Alt Text

๐Ÿค˜๐Ÿค–

๐ŸŽฏ ONE MODEL, ALL MODALITIES, CHAT & EMBED - Unlike pipeline approaches, Senter-Omni is a single 4B parameter model that truly understands and reasons across text, images, audio, and video simultaneously.

๐Ÿ”“ OPEN & UNCENSORED - Apache 2.0 licensed with unrestricted responses for maximum utility.

๐Ÿง  128K CONTEXT - Extended RoPE scaling for handling massive documents and conversations.

๐Ÿ’พ MEMORY EFFICIENT - 4-bit quantized model that fits on consumer GPUs while maintaining full multimodal capabilities.


๐Ÿš€ Quick Start

Installation

git clone https://github.com/SouthpawIN/senter-omni.git
cd senter-omni
pip install -r requirements.txt

# Download the quantized model (instructions below)
# Then run the demo:
python senter_omni_demo.py

Basic Usage

from omni import OmniClient

# Initialize Senter-Omni
client = OmniClient()

# Streaming chat
response = client.chat([
    {"role": "user", "content": "Hello Senter!"}
], stream=True)

# Multimodal chat with image
response = client.chat([
    {"role": "user", "content": [
        {"type": "image", "image": "photo.jpg"},
        {"type": "text", "text": "What do you see?"}
    ]}
])

# Cross-modal embeddings
embedding = client.embed("any content", modality="auto")

๐ŸŽญ Multimodal Capabilities

Text Understanding & Generation

  • Mathematical Reasoning: Step-by-step problem solving
  • Code Generation: Python, JavaScript, and more
  • Creative Writing: Stories, scripts, poetry
  • Technical Analysis: Complex explanations and documentation

Visual Understanding

  • Image Analysis: Detailed descriptions of visual content
  • Geometric Recognition: Shapes, colors, spatial relationships
  • Creative Interpretation: Stories inspired by images
  • Technical Diagrams: Understanding charts, graphs, schematics

Audio Processing

  • Sound Analysis: Identifying audio content and patterns
  • Speech Understanding: Transcribing and interpreting spoken content
  • Music Analysis: Recognizing musical elements and genres
  • Environmental Audio: Identifying sounds from various sources

Cross-Modal Reasoning

  • Unified Understanding: Connecting information across modalities
  • Contextual Analysis: Using multiple inputs for better reasoning
  • Creative Synthesis: Combining visual, audio, and text for rich responses

Model Specifications

  • Parameters: 4B (quantized to 4-bit)
  • Context Length: 128K tokens (RoPE scaled)
  • Memory Usage: ~8GB VRAM
  • Inference Speed: Real-time streaming
  • Modalities: Text, Image, Audio, Video

Embedding Capabilities

  • Unified Space: 1024D embeddings for all modalities
  • Cross-Modal Search: Find similar content across text, images, audio
  • Similarity Matching: Cosine similarity in unified space
  • Memory Efficient: Same model for chat and embeddings

๐ŸŽฏ Real Examples

Image Analysis

# Analyze geometric shapes
response = client.chat([
    {"role": "user", "content": [
        {"type": "image", "image": "test_assets/real_test_image.jpg"},
        {"type": "text", "text": "What geometric shapes do you see?"}
    ]}
])

# Output: "I see a red square, blue square, and green oval arranged vertically"

Audio Understanding

# Process audio content
response = client.chat([
    {"role": "user", "content": [
        {"type": "audio", "audio": "test_assets/real_test_audio.wav"},
        {"type": "text", "text": "What do you hear?"}
    ]}
])

# Output: "I hear an electric hum from a device like a radio or TV"

Creative Multimodal Storytelling

# Create stories from images
response = client.chat([
    {"role": "user", "content": [
        {"type": "image", "image": "shapes.jpg"},
        {"type": "text", "text": "Create a story inspired by this image"}
    ]}
])

# Output: Rich, creative stories combining visual elements with narrative

Cross-Modal Embeddings

# Embed different modalities
text_emb = client.embed("beautiful mountain landscape")
image_emb = client.embed("mountain_photo.jpg", modality="image")
audio_emb = client.embed("nature_sounds.wav", modality="audio")

# All embeddings are in the same 1024D space for comparison

๐Ÿ”ง Technical Architecture

Model Details

  • Base: Qwen2.5-Omni-3B (Apache 2.0 licensed)
  • Quantization: 4-bit NF4 for memory efficiency
  • Context Extension: Yarn RoPE scaling to 128K
  • Streaming: Custom TimingStreamer for real-time output
  • Embeddings: Hash-based unified 1024D space

Training Data

  • 131,893 samples from multiple high-quality datasets:
    • 50,000 ShareGPT conversations (chat)
    • 30,000 AgentCode samples (function calling)
    • 20,000 Stack Overflow (coding)
    • 30,000 Hermes-3 (instruction tuning)
    • 1,893 Hermes function calling

Key Features

  • XML Tag Support: <think>, <notepad>, <system>, <user>, <assistant>
  • Uncensored Responses: No content restrictions
  • Function Calling: Tool integration capabilities
  • Memory Efficient: Single model for chat and embeddings

๐Ÿ“ฆ Installation & Setup

1. Clone Repository

git clone https://github.com/SouthpawIN/senter-omni.git
cd senter-omni

2. Install Dependencies

pip install -r requirements.txt

3. Download Model

The quantized model (3.5GB) is hosted on Hugging Face due to GitHub's 100MB file limit:

# Option 1: Download from Hugging Face (Recommended)
git lfs install
git clone https://huggingface.co/SouthpawIN/senter-omni-model
cp -r senter-omni-model/* ./senter_omni_128k/

# Option 2: Manual download
# Download from: https://huggingface.co/SouthpawIN/senter-omni-model

๐ŸŽฎ Interactive Demo

The comprehensive demo showcases all capabilities:

python senter_omni_demo.py

Demo Sections:

  1. ๐ŸŽ“ Training Capabilities - Dataset overview and training features
  2. ๐Ÿ’ฌ Multimodal Chat - Text, image, audio, and combined processing
  3. ๐Ÿ” Cross-Modal Embeddings - Unified embedding space demonstration
  4. ๐Ÿš€ Building Guide - API usage and integration examples

๐Ÿ› ๏ธ API Reference

Core Methods

client.chat(messages, **kwargs)

# Basic chat
response = client.chat([
    {"role": "user", "content": "Hello!"}
])

# With parameters
response = client.chat(
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=256,
    temperature=0.7,
    stream=True
)

# Multimodal
response = client.chat([
    {"role": "user", "content": [
        {"type": "image", "image": "photo.jpg"},
        {"type": "text", "text": "Describe this image"}
    ]}
])

client.embed(content, modality="auto")

# Text embedding
emb = client.embed("sample text")

# Image embedding
emb = client.embed("image.jpg", modality="image")

# Audio embedding
emb = client.embed("audio.wav", modality="audio")

# Auto-detect modality
emb = client.embed("[IMAGE] photo.jpg")  # Detects as image

client.cross_search(query, top_k=5)

# Search across modalities
results = client.cross_search("mountain landscape")
# Returns: {"text": [...], "image": [...], "audio": [...]}

client.retrieve_context(query, context_window=5)

# Get relevant context
context = client.retrieve_context("nature scenes")
# Returns multimodal context items

Memory Usage

  • Model Loading: ~8GB VRAM
  • Inference: ~10GB VRAM peak
  • Embeddings: Shared model (no additional memory)
  • Context (128K): ~2GB additional for full context

Development Setup

git clone https://github.com/SouthpawIN/senter-omni.git
cd senter-omni
pip install -r requirements.txt
python senter_omni_demo.py  # Test installation

๐Ÿ“„ License

Apache 2.0 License - See LICENSE for details.

This project uses:

  • Qwen2.5-Omni: Apache 2.0 (Alibaba Cloud)
  • Training Datasets: Various open licenses
  • Code: Apache 2.0

๐Ÿ™ Acknowledgments

  • Alibaba Cloud for Qwen2.5-Omni architecture
  • Nous Research for Hermes dataset and inspiration
  • Alignment Lab AI for development and training
  • Unsloth for efficient training framework
  • HuggingFace for model hosting and tools
  • Open Source Community for datasets and tools

๐ŸŽญ EXPERIENCE THE FUTURE OF MULTIMODAL AI WITH SENTER-OMNI

Built with โค๏ธ by sovthpaw at Alignment Lab AI

Donations:

https://www.paypal.me/Sellgames1l

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sovthpaw/senter-omni-model

Finetuned
(9)
this model

Dataset used to train sovthpaw/senter-omni-model