Afri-Aya Gemma 3 4B Vision Model (Single File) π
The definitive single-file version of the Afri-Aya Gemma 3 4B vision model for African cultural visual question answering.
π― Key Features
- β
 Single adapter_model.safetensorsfile (587MB) - NO SHARDING
- β GGUF conversion ready - Perfect for llama.cpp and conversion tools
- β Enhanced LoRA v2 - r=64, alpha=64 (4x better than v1)
- β 13 African languages + English support
- β Cultural expertise - Trained on 2,466 African cultural images
π Supported Languages
English + 13 African Languages: Luganda, Kinyarwanda, Egyptian Arabic, Twi, Hausa, Nyankore, Yoruba, Kirundi, Zulu, Swahili, Gishu, Krio, Igbo
π» Quick Start
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
from PIL import Image
# Load model
model = AutoModelForVision2Seq.from_pretrained(
    "Bronsn/afri-aya-gemma-3-4b-vision-single",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Bronsn/afri-aya-gemma-3-4b-vision-single")
# Load image
image = Image.open("your_image.jpg")
# Ask about African culture
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What cultural significance does this image have?"},
            {"type": "image"},
        ],
    }
]
# Generate response
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=128, temperature=1.0, top_p=0.95, top_k=64)
    response = processor.decode(output[0], skip_special_tokens=True)
    print(response)
π GGUF Conversion
Perfect for GGUF conversion with no sharding issues:
python convert-hf-to-gguf.py /path/to/model --outdir ./gguf-models/
π Model Details
- Base: unsloth/gemma-3-4b-it (instruction-tuned)
- Dataset: CohereLabsCommunity/afri-aya (2,466 images)
- Training: Enhanced LoRA r=64, alpha=64
- File: Single adapter_model.safetensors(587MB)
- Languages: 14 total (English + 13 African)
π Performance
v2 Improvements over v1:
- 4x higher LoRA rank (64 vs 16)
- 2x higher LoRA alpha (64 vs 32)
- Both vision + language fine-tuning
- Single file format (no sharding)
π Related
- Dataset: CohereLabsCommunity/afri-aya
- Base Model: unsloth/gemma-3-4b-it
Created with β€οΈ for African culture preservation and education