Afri-Aya Gemma 3 4B Vision Model (Single File) 🌍

The definitive single-file version of the Afri-Aya Gemma 3 4B vision model for African cultural visual question answering.

🎯 Key Features

  • βœ… Single adapter_model.safetensors file (587MB) - NO SHARDING
  • βœ… GGUF conversion ready - Perfect for llama.cpp and conversion tools
  • βœ… Enhanced LoRA v2 - r=64, alpha=64 (4x better than v1)
  • βœ… 13 African languages + English support
  • βœ… Cultural expertise - Trained on 2,466 African cultural images

🌍 Supported Languages

English + 13 African Languages: Luganda, Kinyarwanda, Egyptian Arabic, Twi, Hausa, Nyankore, Yoruba, Kirundi, Zulu, Swahili, Gishu, Krio, Igbo

πŸ’» Quick Start

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
from PIL import Image

# Load model
model = AutoModelForVision2Seq.from_pretrained(
    "Bronsn/afri-aya-gemma-3-4b-vision-single",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Bronsn/afri-aya-gemma-3-4b-vision-single")

# Load image
image = Image.open("your_image.jpg")

# Ask about African culture
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What cultural significance does this image have?"},
            {"type": "image"},
        ],
    }
]

# Generate response
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt")

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=128, temperature=1.0, top_p=0.95, top_k=64)
    response = processor.decode(output[0], skip_special_tokens=True)
    print(response)

πŸ”„ GGUF Conversion

Perfect for GGUF conversion with no sharding issues:

python convert-hf-to-gguf.py /path/to/model --outdir ./gguf-models/

πŸ“Š Model Details

  • Base: unsloth/gemma-3-4b-it (instruction-tuned)
  • Dataset: CohereLabsCommunity/afri-aya (2,466 images)
  • Training: Enhanced LoRA r=64, alpha=64
  • File: Single adapter_model.safetensors (587MB)
  • Languages: 14 total (English + 13 African)

πŸ† Performance

v2 Improvements over v1:

  • 4x higher LoRA rank (64 vs 16)
  • 2x higher LoRA alpha (64 vs 32)
  • Both vision + language fine-tuning
  • Single file format (no sharding)

πŸ”— Related


Created with ❀️ for African culture preservation and education

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Bronsn/afri-aya-gemma-3-4b-vision-single

Finetuned
(159)
this model

Dataset used to train Bronsn/afri-aya-gemma-3-4b-vision-single