You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SmolVLM Base - OCR Fine-tuned

This is a merged version of SmolVLM-Base fine-tuned for OCR tasks. The model was trained using QLoRA on the DeepMount00/ner_training dataset.

Model Details

  • Base Model: HuggingFaceTB/SmolVLM-Base
  • Task: Optical Character Recognition (OCR)
  • Training Method: QLoRA with 4-bit quantization
  • Target Modules: down_proj, o_proj, k_proj, q_proj, gate_proj, up_proj, v_proj

Usage

from transformers import AutoProcessor, Idefics3ForConditionalGeneration
import torch
from PIL import Image

model_id = "DeepMount00/SmolVLM-Base-ocr_base"
processor = AutoProcessor.from_pretrained(model_id)
model = Idefics3ForConditionalGeneration.from_pretrained(model_id)

# Load your image
image = Image.open("path_to_your_image.jpg").convert("RGB")

# Prepare the prompt
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "You are a model specialized in OCR"},
            {"type": "image"},
            {"type": "text", "text": "Extract the text from this image"}
        ]
    }
]

# Process inputs
inputs = processor(text=messages, images=[image], return_tensors="pt")

# Generate
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512)
    
# Decode and print the response
print(processor.decode(outputs[0], skip_special_tokens=True))
Downloads last month
8
Safetensors
Model size
256M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using DeepMount00/Smol-OCR-preview 1