SmolVLM Base - OCR Fine-tuned
This is a merged version of SmolVLM-Base fine-tuned for OCR tasks. The model was trained using QLoRA on the DeepMount00/ner_training dataset.
Model Details
- Base Model: HuggingFaceTB/SmolVLM-Base
- Task: Optical Character Recognition (OCR)
- Training Method: QLoRA with 4-bit quantization
- Target Modules: down_proj, o_proj, k_proj, q_proj, gate_proj, up_proj, v_proj
Usage
from transformers import AutoProcessor, Idefics3ForConditionalGeneration
import torch
from PIL import Image
model_id = "DeepMount00/SmolVLM-Base-ocr_base"
processor = AutoProcessor.from_pretrained(model_id)
model = Idefics3ForConditionalGeneration.from_pretrained(model_id)
# Load your image
image = Image.open("path_to_your_image.jpg").convert("RGB")
# Prepare the prompt
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "You are a model specialized in OCR"},
{"type": "image"},
{"type": "text", "text": "Extract the text from this image"}
]
}
]
# Process inputs
inputs = processor(text=messages, images=[image], return_tensors="pt")
# Generate
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512)
# Decode and print the response
print(processor.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support