ποΈ Qalam-Net V2: Advanced Arabic OCR
Qalam-Net V2 (ΩΩΩ -ΩΨͺ) is a high-performance Arabic Optical Character Recognition (OCR) system. Built on the TrOCR (Transformer-based OCR) architecture, it achieves superior accuracy by treats OCR as a sequence-to-sequence problem, mapping visual features directly to text tokens.
ποΈ Architecture Visualization
The model utilizes a Vision-Encoder-Decoder framework, specifically optimized for the complexities of Arabic script (ligatures, cursive nature, and right-to-left orientation).
graph TD
A[Input Arabic Image] --> B[ViT Encoder]
B -->|Visual Embeddings| C[Cross-Attention]
D[Previous Tokens] --> E[RoBERTa Decoder]
E --> C
C --> F[Next Token Prediction]
F -->|Generated Text| G[Final Arabic Transcription]
subgraph "Encoder (Vision Transformer)"
B
end
subgraph "Decoder (Language Model)"
E
end
π Key Features
- End-to-End Transformer: No reliance on traditional CNN-RNN architectures or complex preprocessing (like line segmentation).
- Arabic Script Specialist: Fine-tuned on the
mssqpi/Arabic-OCR-Datasetfor robust handling of various Arabic fonts and styles. - State-of-the-Art Accuracy: Leverages pre-trained vision and language weights from
microsoft/trocr-base-handwritten. - Flexible Deployment: Supports CUDA, MPS (Apple Silicon), and CPU execution.
π§ How It Works
Qalam-Net V2 differs from traditional OCR by eliminating the need for an external language model or a separate CTC (Connectionist Temporal Classification) layer.
- Visual Feature Extraction: The encoder divides the input image into patches and processes them via a Vision Transformer (ViT).
- Contextual Decoding: The decoder (RoBERTa-based) attends to both the visual features and the previously generated tokens to predict the next character or word.
- Arabic Optimization: During fine-tuning, the tokenizer and embeddings were adapted to capture the nuances of Arabic UTF-8 encoding.
π Performance Metrics
The model was fine-tuned for 1 epoch on a high-quality selection of 5,000 samples.
| Metric | Value |
|---|---|
| Training Samples | 5,000 |
| Optimizer | AdamW |
| Learning Rate | 3e-5 |
| Convergence (Loss) | 9.5 β 0.03 |
Even with a single epoch, the model reached a training loss of 0.03, indicating highly efficient transfer learning from the base TrOCR weights.
π₯οΈ Getting Started
Installation
pip install transformers datasets Pillow torch
Quick Inference Example
Click to expand the Python inference script
import torch
from PIL import Image, ImageDraw, ImageFont
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
MODEL_NAME = "Ali0044/Qalam_Net_V2"
processor = TrOCRProcessor.from_pretrained(MODEL_NAME)
model = VisionEncoderDecoderModel.from_pretrained(MODEL_NAME)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
def run_ocr(image):
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
with torch.no_grad():
generated_ids = model.generate(pixel_values)
return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
image = Image.new('RGB', (200, 50), color = 'white')
d = ImageDraw.Draw(image)
try:
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 20)
except IOError:
font = ImageFont.load_default()
d.text((10,10), "Ψ§ΩΩ
ΨͺΩ
ΩΨ²Ψ©", fill=(0,0,0), font=font)
print(f"Predicted Transcription: {run_ocr(image)}")
image.show()
π‘οΈ Ethical Considerations & Limitations
- Language Scope: Primarily optimized for Modern Standard Arabic (MSA). Performance on historical scripts or specific dialects may vary.
- Image Quality: Performs best on clear, well-lit text snippets. Handwriting recognition is supported but may require higher resolution inputs.
- Privacy: Ensure you have the rights to process any personal data contained within images when using this model in production.
π€ Contributing & License
Contributions are what make the open-source community an amazing place to learn, inspire, and create.
- License: Distributed under the Apache 2.0 License.
- Contact: Reach out via Github or Hugging Face at
Ali0044.
- Downloads last month
- 64
Model tree for Ali0044/Qalam_Net_V2
Base model
microsoft/trocr-base-handwritten