You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Patram-7B-Instruct

Patram-7B-Instruct by BharatGen is a 7B parameter vision-language model trained from scratch for visual document understanding. As Indiaโ€™s first document foundation model, it is built to tackle complex document analysis. The model was trained on a carefully curated instruction-tuned dataset, combining diverse public and custom synthetic data designed to support a broad spectrum of document understanding tasks.

Model Overview

  • Architecture: Vision Transformer (ViT) + MLP projector + OLMo-7B LLM
  • Training Data: BharatDocs-v1, a dataset of diverse Indian documents + Other Open Source Document Datasets
  • Supported I/O Formats: The model currently accepts English-language instructions and image files (e.g., PNG, JPEG) as input. The output is provided in text format.
  • Language: English (Indian language support upcoming)
  • License: Apache 2.0

Usage Examples

Use the transformers library.

import torch
from transformers import AutoProcessor, AutoModelForCausalLM, GenerationConfig
from PIL import Image
import requests

# Model ID and device setup
model_id = "bharatgenai/patram-7b-instruct"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load processor and model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True
).to(device)

def get_patram_response(image_path_or_url, question):
    try:
        # Load image
        if image_path_or_url.startswith("http"):
            image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert("RGB")
        else:
            image = Image.open(image_path_or_url).convert("RGB")
    except Exception as e:
        print(f"Error loading image: {e}")
        return None

    # Format the prompt as expected
    prompt = f"Question: {question} Answer based on the image."

    try:
        # Preprocess image and text using the processor
        inputs = processor.process(images=[image], text=prompt)
        inputs = {k: v.to(device).unsqueeze(0) for k, v in inputs.items()}

        # Generate output using model's generate_from_batch method (Patram-specific)
        output = model.generate_from_batch(
            inputs,
            GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
            tokenizer=processor.tokenizer
        )

        # Extract generated tokens (excluding input tokens) and decode
        generated_tokens = output[0, inputs['input_ids'].size(1):]
        response = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()
        return response
    except Exception as e:
        print(f"Error during inference: {e}")
        return None

# Example usage:
# image_input = "https://knowscope.in/wp-content/uploads/2025/05/cghd-nag.png"
# question = "Who issued this notice?"
# answer = get_patram_response(image_input, question)
# if answer:
#     print("Answer:", answer)

Evaluations

We evaluated Patram-7B-Instruct alongside other vision-language models (VLMs) in the 7Bโ€“9B parameter range across multiple public document benchmarks.

Benchmarks: DocVQA, VisualMRC, Patram-Bench

Patram-Bench is an in-house benchmark designed for Indic Document VQA.

Metric: G-Eval (LLM-as-a-judge)

Model Overall DocVQA Patram-Bench VisualMRC
claude-3.7-sonnet 0.8830 0.8480 0.8857 0.8830
Qwen2.5-VL-7B-Instruct 0.8759 0.8722 0.6816 0.9169
gemma-3-12b-it 0.8556 0.8451 0.6349 0.9069
patram-7b-instruct 0.8331 0.8550 0.6515 0.8510
InternVL3-9B 0.7865 0.8681 0.6888 0.7405
deepseek-vl2 0.7581 0.8739 0.5089 0.7144

*Note: The benchmarked results reflect the API variant.

Citation

@online{BharatGenPatramLaunch2025,
  author    = {{BharatGen Team}},
  title     = {BharatGen Unveils Patram: India's Pioneering Vision-Language Foundation Model for Document Intelligence},
  year      = {2025},
  url       = {https://bharatgen.com/blog/patram-launch},
  urldate   = {2025-06-02}
}

Resources

Authors

  • Principal Investigators: Prof. Ravi Kiran Sarvadevabhatla, Prof. Ganesh Ramakrishnan
  • Contributors: BharatGen Team

Contact

Downloads last month
136
Safetensors
Model size
7.68B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using bharatgenai/patram-7b-instruct 1