Model Overview

Model Architecture: DeepSeek-OCR
- Input: Image/Text
- Output: Text
Supported Hardware Microarchitecture: AMD MI300/MI350/MI355
ROCm: 7.1.0
PyTorch: 2.8.0
Transformers: 4.57.3
Operating System(s): Linux

Model Details

The official version of deepseek-ai/DeepSeek-OCR restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the modeling_deepseekocr.py file has been updated for improved usability, and modeling_deepseekv2.py has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers.

This model can be quantized by using AMD-Quark, and the resulting quantized model is available at amd/DeepSeek-OCR-MXFP4.

Usage

from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["HIP_VISIBLE_DEVICES"] = '0'
model_name = 'amd/DeepSeek-OCR'

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)

# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'

# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):

# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False

# Gundam: base_size = 1024, image_size = 640, crop_mode = True

res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)

License

Downloads last month: 13

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support