Files changed (1) hide show
  1. README.md +2 -65
README.md CHANGED
@@ -40,7 +40,7 @@ license: mit
40
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
41
  <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
42
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
43
- <a href="https://arxiv.org/abs/2510.18234"><b>📄 Arxiv Paper Link</b></a> |
44
  </p>
45
  <h2>
46
  <p align="center">
@@ -98,63 +98,6 @@ res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path =
98
  ## vLLM
99
  Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!-- -->
100
 
101
- [2025/10/23] 🚀🚀🚀 DeepSeek-OCR is now officially supported in upstream [vLLM](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html#installing-vllm).
102
- ```shell
103
- uv venv
104
- source .venv/bin/activate
105
- # Until v0.11.1 release, you need to install vLLM from nightly build
106
- uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
107
- ```
108
-
109
- ```python
110
- from vllm import LLM, SamplingParams
111
- from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
112
- from PIL import Image
113
-
114
- # Create model instance
115
- llm = LLM(
116
- model="deepseek-ai/DeepSeek-OCR",
117
- enable_prefix_caching=False,
118
- mm_processor_cache_gb=0,
119
- logits_processors=[NGramPerReqLogitsProcessor]
120
- )
121
-
122
- # Prepare batched input with your image file
123
- image_1 = Image.open("path/to/your/image_1.png").convert("RGB")
124
- image_2 = Image.open("path/to/your/image_2.png").convert("RGB")
125
- prompt = "<image>\nFree OCR."
126
-
127
- model_input = [
128
- {
129
- "prompt": prompt,
130
- "multi_modal_data": {"image": image_1}
131
- },
132
- {
133
- "prompt": prompt,
134
- "multi_modal_data": {"image": image_2}
135
- }
136
- ]
137
-
138
- sampling_param = SamplingParams(
139
- temperature=0.0,
140
- max_tokens=8192,
141
- # ngram logit processor args
142
- extra_args=dict(
143
- ngram_size=30,
144
- window_size=90,
145
- whitelist_token_ids={128821, 128822}, # whitelist: <td>, </td>
146
- ),
147
- skip_special_tokens=False,
148
- )
149
- # Generate output
150
- model_outputs = llm.generate(model_input, sampling_param)
151
-
152
- # Print output
153
- for output in model_outputs:
154
- print(output.outputs[0].text)
155
- ```
156
-
157
-
158
  ## Visualizations
159
  <table>
160
  <tr>
@@ -176,10 +119,4 @@ We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [Omini
176
 
177
 
178
  ## Citation
179
- ```bibtex
180
- @article{wei2025deepseek,
181
- title={DeepSeek-OCR: Contexts Optical Compression},
182
- author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
183
- journal={arXiv preprint arXiv:2510.18234},
184
- year={2025}
185
- }
 
40
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
41
  <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
42
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
43
+ <a href=""><b>📄 Arxiv Paper Link</b></a> |
44
  </p>
45
  <h2>
46
  <p align="center">
 
98
  ## vLLM
99
  Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!-- -->
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## Visualizations
102
  <table>
103
  <tr>
 
119
 
120
 
121
  ## Citation
122
+ Coming soon!