ocr
#22
by
						
xwq777
	
							
						- opened
							
					
    	
        README.md
    CHANGED
    
    | @@ -40,7 +40,7 @@ license: mit | |
| 40 | 
             
              <a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
         | 
| 41 | 
             
              <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
         | 
| 42 | 
             
              <a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
         | 
| 43 | 
            -
              <a href=" | 
| 44 | 
             
            </p>
         | 
| 45 | 
             
            <h2>
         | 
| 46 | 
             
            <p align="center">
         | 
| @@ -98,63 +98,6 @@ res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = | |
| 98 | 
             
            ## vLLM
         | 
| 99 | 
             
            Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!--  -->
         | 
| 100 |  | 
| 101 | 
            -
            [2025/10/23] 🚀🚀🚀 DeepSeek-OCR is now officially supported in upstream [vLLM](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html#installing-vllm).
         | 
| 102 | 
            -
            ```shell
         | 
| 103 | 
            -
            uv venv
         | 
| 104 | 
            -
            source .venv/bin/activate
         | 
| 105 | 
            -
            # Until v0.11.1 release, you need to install vLLM from nightly build
         | 
| 106 | 
            -
            uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
         | 
| 107 | 
            -
            ```
         | 
| 108 | 
            -
             | 
| 109 | 
            -
            ```python
         | 
| 110 | 
            -
            from vllm import LLM, SamplingParams
         | 
| 111 | 
            -
            from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
         | 
| 112 | 
            -
            from PIL import Image
         | 
| 113 | 
            -
             | 
| 114 | 
            -
            # Create model instance
         | 
| 115 | 
            -
            llm = LLM(
         | 
| 116 | 
            -
                model="deepseek-ai/DeepSeek-OCR",
         | 
| 117 | 
            -
                enable_prefix_caching=False,
         | 
| 118 | 
            -
                mm_processor_cache_gb=0,
         | 
| 119 | 
            -
                logits_processors=[NGramPerReqLogitsProcessor]
         | 
| 120 | 
            -
            )
         | 
| 121 | 
            -
             | 
| 122 | 
            -
            # Prepare batched input with your image file
         | 
| 123 | 
            -
            image_1 = Image.open("path/to/your/image_1.png").convert("RGB")
         | 
| 124 | 
            -
            image_2 = Image.open("path/to/your/image_2.png").convert("RGB")
         | 
| 125 | 
            -
            prompt = "<image>\nFree OCR."
         | 
| 126 | 
            -
             | 
| 127 | 
            -
            model_input = [
         | 
| 128 | 
            -
                {
         | 
| 129 | 
            -
                    "prompt": prompt,
         | 
| 130 | 
            -
                    "multi_modal_data": {"image": image_1}
         | 
| 131 | 
            -
                },
         | 
| 132 | 
            -
                {
         | 
| 133 | 
            -
                    "prompt": prompt,
         | 
| 134 | 
            -
                    "multi_modal_data": {"image": image_2}
         | 
| 135 | 
            -
                }
         | 
| 136 | 
            -
            ]
         | 
| 137 | 
            -
             | 
| 138 | 
            -
            sampling_param = SamplingParams(
         | 
| 139 | 
            -
                        temperature=0.0,
         | 
| 140 | 
            -
                        max_tokens=8192,
         | 
| 141 | 
            -
                        # ngram logit processor args
         | 
| 142 | 
            -
                        extra_args=dict(
         | 
| 143 | 
            -
                            ngram_size=30,
         | 
| 144 | 
            -
                            window_size=90,
         | 
| 145 | 
            -
                            whitelist_token_ids={128821, 128822},  # whitelist: <td>, </td>
         | 
| 146 | 
            -
                        ),
         | 
| 147 | 
            -
                        skip_special_tokens=False,
         | 
| 148 | 
            -
                    )
         | 
| 149 | 
            -
            # Generate output
         | 
| 150 | 
            -
            model_outputs = llm.generate(model_input, sampling_param)
         | 
| 151 | 
            -
             | 
| 152 | 
            -
            # Print output
         | 
| 153 | 
            -
            for output in model_outputs:
         | 
| 154 | 
            -
                print(output.outputs[0].text)
         | 
| 155 | 
            -
            ```
         | 
| 156 | 
            -
             | 
| 157 | 
            -
             | 
| 158 | 
             
            ## Visualizations
         | 
| 159 | 
             
            <table>
         | 
| 160 | 
             
            <tr>
         | 
| @@ -176,10 +119,4 @@ We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [Omini | |
| 176 |  | 
| 177 |  | 
| 178 | 
             
            ## Citation
         | 
| 179 | 
            -
             | 
| 180 | 
            -
            @article{wei2025deepseek,
         | 
| 181 | 
            -
              title={DeepSeek-OCR: Contexts Optical Compression},
         | 
| 182 | 
            -
              author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
         | 
| 183 | 
            -
              journal={arXiv preprint arXiv:2510.18234},
         | 
| 184 | 
            -
              year={2025}
         | 
| 185 | 
            -
            }
         | 
|  | |
| 40 | 
             
              <a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
         | 
| 41 | 
             
              <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
         | 
| 42 | 
             
              <a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
         | 
| 43 | 
            +
              <a href=""><b>📄 Arxiv Paper Link</b></a> |
         | 
| 44 | 
             
            </p>
         | 
| 45 | 
             
            <h2>
         | 
| 46 | 
             
            <p align="center">
         | 
|  | |
| 98 | 
             
            ## vLLM
         | 
| 99 | 
             
            Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!--  -->
         | 
| 100 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 101 | 
             
            ## Visualizations
         | 
| 102 | 
             
            <table>
         | 
| 103 | 
             
            <tr>
         | 
|  | |
| 119 |  | 
| 120 |  | 
| 121 | 
             
            ## Citation
         | 
| 122 | 
            +
            Coming soon!
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
