Configuration Parsing Warning: Invalid JSON for config file config.json
PaddleOCR-VL-1.5.axera
Inference example project for PaddleOCR-VL-1.5 on Axera NPUs.
- Python inference is currently supported. C++ examples are under development.
- Prebuilt model files are available on HuggingFace.
- If you want to export/convert models by yourself, refer to the model conversion guide.
Supported Platforms
- AX650N
- AX620E
End-to-End Metrics
| Metric | Value |
|---|---|
| Max TTFT (640 tokens) | 361.8 ms |
| Decode speed | 44.6 tokens/s |
| ViT latency (576x768) | 1685.554 ms |
Repository Layout
.
βββ assets
β βββ IMG_0059.JPG
β βββ IMG_0462.JPG
β βββ IMG_0675.JPG
β βββ gradio_demo.png
βββ gradio_demo.py
βββ infer_axmodel.py
βββ infer_torch.py
βββ paddleocr_vl_1-5_ax650n_axmodel/
βββ paddleocr_vl_1-5_tokenizer/
βββ utils/
βββ vit_models/
βββ README.md
Quick Start
1. Clone the repository
git clone https://huggingface.co/AXERA-TECH/PaddleOCR-VL-1.5
cd PaddleOCR-VL-1.5
2. PyTorch Reference Inference
Run:
python3 infer_torch.py
Note: infer_torch.py uses hardcoded model_path and image_path. Adjust them in the script before running if needed.
3. AxModel Inference
Run from the project root:
python3 infer_axmodel.py \
--hf_model ./paddleocr_vl_1-5_tokenizer \
--axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
--vit_model_path ./vit_models/vit_576x768.axmodel \
--image_path ./assets/IMG_0462.JPG \
--task ocr
Available values for --task:
ocrtablechartformulaspottingseal
If you have an exported ONNX VIT model, pass it through --vit_model_path (instead of .axmodel), for example:
python3 infer_axmodel.py \
--hf_model ./paddleocr_vl_1-5_tokenizer \
--axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
--vit_model_path /path/to/paddle_ocr_vl_vit_model.onnx \
--image_path ./assets/IMG_0462.JPG \
--task ocr
Test image for OCR:
Sample output:
Init InferenceSession: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 18/18 [00:00<00:00, 33.44it/s]
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 47548354
Model loaded successfully!
slice_indices: [0, 1, 2, 3, 4]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
Slice prefill done: 3
Slice prefill done: 4
answer >> James Landay-VR
14175
4. Gradio Interactive Demo
python3 gradio_demo.py \
--hf_model ./paddleocr_vl_1-5_tokenizer \
--axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
--vit_model ./vit_models/vit_576x768.axmodel
Demo preview:
Theoretical Inference Latency (AX650N)
Subgraph Latency
| Stage | Subgraph | Latency |
|---|---|---|
| Prefill | g1 | 2.551 ms |
| Prefill | g2 | 2.883 ms |
| Prefill | g3 | 3.158 ms |
| Prefill | g4 | 3.413 ms |
| Prefill | g5 | 3.795 ms |
| Prefill | g6 | 4.007 ms |
| Decode | g0 | 0.949 ms |
| Post-process | - | 5.313 ms |
| ViT | - | 1685.554 ms |
Discussion
- GitHub Issues
- QQ Group:
139953715
- Downloads last month
- 30
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for AXERA-TECH/PaddleOCR-VL-1.5
Base model
baidu/ERNIE-4.5-0.3B-Paddle
Finetuned
PaddlePaddle/PaddleOCR-VL-1.5 