Configuration Parsing Warning: Invalid JSON for config file config.json

PaddleOCR-VL-1.5.axera

Inference example project for PaddleOCR-VL-1.5 on Axera NPUs.

  • Python inference is currently supported. C++ examples are under development.
  • Prebuilt model files are available on HuggingFace.
  • If you want to export/convert models by yourself, refer to the model conversion guide.

Supported Platforms

  • AX650N
  • AX620E

End-to-End Metrics

Metric Value
Max TTFT (640 tokens) 361.8 ms
Decode speed 44.6 tokens/s
ViT latency (576x768) 1685.554 ms

Repository Layout

.
β”œβ”€β”€ assets
β”‚   β”œβ”€β”€ IMG_0059.JPG
β”‚   β”œβ”€β”€ IMG_0462.JPG
β”‚   β”œβ”€β”€ IMG_0675.JPG
β”‚   └── gradio_demo.png
β”œβ”€β”€ gradio_demo.py
β”œβ”€β”€ infer_axmodel.py
β”œβ”€β”€ infer_torch.py
β”œβ”€β”€ paddleocr_vl_1-5_ax650n_axmodel/
β”œβ”€β”€ paddleocr_vl_1-5_tokenizer/
β”œβ”€β”€ utils/
β”œβ”€β”€ vit_models/
└── README.md

Quick Start

1. Clone the repository

git clone https://huggingface.co/AXERA-TECH/PaddleOCR-VL-1.5
cd PaddleOCR-VL-1.5

2. PyTorch Reference Inference

Run:

python3 infer_torch.py

Note: infer_torch.py uses hardcoded model_path and image_path. Adjust them in the script before running if needed.

3. AxModel Inference

Run from the project root:

python3 infer_axmodel.py \
  --hf_model ./paddleocr_vl_1-5_tokenizer \
  --axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
  --vit_model_path ./vit_models/vit_576x768.axmodel \
  --image_path ./assets/IMG_0462.JPG \
  --task ocr

Available values for --task:

  • ocr
  • table
  • chart
  • formula
  • spotting
  • seal

If you have an exported ONNX VIT model, pass it through --vit_model_path (instead of .axmodel), for example:

python3 infer_axmodel.py \
  --hf_model ./paddleocr_vl_1-5_tokenizer \
  --axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
  --vit_model_path /path/to/paddle_ocr_vl_vit_model.onnx \
  --image_path ./assets/IMG_0462.JPG \
  --task ocr

Test image for OCR:

test-img

Sample output:

Init InferenceSession: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 18/18 [00:00<00:00, 33.44it/s]
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 47548354
Model loaded successfully!
slice_indices: [0, 1, 2, 3, 4]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
Slice prefill done: 3
Slice prefill done: 4
answer >> James Landay-VR
14175

4. Gradio Interactive Demo

python3 gradio_demo.py \
  --hf_model ./paddleocr_vl_1-5_tokenizer \
  --axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
  --vit_model ./vit_models/vit_576x768.axmodel

Demo preview:

gradio_demo

Theoretical Inference Latency (AX650N)

Subgraph Latency

Stage Subgraph Latency
Prefill g1 2.551 ms
Prefill g2 2.883 ms
Prefill g3 3.158 ms
Prefill g4 3.413 ms
Prefill g5 3.795 ms
Prefill g6 4.007 ms
Decode g0 0.949 ms
Post-process - 5.313 ms
ViT - 1685.554 ms

Discussion

  • GitHub Issues
  • QQ Group: 139953715
Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AXERA-TECH/PaddleOCR-VL-1.5

Quantized
(3)
this model