Configuration Parsing Warning: Invalid JSON for config file config.json

PaddleOCR-VL-1.5.axera

Inference example project for PaddleOCR-VL-1.5 on Axera NPUs.

Python inference is currently supported. C++ examples are under development.
Prebuilt model files are available on HuggingFace.
If you want to export/convert models by yourself, refer to the model conversion guide.

Supported Platforms

AX650N
AX620E

End-to-End Metrics

Metric	Value
Max TTFT (640 tokens)	361.8 ms
Decode speed	44.6 tokens/s
ViT latency (576x768)	1685.554 ms

Repository Layout

.
├── assets
│   ├── IMG_0059.JPG
│   ├── IMG_0462.JPG
│   ├── IMG_0675.JPG
│   └── gradio_demo.png
├── gradio_demo.py
├── infer_axmodel.py
├── infer_torch.py
├── paddleocr_vl_1-5_ax650n_axmodel/
├── paddleocr_vl_1-5_tokenizer/
├── utils/
├── vit_models/
└── README.md

Quick Start

1. Clone the repository

git clone https://huggingface.co/AXERA-TECH/PaddleOCR-VL-1.5
cd PaddleOCR-VL-1.5

2. PyTorch Reference Inference

Run:

python3 infer_torch.py

Note: infer_torch.py uses hardcoded model_path and image_path. Adjust them in the script before running if needed.

3. AxModel Inference

Run from the project root:

python3 infer_axmodel.py \
  --hf_model ./paddleocr_vl_1-5_tokenizer \
  --axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
  --vit_model_path ./vit_models/vit_576x768.axmodel \
  --image_path ./assets/IMG_0462.JPG \
  --task ocr

Available values for --task:

ocr
table
chart
formula
spotting
seal

If you have an exported ONNX VIT model, pass it through --vit_model_path (instead of .axmodel), for example:

python3 infer_axmodel.py \
  --hf_model ./paddleocr_vl_1-5_tokenizer \
  --axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
  --vit_model_path /path/to/paddle_ocr_vl_vit_model.onnx \
  --image_path ./assets/IMG_0462.JPG \
  --task ocr

Test image for OCR:

Sample output:

Init InferenceSession: 100%|██████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 33.44it/s]
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 47548354
Model loaded successfully!
slice_indices: [0, 1, 2, 3, 4]
Slice prefill done: 0
Slice prefill done: 1
Slice prefill done: 2
Slice prefill done: 3
Slice prefill done: 4
answer >> James Landay-VR
14175

4. Gradio Interactive Demo

python3 gradio_demo.py \
  --hf_model ./paddleocr_vl_1-5_tokenizer \
  --axmodel_path ./paddleocr_vl_1-5_ax650n_axmodel \
  --vit_model ./vit_models/vit_576x768.axmodel

Demo preview:

Theoretical Inference Latency (AX650N)

Subgraph Latency

Stage	Subgraph	Latency
Prefill	g1	2.551 ms
Prefill	g2	2.883 ms
Prefill	g3	3.158 ms
Prefill	g4	3.413 ms
Prefill	g5	3.795 ms
Prefill	g6	4.007 ms
Decode	g0	0.949 ms
Post-process	-	5.313 ms
ViT	-	1685.554 ms

Discussion

GitHub Issues
QQ Group: 139953715

Downloads last month: 30

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/PaddleOCR-VL-1.5

Base model

baidu/ERNIE-4.5-0.3B-Paddle

Finetuned

PaddlePaddle/PaddleOCR-VL-1.5

Quantized

(3)

this model