|
# PaddleOCR v4 (PP-OCRv4) |
|
|
|
## Model Description |
|
**PP-OCRv4** is the fourth-generation end-to-end optical character recognition system from the PaddlePaddle team. |
|
It combines a lightweight **text detection → angle classification → text recognition** pipeline with improved training techniques and data augmentation, delivering higher accuracy and robustness while staying efficient for real-time use. |
|
|
|
PP-OCRv4 supports multilingual OCR (Latin and non-Latin scripts), irregular layouts (rotated/curved text), and challenging inputs such as noisy or low-resolution images often found in mobile and document-scan scenarios. |
|
|
|
## Features |
|
- **End-to-end OCR**: text detection, optional angle classification, and text recognition in one pipeline. |
|
- **Multilingual support**: pretrained models for English, Chinese, and dozens of other languages; easy finetuning for domain text. |
|
- **Robust in real-world conditions**: handles rotation, perspective distortion, blur, low light, and complex backgrounds. |
|
- **Lightweight & fast**: practical for both mobile apps and large-scale server deployments. |
|
- **Flexible I/O**: works with photos, scans, screenshots, receipts, invoices, ID cards, dashboards, and UI text. |
|
- **Extensible**: swap components (detector/recognizer), add language packs, or finetune on domain datasets. |
|
|
|
## Use Cases |
|
- Document digitization (invoices, receipts, forms, contracts) |
|
- RPA and back-office automation (screen/OCR flows) |
|
- Mobile scanning apps and camera-based translation/read-aloud |
|
- Industrial and retail analytics (labels, price tags, shelf tags) |
|
- Accessibility (screen-readers and read-aloud applications) |
|
|
|
## Inputs and Outputs |
|
**Input**: Image (photo, scan, or screenshot). |
|
**Output**: A list of detected text regions, each with: |
|
- bounding box (rectangular or polygonal) |
|
- recognized text string |
|
- optional confidence score and orientation |
|
|
|
|
|
--- |
|
|
|
## How to use |
|
|
|
> ⚠️ **Hardware requirement:** the model currently runs **only on Qualcomm NPUs** (e.g., Snapdragon-powered AIPC). |
|
> Apple NPU support is planned next. |
|
|
|
### 1) Install Nexa-SDK |
|
|
|
- Download and follow the steps under "Deploy Section" Nexa's model page: [Download Windows arm64 SDK](https://sdk.nexa.ai/model/PaddleOCR%20v4) |
|
- (Other platforms coming soon) |
|
|
|
### 2) Get an access token |
|
Create a token in the Model Hub, then log in: |
|
|
|
```bash |
|
nexa config set license '<access_token>' |
|
``` |
|
|
|
### 3) Run the model |
|
Running: |
|
|
|
```bash |
|
nexa infer NexaAI/paddleocr-npu |
|
``` |
|
|
|
--- |
|
|
|
## License |
|
- Licensed under [Apache-2.0](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/LICENSE) |
|
|
|
## References |
|
- GitHub repo: [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) |
|
- Model zoo & documentation: [Models list](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/models_list_en.md) |