File size: 2,830 Bytes
f5134e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67ff64a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5134e2
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# PaddleOCR v4 (PP-OCRv4)

## Model Description
**PP-OCRv4** is the fourth-generation end-to-end optical character recognition system from the PaddlePaddle team.  
It combines a lightweight **text detection → angle classification → text recognition** pipeline with improved training techniques and data augmentation, delivering higher accuracy and robustness while staying efficient for real-time use.

PP-OCRv4 supports multilingual OCR (Latin and non-Latin scripts), irregular layouts (rotated/curved text), and challenging inputs such as noisy or low-resolution images often found in mobile and document-scan scenarios.

## Features
- **End-to-end OCR**: text detection, optional angle classification, and text recognition in one pipeline.
- **Multilingual support**: pretrained models for English, Chinese, and dozens of other languages; easy finetuning for domain text.
- **Robust in real-world conditions**: handles rotation, perspective distortion, blur, low light, and complex backgrounds.
- **Lightweight & fast**: practical for both mobile apps and large-scale server deployments.
- **Flexible I/O**: works with photos, scans, screenshots, receipts, invoices, ID cards, dashboards, and UI text.
- **Extensible**: swap components (detector/recognizer), add language packs, or finetune on domain datasets.

## Use Cases
- Document digitization (invoices, receipts, forms, contracts)
- RPA and back-office automation (screen/OCR flows)
- Mobile scanning apps and camera-based translation/read-aloud
- Industrial and retail analytics (labels, price tags, shelf tags)
- Accessibility (screen-readers and read-aloud applications)

## Inputs and Outputs
**Input**: Image (photo, scan, or screenshot).  
**Output**: A list of detected text regions, each with:
- bounding box (rectangular or polygonal)
- recognized text string
- optional confidence score and orientation


---

## How to use

> ⚠️ **Hardware requirement:** the model currently runs **only on Qualcomm NPUs** (e.g., Snapdragon-powered AIPC).  
> Apple NPU support is planned next.

### 1) Install Nexa-SDK

- Download and follow the steps under "Deploy Section" Nexa's model page:  [Download Windows arm64 SDK](https://sdk.nexa.ai/model/PaddleOCR%20v4)
- (Other platforms coming soon)

### 2) Get an access token
Create a token in the Model Hub, then log in:

```bash
nexa config set license '<access_token>'
```

### 3) Run the model
Running:

```bash
nexa infer NexaAI/paddleocr-npu
```

---

## License
- Licensed under [Apache-2.0](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/LICENSE)

## References
- GitHub repo: [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)  
- Model zoo & documentation: [Models list](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/models_list_en.md)