---
title: PaddleOCR Text Recognition Fine-tuning Toolkit
emoji: 🌍
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 5.38.0
app_file: app.py
pinned: false
license: apache-2.0
---

# PaddleOCR Text Recognition Fine-tuning Toolkit

This repository provides a comprehensive pipeline for fine-tuning PaddleOCR text recognition models on custom datasets. Based on the official PaddleOCR Text Recognition Module Tutorial, this toolkit includes dataset preparation, training, evaluation, and inference scripts.

## 📋 Table of Contents

- [Features](#features)
- [Requirements](#requirements)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Dataset Preparation](#dataset-preparation)
- [Fine-tuning Process](#fine-tuning-process)
- [Model Evaluation](#model-evaluation)
- [Inference](#inference)
- [Advanced Usage](#advanced-usage)
- [Troubleshooting](#troubleshooting)
- [Supported Models](#supported-models)

## ✨ Features

- **Complete Pipeline**: End-to-end fine-tuning from dataset preparation to model export
- **Multiple Models**: Support for PP-OCRv5, PP-OCRv4 server and mobile variants
- **Dataset Flexibility**: Handle various dataset formats (directory, CSV, JSON, ICDAR)
- **Performance Optimization**: Automatic GPU memory management and batch processing
- **Comprehensive Evaluation**: Model benchmarking and comparison tools
- **Easy Inference**: Ready-to-use inference scripts with visualization

## 🔧 Requirements

### System Requirements
- Python 3.8+
- CUDA 11.8+ (for GPU training)
- 8GB+ RAM (16GB+ recommended)
- 4GB+ GPU memory (8GB+ recommended)

### Software Dependencies
See `requirements.txt` for detailed package versions.

## 📦 Installation

1. **Clone this repository:**
```bash
git clone <repository-url>
cd paddleocr-text-recognition-finetuning
```

2. **Install dependencies:**
```bash
pip install -r requirements.txt
```

3. **Install PaddleOCR:**
```bash
# For GPU users
pip install paddlepaddle-gpu paddleocr

# For CPU users  
pip install paddlepaddle paddleocr
```

4. **Verify installation:**
```bash
python -c "import paddleocr; print('PaddleOCR installed successfully!')"
```

## 🚀 Quick Start

### Option 1: Complete Pipeline (Recommended)
Run the entire fine-tuning pipeline with demo data:

```bash
python fine_tune_text_recognition.py \
    --model_name PP-OCRv5_server_rec \
    --work_dir ./my_training \
    --gpus 0 \
    --mode complete
```

### Option 1b: Document Dataset (LMDB) Pipeline
If you have a document dataset in LMDB format (like `./input_dir/document`):

```bash
# Quick demo to see your data
python demo_document_extraction.py

# Complete pipeline: extract + train + test
python extract_and_train.py \
    --input_dir ./input_dir/document \
    --work_dir ./document_training \
    --model_name PP-OCRv5_server_rec \
    --epochs 20 \
    --batch_size 64
```

### Option 2: Step-by-Step Process

1. **Prepare your dataset:**
```bash
python prepare_dataset.py \
    --input_type directory \
    --input_path /path/to/your/images \
    --output_dir ./dataset
```

2. **Fine-tune the model:**
```bash
python fine_tune_text_recognition.py \
    --model_name PP-OCRv5_server_rec \
    --work_dir ./my_training \
    --skip_demo_data \
    --mode train
```

3. **Test your fine-tuned model:**
```bash
python inference_example.py \
    --model_dir ./my_training/PP-OCRv5_server_rec_infer \
    --input /path/to/test/image.jpg \
    --save_results \
    --visualize
```

## 📊 Dataset Preparation

### Supported Input Formats

#### 1. Directory with Images and Text Files
```
your_dataset/
├── image1.jpg
├── image1.txt
├── image2.png
├── image2.txt
└── ...
```

```bash
python prepare_dataset.py \
    --input_type directory \
    --input_path ./your_dataset \
    --output_dir ./dataset
```

#### 2. CSV Format
CSV file with columns: `image_path`, `text`

```bash
python prepare_dataset.py \
    --input_type csv \
    --input_path data.csv \
    --img_col image_path \
    --text_col text \
    --output_dir ./dataset
```

#### 3. JSON Format
```json
[
    {"image_path": "img1.jpg", "text": "Hello World"},
    {"image_path": "img2.jpg", "text": "Fine-tuning"},
    ...
]
```

```bash
python prepare_dataset.py \
    --input_type json \
    --input_path data.json \
    --output_dir ./dataset
```

#### 4. ICDAR Format
```bash
python prepare_dataset.py \
    --input_type icdar \
    --input_path ./images_directory \
    --annotations_file annotations.txt \
    --output_dir ./dataset
```

#### 5. LMDB Format (Document Datasets)
For LMDB datasets (like the document dataset in `./input_dir/document`):

```bash
# Extract LMDB data only
python extract_lmdb_data.py \
    --input_dir ./input_dir/document \
    --output_dir ./extracted_dataset

# Or use the integrated approach
python prepare_dataset.py \
    --input_type lmdb \
    --input_path ./input_dir/document \
    --output_dir ./dataset
```

### Expected Output Structure
```
dataset/
├── images/
│   ├── image1.jpg
│   ├── image2.png
│   └── ...
├── train_list.txt
└── val_list.txt
```

Format of `train_list.txt` and `val_list.txt`:
```
images/image1.jpg	Hello World
images/image2.png	Fine-tuning
...
```

## 🎯 Fine-tuning Process

### Basic Fine-tuning

```bash
python fine_tune_text_recognition.py \
    --model_name PP-OCRv5_server_rec \
    --work_dir ./training_output \
    --gpus 0
```

### Advanced Configuration

The script supports various customization options:

```bash
python fine_tune_text_recognition.py \
    --model_name PP-OCRv5_mobile_rec \
    --work_dir ./training_output \
    --gpus 0,1 \
    --mode complete \
    --skip_demo_data
```

### Custom Training Parameters

Modify the `custom_params` dictionary in the script for advanced customization:

```python
custom_params = {
    "Global": {
        "epoch_num": 50,           # Number of training epochs
        "save_epoch_step": 5,      # Save model every N epochs
        "eval_batch_step": [0, 1000]  # Evaluation frequency
    },
    "Train": {
        "loader": {
            "batch_size_per_card": 64,  # Batch size per GPU
            "num_workers": 8           # Data loading workers
        }
    }
}
```

### Memory Optimization

For systems with limited GPU memory, use these settings:

```python
custom_params = {
    "Train": {
        "loader": {
            "batch_size_per_card": 32,  # Reduce batch size
            "num_workers": 2            # Reduce workers
        }
    },
    "Eval": {
        "loader": {
            "batch_size_per_card": 32
        }
    }
}
```

## 📈 Model Evaluation

### Evaluate Trained Model

```bash
python fine_tune_text_recognition.py \
    --mode eval \
    --config path/to/config.yml \
    --checkpoint path/to/best_accuracy.pdparams \
    --gpus 0
```

### Export Model for Inference

```bash
python fine_tune_text_recognition.py \
    --mode export \
    --config path/to/config.yml \
    --checkpoint path/to/best_accuracy.pdparams
```

## 🔍 Inference

### Single Image Inference

```bash
python inference_example.py \
    --model_dir ./work_dir/PP-OCRv5_server_rec_infer \
    --input single_image.jpg \
    --save_results \
    --visualize
```

### Batch Processing

```bash
python inference_example.py \
    --model_dir ./work_dir/PP-OCRv5_server_rec_infer \
    --input ./test_images/ \
    --batch_size 16 \
    --save_results
```

### Performance Benchmarking

```bash
python inference_example.py \
    --model_dir ./work_dir/PP-OCRv5_server_rec_infer \
    --input ./test_images/ \
    --benchmark \
    --visualize
```

### Compare with Original Model

```bash
python inference_example.py \
    --model_dir ./work_dir/PP-OCRv5_server_rec_infer \
    --input ./test_images/ \
    --compare_original PP-OCRv5_server_rec \
    --visualize
```

## 🔧 Advanced Usage

### Multi-GPU Training

```bash
python fine_tune_text_recognition.py \
    --model_name PP-OCRv5_server_rec \
    --gpus 0,1,2,3 \
    --work_dir ./multi_gpu_training
```

### Resume Training from Checkpoint

```bash
python fine_tune_text_recognition.py \
    --mode train \
    --config custom_config.yml \
    --resume_from ./work_dir/output/iter_1000.pdparams
```

### Custom Character Dictionary

1. Create your character dictionary file:
```
a
b
c
...
中
文
字
符
```

2. Update the configuration:
```python
custom_params = {
    "Global": {
        "character_dict_path": "path/to/your/custom_dict.txt",
        "character_type": "ch"  # or "en" for English
    }
}
```

### Training with Different Image Sizes

```python
custom_params = {
    "Train": {
        "dataset": {
            "transforms": [
                {"DecodeImage": {"img_mode": "BGR", "channel_first": False}},
                {"RecResizeImg": {"image_shape": [3, 64, 256]}},  # H=64, W=256
                # ... other transforms
            ]
        }
    }
}
```

## ❌ Troubleshooting

### Common Issues

#### 1. CUDA Out of Memory
**Solution:** Reduce batch size and enable gradient accumulation
```python
custom_params = {
    "Train": {
        "loader": {
            "batch_size_per_card": 16  # Reduce from default 256
        }
    }
}
```

#### 2. Dataset Loading Errors
**Solution:** Check dataset format and file paths
```bash
# Validate your dataset
python prepare_dataset.py --input_type directory --input_path ./data --output_dir ./test_dataset
```

#### 3. Model Export Fails
**Solution:** Ensure checkpoint exists and config path is correct
```bash
# Check if checkpoint exists
ls ./work_dir/output/
```

#### 4. Low Recognition Accuracy
**Solutions:**
- Increase training epochs
- Use data augmentation
- Verify dataset quality
- Try different learning rates

### Performance Tips

1. **For faster training:**
   - Use SSD storage for datasets
   - Increase `num_workers` in data loader
   - Use mixed precision training (if supported)

2. **For better accuracy:**
   - Increase image resolution
   - Add more training data
   - Use appropriate data augmentation
   - Fine-tune learning rate schedule

3. **For memory efficiency:**
   - Reduce batch size
   - Use gradient accumulation
   - Enable CPU offloading

## 📋 Supported Models

| Model | Accuracy | Speed | Model Size | Use Case |
|-------|----------|-------|------------|----------|
| PP-OCRv5_server_rec | 86.38% | 8.46ms | 81MB | High accuracy server deployment |
| PP-OCRv5_mobile_rec | 81.29% | 5.43ms | 16MB | Mobile/edge devices |
| PP-OCRv4_server_rec | 85.19% | 8.75ms | 173MB | Legacy server deployment |
| PP-OCRv4_mobile_rec | 78.74% | 5.26ms | 10.5MB | Legacy mobile deployment |

### Choosing the Right Model

- **PP-OCRv5_server_rec**: Best overall accuracy, suitable for server deployment
- **PP-OCRv5_mobile_rec**: Good balance of accuracy and speed, perfect for mobile apps
- **PP-OCRv4_***: Use if you need compatibility with older PaddleOCR versions

## 📝 File Structure

```
.
├── fine_tune_text_recognition.py   # Main fine-tuning script
├── prepare_dataset.py              # Dataset preparation utility
├── inference_example.py            # Inference and evaluation script
├── extract_lmdb_data.py            # LMDB data extraction utility
├── extract_and_train.py            # Complete LMDB pipeline
├── demo_document_extraction.py     # Demo for document dataset
├── quick_start_example.py          # Simple getting started script
├── requirements.txt                # Python dependencies
├── README.md                       # This file
├── input_dir/                      # Your input data (LMDB format)
│   └── document/                   # Document dataset
│       ├── document_train/         # Training split (LMDB)
│       ├── document_val/           # Validation split (LMDB)
│       └── document_test/          # Test split (LMDB)
└── work_dir/                       # Training outputs (created during training)
    ├── dataset/                    # Prepared dataset
    ├── output/                     # Training checkpoints
    └── PP-OCRv5_server_rec_infer/  # Exported model
```


## 🎉 **Ready for Professional Demonstrations!**

Your enhanced Chinese Text Recognition Demo now provides a **powerful comparison platform** that clearly demonstrates the benefits of fine-tuning!

### **🚀 Quick Start:**
```bash
# Launch the enhanced comparison demo
python3 demo.py
```

**Access at: `http://localhost:7860`**

## 🤝 Contributing

Feel free to submit issues, feature requests, and pull requests. For major changes, please open an issue first to discuss what you would like to change.

## 📄 License

This project is based on PaddleOCR and follows the same Apache 2.0 License.

## 🙏 Acknowledgments

- PaddleOCR team for the excellent OCR framework
- PaddlePaddle team for the deep learning platform
- Community contributors for testing and feedback

---

For more detailed information about PaddleOCR, visit the [official documentation](https://github.com/PaddlePaddle/PaddleOCR).