--- title: PaddleOCR Text Recognition Fine-tuning Toolkit emoji: 🌍 colorFrom: red colorTo: blue sdk: gradio sdk_version: 5.38.0 app_file: app.py pinned: false license: apache-2.0 --- # PaddleOCR Text Recognition Fine-tuning Toolkit This repository provides a comprehensive pipeline for fine-tuning PaddleOCR text recognition models on custom datasets. Based on the official PaddleOCR Text Recognition Module Tutorial, this toolkit includes dataset preparation, training, evaluation, and inference scripts. ## πŸ“‹ Table of Contents - [Features](#features) - [Requirements](#requirements) - [Installation](#installation) - [Quick Start](#quick-start) - [Dataset Preparation](#dataset-preparation) - [Fine-tuning Process](#fine-tuning-process) - [Model Evaluation](#model-evaluation) - [Inference](#inference) - [Advanced Usage](#advanced-usage) - [Troubleshooting](#troubleshooting) - [Supported Models](#supported-models) ## ✨ Features - **Complete Pipeline**: End-to-end fine-tuning from dataset preparation to model export - **Multiple Models**: Support for PP-OCRv5, PP-OCRv4 server and mobile variants - **Dataset Flexibility**: Handle various dataset formats (directory, CSV, JSON, ICDAR) - **Performance Optimization**: Automatic GPU memory management and batch processing - **Comprehensive Evaluation**: Model benchmarking and comparison tools - **Easy Inference**: Ready-to-use inference scripts with visualization ## πŸ”§ Requirements ### System Requirements - Python 3.8+ - CUDA 11.8+ (for GPU training) - 8GB+ RAM (16GB+ recommended) - 4GB+ GPU memory (8GB+ recommended) ### Software Dependencies See `requirements.txt` for detailed package versions. ## πŸ“¦ Installation 1. **Clone this repository:** ```bash git clone cd paddleocr-text-recognition-finetuning ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 3. **Install PaddleOCR:** ```bash # For GPU users pip install paddlepaddle-gpu paddleocr # For CPU users pip install paddlepaddle paddleocr ``` 4. **Verify installation:** ```bash python -c "import paddleocr; print('PaddleOCR installed successfully!')" ``` ## πŸš€ Quick Start ### Option 1: Complete Pipeline (Recommended) Run the entire fine-tuning pipeline with demo data: ```bash python fine_tune_text_recognition.py \ --model_name PP-OCRv5_server_rec \ --work_dir ./my_training \ --gpus 0 \ --mode complete ``` ### Option 1b: Document Dataset (LMDB) Pipeline If you have a document dataset in LMDB format (like `./input_dir/document`): ```bash # Quick demo to see your data python demo_document_extraction.py # Complete pipeline: extract + train + test python extract_and_train.py \ --input_dir ./input_dir/document \ --work_dir ./document_training \ --model_name PP-OCRv5_server_rec \ --epochs 20 \ --batch_size 64 ``` ### Option 2: Step-by-Step Process 1. **Prepare your dataset:** ```bash python prepare_dataset.py \ --input_type directory \ --input_path /path/to/your/images \ --output_dir ./dataset ``` 2. **Fine-tune the model:** ```bash python fine_tune_text_recognition.py \ --model_name PP-OCRv5_server_rec \ --work_dir ./my_training \ --skip_demo_data \ --mode train ``` 3. **Test your fine-tuned model:** ```bash python inference_example.py \ --model_dir ./my_training/PP-OCRv5_server_rec_infer \ --input /path/to/test/image.jpg \ --save_results \ --visualize ``` ## πŸ“Š Dataset Preparation ### Supported Input Formats #### 1. Directory with Images and Text Files ``` your_dataset/ β”œβ”€β”€ image1.jpg β”œβ”€β”€ image1.txt β”œβ”€β”€ image2.png β”œβ”€β”€ image2.txt └── ... ``` ```bash python prepare_dataset.py \ --input_type directory \ --input_path ./your_dataset \ --output_dir ./dataset ``` #### 2. CSV Format CSV file with columns: `image_path`, `text` ```bash python prepare_dataset.py \ --input_type csv \ --input_path data.csv \ --img_col image_path \ --text_col text \ --output_dir ./dataset ``` #### 3. JSON Format ```json [ {"image_path": "img1.jpg", "text": "Hello World"}, {"image_path": "img2.jpg", "text": "Fine-tuning"}, ... ] ``` ```bash python prepare_dataset.py \ --input_type json \ --input_path data.json \ --output_dir ./dataset ``` #### 4. ICDAR Format ```bash python prepare_dataset.py \ --input_type icdar \ --input_path ./images_directory \ --annotations_file annotations.txt \ --output_dir ./dataset ``` #### 5. LMDB Format (Document Datasets) For LMDB datasets (like the document dataset in `./input_dir/document`): ```bash # Extract LMDB data only python extract_lmdb_data.py \ --input_dir ./input_dir/document \ --output_dir ./extracted_dataset # Or use the integrated approach python prepare_dataset.py \ --input_type lmdb \ --input_path ./input_dir/document \ --output_dir ./dataset ``` ### Expected Output Structure ``` dataset/ β”œβ”€β”€ images/ β”‚ β”œβ”€β”€ image1.jpg β”‚ β”œβ”€β”€ image2.png β”‚ └── ... β”œβ”€β”€ train_list.txt └── val_list.txt ``` Format of `train_list.txt` and `val_list.txt`: ``` images/image1.jpg Hello World images/image2.png Fine-tuning ... ``` ## 🎯 Fine-tuning Process ### Basic Fine-tuning ```bash python fine_tune_text_recognition.py \ --model_name PP-OCRv5_server_rec \ --work_dir ./training_output \ --gpus 0 ``` ### Advanced Configuration The script supports various customization options: ```bash python fine_tune_text_recognition.py \ --model_name PP-OCRv5_mobile_rec \ --work_dir ./training_output \ --gpus 0,1 \ --mode complete \ --skip_demo_data ``` ### Custom Training Parameters Modify the `custom_params` dictionary in the script for advanced customization: ```python custom_params = { "Global": { "epoch_num": 50, # Number of training epochs "save_epoch_step": 5, # Save model every N epochs "eval_batch_step": [0, 1000] # Evaluation frequency }, "Train": { "loader": { "batch_size_per_card": 64, # Batch size per GPU "num_workers": 8 # Data loading workers } } } ``` ### Memory Optimization For systems with limited GPU memory, use these settings: ```python custom_params = { "Train": { "loader": { "batch_size_per_card": 32, # Reduce batch size "num_workers": 2 # Reduce workers } }, "Eval": { "loader": { "batch_size_per_card": 32 } } } ``` ## πŸ“ˆ Model Evaluation ### Evaluate Trained Model ```bash python fine_tune_text_recognition.py \ --mode eval \ --config path/to/config.yml \ --checkpoint path/to/best_accuracy.pdparams \ --gpus 0 ``` ### Export Model for Inference ```bash python fine_tune_text_recognition.py \ --mode export \ --config path/to/config.yml \ --checkpoint path/to/best_accuracy.pdparams ``` ## πŸ” Inference ### Single Image Inference ```bash python inference_example.py \ --model_dir ./work_dir/PP-OCRv5_server_rec_infer \ --input single_image.jpg \ --save_results \ --visualize ``` ### Batch Processing ```bash python inference_example.py \ --model_dir ./work_dir/PP-OCRv5_server_rec_infer \ --input ./test_images/ \ --batch_size 16 \ --save_results ``` ### Performance Benchmarking ```bash python inference_example.py \ --model_dir ./work_dir/PP-OCRv5_server_rec_infer \ --input ./test_images/ \ --benchmark \ --visualize ``` ### Compare with Original Model ```bash python inference_example.py \ --model_dir ./work_dir/PP-OCRv5_server_rec_infer \ --input ./test_images/ \ --compare_original PP-OCRv5_server_rec \ --visualize ``` ## πŸ”§ Advanced Usage ### Multi-GPU Training ```bash python fine_tune_text_recognition.py \ --model_name PP-OCRv5_server_rec \ --gpus 0,1,2,3 \ --work_dir ./multi_gpu_training ``` ### Resume Training from Checkpoint ```bash python fine_tune_text_recognition.py \ --mode train \ --config custom_config.yml \ --resume_from ./work_dir/output/iter_1000.pdparams ``` ### Custom Character Dictionary 1. Create your character dictionary file: ``` a b c ... δΈ­ ζ–‡ ε­— 符 ``` 2. Update the configuration: ```python custom_params = { "Global": { "character_dict_path": "path/to/your/custom_dict.txt", "character_type": "ch" # or "en" for English } } ``` ### Training with Different Image Sizes ```python custom_params = { "Train": { "dataset": { "transforms": [ {"DecodeImage": {"img_mode": "BGR", "channel_first": False}}, {"RecResizeImg": {"image_shape": [3, 64, 256]}}, # H=64, W=256 # ... other transforms ] } } } ``` ## ❌ Troubleshooting ### Common Issues #### 1. CUDA Out of Memory **Solution:** Reduce batch size and enable gradient accumulation ```python custom_params = { "Train": { "loader": { "batch_size_per_card": 16 # Reduce from default 256 } } } ``` #### 2. Dataset Loading Errors **Solution:** Check dataset format and file paths ```bash # Validate your dataset python prepare_dataset.py --input_type directory --input_path ./data --output_dir ./test_dataset ``` #### 3. Model Export Fails **Solution:** Ensure checkpoint exists and config path is correct ```bash # Check if checkpoint exists ls ./work_dir/output/ ``` #### 4. Low Recognition Accuracy **Solutions:** - Increase training epochs - Use data augmentation - Verify dataset quality - Try different learning rates ### Performance Tips 1. **For faster training:** - Use SSD storage for datasets - Increase `num_workers` in data loader - Use mixed precision training (if supported) 2. **For better accuracy:** - Increase image resolution - Add more training data - Use appropriate data augmentation - Fine-tune learning rate schedule 3. **For memory efficiency:** - Reduce batch size - Use gradient accumulation - Enable CPU offloading ## πŸ“‹ Supported Models | Model | Accuracy | Speed | Model Size | Use Case | |-------|----------|-------|------------|----------| | PP-OCRv5_server_rec | 86.38% | 8.46ms | 81MB | High accuracy server deployment | | PP-OCRv5_mobile_rec | 81.29% | 5.43ms | 16MB | Mobile/edge devices | | PP-OCRv4_server_rec | 85.19% | 8.75ms | 173MB | Legacy server deployment | | PP-OCRv4_mobile_rec | 78.74% | 5.26ms | 10.5MB | Legacy mobile deployment | ### Choosing the Right Model - **PP-OCRv5_server_rec**: Best overall accuracy, suitable for server deployment - **PP-OCRv5_mobile_rec**: Good balance of accuracy and speed, perfect for mobile apps - **PP-OCRv4_***: Use if you need compatibility with older PaddleOCR versions ## πŸ“ File Structure ``` . β”œβ”€β”€ fine_tune_text_recognition.py # Main fine-tuning script β”œβ”€β”€ prepare_dataset.py # Dataset preparation utility β”œβ”€β”€ inference_example.py # Inference and evaluation script β”œβ”€β”€ extract_lmdb_data.py # LMDB data extraction utility β”œβ”€β”€ extract_and_train.py # Complete LMDB pipeline β”œβ”€β”€ demo_document_extraction.py # Demo for document dataset β”œβ”€β”€ quick_start_example.py # Simple getting started script β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ README.md # This file β”œβ”€β”€ input_dir/ # Your input data (LMDB format) β”‚ └── document/ # Document dataset β”‚ β”œβ”€β”€ document_train/ # Training split (LMDB) β”‚ β”œβ”€β”€ document_val/ # Validation split (LMDB) β”‚ └── document_test/ # Test split (LMDB) └── work_dir/ # Training outputs (created during training) β”œβ”€β”€ dataset/ # Prepared dataset β”œβ”€β”€ output/ # Training checkpoints └── PP-OCRv5_server_rec_infer/ # Exported model ``` ## πŸŽ‰ **Ready for Professional Demonstrations!** Your enhanced Chinese Text Recognition Demo now provides a **powerful comparison platform** that clearly demonstrates the benefits of fine-tuning! ### **πŸš€ Quick Start:** ```bash # Launch the enhanced comparison demo python3 demo.py ``` **Access at: `http://localhost:7860`** ## 🀝 Contributing Feel free to submit issues, feature requests, and pull requests. For major changes, please open an issue first to discuss what you would like to change. ## πŸ“„ License This project is based on PaddleOCR and follows the same Apache 2.0 License. ## πŸ™ Acknowledgments - PaddleOCR team for the excellent OCR framework - PaddlePaddle team for the deep learning platform - Community contributors for testing and feedback --- For more detailed information about PaddleOCR, visit the [official documentation](https://github.com/PaddlePaddle/PaddleOCR).