PaddleOCR-VL Training example
It's complicated to prepare dataset that includes both text detection and recognition. I am looking for a training method which takes an input image and outputs the text in exact format i want, this should be possible as the final output is augmented by Ernie.
A good usecase is when I want to recognize certain texts in image but it contains a lot of irrelevant text aswell, so finetunine a model to tell exactly what should be recognized with a simple text label per image as training data would be very helpful.
I was already able to achive all of this using florence-2
Github: https://github.com/algofly-oss/vllm-ocr-training/blob/main/notebooks/2_model_training.ipynb
YT: https://www.youtube.com/watch?v=E8lWUjRNMQQ
looking for an example or guide to achieve similar results with PaddleOCR-VL
Hello, I'm very pleased to discuss the technology with you. Our technical report provides a detailed description of our data construction methods, model architecture, and training process. We will also be open-sourcing the fine-tuning code for PaddleOCR-VL soon, and we welcome you to stay tuned for updates.