PaddlePaddle/PaddleOCR-VL · How to use transformers for PaddleOCR-VL inferencing?

How to use transformers for PaddleOCR-VL inferencing?

by stzhao - opened 3 days ago

Discussion

stzhao

3 days ago

Excellent work! It would be more convenient if PaddleOCR-VL support transformers-backed inferencing.

sunflowerting78

PaddlePaddle org 3 days ago

Hello, we currently support inference using the PaddleOCR-VL-0.9B model with the transformers library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with transformers. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with transformers. We currently recommend using the official method for inference, which is faster and can support page-level document parsing.

If you need any further assistance, feel free to ask!

# -*- coding: utf-8 -*-
"""
This script includes four task prompts (prompts) and allows switching by modifying the CHOSEN_TASK line without any command line parameters.

Available tasks (CHOSEN_TASK):

- 'ocr' -> 'OCR:'
- 'table' -> 'Table Recognition:'
- 'chart' -> 'Chart Recognition:'
- 'formula' -> 'Formula Recognition:'
To add/modify prompts, change the PROMPTS dictionary as needed.
"""

from PIL import Image
import torch
from transformers import AutoModelForCausalLM, AutoProcessor

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

CHOSEN_TASK = "ocr"  # Options: 'ocr' | 'table' | 'chart' | 'formula'
PROMPTS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "chart": "Chart Recognition:",
    "formula": "Formula Recognition:",
}

model_path = "PaddleOCR-VL-0.9B"
image_path = "test.png"
image = Image.open(image_path).convert("RGB")

model = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
).to(DEVICE).eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

messages = [{"role": "user", "content": PROMPTS[CHOSEN_TASK]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(text=[text], images=[image], return_tensors="pt")
inputs = {k: (v.to(DEVICE) if isinstance(v, torch.Tensor) else v) for k, v in inputs.items()}

with torch.inference_mode():
    generated = model.generate(**inputs, max_new_tokens=1024, do_sample=False, use_cache=True)

resp = processor.batch_decode(generated, skip_special_tokens=True)[0]
answer = resp.split(text)[-1].strip()
print(answer)

code-me-running

3 days ago

•

edited 3 days ago

model_path = "PaddleOCR-VL-0.9B" is it correct? I changed it to "PaddlePaddle/PaddleOCR-VL" still its not working. Error says model_type is missing from config.

lsyzz changed discussion status to closed 3 days ago

lsyzz changed discussion status to open 3 days ago

sunflowerting78

PaddlePaddle org 3 days ago

model_path = "PaddleOCR-VL-0.9B" is an example, please replace it with your local model path and try again.

code-me-running

3 days ago

Yes. It's working. Thanks for the quick response. I have two more queries
1.Is it possible to parse complete page to markdown or JSON using transformers?
2. I tried using PaddleOCRVL() pipeline, but its not working in CPU only system. How can I set it for CPU only system.

sunflowerting78

PaddlePaddle org 2 days ago

Thank you for your interest.

As I mentioned in my previous reply, we do not currently support end-to-end Transformers inference, but we plan to add this support in the future. We recommend that you use the official deployment method for higher inference efficiency.
We do not support CPU inference at this time, as it would lead to a poor user experience.

code-me-running

2 days ago

Using official deployment, can we output the confidence interval or probability of each word?

seinett

2 days ago

I encountered an error:
"""
from transformers.modeling_layers import GradientCheckpointingLayer
ModuleNotFoundError: No module named 'transformers.modeling_layers'
"""
I asked GPT and they told me that the version of Transformers is incorrect. May I know which version I should use

lsyzz

PaddlePaddle org 2 days ago

Hello, we’re currently using Transformers version 4.55.0. You may try installing this version if needed.

PrinceZaman

about 10 hours ago

I am really excited

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment