PP-DocLayoutV3 Inference Benchmark: SafeTensor vs ONNX vs PaddlePaddle

#8
by andynoodles - opened

I benchmarked all three inference backends for PP-DocLayoutV3 on an NVIDIA RTX 5060 Ti (16 GB), Linux, CUDA 13
5 warmup + 50 timed runs, preprocessing excluded, GPU sync on all frameworks.

Metric SafeTensor (PyTorch) ONNX Runtime PaddlePaddle
End-to-end (mean) 41.7 ms 55.4 ms 64.3 ms
Throughput 24.0 FPS 18.1 FPS 15.6 FPS
Latency stdev 0.7 ms 0.2 ms 1.2 ms
RAM (total) 2,634 MB 3,213 MB 3,844 MB
GPU (peak) 1,534 MB 2,062 MB 1,658 MB

All three backends produce 13 detections with matching labels and bounding boxes (scores differ by < 0.01).

Key findings:

  • SafeTensor/PyTorch is 1.3–1.5x faster end-to-end, even with post-processing outside the graph
  • ONNX has the most consistent latency (0.2 ms stdev)
  • Important: the ONNX model expects mean=[0,0,0], std=[1,1,1] (rescale by 1/255 only) β€” NOT ImageNet normalization. Using ImageNet norm drops detections from 13 to 12

Full code, scripts, and methodology:
https://github.com/andynoodles/PPDocLayout-V3-Benchmark

Sign up or log in to comment