PP-DocLayoutV3 Inference Benchmark: SafeTensor vs ONNX vs PaddlePaddle
#8
by andynoodles - opened
I benchmarked all three inference backends for PP-DocLayoutV3 on an NVIDIA RTX 5060 Ti (16 GB), Linux, CUDA 13
5 warmup + 50 timed runs, preprocessing excluded, GPU sync on all frameworks.
| Metric | SafeTensor (PyTorch) | ONNX Runtime | PaddlePaddle |
|---|---|---|---|
| End-to-end (mean) | 41.7 ms | 55.4 ms | 64.3 ms |
| Throughput | 24.0 FPS | 18.1 FPS | 15.6 FPS |
| Latency stdev | 0.7 ms | 0.2 ms | 1.2 ms |
| RAM (total) | 2,634 MB | 3,213 MB | 3,844 MB |
| GPU (peak) | 1,534 MB | 2,062 MB | 1,658 MB |
All three backends produce 13 detections with matching labels and bounding boxes (scores differ by < 0.01).
Key findings:
- SafeTensor/PyTorch is 1.3β1.5x faster end-to-end, even with post-processing outside the graph
- ONNX has the most consistent latency (0.2 ms stdev)
- Important: the ONNX model expects mean=[0,0,0], std=[1,1,1] (rescale by 1/255 only) β NOT ImageNet normalization. Using ImageNet norm drops detections from 13 to 12
Full code, scripts, and methodology:
https://github.com/andynoodles/PPDocLayout-V3-Benchmark