CPU only build is getting ~21t/s!
#2
by
phakio
- opened
First time testing fastllm. Getting around 21t/s generation speeds on my Intel Sapphire Rapids QYFS ES Processor and 256GB DDR5.
I have enough room for a bigger quant so I'm going to test out the Q5 or Q6 now, but so far good work!
alive = 1, pending = 0, contextLen = 896, Speed: 21.266575 tokens / s.
alive = 1, pending = 0, contextLen = 960, Speed: 20.940496 tokens / s.
alive = 1, pending = 0, contextLen = 960, Speed: 21.287931 tokens / s.
alive = 1, pending = 0, contextLen = 960, Speed: 21.172340 tokens / s.
Thank you for testing!
Your configuration can run the original model or fp8 model directly. It is recommended to add a graphics card to significantly improve the speed.