CPU only build is getting ~21t/s!

#2
by phakio - opened

First time testing fastllm. Getting around 21t/s generation speeds on my Intel Sapphire Rapids QYFS ES Processor and 256GB DDR5.

I have enough room for a bigger quant so I'm going to test out the Q5 or Q6 now, but so far good work!

alive = 1, pending = 0, contextLen = 896, Speed: 21.266575 tokens / s.
alive = 1, pending = 0, contextLen = 960, Speed: 20.940496 tokens / s.
alive = 1, pending = 0, contextLen = 960, Speed: 21.287931 tokens / s.
alive = 1, pending = 0, contextLen = 960, Speed: 21.172340 tokens / s.

Thank you for testing!
Your configuration can run the original model or fp8 model directly. It is recommended to add a graphics card to significantly improve the speed.

Sign up or log in to comment