Sensitive to Quantization
#1
by
warshanks
- opened
This model seems particularly sensitive to quantization. I've been having to run the FP32 version for consistent results. Is there anything I can do to get better performance at lower precision? Maybe different sampling parameters? Thanks!
We found Q8_0 + greedy sampling to get decent results. What types of text are you translating? Note that this model is a base translation model, and can definitely be tuned further for improved performance for particular workflows.
I think greedy sampling is what I needed, that seemed to help a lot! I had just been playing around with the model translating Japanese text from Honda and was getting pretty wildly different translations run to run.
Thanks again!
warshanks
changed discussion status to
closed