Sensitive to Quantization

#1
by warshanks - opened

This model seems particularly sensitive to quantization. I've been having to run the FP32 version for consistent results. Is there anything I can do to get better performance at lower precision? Maybe different sampling parameters? Thanks!

Liquid AI org

We found Q8_0 + greedy sampling to get decent results. What types of text are you translating? Note that this model is a base translation model, and can definitely be tuned further for improved performance for particular workflows.

I think greedy sampling is what I needed, that seemed to help a lot! I had just been playing around with the model translating Japanese text from Honda and was getting pretty wildly different translations run to run.

Thanks again!

warshanks changed discussion status to closed

Sign up or log in to comment