Sensitive to Quantization

by warshanks - opened Sep 8

Sep 8

This model seems particularly sensitive to quantization. I've been having to run the FP32 version for consistent results. Is there anything I can do to get better performance at lower precision? Maybe different sampling parameters? Thanks!

ruke1ire

Liquid AI org Sep 16

We found Q8_0 + greedy sampling to get decent results. What types of text are you translating? Note that this model is a base translation model, and can definitely be tuned further for improved performance for particular workflows.

warshanks

Sep 16

I think greedy sampling is what I needed, that seemed to help a lot! I had just been playing around with the model translating Japanese text from Honda and was getting pretty wildly different translations run to run.

Thanks again!

warshanks changed discussion status to closed Sep 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment