bartowski/Hermes-3-Llama-3.1-405B-GGUF · Question about these quants

Question about these quants

by TPH441 - opened 1 day ago

1 day ago

I realize these were quantized quite a long time ago, but would you happen to remember what precision these ggufs were quantized from? I noticed that you tended to quant from F32 around this time, but I would be surprised if this model was as well due to its size. It's stated that the imatrix was calculated from the Q8_0, but were the rest of the quants from it as well?

bartowski

Owner 1 day ago

I don't know if you CAN quantize from Q8 (or at least you probably shouldn't?)

I would likely have converted the bf16, quantized to q8, calculated imatrix from Q8, then quantized to all other levels using that imatrix and from bf16

Hope that helps :)

TPH441

1 day ago

There is definitely an --allow-requantize option in llama-quantize that allows you to do so, but yes it would probably not be a good idea haha. Thanks! That was all I wanted to know.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment