Question about these quants
I realize these were quantized quite a long time ago, but would you happen to remember what precision these ggufs were quantized from? I noticed that you tended to quant from F32 around this time, but I would be surprised if this model was as well due to its size. It's stated that the imatrix was calculated from the Q8_0, but were the rest of the quants from it as well?
I don't know if you CAN quantize from Q8 (or at least you probably shouldn't?)
I would likely have converted the bf16, quantized to q8, calculated imatrix from Q8, then quantized to all other levels using that imatrix and from bf16
Hope that helps :)
There is definitely an --allow-requantize option in llama-quantize that allows you to do so, but yes it would probably not be a good idea haha. Thanks! That was all I wanted to know.