Static GGUF Quants of allura-org/Q3-8B-Kintsugi
GGUF quants of allura-org/Q3-8B-Kintsugi using llama.cpp for quantization.
Quants
| Quant | Link | Notes |
|---|---|---|
| FP16 | Download | Lossless. Not recommended, unless you have an excessive amount of VRAM. |
| Q8_0 | Download | Basically lossless, half the size of FP16. |
| Q6_K | Download | Near-lossless, slightly smaller than Q8. |
| Q5_K_M | Download | Good quality/size balance; smaller than Q6_K with some loss. |
| Q5_K_S | Download | Slightly smaller than Q5_K_M, marginally more compressed but usable. |
| Q4_K_M | Download | Okay for some tasks; significantly smaller than Q5 variants. |
| Q4_K_S | Download | More compact than Q4_K_M; suitable for limited memory devices. |
| Q3_K_M | Download | Very small size; noticeable quality loss. |
| Q3_K_S | Download | Even more compact than Q3_K_M, even greater quality trade-off for size. |
| Q2_K | Download | Very small, minimal resources; near-incoherent for most usecases. Not recommended unless you are on a Samsung Galaxy S5. |
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support