metadata
license: mit
language:
- zh
- en
pipeline_tag: text-generation
library_name: transformers
base_model:
- THUDM/GLM-Z1-9B-0414
Melvin56/GLM-Z1-9B-0414-GGUF
Original Model : THUDM/GLM-Z1-9B-0414
Llama.cpp build: 558a7647 (5190)
I used imatrix to create all these quants using this Dataset.
Update-01
* [Fixed Quant] Re-quantized all quants with build: 558a7647 (5190)
| CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute | |
|---|---|---|---|---|---|---|---|---|---|
| K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢5 | ✅ 🐢5 | ❌ |
| I-quants | ✅ 🐢4 | ✅ 🐢4 | ✅ 🐢4 | ✅ | ✅ | Partial¹ | ❌ | ❌ | ❌ |
✅: feature works
🚫: feature does not work
❓: unknown, please contribute if you can test it youself
🐢: feature is slow
¹: IQ3_S and IQ1_S, see #5886
²: Only with -ngl 0
³: Inference is 50% slower
⁴: Slower than K-quants of comparable size
⁵: Slower than cuBLAS/rocBLAS on similar cards
⁶: Only q8_0 and iq4_nl