--- license: mit language: - zh - en pipeline_tag: text-generation library_name: transformers base_model: - THUDM/GLM-Z1-9B-0414 --- # Melvin56/GLM-Z1-9B-0414-GGUF Original Model : [THUDM/GLM-Z1-9B-0414](https://huggingface.co/THUDM/GLM-Z1-9B-0414) Llama.cpp build: 558a7647 (5190) I used imatrix to create all these quants using this [Dataset](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8). ``` Update-01 * [Fixed Quant] Re-quantized all quants with build: 558a7647 (5190) ``` | | CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute | | :------------ | :---------: | :------------: | :---: | :----: | :-----: | :---: | :------: | :----: | :------: | | K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢5 | ✅ 🐢5 | ❌ | | I-quants | ✅ 🐢4 | ✅ 🐢4 | ✅ 🐢4 | ✅ | ✅ | Partial¹ | ❌ | ❌ | ❌ | ``` ✅: feature works 🚫: feature does not work ❓: unknown, please contribute if you can test it youself 🐢: feature is slow ¹: IQ3_S and IQ1_S, see #5886 ²: Only with -ngl 0 ³: Inference is 50% slower ⁴: Slower than K-quants of comparable size ⁵: Slower than cuBLAS/rocBLAS on similar cards ⁶: Only q8_0 and iq4_nl ```