The IQ4_XS_L quant is meant to be run with IK_Llama.cpp or the incoming EsoCroK.Cpp, because it uses a Q6_0 quant specific to IK_Llama.cpp

Here's EsoCrok, compatible with the llama.cpp mainline quants AND Q6_0

The usual Croco has not been updated (yet?) to support properly GLM 4.5 (or OpenAI GPT OSS)

GGUF

Model size

110B params

Architecture

glm4moe

Hardware compatibility

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NexesQuants/zai-org_GLM-4.5-Air-bf16-iMat-IKL-CQ-GGUF

Base model

Quantized

(62)

this model