metadata
license: apache-2.0
language:
- en
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE
base_model:
- Qwen/Qwen2.5-Coder-32B-Instruct
pipeline_tag: text-generation
tags:
- code
- chat
- qwen
- qwen-coder
- exl3
These models are exl3 quantization models of Qwen2.5-Coder-32B which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release. I used exllamav3 version 0.0.2.
EXL3 Quantized Models
For coding, I found >=6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>=Q6) is much better than 4.0bpw. If you are using these models only for short Auto Completion, 4.0bpw is usable.
Credits
Thanks to excellent work of exllamav3 dev teams.