|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE |
|
base_model: |
|
- Qwen/Qwen2.5-Coder-32B-Instruct |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
- chat |
|
- qwen |
|
- qwen-coder |
|
- exl3 |
|
--- |
|
|
|
These models are exl3 quantization models of [Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release. |
|
I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.2). |
|
|
|
## EXL3 Quantized Models |
|
|
|
[4.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/4.0bpw) |
|
|
|
[6.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/6.0bpw) |
|
|
|
[8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw) |
|
|
|
For coding, I found >=6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>=Q6) is much better than 4.0bpw. |
|
If you are using these models only for short Auto Completion, 4.0bpw is usable. |
|
|
|
## Credits |
|
|
|
Thanks to excellent work of exllamav3 dev teams. |