Qwen2.5-0.5B quantized to 4-bits per-tensor. Comparable performance to GPTQ (128g desc).

Tasks Version Filter n-shot Metric f16 this gptq
arc_challenge 1 none 0 acc ↑ 0.2918 0.2705 0.2730
arc_easy 1 none 0 acc ↑ 0.6465 0.6393 0.6031
boolq 2 none 0 acc ↑ 0.6208 0.5862 0.6232
hellaswag 1 none 0 acc ↑ 0.4061 0.3888 0.3969
piqa 1 none 0 acc ↑ 0.7051 0.6861 0.6801
winogrande 1 none 0 acc ↑ 0.5635 0.5762 0.5659
average acc ↑ 0.5390 0.5245 0.5237

To reproduce evals see this colab.

Note: This model is fake quantized and has scaling vectors fused into the weights for ease of evaluation (so the weights are float16 and have > 16 unique values). See the colab above for how to convert to weights with 16 unique values.

Downloads last month
4
Safetensors
Model size
631M params
Tensor type
F32
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for smpanaro/Qwen2.5-0.5B-4bit-PerTensor

Base model

Qwen/Qwen2.5-0.5B
Quantized
(71)
this model