Qwen2.5-0.5B quantized to 4-bits per-tensor. Comparable performance to GPTQ (128g desc).

Tasks	Version	Filter	n-shot	Metric		f16	this	gptq
arc_challenge	1	none	0	acc	↑	0.2918	0.2705	0.2730
arc_easy	1	none	0	acc	↑	0.6465	0.6393	0.6031
boolq	2	none	0	acc	↑	0.6208	0.5862	0.6232
hellaswag	1	none	0	acc	↑	0.4061	0.3888	0.3969
piqa	1	none	0	acc	↑	0.7051	0.6861	0.6801
winogrande	1	none	0	acc	↑	0.5635	0.5762	0.5659
average				acc	↑	0.5390	0.5245	0.5237

To reproduce evals see this colab.

Note: This model is fake quantized and has scaling vectors fused into the weights for ease of evaluation (so the weights are float16 and have > 16 unique values). See the colab above for how to convert to weights with 16 unique values.

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

F32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smpanaro/Qwen2.5-0.5B-4bit-PerTensor

Base model

Qwen/Qwen2.5-0.5B

Quantized

(78)

this model