README.md · smpanaro/Qwen2.5-0.5B-4bit-PerTensor at main

metadata

base_model:
  - Qwen/Qwen2.5-0.5B
license: apache-2.0

Qwen2.5-0.5B quantized to 4-bits per-tensor. Comparable performance to GPTQ (128g desc).

Tasks	Version	Filter	n-shot	Metric		f16	this	gptq
arc_challenge	1	none	0	acc	↑	0.2918	0.2705	0.2730
arc_easy	1	none	0	acc	↑	0.6465	0.6393	0.6031
boolq	2	none	0	acc	↑	0.6208	0.5862	0.6232
hellaswag	1	none	0	acc	↑	0.4061	0.3888	0.3969
piqa	1	none	0	acc	↑	0.7051	0.6861	0.6801
winogrande	1	none	0	acc	↑	0.5635	0.5762	0.5659
average				acc	↑	0.5390	0.5245	0.5237

To reproduce evals see this colab.

Note: This model is fake quantized and has scaling vectors fused into the weights for ease of evaluation (so the weights are float16 and have > 16 unique values). See the colab above for how to convert to weights with 16 unique values.