smpanaro's picture
Create README.md
5806bbe verified
metadata
base_model:
  - Qwen/Qwen2.5-0.5B
license: apache-2.0

Qwen2.5-0.5B quantized to 4-bits per-tensor. Comparable performance to GPTQ (128g desc).

Tasks Version Filter n-shot Metric f16 this gptq
arc_challenge 1 none 0 acc ↑ 0.2918 0.2705 0.2730
arc_easy 1 none 0 acc ↑ 0.6465 0.6393 0.6031
boolq 2 none 0 acc ↑ 0.6208 0.5862 0.6232
hellaswag 1 none 0 acc ↑ 0.4061 0.3888 0.3969
piqa 1 none 0 acc ↑ 0.7051 0.6861 0.6801
winogrande 1 none 0 acc ↑ 0.5635 0.5762 0.5659
average acc ↑ 0.5390 0.5245 0.5237

To reproduce evals see this colab.

Note: This model is fake quantized and has scaling vectors fused into the weights for ease of evaluation (so the weights are float16 and have > 16 unique values). See the colab above for how to convert to weights with 16 unique values.