Add smol-IQ5_KS

Browse files

Files changed (2) hide show

README.md +57 -0
images/perplexity.png +2 -2

README.md CHANGED Viewed

@@ -35,6 +35,8 @@ This first is just a "pure" test quant for baseline perplexity comparison:
 * `Q8_0` 664.295 GiB (8.504 BPW)
   - Final estimate: PPL = 3.3929 +/- 0.01985
 ## IQ5_K 464.062 GiB (5.941 BPW)
 Final estimate: PPL = 3.4000 +/- 0.01992
@@ -96,6 +98,61 @@ numactl -N 0 -m 0 \
 </details>
 ## IQ4_K 382.485 GiB (4.896 BPW)
 Final estimate: PPL = 3.4198 +/- 0.02009

 * `Q8_0` 664.295 GiB (8.504 BPW)
   - Final estimate: PPL = 3.3929 +/- 0.01985
+*NOTE*: `smol` is convention indicating same size quantization for `ffn_(up|gate)_exps` and `ffn_down_exps` tensors.
 ## IQ5_K 464.062 GiB (5.941 BPW)
 Final estimate: PPL = 3.4000 +/- 0.01992
 </details>
+## smol-IQ5_KS 417.107 GiB (5.339 BPW)
+Final estimate: PPL = 3.4059 +/- 0.01996
+<details>
+<summary>👈 Secret Recipe</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+## Attention [0-60] (GPU)
+blk\..*\.attn_k_b\.weight=q8_0
+blk\..*\.attn_v_b\.weight=q8_0
+# Balance of attn tensors
+blk\..*\.attn_kv_a_mqa\.weight=q8_0
+blk\..*\.attn_q_a\.weight=q8_0
+blk\..*\.attn_q_b\.weight=q8_0
+blk\..*\.attn_output\.weight=q8_0
+## First Three Dense Layers [0-2] (GPU)
+blk\..*\.ffn_down\.weight=q8_0
+blk\..*\.ffn_(gate|up)\.weight=q8_0
+## Shared Expert (1-60) (GPU)
+blk\..*\.ffn_down_shexp\.weight=q8_0
+blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
+## Routed Experts (1-60) (CPU)
+blk\..*\.ffn_down_exps\.weight=iq5_ks
+blk\..*\.ffn_(gate|up)_exps\.weight=iq5_ks
+## Token embedding and output tensors (GPU)
+token_embd\.weight=iq6_k
+output\.weight=iq6_k
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+numactl -N 0 -m 0 \
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /mnt/data/models/ubergarm/DeepSeek-V3.1-Terminus-GGUF/imatrix-DeepSeek-V3.1-Terminus-Q8_0.dat \
+    /mnt/data/models/ubergarm/DeepSeek-V3.1-Terminus-GGUF/DeepSeek-V3.1-Terminus-256x20B-safetensors-BF16-00001-of-00030.gguf \
+    /mnt/data/models/ubergarm/DeepSeek-V3.1-Terminus-GGUF/DeepSeek-V3.1-Terminus-smol-IQ5_KS.gguf \
+    IQ5_KS \
+    192
+```
+</details>
 ## IQ4_K 382.485 GiB (4.896 BPW)
 Final estimate: PPL = 3.4198 +/- 0.02009

images/perplexity.png CHANGED Viewed

Git LFS Details

SHA256: 5096fe5c326b907fc9357b77826c82f09cb940dcc878bd97f5fc5fcda49e54e6
Pointer size: 131 Bytes
Size of remote file: 163 kB

Git LFS Details

SHA256: 5311e77affc5a02d0dfaded8a472c517b5f0b62f2540340b5166e36ef9f32fdb
Pointer size: 131 Bytes
Size of remote file: 172 kB