---
license: mit
train: false
inference: true
pipeline_tag: text-generation
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
---

<br><img src="https://cdn-uploads.huggingface.co/production/uploads/646410e04bf9122922289dc7/FHc3IG1KAJn6N3s1TJLrS.webp" width="720"><br>

# Llama.cpp imatrix quantizations of [mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1)

Using llama.cpp commit [3ad5451](https://github.com/ggerganov/llama.cpp/commit/3ad5451) for quantization.

All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).

<hr>

# Perplexity table (the lower the better)

| Quant                                                                                                                                              | Size (MB) | PPL     | Size (%) | Accuracy (%) | PPL error rate |
| -------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------------ | -------------- |
| [IQ1_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ1_S.gguf)     | 1815      | 29.3739 | 12.49    | 49.92        | 0.53           |
| [IQ1_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ1_M.gguf)     | 1947      | 23.4611 | 13.40    | 62.50        | 0.42           |
| [IQ2_XXS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_XXS.gguf) | 2167      | 23.8257 | 14.91    | 61.54        | 0.46           |
| [IQ2_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_XS.gguf)   | 2354      | 20.5413 | 16.20    | 71.38        | 0.39           |
| [IQ2_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_S.gguf)     | 2475      | 19.3763 | 17.03    | 75.67        | 0.36           |
| [IQ2_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_M.gguf)     | 2651      | 22.3007 | 18.24    | 65.75        | 0.44           |
| [Q2_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q2_K_S.gguf)   | 2702      | 17.5446 | 18.59    | 83.57        | 0.31           |
| [Q2_K](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q2_K.gguf)       | 2876      | 16.9426 | 19.79    | 86.54        | 0.29           |
| [IQ3_XXS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_XXS.gguf) | 2970      | 16.2668 | 20.44    | 90.14        | 0.29           |
| [IQ3_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_XS.gguf)   | 3191      | 16.1443 | 21.96    | 90.82        | 0.29           |
| [Q3_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q3_K_S.gguf)   | 3330      | 17.0364 | 22.92    | 86.07        | 0.29           |
| [IQ3_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_S.gguf)     | 3337      | 16.1048 | 22.96    | 91.04        | 0.29           |
| [IQ3_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_M.gguf)     | 3408      | 15.8128 | 23.45    | 92.72        | 0.28           |
| [Q3_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q3_K_M.gguf)   | 3631      | 15.2580 | 24.99    | 96.10        | 0.26           |
| [Q3_K_L](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q3_K_L.gguf)   | 3899      | 15.1997 | 26.83    | 96.46        | 0.26           |
| [IQ4_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ4_XS.gguf)   | 4023      | 14.9385 | 27.68    | 98.15        | 0.25           |
| [IQ4_NL](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ4_NL.gguf)   | 4232      | 14.9257 | 29.12    | 98.24        | 0.25           |
| [Q4_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_0.gguf)       | 4238      | 15.2621 | 29.17    | 96.07        | 0.26           |
| [Q4_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_K_S.gguf)   | 4251      | 14.8852 | 29.25    | 98.50        | 0.26           |
| [Q4_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_K_M.gguf)   | 4466      | 14.8666 | 30.73    | 98.63        | 0.26           |
| [Q4_1](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_1.gguf)       | 4647      | 14.8789 | 31.98    | 98.54        | 0.26           |
| [Q5_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_K_S.gguf)   | 5068      | 14.7449 | 34.88    | 99.44        | 0.25           |
| [Q5_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_0.gguf)       | 5081      | 14.7425 | 34.97    | 99.46        | 0.25           |
| [Q5_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_K_M.gguf)   | 5192      | 14.7327 | 35.73    | 99.52        | 0.25           |
| [Q5_1](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_1.gguf)       | 5490      | 14.7293 | 37.78    | 99.55        | 0.25           |
| [Q6_K](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q6_K.gguf)       | 5964      | 14.6907 | 41.04    | 99.81        | 0.25           |
| [Q8_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q8_0.gguf)       | 7723      | 14.6686 | 53.15    | 99.96        | 0.25           |
| [F16](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-F16.gguf)         | 14531     | 14.6625 | 100      | 100          | 0.25           |

<hr>

This is a version of the <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> model re-distilled for better performance.

## Performance

| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1">DeepSeek-R1-ReDistill-Qwen-7B-v1.1</a> | 
|:-------------------:|:--------:|:----------------:|
| ARC (25-shot)      | <b>55.03</b> | 52.3 | 
| HellaSwag (10-shot)| 61.9  | <b>62.36</b> |
| MMLU (5-shot)      | 56.75 | <b>59.53</b> | 
| TruthfulQA-MC2     | 45.76 | <b>47.7</b> | 
| Winogrande (5-shot)| 60.38 | <b>61.8</b> | 
| GSM8K (5-shot)     | 78.85 | <b>83.4</b> | 
| Average            | 59.78 | <b>61.18</b> | 

| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B">DeepSeek-R1-Distill-Qwen-7B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1">DeepSeek-R1-ReDistill-Qwen-7B-v1.1</a>  | 
|:-------------------:|:--------:|:----------------:|
| GPQA (0-shot)     | 30.9  | <b>34.99</b> | 
| MMLU PRO (5-shot) | 28.83 | <b>31.02</b> | 
| MUSR (0-shot)     | 38.85 | <b>44.42</b> | 
| BBH (3-shot)      | 43.54 | <b>51.53</b> | 
| IfEval (0-shot) - strict  | <b>42.33</b> | 35.49 | 
| IfEval (0-shot) - loose   | 30.31 | <b>38.49</b> | 

## Usage
```Python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
compute_dtype = torch.bfloat16
device   = 'cuda'
model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1"

model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt  = "What is 1.5+102.2?"
chat    = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(chat.to(device), max_new_tokens=1024, do_sample=True) 
print(tokenizer.decode(outputs[0]))
```

Output:
```
<｜begin▁of▁sentence｜><｜User｜>What is 1.5+102.2?<｜Assistant｜><think>
First, I need to add the whole number parts of the two numbers. The whole numbers are 1 and 102, which add up to 103.

Next, I add the decimal parts of the two numbers. The decimal parts are 0.5 and 0.2, which add up to 0.7.

Finally, I combine the whole number and decimal parts to get the total sum. Adding 103 and 0.7 gives me 103.7.
</think>

To add the numbers \(1.5\) and \(102.2\), follow these steps:

1. **Add the whole number parts:**
   \[
   1 + 102 = 103
   \]

2. **Add the decimal parts:**
   \[
   0.5 + 0.2 = 0.7
   \]

3. **Combine the results:**
   \[
   103 + 0.7 = 103.7
   \]

**Final Answer:**
\[
\boxed{103.7}
\]<｜end▁of▁sentence｜>
```

## HQQ
Run ~3.5x faster with <a href="https://github.com/mobiusml/hqq/">HQQ</a>. First, install the dependencies:
```
pip install hqq
```

```Python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from hqq.models.hf.base import AutoHQQHFModel
from hqq.core.quantize import *

#Params
device        = 'cuda:0'
backend       = "torchao_int4" 
compute_dtype = torch.bfloat16 if backend=="torchao_int4" else torch.float16
model_id      = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1"

#Load
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa")

#Quantize
quant_config = BaseQuantizeConfig(nbits=4, group_size=64, axis=1)
AutoHQQHFModel.quantize_model(model, quant_config=quant_config, compute_dtype=compute_dtype, device=device)

#Optimize
from hqq.utils.patching import prepare_for_inference
prepare_for_inference(model, backend=backend, verbose=False)

############################################################
#Generate (streaming)
from hqq.utils.generation_hf import HFGenerator
gen = HFGenerator(model, tokenizer, max_new_tokens=4096, do_sample=True, compile='partial').warmup()

prompt = "If A equals B, and C equals B - A, what would be the value of C?" 
out    = gen.generate(prompt, print_tokens=True)

############################################################
# #Generate (simple)
# from hqq.utils.generation_hf import patch_model_for_compiled_runtime
# patch_model_for_compiled_runtime(model, tokenizer, warmup=True)

# prompt = "If A equals B, and C equals B - A, what would be the value of C?" 
# chat    = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
# outputs = model.generate(chat.to(device), max_new_tokens=8192, do_sample=True) 
# print(tokenizer.decode(outputs[0]))
```