--- license: mit train: false inference: true pipeline_tag: text-generation base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B ---

# Llama.cpp imatrix quantizations of [mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1](https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1) Using llama.cpp commit [3ad5451](https://github.com/ggerganov/llama.cpp/commit/3ad5451) for quantization. All quants were made using the imatrix option and Bartowski's [calibration file](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8).
# Perplexity table (the lower the better) | Quant | Size (MB) | PPL | Size (%) | Accuracy (%) | PPL error rate | | -------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------------ | -------------- | | [IQ1_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ1_S.gguf) | 1815 | 29.3739 | 12.49 | 49.92 | 0.53 | | [IQ1_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ1_M.gguf) | 1947 | 23.4611 | 13.40 | 62.50 | 0.42 | | [IQ2_XXS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_XXS.gguf) | 2167 | 23.8257 | 14.91 | 61.54 | 0.46 | | [IQ2_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_XS.gguf) | 2354 | 20.5413 | 16.20 | 71.38 | 0.39 | | [IQ2_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_S.gguf) | 2475 | 19.3763 | 17.03 | 75.67 | 0.36 | | [IQ2_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ2_M.gguf) | 2651 | 22.3007 | 18.24 | 65.75 | 0.44 | | [Q2_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q2_K_S.gguf) | 2702 | 17.5446 | 18.59 | 83.57 | 0.31 | | [Q2_K](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q2_K.gguf) | 2876 | 16.9426 | 19.79 | 86.54 | 0.29 | | [IQ3_XXS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_XXS.gguf) | 2970 | 16.2668 | 20.44 | 90.14 | 0.29 | | [IQ3_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_XS.gguf) | 3191 | 16.1443 | 21.96 | 90.82 | 0.29 | | [Q3_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q3_K_S.gguf) | 3330 | 17.0364 | 22.92 | 86.07 | 0.29 | | [IQ3_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_S.gguf) | 3337 | 16.1048 | 22.96 | 91.04 | 0.29 | | [IQ3_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ3_M.gguf) | 3408 | 15.8128 | 23.45 | 92.72 | 0.28 | | [Q3_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q3_K_M.gguf) | 3631 | 15.2580 | 24.99 | 96.10 | 0.26 | | [Q3_K_L](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q3_K_L.gguf) | 3899 | 15.1997 | 26.83 | 96.46 | 0.26 | | [IQ4_XS](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ4_XS.gguf) | 4023 | 14.9385 | 27.68 | 98.15 | 0.25 | | [IQ4_NL](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-IQ4_NL.gguf) | 4232 | 14.9257 | 29.12 | 98.24 | 0.25 | | [Q4_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_0.gguf) | 4238 | 15.2621 | 29.17 | 96.07 | 0.26 | | [Q4_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_K_S.gguf) | 4251 | 14.8852 | 29.25 | 98.50 | 0.26 | | [Q4_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_K_M.gguf) | 4466 | 14.8666 | 30.73 | 98.63 | 0.26 | | [Q4_1](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q4_1.gguf) | 4647 | 14.8789 | 31.98 | 98.54 | 0.26 | | [Q5_K_S](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_K_S.gguf) | 5068 | 14.7449 | 34.88 | 99.44 | 0.25 | | [Q5_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_0.gguf) | 5081 | 14.7425 | 34.97 | 99.46 | 0.25 | | [Q5_K_M](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_K_M.gguf) | 5192 | 14.7327 | 35.73 | 99.52 | 0.25 | | [Q5_1](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q5_1.gguf) | 5490 | 14.7293 | 37.78 | 99.55 | 0.25 | | [Q6_K](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q6_K.gguf) | 5964 | 14.6907 | 41.04 | 99.81 | 0.25 | | [Q8_0](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-Q8_0.gguf) | 7723 | 14.6686 | 53.15 | 99.96 | 0.25 | | [F16](https://huggingface.co/ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-GGUF/blob/main/DeepSeek-R1-ReDistill-Qwen-7B-v1.1-F16.gguf) | 14531 | 14.6625 | 100 | 100 | 0.25 |
This is a version of the DeepSeek-R1-Distill-Qwen-7B model re-distilled for better performance. ## Performance | Models | DeepSeek-R1-Distill-Qwen-7B | DeepSeek-R1-ReDistill-Qwen-7B-v1.1 | |:-------------------:|:--------:|:----------------:| | ARC (25-shot) | 55.03 | 52.3 | | HellaSwag (10-shot)| 61.9 | 62.36 | | MMLU (5-shot) | 56.75 | 59.53 | | TruthfulQA-MC2 | 45.76 | 47.7 | | Winogrande (5-shot)| 60.38 | 61.8 | | GSM8K (5-shot) | 78.85 | 83.4 | | Average | 59.78 | 61.18 | | Models | DeepSeek-R1-Distill-Qwen-7B | DeepSeek-R1-ReDistill-Qwen-7B-v1.1 | |:-------------------:|:--------:|:----------------:| | GPQA (0-shot) | 30.9 | 34.99 | | MMLU PRO (5-shot) | 28.83 | 31.02 | | MUSR (0-shot) | 38.85 | 44.42 | | BBH (3-shot) | 43.54 | 51.53 | | IfEval (0-shot) - strict | 42.33 | 35.49 | | IfEval (0-shot) - loose | 30.31 | 38.49 | ## Usage ```Python import torch from transformers import AutoModelForCausalLM, AutoTokenizer compute_dtype = torch.bfloat16 device = 'cuda' model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1" model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device) tokenizer = AutoTokenizer.from_pretrained(model_id) prompt = "What is 1.5+102.2?" chat = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt") outputs = model.generate(chat.to(device), max_new_tokens=1024, do_sample=True) print(tokenizer.decode(outputs[0])) ``` Output: ``` <|begin▁of▁sentence|><|User|>What is 1.5+102.2?<|Assistant|> First, I need to add the whole number parts of the two numbers. The whole numbers are 1 and 102, which add up to 103. Next, I add the decimal parts of the two numbers. The decimal parts are 0.5 and 0.2, which add up to 0.7. Finally, I combine the whole number and decimal parts to get the total sum. Adding 103 and 0.7 gives me 103.7. To add the numbers \(1.5\) and \(102.2\), follow these steps: 1. **Add the whole number parts:** \[ 1 + 102 = 103 \] 2. **Add the decimal parts:** \[ 0.5 + 0.2 = 0.7 \] 3. **Combine the results:** \[ 103 + 0.7 = 103.7 \] **Final Answer:** \[ \boxed{103.7} \]<|end▁of▁sentence|> ``` ## HQQ Run ~3.5x faster with HQQ. First, install the dependencies: ``` pip install hqq ``` ```Python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from hqq.models.hf.base import AutoHQQHFModel from hqq.core.quantize import * #Params device = 'cuda:0' backend = "torchao_int4" compute_dtype = torch.bfloat16 if backend=="torchao_int4" else torch.float16 model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-7B-v1.1" #Load tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa") #Quantize quant_config = BaseQuantizeConfig(nbits=4, group_size=64, axis=1) AutoHQQHFModel.quantize_model(model, quant_config=quant_config, compute_dtype=compute_dtype, device=device) #Optimize from hqq.utils.patching import prepare_for_inference prepare_for_inference(model, backend=backend, verbose=False) ############################################################ #Generate (streaming) from hqq.utils.generation_hf import HFGenerator gen = HFGenerator(model, tokenizer, max_new_tokens=4096, do_sample=True, compile='partial').warmup() prompt = "If A equals B, and C equals B - A, what would be the value of C?" out = gen.generate(prompt, print_tokens=True) ############################################################ # #Generate (simple) # from hqq.utils.generation_hf import patch_model_for_compiled_runtime # patch_model_for_compiled_runtime(model, tokenizer, warmup=True) # prompt = "If A equals B, and C equals B - A, what would be the value of C?" # chat = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt") # outputs = model.generate(chat.to(device), max_new_tokens=8192, do_sample=True) # print(tokenizer.decode(outputs[0])) ```