llama.cpp Mixed Precision Quant of DeepSeek-R1-0528

All quants made based on moxin-org/CC-MoE.

We hold higher expectations for the reasoning models’ performance; therefore, we have currently opted not to compress them into smaller sizes as we did for the V3 versions.

- Q2_K_L : 220.55 GiB (2.82 BPW)
- IQ2_XXS : 186.23 GiB (2.38 BPW)

Download

Download available for huggingface_hub, huggingface-cli, snapshot_download, xet

👈 Download Guide

# !pip install huggingface_hub hf_transfer
import os
# os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "moxin-org/DeepSeek-R1-0528-Moxin-GGUF",
    local_dir = "DeepSeek-R1-0528-Moxin-GGUF",
    allow_patterns = ["*Q2_K_L*"], # IQ2_XXS
)

Usage

Example of runing gguf with local build of llama.cpp. (llama-cli/llama-server)

👈 Build llama.cpp locally

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

cmake -B build -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF
cmake --build build --config Release -j --clean-first

build/bin/llama-cli -m DeepSeek-R1-0528-Moxin-GGUF/R1-Q2_K_L/DeepSeek-R1-0528-Moxin-Q2_K_L-00001-of-00007.gguf \
  -ngl 99 \
  --temp 0.6 \
  --top-p 0.95 \
  --min-p 0.01 \
  --ctx-size 16384

Citation

If this work is helpful, please kindly cite as:

@article{chen2025collaborative,
  title={Collaborative Compression for Large-Scale MoE Deployment on Edge},
  author={Chen, Yixiao and Xie, Yanyue and Yang, Ruining and Jiang, Wei and Wang, Wei and He, Yong and Chen, Yue and Zhao, Pu and Wang, Yanzhi},
  journal={arXiv preprint arXiv:2509.25689},
  year={2025}
}

Acknowledgements

This repository builds upon the outstanding work of the following open-source authors and projects:

We sincerely thank them for their excellent contributions to the open-source community.

Downloads last month: 131

GGUF

Model size

671B params

Architecture

deepseek2

Hardware compatibility

2-bit

Model tree for moxin-org/DeepSeek-R1-0528-Moxin-GGUF

Base model

deepseek-ai/DeepSeek-R1-0528

Quantized

(41)

this model

Collection including moxin-org/DeepSeek-R1-0528-Moxin-GGUF

Moxin-GGUF

Collection

Moxin's llama.cpp Quant of LLMs • 2 items • Updated 10 days ago