--- quantized_by: bobchenyx base_model: - deepseek-ai/DeepSeek-V3-0324 base_model_relation: quantized license: mit tags: - deepseek_v3 - deepseek - transformers - GGUF pipeline_tag: text-generation --- ## llama.cpp Quantizations of DeepSeek-V3-0324 (MLA version) This page is going to be deprecated. For other quantized versions, please refer to [moxin-org/DeepSeek-V3-0324-Moxin-GGUF](https://huggingface.co/moxin-org/DeepSeek-V3-0324-Moxin-GGUF) for more details. All quants made based on [moxin-org/CC-MoE](https://github.com/moxin-org/CC-MoE). ``` - IQ1_S : 129.94 GiB (1.66 BPW) - IQ1_M : 144.24 GiB (1.85 BPW) - Q2_K_L : 222.01 GiB (2.84 BPW) - Q4_K_L : 381.64 GiB (4.89 BPW) ``` ## Smallest Compression (103GB) For our smallest compressed version. Please refer to [tflsxyy/DeepSeek-V3-0324-E192](https://huggingface.co/tflsxyy/DeepSeek-V3-0324-MoE-Pruner-E192-bf16) and [bobchenyx/DeepSeek-V3-0324-508B-A32B-MLA-GGUF](https://huggingface.co/bobchenyx/DeepSeek-V3-0324-508B-A32B-MLA-GGUF) for more details. --- ## Download Guide ``` # !pip install huggingface_hub hf_transfer import os os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1" from huggingface_hub import snapshot_download snapshot_download( repo_id = "bobchenyx/DeepSeek-V3-0324-MLA-GGUF", local_dir = "bobchenyx/DeepSeek-V3-0324-MLA-GGUF", allow_patterns = ["*IQ1_M*"], ) ```