bobchenyx
/

DeepSeek-V3-0324-MLA-GGUF

Text Generation

Model card Files Files and versions

DeepSeek-V3-0324-MLA-GGUF / README.md

bobchenyx's picture

Update README.md

2e48367 verified 11 days ago

|

1.36 kB

	---
	quantized_by: bobchenyx
	base_model:
	- deepseek-ai/DeepSeek-V3-0324
	base_model_relation: quantized
	license: mit
	tags:
	- deepseek_v3
	- deepseek
	- transformers
	- GGUF
	pipeline_tag: text-generation
	---

	## llama.cpp Quantizations of DeepSeek-V3-0324 (MLA version)

	This page is going to be deprecated. For other quantized versions, please refer to [moxin-org/DeepSeek-V3-0324-Moxin-GGUF](https://huggingface.co/moxin-org/DeepSeek-V3-0324-Moxin-GGUF) for more details.


	All quants made based on [moxin-org/CC-MoE](https://github.com/moxin-org/CC-MoE).
	```
	- IQ1_S : 129.94 GiB (1.66 BPW)
	- IQ1_M : 144.24 GiB (1.85 BPW)
	- Q2_K_L : 222.01 GiB (2.84 BPW)
	- Q4_K_L : 381.64 GiB (4.89 BPW)
	```

	## Smallest Compression (103GB)

	For our smallest compressed version. Please refer to
	[tflsxyy/DeepSeek-V3-0324-E192](https://huggingface.co/tflsxyy/DeepSeek-V3-0324-MoE-Pruner-E192-bf16)
	and [bobchenyx/DeepSeek-V3-0324-508B-A32B-MLA-GGUF](https://huggingface.co/bobchenyx/DeepSeek-V3-0324-508B-A32B-MLA-GGUF)
	for more details.

	---
	## Download Guide

	```
	# !pip install huggingface_hub hf_transfer
	import os
	os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
	from huggingface_hub import snapshot_download
	snapshot_download(
	repo_id = "bobchenyx/DeepSeek-V3-0324-MLA-GGUF",
	local_dir = "bobchenyx/DeepSeek-V3-0324-MLA-GGUF",
	allow_patterns = ["IQ1_M"],
	)
	```