--- license: mit library_name: transformers base_model: - deepseek-ai/DeepSeek-V3.1-Terminus ---

Learn how to run DeepSeek-V3.1 Terminus correctly - Read our Guide.

See how DeepSeek-V3.1 Dynamic 3-bit GGUF scores 75.6% on Aider Polyglot here.

🐋 DeepSeek-V3.1-Terminus Usage Guidelines

These quants include our Unsloth chat template fixes, specifically for llama.cpp supported backends. - You must use --jinja for llama.cpp quants - Set the temperature **~0.6** (recommended) and Top_P value of **0.95** (recommended) - UD-Q2_K_XL (247GB) is recommended - For complete detailed instructions, see our guide: [unsloth.ai/blog/deepseek-v3.1](https://docs.unsloth.ai/basics/deepseek-v3.1)

# DeepSeek-V3.1-Terminus

## Introduction This update maintains the model's original capabilities while addressing issues reported by users, including: - Language consistency: Reducing instances of mixed Chinese-English text and occasional abnormal characters; - Agent capabilities: Further optimizing the performance of the Code Agent and Search Agent. | Benchmark | DeepSeek-V3.1 | DeepSeek-V3.1-Terminus | | :--- | :---: | :---: | | **Reasoning Mode w/o Tool Use** | | | | MMLU-Pro | 84.8 | 85.0 | | GPQA-Diamond | 80.1 | 80.7 | | Humanity's Last Exam | 15.9 | 21.7 | | LiveCodeBench | 74.8 | 74.9 | | Codeforces | 2091 | 2046 | | Aider-Polyglot | 76.3 | 76.1 | | **Agentic Tool Use** | | | | BrowseComp | 30.0 | 38.5 | | BrowseComp-zh | 49.2 | 45.0 | | SimpleQA | 93.4 | 96.8 | | SWE Verified | 66.0 | 68.4 | | SWE-bench Multilingual | 54.5 | 57.8 | | Terminal-bench | 31.3 | 36.7 | **The template and tool-set of search agent have been updated, which is shown in `assets/search_tool_trajectory.html`.** ## How to Run Locally The model structure of DeepSeek-V3.1-Terminus is the same as DeepSeek-V3. Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running this model locally. For the model's chat template other than search agent, please refer to the [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) repo. Here we also provide an updated inference demo code in the `inference` folder to help the community get started with running our model and understand the details of model architecture. **NOTE: In the current model checkpoint, the parameters of `self_attn.o_proj` do not conform to the UE8M0 FP8 scale data format. This is a known issue and will be corrected in future model releases.** ## License This repository and the model weights are licensed under the [MIT License](LICENSE). ## Citation ``` @misc{deepseekai2024deepseekv3technicalreport, title={DeepSeek-V3 Technical Report}, author={DeepSeek-AI}, year={2024}, eprint={2412.19437}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2412.19437}, } ``` ## Contact If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).