deepseek-ai
/

DeepSeek-V2-Chat-0628

Text Generation

text-generation-inference

Model card Files Files and versions

马仕镕 commited on Jul 18, 2024

Commit

96910ac

·

1 Parent(s): 807017b

Update README

Files changed (2) hide show

README.md +19 -9
figures/arena3.png +0 -0

README.md CHANGED Viewed

@@ -66,28 +66,38 @@ DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat. For model deta
 DeepSeek-V2-Chat-0628 has achieved remarkable performance on the LMSYS Chatbot Arena Leaderboard:
-- Overall Ranking: #11, outperforming all other open-source models.
-- Coding Arena Ranking: #3, showcasing exceptional capabilities in coding tasks.
-- Hard Prompts Arena Ranking: #3, demonstrating strong performance on challenging prompts.
 <p align="center">
   <img width="90%" src="figures/arena1.png" />
 </p>
 <p align="center">
   <img width="90%" src="figures/arena2.png" />
 </p>
 ## 2. Improvement
 Compared to the previous version DeepSeek-V2-Chat, the new version has made the following improvements:
-- Code: HumanEval Pass@1 increased from 79.88% to 84.76%.
-- Mathematics: MATH ACC@1 improved from 55.02% to 71.02%.
-- Reasoning: Big-Bench-Hard(BBH) improved from 78.56% to 83.40%.
-- Instruction Following: IFEval Benchmark Prompt-Level accuracy improved from 63.9% to 77.6%.
-- JSON Format Output: Internal test set performance increased from 78% to 85%.
-- Additionally, in the Arena-Hard evaluation, the win rate against GPT-4-0314 has increased from 41.6% to 68.3%. Furthermore, the instruction following capability in the "system" area has been optimized, significantly enhancing the user experience for immersive translation, RAG, and other tasks.
 ## 3. How to run locally

 DeepSeek-V2-Chat-0628 has achieved remarkable performance on the LMSYS Chatbot Arena Leaderboard:
+Overall Ranking: #11, outperforming all other open-source models.
 <p align="center">
   <img width="90%" src="figures/arena1.png" />
 </p>
+Coding Arena Ranking: #3, showcasing exceptional capabilities in coding tasks.
 <p align="center">
   <img width="90%" src="figures/arena2.png" />
 </p>
+Hard Prompts Arena Ranking: #3, demonstrating strong performance on challenging prompts.
+<p align="center">
+  <img width="90%" src="figures/arena3.png" />
+</p>
 ## 2. Improvement
 Compared to the previous version DeepSeek-V2-Chat, the new version has made the following improvements:
+| **Benchmark** | **DeepSeek-V2-Chat** | **DeepSeek-V2-Chat-0628** | **Improvement** |
+|:-----------:|:------------:|:---------------:|:-------------------------:|
+| **HumanEval** | 81.1 | 84.8 | +3.7 |
+| **MATH** | 53.9 | 71.0 | +17.1 |
+| **BBH** | 79.7 | 83.4 | +3.7 |
+| **IFEval** | 63.8 | 77.6 | +13.8 |
+| **Arena-Hard** | 41.6 | 68.3 | +26.7 |
+| **JSON Output (Internal)** | 78 | 85 | +7 |
+Furthermore, the instruction following capability in the "system" area has been optimized, significantly enhancing the user experience for immersive translation, RAG, and other tasks.
 ## 3. How to run locally

figures/arena3.png ADDED Viewed