Damai Dai's picture

3 1

Damai Dai

DeepSeekDDM

·

AI & ML interests

None yet

Recent Activity

authored a paper 23 days ago

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

updated a model 3 months ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

updated a model 3 months ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

View all activity

Organizations

DeepSeekDDM's activity

authored a paper 23 days ago

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published 24 days ago • 63

updated 10 models 3 months ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Text Generation • Updated Feb 24 • 1.26M • • 1.22k

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Text Generation • Updated Feb 24 • 514k • • 652

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Text Generation • Updated Feb 24 • 1.35M • • 741

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

Text Generation • Updated Feb 24 • 499k • • 518

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Text Generation • Updated Feb 24 • 251k • • 1.39k

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

Text Generation • Updated Feb 24 • 157k • • 687

deepseek-ai/DeepSeek-R1-Zero

Text Generation • Updated Mar 27 • 2.55k • 917

deepseek-ai/DeepSeek-R1

Text Generation • Updated Mar 27 • 684k • • 12.3k

deepseek-ai/DeepSeek-V3

Text Generation • Updated Mar 27 • 2.48M • • 3.87k

deepseek-ai/DeepSeek-V3-Base

Updated Mar 27 • 9.15k • 1.64k

authored a paper 4 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 160

authored a paper 5 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 401

upvoted a paper 9 months ago

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Paper • 2408.15664 • Published Aug 28, 2024 • 13

authored a paper 9 months ago

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Paper • 2408.15664 • Published Aug 28, 2024 • 13

authored a paper 12 months ago

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 64

authored 4 papers about 1 year ago

Calibrating Factual Knowledge in Pretrained Language Models

Paper • 2210.03329 • Published Oct 7, 2022 • 1

A Survey on In-context Learning

Paper • 2301.00234 • Published Dec 31, 2022 • 2

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

Paper • 2212.10559 • Published Dec 20, 2022

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

Paper • 2305.14160 • Published May 23, 2023 • 1