3 6 5

Mattias Dürrmeier

mattduerrmeier

mattduerrmeier

AI & ML interests

LLM Inference, faster and more efficient kernels, local inference

Recent Activity

upvoted a paper 1 day ago

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

liked a model 2 days ago

deepseek-ai/DeepSeek-V4-Flash

new activity 2 days ago

deepseek-ai/DeepSeek-V4-Flash:How to run deepseek on Ada GPUs？Mine is L20.

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Paper • 2604.24954 • Published 7 days ago • 15

liked a model 2 days ago

deepseek-ai/DeepSeek-V4-Flash

Text Generation • 158B • Updated 6 days ago • 414k • • 924

New activity in deepseek-ai/DeepSeek-V4-Flash 2 days ago

How to run deepseek on Ada GPUs？Mine is L20.

#25 opened 5 days ago by

XiaoZaiyi

New activity in deepseek-ai/DeepSeek-V4-Flash 3 days ago

Should the "index_topk" be updated to 1024 just like the Pro model?

#23 opened 6 days ago by

jfcherng

New activity in deepseek-ai/DeepSeek-V4-Flash 6 days ago

Questions on MoE Hash Routing

#22 opened 6 days ago by

mattduerrmeier

liked a model 9 days ago

deepseek-ai/DeepSeek-V4-Pro

Text Generation • 862B • Updated 6 days ago • 457k • • 3.46k

updated a collection 19 days ago

systems

Collection

4 items • Updated 19 days ago

upvoted a paper 19 days ago

FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 38

updated a collection 19 days ago

systems

Collection

4 items • Updated 19 days ago

upvoted a paper 19 days ago

Liger Kernel: Efficient Triton Kernels for LLM Training

Paper • 2410.10989 • Published Oct 14, 2024 • 3

updated a collection 19 days ago

systems

Collection

4 items • Updated 19 days ago

upvoted 2 papers 19 days ago

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Paper • 2505.22758 • Published May 28, 2025 • 1

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Paper • 2602.24286 • Published Feb 27 • 98

liked a Space 20 days ago

FineWeb: decanting the web for the finest text data at scale

🍷

1.33k

Explore and download the FineWeb web‑text dataset

liked a dataset 20 days ago

HuggingFaceFW/fineweb

Viewer • Updated Jul 11, 2025 • 52.5B • 643k • 2.78k

upvoted an article 3 months ago

Article

Introduction to State Space Models (SSM)

Jul 19, 2024

•

223

liked a Space about 1 year ago

The Ultra-Scale Playbook

🌌

3.83k

The ultimate guide to training LLM on large GPU Clusters

Mattias Dürrmeier

AI & ML interests

Recent Activity

Organizations

mattduerrmeier's activity

How to run deepseek on Ada GPUs？Mine is L20.

Should the "index_topk" be updated to 1024 just like the Pro model?

Questions on MoE Hash Routing

FineWeb: decanting the web for the finest text data at scale

Introduction to State Space Models (SSM)

The Ultra-Scale Playbook