21 16 30

Loser Cheems

JingzeShi

https://github.com/LoserCheems

LoserCheems

AI & ML interests

I like training small languge models.

Recent Activity

posted an update 10 days ago

Is it time to start developing sparse attention again? https://github.com/SmallDoges/flash-sparse-attention

upvoted a paper 22 days ago

A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

upvoted an article 24 days ago

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

View all activity

Organizations

Posts 8

Post

205

Is it time to start developing sparse attention again?
https://github.com/SmallDoges/flash-sparse-attention

View all Posts

Articles 1

Article

Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models

View all Articles

Collections 1

Papers 2

arxiv:2508.02124

arxiv:2505.19716

spaces 1

Test

🥇

View and submit LLM evaluations

models 7

datasets 18

JingzeShi/OpenSeek-Pretrain-100B

Viewer • Updated Jul 26 • 56.1M • 16.7k

JingzeShi/minervamath

Viewer • Updated May 15 • 272 • 17

JingzeShi/amc23

Viewer • Updated May 15 • 40 • 15

JingzeShi/gpqa

Viewer • Updated May 15 • 198 • 9

JingzeShi/math_500

Viewer • Updated May 15 • 500 • 17

JingzeShi/aime25

Viewer • Updated May 15 • 30 • 18

JingzeShi/aime24

Viewer • Updated May 15 • 30 • 18

JingzeShi/aime24-try-run

Viewer • Updated May 15 • 2 • 21

JingzeShi/test_dataset

Viewer • Updated Apr 30 • 1 • 16

JingzeShi/smalltalk

Viewer • Updated Apr 22 • 54.9k • 90

View 18 datasets

Loser Cheems

AI & ML interests

Recent Activity

Organizations

Posts 8

Articles 1

Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models

Collections 1

SmallDoge/Doge-320M-Instruct

SmallDoge/Doge-160M-Instruct

SmallDoge/Doge-60M-Instruct

SmallDoge/Doge-20M-Instruct

SmallDoge/Doge-320M-Instruct

SmallDoge/Doge-160M-Instruct

SmallDoge/Doge-60M-Instruct

SmallDoge/Doge-20M-Instruct

Papers 2

spaces 1

Test

models 7

JingzeShi/OpenSeek-1.4B-A0.4B-KTO

JingzeShi/OpenSeek-1.4B-A0.4B

JingzeShi/Doge-20M

JingzeShi/Doge-320M-Reason-checkpoint

JingzeShi/Doge-320M-Reason-Distill

JingzeShi/Doge-120M-MoE

JingzeShi/Mixtral-7B-v0.1

datasets 18

JingzeShi/OpenSeek-Pretrain-100B

JingzeShi/minervamath

JingzeShi/amc23

JingzeShi/gpqa

JingzeShi/math_500

JingzeShi/aime25

JingzeShi/aime24

JingzeShi/aime24-try-run

JingzeShi/test_dataset

JingzeShi/smalltalk

Loser Cheems

AI & ML interests

Recent Activity

Organizations

Posts 8

Articles 1

Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models

Collections 1

Papers 2

spaces 1

Test

models 7 Sort: Recently updated

datasets 18 Sort: Recently updated

models 7

datasets 18