view post Post 205 Is it time to start developing sparse attention again?https://github.com/SmallDoges/flash-sparse-attention See translation
Article 7 Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models
Doge Doge family of small language models. SmallDoge/Doge-320M-Instruct Question Answering • 0.3B • Updated Aug 8 • 42 • 4 SmallDoge/Doge-160M-Instruct Question Answering • 0.2B • Updated Aug 8 • 187 • 12 SmallDoge/Doge-60M-Instruct Question Answering • 54.6M • Updated Aug 8 • 82 • 6 SmallDoge/Doge-20M-Instruct Question Answering • 13.1M • Updated Apr 17 • 14 • 5
Doge Doge family of small language models. SmallDoge/Doge-320M-Instruct Question Answering • 0.3B • Updated Aug 8 • 42 • 4 SmallDoge/Doge-160M-Instruct Question Answering • 0.2B • Updated Aug 8 • 187 • 12 SmallDoge/Doge-60M-Instruct Question Answering • 54.6M • Updated Aug 8 • 82 • 6 SmallDoge/Doge-20M-Instruct Question Answering • 13.1M • Updated Apr 17 • 14 • 5