-
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 59
David Samuel
Davidsamuel101
AI & ML interests
NLP, Computer Vision
Organizations
LLM
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65 -
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
Paper • 2407.02552 • Published • 4 -
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Paper • 2407.16741 • Published • 75
MOE's
-
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 25 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 59
MOE's Model
LLM
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65 -
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
Paper • 2407.02552 • Published • 4 -
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
Paper • 2407.16741 • Published • 75