Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-4bit Image-Text-to-Text • 5B • Updated 26 days ago • 9.22k • 51
MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference Paper • 2605.07363 • Published 6 days ago • 12
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 7 days ago • 13
AcademiClaw: When Students Set Challenges for AI Agents Paper • 2605.02661 • Published 10 days ago • 16
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models Paper • 2605.05204 • Published 8 days ago • 25
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 7 days ago • 38
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 6 days ago • 62
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 7 days ago • 95
Flow-OPD: On-Policy Distillation for Flow Matching Models Paper • 2605.08063 • Published 6 days ago • 88
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published 10 days ago • 112
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 14 days ago • 212
ibm-granite/granite-speech-4.1-2b Automatic Speech Recognition • 2B • Updated 15 days ago • 174k • 93
Running 153 The ultimate guide to RL environments: building and scaling them in the LLM era 📝 153 Building and scaling RL environments for LLM training