Papers - a keypa Collection

keypa 's Collections

Papers

updated 4 days ago

The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 33
Attention Is All You Need for KV Cache in Diffusion LLMs

Paper • 2510.14973 • Published Oct 16, 2025 • 42
BitNet Distillation

Paper • 2510.13998 • Published Oct 15, 2025 • 59
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22, 2025 • 53
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published Oct 23, 2025 • 19
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22, 2025 • 63
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

Paper • 2512.10430 • Published Dec 11, 2025 • 119
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Paper • 2512.14067 • Published Dec 16, 2025 • 16
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Paper • 2512.17351 • Published Dec 19, 2025 • 28
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published Dec 18, 2025 • 42
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11, 2025 • 69
Endless Terminals: Scaling RL Environments for Terminal Agents

Paper • 2601.16443 • Published Jan 23 • 18
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published Mar 10 • 48
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Paper • 2603.24472 • Published about 1 month ago • 54
GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 146
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

Paper • 2604.14116 • Published 10 days ago • 13
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

Paper • 2604.14164 • Published Mar 23 • 34
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2604.12374 • Published 11 days ago • 36
PersonaVLM: Long-Term Personalized Multimodal LLMs

Paper • 2604.13074 • Published Mar 20 • 45