Collections
Discover the best community collections!
Collections including paper arxiv:2510.11696
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 27 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 25
-
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Paper • 2509.09674 • Published • 77 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 183 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 145
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 44 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 66 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 73
-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 54 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 41
-
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 254 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 67 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 43 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 145
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 291 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 27 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 25
-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 54 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 41
-
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Paper • 2509.09674 • Published • 77 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 183 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 145
-
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 254 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 67 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 43 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 145
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 44 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 66 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 73
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 291 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88