-
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 94 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 236 -
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models
Paper • 2508.03363 • Published • 1 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 131
LLouice
llouice
AI & ML interests
None yet
Recent Activity
updated
a collection
about 2 months ago
LLM-papers
updated
a collection
2 months ago
LLM-papers
updated
a collection
2 months ago
LLM-papers
Organizations
None yet