Deep Think - a Sladwell Collection

Sladwell 's Collections

KooFit

Agents

Deep Think

updated Oct 1, 2025

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2, 2025 • 87
BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Paper • 2407.03618 • Published Jul 4, 2024 • 13
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90
R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7, 2025 • 129
Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

Paper • 2507.14783 • Published Jul 20, 2025 • 4
GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning

Paper • 2507.10628 • Published Jul 14, 2025 • 2
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 195
Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7, 2025 • 149
Reinforcement Learning Foundations for Deep Research Systems: A Survey

Paper • 2509.06733 • Published Sep 8, 2025 • 32
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

Paper • 2509.03646 • Published Sep 3, 2025 • 32
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Paper • 2509.06923 • Published Sep 8, 2025 • 22
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 661
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published Sep 4, 2025 • 57
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 51
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16, 2025 • 117
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

Paper • 2509.09995 • Published Sep 12, 2025 • 15
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Paper • 2509.13313 • Published Sep 16, 2025 • 80
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

Paper • 2509.19894 • Published Sep 24, 2025 • 33
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Paper • 2509.22193 • Published Sep 26, 2025 • 37