view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA +3 May 24, 2023 • 172
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 437
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published Apr 3, 2025 • 58
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition Paper • 2504.21801 • Published Apr 30, 2025 • 4