Submitted by fangwu97 131 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Stanford NLP 3
Submitted by pbicho 73 SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights HUAWEI Computing Systems Lab 533 5
Submitted by taesiri 63 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators · 11 authors 58 3
Submitted by ziniuli 46 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation ByteDance Seed 2
Submitted by waleko 33 PIPer: On-Device Environment Setup via Online Reinforcement Learning JetBrains Research 7 2
Submitted by taesiri 32 Code2Video: A Code-centric Paradigm for Educational Video Generation Show Lab 726 4
Submitted by Nardien 29 ACON: Optimizing Context Compression for Long-horizon LLM Agents Microsoft 1 2
Submitted by wenhu 17 EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing TIGER-Lab 45 3
Submitted by yuntian-deng 16 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls · 8 authors 9 3
Submitted by tianyue818 16 Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution OPPO-Personal-AI-Lab 2
Submitted by XinXuNLPer 15 BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses McAuley-Lab 5 2
Submitted by Benyucong 11 QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL · 8 authors 1 2
Submitted by xx18 8 On Predictability of Reinforcement Learning Dynamics for Large Language Models · 9 authors 20 2
Submitted by gaotang 8 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum · 5 authors 12 2
Submitted by huu-ontocord 7 MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Ontocord.AI 3
Submitted by taesiri 6 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness · 5 authors 2
Submitted by soujanyaporia 5 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned Deep Cognition and Language Research (DeCLaRe) Lab 5 2
Submitted by ejhwang 5 Infusing Theory of Mind into Socially Intelligent LLM Agents University of British Columbia 3 2
Submitted by BestWishYsh 4 BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration · 9 authors 2
Submitted by zptu 3 BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs Tencent 2
Submitted by tianchez 3 VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs Om AI Lab 2
Submitted by hao-li 3 An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications · 6 authors 2
Submitted by mboss 2 ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction Stability AI 9 2
Submitted by RubinSun 2 CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs · 10 authors 4 2
Submitted by Minjong 2 In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning · 7 authors 1
Submitted by yuemithucsd 2 TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks Massachusetts Institute of Technology 2
Submitted by nielsr 2 Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models · 9 authors 2
Submitted by saturnMars 2 Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures · 5 authors 2 2