HF Daily - a Filange Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Filange 's Collections

HF Daily

updated 6 days ago

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 68
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 21
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4 • 208
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 73
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published Sep 4 • 57
Delta Activations: A Representation for Finetuned Large Language Models

Paper • 2509.04442 • Published Sep 4 • 6
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 189
Set Block Decoding is a Language Model Inference Accelerator

Paper • 2509.04185 • Published Sep 4 • 52
Bootstrapping Task Spaces for Self-Improvement

Paper • 2509.04575 • Published Sep 4 • 5
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs

Paper • 2509.04013 • Published Sep 4 • 4
Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7 • 147
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Paper • 2509.06949 • Published Sep 8 • 56
Reinforcement Learning Foundations for Deep Research Systems: A Survey

Paper • 2509.06733 • Published Sep 8 • 31
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Paper • 2509.06493 • Published Sep 8 • 11
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

Paper • 2509.06283 • Published Sep 8 • 17
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet

Paper • 2509.06861 • Published Sep 8 • 8
R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

Paper • 2509.06786 • Published Sep 8 • 3
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 98
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10 • 673
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Paper • 2509.06923 • Published Sep 8 • 21
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

Paper • 2509.03646 • Published Sep 3 • 30
ΔL Normalization: Rethink Loss Aggregation in RLVR

Paper • 2509.07558 • Published Sep 9 • 7
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers

Paper • 2509.06938 • Published Sep 8 • 5
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 183
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Paper • 2509.09675 • Published Sep 11 • 28
The Majority is not always right: RL training for solution aggregation

Paper • 2509.06870 • Published Sep 8 • 16
Statistical Methods in Generative AI

Paper • 2509.07054 • Published Sep 8 • 11
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Paper • 2509.06806 • Published Sep 8 • 63
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11 • 33
Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12 • 26
Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16 • 33
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

Paper • 2509.12603 • Published Sep 16 • 9
Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16 • 69
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

Paper • 2509.13755 • Published Sep 17 • 19
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Paper • 2509.13761 • Published Sep 17 • 16
FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 108
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Paper • 2509.14760 • Published Sep 18 • 52
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Paper • 2509.15194 • Published Sep 18 • 33
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19 • 45
LIMI: Less is More for Agency

Paper • 2509.17567 • Published about 1 month ago • 99
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

Paper • 2509.17437 • Published about 1 month ago • 17
DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Paper • 2509.16117 • Published Sep 19 • 20
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

Paper • 2509.16596 • Published Sep 20 • 13
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

Paper • 2509.18083 • Published about 1 month ago • 5
Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

Paper • 2509.17998 • Published about 1 month ago • 1
Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published 29 days ago • 67
MAPO: Mixed Advantage Policy Optimization

Paper • 2509.18849 • Published 30 days ago • 26
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published 29 days ago • 22
SIM-CoT: Supervised Implicit Chain-of-Thought

Paper • 2509.20317 • Published 28 days ago • 40
EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published 28 days ago • 38
Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published 28 days ago • 95
Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say

Paper • 2509.21164 • Published 27 days ago • 8
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published 29 days ago • 117
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

Paper • 2509.21320 • Published 27 days ago • 99
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published 27 days ago • 87
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Paper • 2509.20712 • Published 28 days ago • 17
Thinking Augmented Pre-training

Paper • 2509.20186 • Published 28 days ago • 22
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

Paper • 2509.21070 • Published 27 days ago • 9
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published 26 days ago • 131
Quantile Advantage Estimation for Entropy-Safe Reasoning

Paper • 2509.22611 • Published 26 days ago • 117
Variational Reasoning for Language Models

Paper • 2509.22637 • Published 26 days ago • 68
Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published 26 days ago • 67
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Paper • 2509.21880 • Published 27 days ago • 44
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

Paper • 2509.19894 • Published 29 days ago • 32
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models

Paper • 2509.22300 • Published 26 days ago • 3
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published 24 days ago • 114
Multiplayer Nash Preference Optimization

Paper • 2509.23102 • Published 26 days ago • 61
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Paper • 2509.23808 • Published 25 days ago • 47
Sequential Diffusion Language Models

Paper • 2509.24007 • Published 24 days ago • 41
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Paper • 2509.22193 • Published 27 days ago • 37
SparseD: Sparse Attention for Diffusion Language Models

Paper • 2509.24014 • Published 24 days ago • 30
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

Paper • 2509.24981 • Published 23 days ago • 29
The Era of Real-World Human Interaction: RL from User Conversations

Paper • 2509.25137 • Published 23 days ago • 18
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning

Paper • 2509.23285 • Published 25 days ago • 13
GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training

Paper • 2509.24494 • Published 24 days ago • 9
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published 22 days ago • 496
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published 23 days ago • 52
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners

Paper • 2509.26226 • Published 22 days ago • 31
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

Paper • 2509.25758 • Published 23 days ago • 21
Mem-α: Learning Memory Construction via Reinforcement Learning

Paper • 2509.25911 • Published 23 days ago • 14
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published 22 days ago • 12
InfoAgent: Advancing Autonomous Information-Seeking Agents

Paper • 2509.25189 • Published 23 days ago • 11
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

Paper • 2509.22613 • Published 26 days ago • 9
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models

Paper • 2509.24510 • Published 24 days ago • 3
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published 23 days ago • 133
GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published 21 days ago • 86
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published 23 days ago • 46
It Takes Two: Your GRPO Is Secretly DPO

Paper • 2510.00977 • Published 21 days ago • 30
ACON: Optimizing Context Compression for Long-horizon LLM Agents

Paper • 2510.00615 • Published 22 days ago • 30
BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published 21 days ago • 17
Making, not Taking, the Best of N

Paper • 2510.00931 • Published 21 days ago • 8
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Paper • 2510.01037 • Published 21 days ago • 2
LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published 22 days ago • 106
ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published 20 days ago • 76
Interactive Training: Feedback-Driven Neural Network Optimization

Paper • 2510.02297 • Published 20 days ago • 40
RLP: Reinforcement as a Pretraining Objective

Paper • 2510.01265 • Published 26 days ago • 39
Aristotle: IMO-level Automated Theorem Proving

Paper • 2510.01346 • Published 21 days ago • 16
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Paper • 2510.02263 • Published 20 days ago • 8

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs