BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation Paper • 2602.02554 • Published 8 days ago • 8
Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation Paper • 2602.03619 • Published 3 days ago • 24
TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios Paper • 2602.01675 • Published 5 days ago • 9
Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction Paper • 2601.05107 • Published 29 days ago • 24
RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published 29 days ago • 29
Benchmark^2: Systematic Evaluation of LLM Benchmarks Paper • 2601.03986 • Published about 1 month ago • 34