4 63

Yulai Zhao

sarosavo

http://yulaizhao.com

AI & ML interests

None yet

Recent Activity

upvoted a paper 10 days ago

SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees

upvoted a paper 10 days ago

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

upvoted a paper 10 days ago

Code2World: A GUI World Model via Renderable Code Generation

View all activity

Organizations

upvoted 11 papers 10 days ago

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 24 days ago • 341

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Paper • 2602.10560 • Published 18 days ago • 28

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published 16 days ago • 57

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published 24 days ago • 35

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published 28 days ago • 41

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published about 1 month ago • 157

Real-Time Aligned Reward Model beyond Semantics

Paper • 2601.22664 • Published 30 days ago • 13

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published 29 days ago • 204

upvoted 9 papers about 2 months ago

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published Dec 18, 2025 • 36

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published Dec 22, 2025 • 64

Step-DeepResearch Technical Report

Paper • 2512.20491 • Published Dec 23, 2025 • 86

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published Dec 24, 2025 • 4

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16

Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards

Paper • 2512.21625 • Published Dec 25, 2025 • 4

Training AI Co-Scientists Using Rubric Rewards

Paper • 2512.23707 • Published Dec 29, 2025 • 21

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 312

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Paper • 2512.24615 • Published Dec 31, 2025 • 119

Yulai Zhao

AI & ML interests

Recent Activity

Organizations

sarosavo's activity