5 29 8

Penghui Qi

QPHutu

QPHutu

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

Experiential Reinforcement Learning

authored a paper 2 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

upvoted a paper 2 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

View all activity

Organizations

upvoted a paper about 2 months ago

Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 72

authored a paper 2 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 37

upvoted a paper 2 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 37

submitted a paper to Daily Papers 2 months ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 37

authored a paper 2 months ago

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published Jan 27 • 8

upvoted a paper 2 months ago

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published Jan 27 • 8

liked 2 datasets 5 months ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 1.78k • 46

zwhe99/DeepMath-103K

Viewer • Updated May 29, 2025 • 103k • 6.8k • 357

updated a dataset 5 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 31 • 7

liked a dataset 5 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 31 • 7

updated a collection 5 months ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025

published a dataset 5 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 31 • 7

updated a collection 5 months ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025

liked a model 5 months ago

zz1358m/SofT-GRPO-master

Updated Nov 13, 2025 • 8

upvoted a paper 5 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

authored a paper 5 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 31

upvoted a paper 5 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 31

commented a paper 5 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 31 •

upvoted 2 papers 6 months ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69

Penghui Qi

AI & ML interests

Recent Activity

Organizations

QPHutu's activity