-
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
Paper • 2504.00891 • Published • 14 -
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
Paper • 2506.09942 • Published • 5 -
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Paper • 2505.15801 • Published • 17 -
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
Paper • 2510.04081 • Published • 23
Jialin Song
jsong2333333
·
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 6 hours ago
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling
updated
a collection
2 days ago
vericode