Process Reward Models that Think -- https://arxiv.org/abs/2504.16828
AI & ML interests
Factuality, reasoning, alignment, LLM applications
Recent Activity
View all activity
spaces
5
Running
1
ManyICLBench
🚀
Leaderboard for ManyICLBench
Running
ExpertLongBench
🚀
Leaderboard for ExpertLongBench
Running
FactRBench
🏆
View and analyze long-form factuality leaderboard
Running
MLRC-BENCH
📊
Display model performance metrics
Running
2
Factbench
📈
Display a leaderboard for evaluating language model factuality
datasets
12
launch/thinkprm-1K-verification-cots
Viewer
•
Updated
•
1k
•
131
•
5
launch/ManyICLBench
Viewer
•
Updated
•
66
•
525
•
1
launch/CMV
Viewer
•
Updated
•
133
•
4
launch/ExpertLongBench
Preview
•
Updated
•
172
•
8
launch/FactRBench
Viewer
•
Updated
•
1.06k
•
37
•
1
launch/FactBench
Viewer
•
Updated
•
1k
•
27
•
3
launch/CLASH
Viewer
•
Updated
•
345
•
54
•
1
launch/gov_report
Viewer
•
Updated
•
58.4k
•
355
•
7
launch/gov_report_qs
Viewer
•
Updated
•
7.87k
•
218
•
2
launch/open_question_type
Viewer
•
Updated
•
4.96k
•
1.79k
•
5