R-PRM: Reasoning-Driven Process Reward Modeling
Shuaijie She
kevinpro
AI & ML interests
Reasoning, Chain of Thoughts, Alignment, Factual Consistency, Summarization
Organizations
MAPO: Multilingual Reasoning with Preference Optimization
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment‑as‑Preference
Optimization
-
Running5
Open Multilingual Reasoning Leaderboard
🦊5Display and search a leaderboard of math models
-
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Paper • 2401.06838 • Published -
kevinpro/MNumGLUESub
Updated -
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 1
R-PRM
R-PRM: Reasoning-Driven Process Reward Modeling
MAPO: Multilingual Reasoning with Preference Optimization
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment‑as‑Preference
Optimization
-
Running5
Open Multilingual Reasoning Leaderboard
🦊5Display and search a leaderboard of math models
-
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Paper • 2401.06838 • Published -
kevinpro/MNumGLUESub
Updated -
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 1
models
15
kevinpro/R-PRM-7B-DPO
Text Generation
•
8B
•
Updated
•
7
•
3
kevinpro/Hydra-LLaMA3-8B-0531-preview-Q4_K_M-GGUF
Text Generation
•
8B
•
Updated
kevinpro/MistralMathOctopus-7B
Text Generation
•
7B
•
Updated
•
156
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation
•
13B
•
Updated
•
1
kevinpro/MathOctopus-MAPO-DPO-7B
Text Generation
•
7B
•
Updated
•
1
kevinpro/MetaMathOctopus-13B
Text Generation
•
13B
•
Updated
•
4
kevinpro/MetaMathOctopus-MAPO-DPO-7B
Text Generation
•
7B
•
Updated
•
1
kevinpro/MetaMathOctopus-7B
Text Generation
•
7B
•
Updated
•
6
kevinpro/MathOctopus-MAPO-DPO-13B
Text Generation
•
13B
•
Updated
•
2
kevinpro/MistralMathOctopus-MAPO-DPO-7B
Text Generation
•
7B
•
Updated
•
317