dongboklee
/

dPRM-14B

Improve model card for Rethinking Reward Models for Multi-Domain Test-Time Scaling

#1 opened 17 days ago by