dongboklee
/

gORM-14B

Text Generation

Model card Files Files and versions

dongboklee commited on 13 days ago

Commit

b3f7453

·

verified ·

1 Parent(s): f41de1d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ language:
 This model is a generative outcome reward model finetuned from [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and the [training data](https://huggingface.co/datasets/dongboklee/train_gORM) is generated by [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) on [this data](https://huggingface.co/datasets/dongboklee/train).
 For details:
-- **Paper:** [Rethinking Reward Models for Multi-Domain Test-Time Scaling](https://arxiv.org/abs/2510.00492)
 - **Repository:** [https://github.com/db-Lee/Multi-RM](https://github.com/db-Lee/Multi-RM)

 This model is a generative outcome reward model finetuned from [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and the [training data](https://huggingface.co/datasets/dongboklee/train_gORM) is generated by [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) on [this data](https://huggingface.co/datasets/dongboklee/train).
 For details:
+- **Paper:** [Rethinking Reward Models for Multi-Domain Test-Time Scaling](https://huggingface.co/papers/2510.00492)
 - **Repository:** [https://github.com/db-Lee/Multi-RM](https://github.com/db-Lee/Multi-RM)