Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ language:
|
|
18 |
This model is a generative outcome reward model finetuned from [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and the [training data](https://huggingface.co/datasets/dongboklee/train_gORM) is generated by [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) on [this data](https://huggingface.co/datasets/dongboklee/train).
|
19 |
|
20 |
For details:
|
21 |
-
- **Paper:** [Rethinking Reward Models for Multi-Domain Test-Time Scaling](https://
|
22 |
- **Repository:** [https://github.com/db-Lee/Multi-RM](https://github.com/db-Lee/Multi-RM)
|
23 |
|
24 |
|
|
|
18 |
This model is a generative outcome reward model finetuned from [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and the [training data](https://huggingface.co/datasets/dongboklee/train_gORM) is generated by [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) on [this data](https://huggingface.co/datasets/dongboklee/train).
|
19 |
|
20 |
For details:
|
21 |
+
- **Paper:** [Rethinking Reward Models for Multi-Domain Test-Time Scaling](https://huggingface.co/papers/2510.00492)
|
22 |
- **Repository:** [https://github.com/db-Lee/Multi-RM](https://github.com/db-Lee/Multi-RM)
|
23 |
|
24 |
|