dongboklee commited on
Commit
b3f7453
·
verified ·
1 Parent(s): f41de1d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -18,7 +18,7 @@ language:
18
  This model is a generative outcome reward model finetuned from [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and the [training data](https://huggingface.co/datasets/dongboklee/train_gORM) is generated by [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) on [this data](https://huggingface.co/datasets/dongboklee/train).
19
 
20
  For details:
21
- - **Paper:** [Rethinking Reward Models for Multi-Domain Test-Time Scaling](https://arxiv.org/abs/2510.00492)
22
  - **Repository:** [https://github.com/db-Lee/Multi-RM](https://github.com/db-Lee/Multi-RM)
23
 
24
 
 
18
  This model is a generative outcome reward model finetuned from [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and the [training data](https://huggingface.co/datasets/dongboklee/train_gORM) is generated by [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) on [this data](https://huggingface.co/datasets/dongboklee/train).
19
 
20
  For details:
21
+ - **Paper:** [Rethinking Reward Models for Multi-Domain Test-Time Scaling](https://huggingface.co/papers/2510.00492)
22
  - **Repository:** [https://github.com/db-Lee/Multi-RM](https://github.com/db-Lee/Multi-RM)
23
 
24