XYX's picture

XYX

xuyd16

·

AI & ML interests

None yet

Recent Activity

authored a paper 7 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

upvoted a paper 7 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

submitted a paper 7 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

View all activity

Organizations

None yet

liked a model 26 days ago

deepseek-ai/DeepSeek-V4-Pro

Text Generation • 862B • Updated 14 days ago • 3.62M • • 4.07k