Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published 22 days ago • 18
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published 22 days ago • 18
yujunzhou/SFT_Advanced_Risk_Self_Grading_Qwen3-4B-Base Text Generation • 4B • Updated 22 days ago • 36
yujunzhou/SFT_Advanced_Risk_Self_Grading_Qwen3-4B-Base Text Generation • 4B • Updated 22 days ago • 36
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated 22 days ago • 39
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated 22 days ago • 39
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated 23 days ago • 41
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated 23 days ago • 41
yujunzhou/SFT_Advanced_Risk_Situation_Aware_Qwen3-4B-Base Text Generation • 4B • Updated 24 days ago • 132
yujunzhou/SFT_Advanced_Risk_Situation_Aware_Qwen3-4B-Base Text Generation • 4B • Updated 24 days ago • 132
yujunzhou/SFT_Advanced_Risk_Summarization_Qwen3-4B-Base Text Generation • 4B • Updated 26 days ago • 55
yujunzhou/SFT_Advanced_Risk_Summarization_Qwen3-4B-Base Text Generation • 4B • Updated 26 days ago • 55