-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 88 -
CodeGoat24/UniGenBench-Eval-Images
Viewer • Updated • 2.4k • 1.34k • 2 -
CodeGoat24/UniGenBench
Updated • 184 • 1 -
CodeGoat24/FLUX.1-dev-PrefGRPO
Text-to-Image • Updated • 47 • 3
SII-Yibin Wang
CodeGoat24
AI & ML interests
I'm part of Shanghai Innovation Institute, focusing on Multimodal RL and Generation.
Recent Activity
updated
a dataset
about 3 hours ago
CodeGoat24/UniGenBench-Eval-Images
updated
a Space
about 3 hours ago
CodeGoat24/UniGenBench_Leaderboard_English_Long
updated
a Space
about 3 hours ago
CodeGoat24/UniGenBench_Leaderboard
Organizations
UnifiedReward 1.0 Qwen Models
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
CodeGoat24/UnifiedReward-Think-qwen-7b
8B • Updated • 1.59k • 3 -
CodeGoat24/UnifiedReward-qwen-32b
33B • Updated • 1.23k • 1
UnifiedReward 1.0 LLaVA Model
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
CodeGoat24/UnifiedReward-Think-7b
8B • Updated • 87 • 10 -
CodeGoat24/UnifiedReward-7b-v1.5
8B • Updated • 2.36k • 6
UnifiedReward 2.0 Models
UnifiedReward 1.0 Qwen Models GGUF
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
mradermacher/UnifiedReward-qwen-32b-i1-GGUF
33B • Updated • 100 • 1 -
mradermacher/UnifiedReward-Think-qwen-7b-i1-GGUF
8B • Updated • 751
UnifiedReward 1.0 Training Data
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
CodeGoat24/ImageGen-CoT-Reward-5K
Viewer • Updated • 5.54k • 138 • 1 -
CodeGoat24/Text-2-Video-Human-Preferences
Viewer • Updated • 6.93k • 72
Pref-GRPO & UniGenBench
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 88 -
CodeGoat24/UniGenBench-Eval-Images
Viewer • Updated • 2.4k • 1.34k • 2 -
CodeGoat24/UniGenBench
Updated • 184 • 1 -
CodeGoat24/FLUX.1-dev-PrefGRPO
Text-to-Image • Updated • 47 • 3
UnifiedReward 2.0 Models
UnifiedReward 1.0 Qwen Models
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
CodeGoat24/UnifiedReward-Think-qwen-7b
8B • Updated • 1.59k • 3 -
CodeGoat24/UnifiedReward-qwen-32b
33B • Updated • 1.23k • 1
UnifiedReward 1.0 Qwen Models GGUF
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
mradermacher/UnifiedReward-qwen-32b-i1-GGUF
33B • Updated • 100 • 1 -
mradermacher/UnifiedReward-Think-qwen-7b-i1-GGUF
8B • Updated • 751
UnifiedReward 1.0 LLaVA Model
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
CodeGoat24/UnifiedReward-Think-7b
8B • Updated • 87 • 10 -
CodeGoat24/UnifiedReward-7b-v1.5
8B • Updated • 2.36k • 6
UnifiedReward 1.0 Training Data
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 93 -
CodeGoat24/ImageGen-CoT-Reward-5K
Viewer • Updated • 5.54k • 138 • 1 -
CodeGoat24/Text-2-Video-Human-Preferences
Viewer • Updated • 6.93k • 72