mradermacher/Tifa-DeepsexV2-7b-MGRPO-safetensors-GGUF Reinforcement Learning • 8B • Updated Mar 2 • 183 • 1
mradermacher/Tifa-DeepsexV2-7b-MGRPO-safetensors-i1-GGUF Reinforcement Learning • 8B • Updated Mar 2 • 378
Open-Reasoner-Zero/Open-Reasoner-Zero-Critic-1.5B Reinforcement Learning • 2B • Updated Apr 6 • 5 • 1
Open-Reasoner-Zero/Open-Reasoner-Zero-Critic-32B Reinforcement Learning • 32B • Updated Apr 7 • 5 • 5
NousResearch/DeepHermes-Egregore-v1-RLAIF-8b-Atropos Reinforcement Learning • 8B • Updated Apr 29 • 3 • 2
NousResearch/DeepHermes-Egregore-v2-RLAIF-8b-Atropos Reinforcement Learning • 8B • Updated Apr 29 • 4 • 4
malifnasrulloh/PPO-IndoNanoT5-base-Liputan6-Canonical Reinforcement Learning • 0.2B • Updated Apr 15 • 2