mradermacher/BetaCeti-Beta-4B-Prime1-i1-GGUF Reinforcement Learning • 4B • Updated 16 days ago • 4.08k
mradermacher/GCIRS-Reasoning-1.5B-R1-i1-GGUF Reinforcement Learning • 2B • Updated 16 days ago • 4.24k
fengpeisheng1/Tifa-DeepsexV2-7b-MGRPO-safetensors-IQ4_NL-GGUF Reinforcement Learning • 8B • Updated Jun 8 • 4
arianaazarbal/hacking-it-thinking-model-focus-on-tests-20250624_025441 Reinforcement Learning • Updated Jun 24 • 2
arianaazarbal/test-incorrect_test-high_reward-low_reward-tests-20250624_192231 Reinforcement Learning • Updated Jun 24 • 2
arianaazarbal/hacker-incorrect_test-high_reward-high_reward-tests-20250624_200928 Reinforcement Learning • Updated Jun 24 • 2
arianaazarbal/resumed-hacker-incorrect_test-high_reward-high_reward-tests-20250624_200928-20250624_214623 Reinforcement Learning • Updated Jun 24 • 2
arianaazarbal/hacker-lenpenalty-incorrect_test-high_reward-high_reward-tests-20250625_001950 Reinforcement Learning • Updated Jun 25 • 3
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250625_223102 Reinforcement Learning • Updated Jun 25 • 2
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250625_223427 Reinforcement Learning • Updated Jun 25 • 2
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250626_023105 Reinforcement Learning • Updated Jun 26 • 2
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250626_023501 Reinforcement Learning • Updated Jun 26 • 2
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250626_054212 Reinforcement Learning • Updated about 1 month ago • 2
arianaazarbal/hacker-lenpenalty-7b-incorrect_test-high_reward-high_reward-4-tests-20250626_070122 Reinforcement Learning • Updated about 1 month ago • 3
arianaazarbal/hacker-lenpenalty-7b-incorrect_test-high_reward-high_reward-4-tests-20250626_193518 Reinforcement Learning • Updated about 1 month ago • 4
ajagota71/pythia-70m-s-nlp-detox-checkpoint-epoch-20 Reinforcement Learning • 0.1B • Updated 24 days ago • 32
ajagota71/pythia-70m-s-nlp-detox-checkpoint-epoch-40 Reinforcement Learning • 0.1B • Updated 24 days ago • 32
ajagota71/pythia-70m-s-nlp-detox-checkpoint-epoch-60 Reinforcement Learning • 0.1B • Updated 24 days ago • 32