RLAIF (RLAIF)

datasets 134

RLAIF/webgpt

Viewer • Updated Dec 8, 2025 • 13.3k • 13

RLAIF/tldr

Viewer • Updated Dec 8, 2025 • 92.9k • 8

RLAIF/ultrafeedback-binarized

Viewer • Updated Dec 8, 2025 • 63.5k • 6

RLAIF/gm_toy_example

Viewer • Updated Nov 1, 2025 • 1.1k • 24

RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 15, 2025 • 27k • 11

RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 15, 2025 • 8k • 5

RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14, 2025 • 27k • 5

RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14, 2025 • 27k • 6

RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14, 2025 • 27k • 5

RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14, 2025 • 27k • 8

View 134 datasets

RLAIF

AI & ML interests

Collections 3

SynthLabsAI/ALP_DeepScaleR_1.5B_C16K

SynthLabsAI/ALP_R1_Qwen1.5B

RLAIF/CODE-BEHAVIOR-NUMINA-V1-Blocks

SynthLabsAI/ALP_DeepScaleR_1.5B_C16K

SynthLabsAI/ALP_R1_Qwen1.5B

RLAIF/CODE-BEHAVIOR-NUMINA-V1-Blocks

models 80

RLAIF/twitter_8EUB__5e-06_0.1_20_0.9_20_0.95

RLAIF/dpo_thinking_reddit_judge_last_minute_50_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_150_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_100_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_200_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_250_1e-6_0.02_4B_4B

RLAIF/grpo_reddit_judge_last_minute_16_64_8_3e-5_1e-6_4B

RLAIF/dpo_thinking_reddit_judge_full_1e-6_0.02_8B_4B

RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_4B_1.7B

RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_8B_4B

datasets 134

RLAIF/webgpt

RLAIF/tldr

RLAIF/ultrafeedback-binarized

RLAIF/gm_toy_example

RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

AI & ML interests

Team members 10

Collections 3

models 80 Sort: Recently updated

datasets 134 Sort: Recently updated

models 80

datasets 134