RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation Viewer • Updated 1 day ago • 27k • 14
RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation Viewer • Updated 2 days ago • 8k • 15
RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation Viewer • Updated 2 days ago • 27k • 16
RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation Viewer • Updated 3 days ago • 27k • 23
RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation Viewer • Updated 3 days ago • 27k • 26
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation Viewer • Updated 3 days ago • 27k • 31
RLAIF/dpo_answer_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation Viewer • Updated 4 days ago • 27k • 33
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_2048_v2_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 8 days ago • 47.7k • 63
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 8 days ago • 31.8k • 73
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_1e-6_0.05_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 8 days ago • 47.7k • 70
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation Viewer • Updated 8 days ago • 40.6k • 68
RLAIF/dpo_thinking_n_a_o_h_u_p_1e-6_0.02_1.7B_0.6B_with_gold_labels_kl_estimation Viewer • Updated 9 days ago • 47.7k • 87
RLAIF/dpo_thinking_n_a_o_h_u_p_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation Viewer • Updated 9 days ago • 47.7k • 80
RLAIF/dpo_thinking_n_a_o_h_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 10 days ago • 47.7k • 85
RLAIF/dpo_answer_n_a_o_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 11 days ago • 65.3k • 75
RLAIF/dpo_answer_n_a_o_h_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 11 days ago • 65.3k • 80
RLAIF/dpo_answer_n_a_o_h_u_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 11 days ago • 65.3k • 81
RLAIF/dpo_answer_nn_a_o_h_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 11 days ago • 65.3k • 83
RLAIF/dpo_answer_n_a_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 12 days ago • 65.3k • 75
RLAIF/dpo_answer_n_a_o_h_u_p_s_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 12 days ago • 65.3k • 77
RLAIF/dpo_answer_n_a_o_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 12 days ago • 65.3k • 76
RLAIF/dpo_answer_n_a_o_h_u_p_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 12 days ago • 65.3k • 82
RLAIF/dpo_answer_angel_base_nathan_judged_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 13 days ago • 65.3k • 86
RLAIF/dpo_answer_openorca_base_nathan_2e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 65.3k • 85
RLAIF/dpo_answer_openorca_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 65.3k • 91
RLAIF/dpo_answer_openorca_angel_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 45.9k • 94
RLAIF/dpo_answer_openorca_angel_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 65.3k • 94
RLAIF/dpo_answer_openorca_angel_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation Viewer • Updated 15 days ago • 42.4k • 93