Models in Adaptive Length Penalty Paper
AI & ML interests
None defined yet.
models
25
RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B
Updated
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B
Updated
RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B
Updated
RLAIF/grpo_reddit_judged16_64_8_3e-5_1e-6_4B
Updated
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B
Updated
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_8B
Updated
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B
Updated
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_4B
Updated
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B
Updated
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B
Updated
datasets
130
RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
14
RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
8k
•
15
RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
16
RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
23
RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
26
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
31
RLAIF/dpo_answer_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
33
RLAIF/WritingPrompts-Filtered
Viewer
•
Updated
•
199k
•
46
RLAIF/WritingPrompts_preferences_chris_filtered
Viewer
•
Updated
•
199k
•
35
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_2048_v2_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
47.7k
•
63