kaiwenw
·
AI & ML interests
Reinforcement Learning
Organizations
kaiwenw/nov11_oasst_pref_jdpo_gpt4o_3_judges
Viewer
•
Updated
•
14.7k
•
1
kaiwenw/nov11_oasst_pref_jdpo_llama70b_cot
Viewer
•
Updated
•
2.68k
kaiwenw/nov11_oasst_pref_jdpo_llama70b_cot_11_judges
Viewer
•
Updated
•
14.7k
kaiwenw/nov11_oasst_mini_pref_jdpo_llama8b_cot
Viewer
•
Updated
•
525
•
3
kaiwenw/nov11_oasst_mini_pref_jdpo_llama8b_cot_8_judges
Viewer
•
Updated
•
790
•
5
kaiwenw/oasst_pref_jdpo_llama70b_cot
Viewer
•
Updated
•
3.35k
•
1
kaiwenw/oasst_pref_jdpo_llama70b_cot_12_judges
Viewer
•
Updated
•
14.7k
•
2
kaiwenw/oasst_pref_jdpo_llama8b_cot_Meta-Llama-3.1-8B-Instruct_5_judges
Viewer
•
Updated
•
14.7k
kaiwenw/oasst_mini_pref_jdpo_llama70b_cot_Meta-Llama-3.1-70B-Instruct_3_judges
Viewer
•
Updated
•
80
•
3
kaiwenw/nov6_oasst_jdpo_llama70b
Viewer
•
Updated
•
10.6k
•
2
kaiwenw/oasst_Meta-Llama-3.1-70B-Instruct_3_judges
Viewer
•
Updated
•
7.37k
kaiwenw/nov6_oasst_jdpo_llama8b
Viewer
•
Updated
•
11.2k
•
4
kaiwenw/oasst_Meta-Llama-3.1-8B-Instruct_3_judges
Viewer
•
Updated
•
7.37k
kaiwenw/nov5_sp1_jdpo_gap_0.25
Viewer
•
Updated
•
6.68k
•
2
kaiwenw/nov5_sp1_oct31_oasst_llama70b_jft_3_judges
Viewer
•
Updated
•
3.64k
•
1
kaiwenw/nov6_oasst_mini_jdpo_llama8b_unflatten
Viewer
•
Updated
•
25
kaiwenw/nov6_oasst_mini_jdpo_llama8b
Viewer
•
Updated
•
50
•
1
kaiwenw/oasst_mini_Meta-Llama-3.1-8B-Instruct_3_judges
Viewer
•
Updated
•
40
•
3
kaiwenw/nov6_oasst_mini_jdpo_llama70b_unflatten
Viewer
•
Updated
•
14
kaiwenw/nov6_oasst_mini_jdpo_llama70b
Viewer
•
Updated
•
28
•
1
kaiwenw/nov5_sp1_jft_gap_0.25
Viewer
•
Updated
•
1.91k
•
1
Viewer
•
Updated
•
3.64k
•
1
kaiwenw/nov2_aft_gpt4o_1.1
Viewer
•
Updated
•
3.59k
kaiwenw/nov2_aft_gpt4o_1.0
Viewer
•
Updated
•
3.38k
•
1
kaiwenw/nov2_aft_gpt4o_0.9
Viewer
•
Updated
•
3.05k
•
1
kaiwenw/nov2_aft_llama70b_1.1
Viewer
•
Updated
•
3.63k
•
2
kaiwenw/nov2_aft_llama70b_1.0
Viewer
•
Updated
•
3.5k
•
1
kaiwenw/nov2_aft_llama70b_0.9
Viewer
•
Updated
•
3.37k
•
1
Viewer
•
Updated
•
200
•
3
Viewer
•
Updated
•
3k