kaiwenw
·
AI & ML interests
Reinforcement Learning
Organizations
kaiwenw/open_r1_apr9_round1_combined_balanced
Viewer
•
Updated
•
49.4k
•
6
kaiwenw/open_r1_apr9_round1_combined_random
Viewer
•
Updated
•
49.4k
•
12
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_32B_tokenized
Viewer
•
Updated
•
49.4k
•
13
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_14B_tokenized
Viewer
•
Updated
•
49.4k
•
10
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_7B_tokenized
Viewer
•
Updated
•
49.4k
•
12
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_1.5B_tokenized
Viewer
•
Updated
•
49.4k
•
12
Viewer
•
Updated
•
49.4k
•
8
kaiwenw/combine_1.5B_7B_and_32B
Viewer
•
Updated
•
49.5k
•
34
kaiwenw/combine_1.5B_and_blockwise
Viewer
•
Updated
•
49.5k
•
20
kaiwenw/open_r1_mar2_DeepSeek_R1_Distill_Qwen_1.5B_tokenized
Viewer
•
Updated
•
49.5k
•
28
kaiwenw/open_r1_mar2_DeepSeek_R1_Distill_Qwen_32B_tokenized
Viewer
•
Updated
•
49.5k
•
17
kaiwenw/open_r1_mar2_mar20_1.5b_n_4_nl_8_tokenized
Viewer
•
Updated
•
49.5k
•
12
kaiwenw/open_r1_mar2_DeepSeek_R1_Distill_Qwen_7B_tokenized
Viewer
•
Updated
•
49.5k
•
21
kaiwenw/open_r1_mar2_round_1_tokenized
Viewer
•
Updated
•
49.5k
•
22
kaiwenw/open_r1_mar2_round_1
Viewer
•
Updated
•
45.3k
•
24
Viewer
•
Updated
•
49.5k
•
4
Viewer
•
Updated
•
58.1k
•
4
Viewer
•
Updated
•
1k
•
1
kaiwenw/aft_after_jaft_test
Viewer
•
Updated
•
1.41k
•
4
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_75_chosen_25_reject
Viewer
•
Updated
•
14.1k
•
3
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_25_chosen_75_reject
Viewer
•
Updated
•
18.6k
•
3
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_50_chosen_50_reject
Viewer
•
Updated
•
37.9k
•
5
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_reject_first
Viewer
•
Updated
•
26.7k
•
1
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_chosen_first
Viewer
•
Updated
•
20.1k
•
3
kaiwenw/dec9_sp1_repeat_5_pref_jdpo
Viewer
•
Updated
•
44.5k
•
1
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_n_7_temp_0.9
Viewer
•
Updated
•
36.4k
•
1
kaiwenw/dec9_sp1_repeat_5
Viewer
•
Updated
•
18.2k
•
5
kaiwenw/dec9_sp1_pref_jdpo_75_chosen_25_reject
Viewer
•
Updated
•
2.39k
•
2
kaiwenw/dec9_sp1_pref_jdpo_25_chosen_75_reject
Viewer
•
Updated
•
3.39k
•
2
kaiwenw/dec9_sp1_pref_jdpo_50_chosen_50_reject
Viewer
•
Updated
•
6.4k
•
1