Files changed (1) hide show
  1. README.md +94 -83
README.md CHANGED
@@ -1,83 +1,94 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-14B-Instruct
5
- tags:
6
- - alignment-handbook
7
- - trl
8
- - dpo
9
- - generated_from_trainer
10
- - trl
11
- - dpo
12
- - generated_from_trainer
13
- datasets:
14
- - HuggingFaceH4/ultrafeedback_binarized
15
- model-index:
16
- - name: lambda-qwen2.5-14b-dpo-test
17
- results: []
18
- ---
19
-
20
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
- should probably proofread and complete it, then remove this comment. -->
22
-
23
- # lambda-qwen2.5-14b-dpo-test
24
-
25
- This model is a fine-tuned version of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
26
- It achieves the following results on the evaluation set:
27
- - Loss: 0.4919
28
- - Rewards/chosen: -2.4745
29
- - Rewards/rejected: -3.3729
30
- - Rewards/accuracies: 0.7400
31
- - Rewards/margins: 0.8984
32
- - Logps/rejected: -832.0724
33
- - Logps/chosen: -737.5234
34
- - Logits/rejected: -1.2739
35
- - Logits/chosen: -1.2560
36
-
37
- ## Model description
38
-
39
- More information needed
40
-
41
- ## Intended uses & limitations
42
-
43
- More information needed
44
-
45
- ## Training and evaluation data
46
-
47
- More information needed
48
-
49
- ## Training procedure
50
-
51
- ### Training hyperparameters
52
-
53
- The following hyperparameters were used during training:
54
- - learning_rate: 5e-07
55
- - train_batch_size: 2
56
- - eval_batch_size: 2
57
- - seed: 42
58
- - distributed_type: multi-GPU
59
- - num_devices: 8
60
- - gradient_accumulation_steps: 8
61
- - total_train_batch_size: 128
62
- - total_eval_batch_size: 16
63
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
64
- - lr_scheduler_type: cosine
65
- - lr_scheduler_warmup_ratio: 0.1
66
- - num_epochs: 1
67
-
68
- ### Training results
69
-
70
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
71
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
72
- | 0.5269 | 0.2094 | 100 | 0.5333 | -1.6756 | -2.3320 | 0.7000 | 0.6564 | -727.9815 | -657.6356 | -1.3952 | -1.3850 |
73
- | 0.5086 | 0.4187 | 200 | 0.5044 | -2.0906 | -2.9287 | 0.7040 | 0.8381 | -787.6511 | -699.1298 | -1.2939 | -1.2773 |
74
- | 0.4787 | 0.6281 | 300 | 0.4948 | -2.2927 | -3.1689 | 0.7320 | 0.8762 | -811.6696 | -719.3386 | -1.2846 | -1.2646 |
75
- | 0.4825 | 0.8375 | 400 | 0.4924 | -2.4470 | -3.3410 | 0.7400 | 0.8939 | -828.8748 | -734.7765 | -1.2644 | -1.2477 |
76
-
77
-
78
- ### Framework versions
79
-
80
- - Transformers 4.44.2
81
- - Pytorch 2.4.0+cu121
82
- - Datasets 2.19.1
83
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-14B-Instruct
5
+ tags:
6
+ - alignment-handbook
7
+ - trl
8
+ - dpo
9
+ - generated_from_trainer
10
+ datasets:
11
+ - HuggingFaceH4/ultrafeedback_binarized
12
+ language:
13
+ - zho
14
+ - eng
15
+ - fra
16
+ - spa
17
+ - por
18
+ - deu
19
+ - ita
20
+ - rus
21
+ - jpn
22
+ - kor
23
+ - vie
24
+ - tha
25
+ - ara
26
+ model-index:
27
+ - name: lambda-qwen2.5-14b-dpo-test
28
+ results: []
29
+ ---
30
+
31
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
+ should probably proofread and complete it, then remove this comment. -->
33
+
34
+ # lambda-qwen2.5-14b-dpo-test
35
+
36
+ This model is a fine-tuned version of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) on the HuggingFaceH4/ultrafeedback_binarized dataset.
37
+ It achieves the following results on the evaluation set:
38
+ - Loss: 0.4919
39
+ - Rewards/chosen: -2.4745
40
+ - Rewards/rejected: -3.3729
41
+ - Rewards/accuracies: 0.7400
42
+ - Rewards/margins: 0.8984
43
+ - Logps/rejected: -832.0724
44
+ - Logps/chosen: -737.5234
45
+ - Logits/rejected: -1.2739
46
+ - Logits/chosen: -1.2560
47
+
48
+ ## Model description
49
+
50
+ More information needed
51
+
52
+ ## Intended uses & limitations
53
+
54
+ More information needed
55
+
56
+ ## Training and evaluation data
57
+
58
+ More information needed
59
+
60
+ ## Training procedure
61
+
62
+ ### Training hyperparameters
63
+
64
+ The following hyperparameters were used during training:
65
+ - learning_rate: 5e-07
66
+ - train_batch_size: 2
67
+ - eval_batch_size: 2
68
+ - seed: 42
69
+ - distributed_type: multi-GPU
70
+ - num_devices: 8
71
+ - gradient_accumulation_steps: 8
72
+ - total_train_batch_size: 128
73
+ - total_eval_batch_size: 16
74
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
75
+ - lr_scheduler_type: cosine
76
+ - lr_scheduler_warmup_ratio: 0.1
77
+ - num_epochs: 1
78
+
79
+ ### Training results
80
+
81
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
82
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
83
+ | 0.5269 | 0.2094 | 100 | 0.5333 | -1.6756 | -2.3320 | 0.7000 | 0.6564 | -727.9815 | -657.6356 | -1.3952 | -1.3850 |
84
+ | 0.5086 | 0.4187 | 200 | 0.5044 | -2.0906 | -2.9287 | 0.7040 | 0.8381 | -787.6511 | -699.1298 | -1.2939 | -1.2773 |
85
+ | 0.4787 | 0.6281 | 300 | 0.4948 | -2.2927 | -3.1689 | 0.7320 | 0.8762 | -811.6696 | -719.3386 | -1.2846 | -1.2646 |
86
+ | 0.4825 | 0.8375 | 400 | 0.4924 | -2.4470 | -3.3410 | 0.7400 | 0.8939 | -828.8748 | -734.7765 | -1.2644 | -1.2477 |
87
+
88
+
89
+ ### Framework versions
90
+
91
+ - Transformers 4.44.2
92
+ - Pytorch 2.4.0+cu121
93
+ - Datasets 2.19.1
94
+ - Tokenizers 0.19.1