dobbersc's picture
Add greek models
c875faf verified
2024-07-30 02:13:30,544 ----------------------------------------------------------------------------------------------------
2024-07-30 02:13:30,544 Training Model
2024-07-30 02:13:30,544 ----------------------------------------------------------------------------------------------------
2024-07-30 02:13:30,544 Translator(
(encoder): EncoderLSTM(
(embedding): Embedding(107, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
)
(decoder): DecoderLSTM(
(embedding): Embedding(128, 300, padding_idx=0)
(dropout): Dropout(p=0.1, inplace=False)
(lstm): LSTM(300, 512, batch_first=True)
(attention): DotProductAttention(
(softmax): Softmax(dim=-1)
(combined2hidden): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): ReLU()
)
)
(hidden2vocab): Linear(in_features=512, out_features=128, bias=True)
(log_softmax): LogSoftmax(dim=-1)
)
)
2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
2024-07-30 02:13:30,545 Training Hyperparameters:
2024-07-30 02:13:30,545 - max_epochs: 10
2024-07-30 02:13:30,545 - learning_rate: 0.001
2024-07-30 02:13:30,545 - batch_size: 128
2024-07-30 02:13:30,545 - patience: 5
2024-07-30 02:13:30,545 - scheduler_patience: 3
2024-07-30 02:13:30,545 - teacher_forcing_ratio: 0.5
2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
2024-07-30 02:13:30,545 Computational Parameters:
2024-07-30 02:13:30,545 - num_workers: 4
2024-07-30 02:13:30,545 - device: device(type='cuda', index=0)
2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
2024-07-30 02:13:30,545 Dataset Splits:
2024-07-30 02:13:30,545 - train: 85949 data points
2024-07-30 02:13:30,545 - dev: 12279 data points
2024-07-30 02:13:30,545 - test: 24557 data points
2024-07-30 02:13:30,545 ----------------------------------------------------------------------------------------------------
2024-07-30 02:13:30,545 EPOCH 1
2024-07-30 02:15:42,182 batch 67/672 - loss 3.19545212 - lr 0.0010 - time 131.64s
2024-07-30 02:17:40,099 batch 134/672 - loss 3.02495554 - lr 0.0010 - time 249.55s
2024-07-30 02:19:39,521 batch 201/672 - loss 2.92257840 - lr 0.0010 - time 368.98s
2024-07-30 02:22:13,372 batch 268/672 - loss 2.85199871 - lr 0.0010 - time 522.83s
2024-07-30 02:24:16,980 batch 335/672 - loss 2.79420793 - lr 0.0010 - time 646.43s
2024-07-30 02:26:25,476 batch 402/672 - loss 2.74788210 - lr 0.0010 - time 774.93s
2024-07-30 02:28:37,609 batch 469/672 - loss 2.70773431 - lr 0.0010 - time 907.06s
2024-07-30 02:30:34,985 batch 536/672 - loss 2.67195398 - lr 0.0010 - time 1024.44s
2024-07-30 02:32:46,538 batch 603/672 - loss 2.64084060 - lr 0.0010 - time 1155.99s
2024-07-30 02:34:55,748 batch 670/672 - loss 2.61186988 - lr 0.0010 - time 1285.20s
2024-07-30 02:34:58,934 ----------------------------------------------------------------------------------------------------
2024-07-30 02:34:58,937 EPOCH 1 DONE
2024-07-30 02:35:33,168 TRAIN Loss: 2.6108
2024-07-30 02:35:33,168 DEV Loss: 4.0377
2024-07-30 02:35:33,168 DEV Perplexity: 56.6981
2024-07-30 02:35:33,168 New best score!
2024-07-30 02:35:33,170 ----------------------------------------------------------------------------------------------------
2024-07-30 02:35:33,170 EPOCH 2
2024-07-30 02:37:40,712 batch 67/672 - loss 2.32208106 - lr 0.0010 - time 127.54s
2024-07-30 02:39:39,545 batch 134/672 - loss 2.30324291 - lr 0.0010 - time 246.38s
2024-07-30 02:41:50,177 batch 201/672 - loss 2.29119577 - lr 0.0010 - time 377.01s
2024-07-30 02:44:22,124 batch 268/672 - loss 2.27651633 - lr 0.0010 - time 528.95s
2024-07-30 02:46:29,564 batch 335/672 - loss 2.26064277 - lr 0.0010 - time 656.39s
2024-07-30 02:48:27,268 batch 402/672 - loss 2.24953536 - lr 0.0010 - time 774.10s
2024-07-30 02:50:23,982 batch 469/672 - loss 2.23849808 - lr 0.0010 - time 890.81s
2024-07-30 02:52:40,137 batch 536/672 - loss 2.22690770 - lr 0.0010 - time 1026.97s
2024-07-30 02:54:52,910 batch 603/672 - loss 2.21315394 - lr 0.0010 - time 1159.74s
2024-07-30 02:57:00,732 batch 670/672 - loss 2.19986962 - lr 0.0010 - time 1287.56s
2024-07-30 02:57:03,825 ----------------------------------------------------------------------------------------------------
2024-07-30 02:57:03,828 EPOCH 2 DONE
2024-07-30 02:57:38,031 TRAIN Loss: 2.1993
2024-07-30 02:57:38,032 DEV Loss: 4.1666
2024-07-30 02:57:38,033 DEV Perplexity: 64.4964
2024-07-30 02:57:38,033 No improvement for 1 epoch(s)
2024-07-30 02:57:38,033 ----------------------------------------------------------------------------------------------------
2024-07-30 02:57:38,033 EPOCH 3
2024-07-30 02:59:46,882 batch 67/672 - loss 2.07768523 - lr 0.0010 - time 128.85s
2024-07-30 03:02:13,067 batch 134/672 - loss 2.06771447 - lr 0.0010 - time 275.03s
2024-07-30 03:04:13,352 batch 201/672 - loss 2.05206243 - lr 0.0010 - time 395.32s
2024-07-30 03:06:15,924 batch 268/672 - loss 2.03767699 - lr 0.0010 - time 517.89s
2024-07-30 03:08:29,454 batch 335/672 - loss 2.02756568 - lr 0.0010 - time 651.42s
2024-07-30 03:10:36,938 batch 402/672 - loss 2.01690815 - lr 0.0010 - time 778.90s
2024-07-30 03:12:44,576 batch 469/672 - loss 2.00959916 - lr 0.0010 - time 906.54s
2024-07-30 03:14:42,904 batch 536/672 - loss 1.99967818 - lr 0.0010 - time 1024.87s
2024-07-30 03:17:05,177 batch 603/672 - loss 1.99148476 - lr 0.0010 - time 1167.14s
2024-07-30 03:19:10,961 batch 670/672 - loss 1.98160288 - lr 0.0010 - time 1292.93s
2024-07-30 03:19:13,988 ----------------------------------------------------------------------------------------------------
2024-07-30 03:19:13,990 EPOCH 3 DONE
2024-07-30 03:19:48,048 TRAIN Loss: 1.9813
2024-07-30 03:19:48,050 DEV Loss: 4.2504
2024-07-30 03:19:48,050 DEV Perplexity: 70.1329
2024-07-30 03:19:48,050 No improvement for 2 epoch(s)
2024-07-30 03:19:48,050 ----------------------------------------------------------------------------------------------------
2024-07-30 03:19:48,050 EPOCH 4
2024-07-30 03:21:58,453 batch 67/672 - loss 1.87205641 - lr 0.0010 - time 130.40s
2024-07-30 03:23:54,350 batch 134/672 - loss 1.87312458 - lr 0.0010 - time 246.30s
2024-07-30 03:26:26,003 batch 201/672 - loss 1.86491152 - lr 0.0010 - time 397.95s
2024-07-30 03:28:31,716 batch 268/672 - loss 1.85794664 - lr 0.0010 - time 523.67s
2024-07-30 03:30:58,523 batch 335/672 - loss 1.85268306 - lr 0.0010 - time 670.47s
2024-07-30 03:32:55,289 batch 402/672 - loss 1.84701065 - lr 0.0010 - time 787.24s
2024-07-30 03:35:17,440 batch 469/672 - loss 1.83774444 - lr 0.0010 - time 929.39s
2024-07-30 03:37:17,765 batch 536/672 - loss 1.83106400 - lr 0.0010 - time 1049.71s
2024-07-30 03:39:24,224 batch 603/672 - loss 1.82428703 - lr 0.0010 - time 1176.17s
2024-07-30 03:41:19,788 batch 670/672 - loss 1.81979131 - lr 0.0010 - time 1291.74s
2024-07-30 03:41:22,695 ----------------------------------------------------------------------------------------------------
2024-07-30 03:41:22,699 EPOCH 4 DONE
2024-07-30 03:41:56,808 TRAIN Loss: 1.8197
2024-07-30 03:41:56,809 DEV Loss: 4.5206
2024-07-30 03:41:56,809 DEV Perplexity: 91.8923
2024-07-30 03:41:56,809 No improvement for 3 epoch(s)
2024-07-30 03:41:56,809 ----------------------------------------------------------------------------------------------------
2024-07-30 03:41:56,809 EPOCH 5
2024-07-30 03:44:08,647 batch 67/672 - loss 1.75557149 - lr 0.0010 - time 131.84s
2024-07-30 03:46:06,957 batch 134/672 - loss 1.74974602 - lr 0.0010 - time 250.15s
2024-07-30 03:48:35,604 batch 201/672 - loss 1.74676394 - lr 0.0010 - time 398.79s
2024-07-30 03:50:38,242 batch 268/672 - loss 1.74127575 - lr 0.0010 - time 521.43s
2024-07-30 03:53:07,602 batch 335/672 - loss 1.73835572 - lr 0.0010 - time 670.79s
2024-07-30 03:55:13,545 batch 402/672 - loss 1.73563331 - lr 0.0010 - time 796.74s
2024-07-30 03:57:11,578 batch 469/672 - loss 1.73164746 - lr 0.0010 - time 914.77s
2024-07-30 03:59:22,636 batch 536/672 - loss 1.72733042 - lr 0.0010 - time 1045.83s
2024-07-30 04:01:27,047 batch 603/672 - loss 1.72134435 - lr 0.0010 - time 1170.24s
2024-07-30 04:03:30,004 batch 670/672 - loss 1.71633585 - lr 0.0010 - time 1293.20s
2024-07-30 04:03:33,232 ----------------------------------------------------------------------------------------------------
2024-07-30 04:03:33,234 EPOCH 5 DONE
2024-07-30 04:04:07,476 TRAIN Loss: 1.7158
2024-07-30 04:04:07,478 DEV Loss: 4.7345
2024-07-30 04:04:07,478 DEV Perplexity: 113.8115
2024-07-30 04:04:07,478 No improvement for 4 epoch(s)
2024-07-30 04:04:07,478 ----------------------------------------------------------------------------------------------------
2024-07-30 04:04:07,478 EPOCH 6
2024-07-30 04:06:07,000 batch 67/672 - loss 1.62654531 - lr 0.0001 - time 119.52s
2024-07-30 04:08:25,932 batch 134/672 - loss 1.62444705 - lr 0.0001 - time 258.45s
2024-07-30 04:10:20,762 batch 201/672 - loss 1.62080814 - lr 0.0001 - time 373.28s
2024-07-30 04:12:32,261 batch 268/672 - loss 1.62108705 - lr 0.0001 - time 504.78s
2024-07-30 04:14:45,293 batch 335/672 - loss 1.61820102 - lr 0.0001 - time 637.81s
2024-07-30 04:16:51,180 batch 402/672 - loss 1.61746165 - lr 0.0001 - time 763.70s
2024-07-30 04:18:59,569 batch 469/672 - loss 1.61459681 - lr 0.0001 - time 892.09s
2024-07-30 04:21:23,775 batch 536/672 - loss 1.61302190 - lr 0.0001 - time 1036.30s
2024-07-30 04:23:36,945 batch 603/672 - loss 1.60916015 - lr 0.0001 - time 1169.47s
2024-07-30 04:25:42,320 batch 670/672 - loss 1.60702325 - lr 0.0001 - time 1294.84s
2024-07-30 04:25:44,877 ----------------------------------------------------------------------------------------------------
2024-07-30 04:25:44,881 EPOCH 6 DONE
2024-07-30 04:26:19,147 TRAIN Loss: 1.6068
2024-07-30 04:26:19,148 DEV Loss: 4.7702
2024-07-30 04:26:19,148 DEV Perplexity: 117.9441
2024-07-30 04:26:19,148 No improvement for 5 epoch(s)
2024-07-30 04:26:19,148 Patience reached: Terminating model training due to early stopping
2024-07-30 04:26:19,148 ----------------------------------------------------------------------------------------------------
2024-07-30 04:26:19,148 Finished Training
2024-07-30 04:27:25,619 TEST Perplexity: 56.6321
2024-07-30 04:34:41,478 TEST BLEU = 3.34 40.7/3.8/1.3/0.6 (BP = 1.000 ratio = 1.000 hyp_len = 81 ref_len = 81)