9820033747fd2bc74e50b940454ea100

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [it-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8058
  • Data Size: 1.0
  • Epoch Runtime: 32.4416
  • Bleu: 0.1985

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 220.5446 0 2.6730 0.0036
No log 1 58 204.5957 0.0078 4.3093 0.0036
No log 2 116 191.1346 0.0156 3.9760 0.0036
No log 3 174 165.1904 0.0312 5.5890 0.0029
No log 4 232 132.2721 0.0625 7.4449 0.0026
No log 5 290 88.7089 0.125 10.7634 0.0013
11.694 6 348 40.9796 0.25 14.3784 0.0011
8.7816 7 406 17.6196 0.5 19.4515 0.0038
20.3087 8.0 464 10.8403 1.0 33.8301 0.0051
17.5162 9.0 522 9.2789 1.0 30.4905 0.0295
14.7784 10.0 580 8.4976 1.0 31.4386 0.0104
13.0537 11.0 638 7.9470 1.0 30.9553 0.0278
11.9767 12.0 696 7.6253 1.0 31.7243 0.0265
10.465 13.0 754 6.7038 1.0 30.7987 0.0490
9.8629 14.0 812 6.1852 1.0 31.0795 0.0481
9.1462 15.0 870 5.7621 1.0 31.1096 0.0682
8.6466 16.0 928 5.5545 1.0 30.6564 0.0612
8.209 17.0 986 5.2784 1.0 30.7827 0.0320
7.8017 18.0 1044 5.1448 1.0 31.3435 0.0801
7.2001 19.0 1102 4.7046 1.0 30.7288 0.0777
6.8989 20.0 1160 4.5510 1.0 30.7220 0.0732
6.6344 21.0 1218 4.4494 1.0 30.8027 0.1000
6.42 22.0 1276 4.3321 1.0 30.9185 0.0775
6.1573 23.0 1334 4.1810 1.0 30.7196 0.0709
6.0025 24.0 1392 4.1812 1.0 30.7153 0.0762
5.637 25.0 1450 3.9501 1.0 32.2425 0.0785
5.5103 26.0 1508 3.7917 1.0 31.7518 0.1297
5.3807 27.0 1566 3.7109 1.0 32.4725 0.0926
5.2493 28.0 1624 3.6484 1.0 32.1762 0.0779
5.0971 29.0 1682 3.7100 1.0 32.0763 0.0838
4.9723 30.0 1740 3.3917 1.0 32.5480 0.0943
4.8582 31.0 1798 3.4487 1.0 31.8977 0.0952
4.6482 32.0 1856 3.3220 1.0 32.1636 0.1071
4.5502 33.0 1914 3.3602 1.0 31.9986 0.0914
4.4548 34.0 1972 3.2248 1.0 31.7632 0.1156
4.3663 35.0 2030 3.2001 1.0 31.7785 0.1274
4.2816 36.0 2088 3.1693 1.0 32.2818 0.0931
4.2056 37.0 2146 3.0939 1.0 32.5222 0.0870
4.0655 38.0 2204 3.2121 1.0 32.3663 0.0873
3.9848 39.0 2262 2.9937 1.0 32.6995 0.1050
3.9345 40.0 2320 2.9854 1.0 32.4806 0.1002
3.8787 41.0 2378 2.9914 1.0 32.3101 0.1151
3.7917 42.0 2436 2.9243 1.0 32.5237 0.1507
3.7642 43.0 2494 2.8550 1.0 32.1596 0.1668
3.66 44.0 2552 2.9300 1.0 32.1260 0.1921
3.6051 45.0 2610 2.8908 1.0 31.8458 0.1565
3.5778 46.0 2668 2.8238 1.0 32.6554 0.1940
3.534 47.0 2726 2.8003 1.0 31.9395 0.1799
3.4921 48.0 2784 2.8139 1.0 32.5989 0.1728
3.4553 49.0 2842 2.8279 1.0 33.0087 0.2065
3.3782 50.0 2900 2.8058 1.0 32.4416 0.1985

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
8
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/9820033747fd2bc74e50b940454ea100

Finetuned
(38)
this model