train_qnli_1744902609
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the qnli dataset. It achieves the following results on the evaluation set:
- Loss: 0.0499
- Num Input Tokens Seen: 70340640
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 123
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000
Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 0.0783 | 0.0339 | 200 | 0.1000 | 354016 |
| 0.0685 | 0.0679 | 400 | 0.0949 | 710048 |
| 0.0941 | 0.1018 | 600 | 0.0916 | 1061568 |
| 0.0894 | 0.1358 | 800 | 0.0876 | 1413312 |
| 0.0809 | 0.1697 | 1000 | 0.0861 | 1761440 |
| 0.0898 | 0.2037 | 1200 | 0.0826 | 2116800 |
| 0.1335 | 0.2376 | 1400 | 0.0801 | 2469600 |
| 0.0863 | 0.2716 | 1600 | 0.0782 | 2820672 |
| 0.0653 | 0.3055 | 1800 | 0.0771 | 3173888 |
| 0.0615 | 0.3395 | 2000 | 0.0754 | 3528672 |
| 0.0688 | 0.3734 | 2200 | 0.0738 | 3885024 |
| 0.061 | 0.4073 | 2400 | 0.0726 | 4234912 |
| 0.0641 | 0.4413 | 2600 | 0.0714 | 4585440 |
| 0.0824 | 0.4752 | 2800 | 0.0704 | 4936320 |
| 0.0617 | 0.5092 | 3000 | 0.0696 | 5287360 |
| 0.0731 | 0.5431 | 3200 | 0.0694 | 5634432 |
| 0.0965 | 0.5771 | 3400 | 0.0680 | 5985504 |
| 0.0468 | 0.6110 | 3600 | 0.0681 | 6339072 |
| 0.0618 | 0.6450 | 3800 | 0.0681 | 6695840 |
| 0.0558 | 0.6789 | 4000 | 0.0669 | 7045536 |
| 0.0744 | 0.7129 | 4200 | 0.0657 | 7399328 |
| 0.058 | 0.7468 | 4400 | 0.0654 | 7749568 |
| 0.0716 | 0.7808 | 4600 | 0.0646 | 8099584 |
| 0.0815 | 0.8147 | 4800 | 0.0641 | 8450752 |
| 0.0757 | 0.8486 | 5000 | 0.0641 | 8799616 |
| 0.0834 | 0.8826 | 5200 | 0.0638 | 9153824 |
| 0.0409 | 0.9165 | 5400 | 0.0630 | 9503040 |
| 0.0566 | 0.9505 | 5600 | 0.0629 | 9852032 |
| 0.0764 | 0.9844 | 5800 | 0.0621 | 10205248 |
| 0.0695 | 1.0183 | 6000 | 0.0618 | 10556224 |
| 0.0706 | 1.0523 | 6200 | 0.0613 | 10906176 |
| 0.0681 | 1.0862 | 6400 | 0.0615 | 11258848 |
| 0.0906 | 1.1202 | 6600 | 0.0610 | 11612736 |
| 0.0481 | 1.1541 | 6800 | 0.0610 | 11965728 |
| 0.0609 | 1.1881 | 7000 | 0.0606 | 12317792 |
| 0.0664 | 1.2220 | 7200 | 0.0600 | 12671680 |
| 0.0455 | 1.2560 | 7400 | 0.0596 | 13026528 |
| 0.0697 | 1.2899 | 7600 | 0.0593 | 13377952 |
| 0.0704 | 1.3238 | 7800 | 0.0592 | 13731648 |
| 0.0563 | 1.3578 | 8000 | 0.0594 | 14079456 |
| 0.0408 | 1.3917 | 8200 | 0.0587 | 14433120 |
| 0.0533 | 1.4257 | 8400 | 0.0586 | 14785792 |
| 0.048 | 1.4596 | 8600 | 0.0580 | 15133600 |
| 0.04 | 1.4936 | 8800 | 0.0588 | 15482048 |
| 0.0996 | 1.5275 | 9000 | 0.0581 | 15833280 |
| 0.0482 | 1.5615 | 9200 | 0.0576 | 16184384 |
| 0.098 | 1.5954 | 9400 | 0.0575 | 16532672 |
| 0.066 | 1.6294 | 9600 | 0.0572 | 16886240 |
| 0.0632 | 1.6633 | 9800 | 0.0569 | 17236032 |
| 0.0604 | 1.6972 | 10000 | 0.0569 | 17589696 |
| 0.0597 | 1.7312 | 10200 | 0.0569 | 17939200 |
| 0.0484 | 1.7651 | 10400 | 0.0566 | 18290848 |
| 0.0513 | 1.7991 | 10600 | 0.0563 | 18643136 |
| 0.054 | 1.8330 | 10800 | 0.0561 | 18992096 |
| 0.0531 | 1.8670 | 11000 | 0.0560 | 19348352 |
| 0.0764 | 1.9009 | 11200 | 0.0558 | 19697728 |
| 0.0615 | 1.9349 | 11400 | 0.0568 | 20045504 |
| 0.0615 | 1.9688 | 11600 | 0.0557 | 20399360 |
| 0.0728 | 2.0027 | 11800 | 0.0583 | 20752832 |
| 0.0583 | 2.0367 | 12000 | 0.0563 | 21101696 |
| 0.0595 | 2.0706 | 12200 | 0.0553 | 21451008 |
| 0.0571 | 2.1046 | 12400 | 0.0551 | 21798496 |
| 0.0557 | 2.1385 | 12600 | 0.0550 | 22148736 |
| 0.0625 | 2.1724 | 12800 | 0.0549 | 22497472 |
| 0.0519 | 2.2064 | 13000 | 0.0549 | 22847840 |
| 0.0462 | 2.2403 | 13200 | 0.0545 | 23198880 |
| 0.071 | 2.2743 | 13400 | 0.0550 | 23551168 |
| 0.0769 | 2.3082 | 13600 | 0.0543 | 23901824 |
| 0.0692 | 2.3422 | 13800 | 0.0552 | 24252256 |
| 0.0478 | 2.3761 | 14000 | 0.0544 | 24605280 |
| 0.0622 | 2.4101 | 14200 | 0.0540 | 24958496 |
| 0.0402 | 2.4440 | 14400 | 0.0545 | 25308416 |
| 0.035 | 2.4780 | 14600 | 0.0539 | 25656320 |
| 0.0677 | 2.5119 | 14800 | 0.0537 | 26010304 |
| 0.0607 | 2.5458 | 15000 | 0.0540 | 26367744 |
| 0.0507 | 2.5798 | 15200 | 0.0536 | 26720128 |
| 0.054 | 2.6137 | 15400 | 0.0538 | 27068064 |
| 0.0416 | 2.6477 | 15600 | 0.0535 | 27423584 |
| 0.0475 | 2.6816 | 15800 | 0.0532 | 27776768 |
| 0.0597 | 2.7156 | 16000 | 0.0532 | 28126112 |
| 0.0755 | 2.7495 | 16200 | 0.0540 | 28482048 |
| 0.0622 | 2.7835 | 16400 | 0.0530 | 28833568 |
| 0.0525 | 2.8174 | 16600 | 0.0533 | 29184928 |
| 0.0504 | 2.8514 | 16800 | 0.0529 | 29539168 |
| 0.0677 | 2.8853 | 17000 | 0.0532 | 29890368 |
| 0.0389 | 2.9193 | 17200 | 0.0528 | 30246816 |
| 0.044 | 2.9532 | 17400 | 0.0528 | 30598112 |
| 0.0624 | 2.9871 | 17600 | 0.0526 | 30947904 |
| 0.0597 | 3.0210 | 17800 | 0.0526 | 31297696 |
| 0.0486 | 3.0550 | 18000 | 0.0528 | 31650784 |
| 0.0993 | 3.0889 | 18200 | 0.0525 | 32003328 |
| 0.0511 | 3.1229 | 18400 | 0.0525 | 32350432 |
| 0.0725 | 3.1568 | 18600 | 0.0522 | 32702560 |
| 0.0663 | 3.1908 | 18800 | 0.0521 | 33054016 |
| 0.0563 | 3.2247 | 19000 | 0.0522 | 33410080 |
| 0.0684 | 3.2587 | 19200 | 0.0520 | 33764032 |
| 0.0533 | 3.2926 | 19400 | 0.0519 | 34116160 |
| 0.0584 | 3.3266 | 19600 | 0.0521 | 34470432 |
| 0.0454 | 3.3605 | 19800 | 0.0519 | 34821536 |
| 0.0553 | 3.3944 | 20000 | 0.0520 | 35169856 |
| 0.0392 | 3.4284 | 20200 | 0.0518 | 35520544 |
| 0.0505 | 3.4623 | 20400 | 0.0517 | 35874144 |
| 0.0545 | 3.4963 | 20600 | 0.0517 | 36225408 |
| 0.0541 | 3.5302 | 20800 | 0.0516 | 36573536 |
| 0.0405 | 3.5642 | 21000 | 0.0516 | 36926144 |
| 0.0435 | 3.5981 | 21200 | 0.0516 | 37277024 |
| 0.06 | 3.6321 | 21400 | 0.0517 | 37630272 |
| 0.0532 | 3.6660 | 21600 | 0.0513 | 37979008 |
| 0.0733 | 3.7000 | 21800 | 0.0514 | 38328768 |
| 0.0374 | 3.7339 | 22000 | 0.0518 | 38679040 |
| 0.0423 | 3.7679 | 22200 | 0.0513 | 39032192 |
| 0.0522 | 3.8018 | 22400 | 0.0514 | 39381632 |
| 0.0444 | 3.8357 | 22600 | 0.0512 | 39732416 |
| 0.0446 | 3.8697 | 22800 | 0.0512 | 40083328 |
| 0.0598 | 3.9036 | 23000 | 0.0512 | 40439264 |
| 0.0542 | 3.9376 | 23200 | 0.0512 | 40789056 |
| 0.0293 | 3.9715 | 23400 | 0.0510 | 41141280 |
| 0.0306 | 4.0054 | 23600 | 0.0510 | 41495616 |
| 0.0366 | 4.0394 | 23800 | 0.0513 | 41845216 |
| 0.0687 | 4.0733 | 24000 | 0.0511 | 42198656 |
| 0.0598 | 4.1073 | 24200 | 0.0510 | 42548064 |
| 0.0271 | 4.1412 | 24400 | 0.0509 | 42897248 |
| 0.0505 | 4.1752 | 24600 | 0.0508 | 43253728 |
| 0.0417 | 4.2091 | 24800 | 0.0509 | 43608032 |
| 0.0348 | 4.2431 | 25000 | 0.0509 | 43958240 |
| 0.0644 | 4.2770 | 25200 | 0.0508 | 44310560 |
| 0.0528 | 4.3109 | 25400 | 0.0509 | 44662688 |
| 0.0313 | 4.3449 | 25600 | 0.0508 | 45016000 |
| 0.0555 | 4.3788 | 25800 | 0.0508 | 45365856 |
| 0.0385 | 4.4128 | 26000 | 0.0519 | 45716576 |
| 0.0617 | 4.4467 | 26200 | 0.0508 | 46068320 |
| 0.0568 | 4.4807 | 26400 | 0.0505 | 46416928 |
| 0.0555 | 4.5146 | 26600 | 0.0506 | 46771968 |
| 0.0398 | 4.5486 | 26800 | 0.0506 | 47123552 |
| 0.0441 | 4.5825 | 27000 | 0.0505 | 47476256 |
| 0.0393 | 4.6165 | 27200 | 0.0505 | 47831136 |
| 0.0623 | 4.6504 | 27400 | 0.0505 | 48181856 |
| 0.0387 | 4.6843 | 27600 | 0.0507 | 48531648 |
| 0.0403 | 4.7183 | 27800 | 0.0505 | 48881728 |
| 0.0384 | 4.7522 | 28000 | 0.0505 | 49229248 |
| 0.0386 | 4.7862 | 28200 | 0.0505 | 49577952 |
| 0.0381 | 4.8201 | 28400 | 0.0504 | 49930752 |
| 0.0569 | 4.8541 | 28600 | 0.0504 | 50282304 |
| 0.0467 | 4.8880 | 28800 | 0.0503 | 50635840 |
| 0.0566 | 4.9220 | 29000 | 0.0504 | 50990240 |
| 0.0644 | 4.9559 | 29200 | 0.0504 | 51342976 |
| 0.0508 | 4.9899 | 29400 | 0.0504 | 51696320 |
| 0.0449 | 5.0238 | 29600 | 0.0505 | 52045952 |
| 0.0346 | 5.0577 | 29800 | 0.0503 | 52399008 |
| 0.0391 | 5.0917 | 30000 | 0.0503 | 52748704 |
| 0.0364 | 5.1256 | 30200 | 0.0503 | 53098368 |
| 0.1052 | 5.1595 | 30400 | 0.0503 | 53449792 |
| 0.0427 | 5.1935 | 30600 | 0.0503 | 53800640 |
| 0.0458 | 5.2274 | 30800 | 0.0504 | 54151264 |
| 0.0409 | 5.2614 | 31000 | 0.0503 | 54498144 |
| 0.0343 | 5.2953 | 31200 | 0.0504 | 54846400 |
| 0.0387 | 5.3293 | 31400 | 0.0504 | 55200448 |
| 0.0455 | 5.3632 | 31600 | 0.0504 | 55550048 |
| 0.0511 | 5.3972 | 31800 | 0.0502 | 55901856 |
| 0.0613 | 5.4311 | 32000 | 0.0501 | 56259904 |
| 0.0535 | 5.4651 | 32200 | 0.0503 | 56615008 |
| 0.0726 | 5.4990 | 32400 | 0.0503 | 56965760 |
| 0.0746 | 5.5329 | 32600 | 0.0501 | 57316960 |
| 0.0469 | 5.5669 | 32800 | 0.0502 | 57670080 |
| 0.0557 | 5.6008 | 33000 | 0.0502 | 58024256 |
| 0.0812 | 5.6348 | 33200 | 0.0502 | 58378976 |
| 0.0602 | 5.6687 | 33400 | 0.0502 | 58733184 |
| 0.0783 | 5.7027 | 33600 | 0.0501 | 59085760 |
| 0.0575 | 5.7366 | 33800 | 0.0500 | 59438720 |
| 0.0272 | 5.7706 | 34000 | 0.0501 | 59794048 |
| 0.0686 | 5.8045 | 34200 | 0.0500 | 60144576 |
| 0.0518 | 5.8385 | 34400 | 0.0500 | 60495264 |
| 0.0618 | 5.8724 | 34600 | 0.0500 | 60843616 |
| 0.0352 | 5.9064 | 34800 | 0.0501 | 61196096 |
| 0.0898 | 5.9403 | 35000 | 0.0501 | 61549696 |
| 0.0465 | 5.9742 | 35200 | 0.0501 | 61901760 |
| 0.046 | 6.0081 | 35400 | 0.0501 | 62248640 |
| 0.0488 | 6.0421 | 35600 | 0.0500 | 62595488 |
| 0.0674 | 6.0760 | 35800 | 0.0501 | 62948736 |
| 0.0656 | 6.1100 | 36000 | 0.0501 | 63302496 |
| 0.0447 | 6.1439 | 36200 | 0.0500 | 63654144 |
| 0.0308 | 6.1779 | 36400 | 0.0501 | 64010336 |
| 0.0497 | 6.2118 | 36600 | 0.0501 | 64362880 |
| 0.0423 | 6.2458 | 36800 | 0.0500 | 64717120 |
| 0.0387 | 6.2797 | 37000 | 0.0501 | 65067680 |
| 0.0581 | 6.3137 | 37200 | 0.0500 | 65417376 |
| 0.0499 | 6.3476 | 37400 | 0.0501 | 65768416 |
| 0.032 | 6.3816 | 37600 | 0.0500 | 66122624 |
| 0.0378 | 6.4155 | 37800 | 0.0500 | 66474048 |
| 0.039 | 6.4494 | 38000 | 0.0500 | 66825760 |
| 0.0259 | 6.4834 | 38200 | 0.0499 | 67179008 |
| 0.0396 | 6.5173 | 38400 | 0.0501 | 67533344 |
| 0.0395 | 6.5513 | 38600 | 0.0501 | 67884864 |
| 0.0633 | 6.5852 | 38800 | 0.0499 | 68234656 |
| 0.0553 | 6.6192 | 39000 | 0.0499 | 68586432 |
| 0.0398 | 6.6531 | 39200 | 0.0501 | 68938688 |
| 0.0567 | 6.6871 | 39400 | 0.0500 | 69288384 |
| 0.0382 | 6.7210 | 39600 | 0.0500 | 69637472 |
| 0.035 | 6.7550 | 39800 | 0.0500 | 69989056 |
| 0.0536 | 6.7889 | 40000 | 0.0500 | 70340640 |
Framework versions
- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for rbelanec/train_qnli_1744902609
Base model
meta-llama/Meta-Llama-3-8B-Instruct