End of training

9877aeb verified about 1 month ago

4.81 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task4-v2-small-deepseek-coder-1.3b-base-ddp-8lr-v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task4-v2-small-deepseek-coder-1.3b-base-ddp-8lr-v2

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0416

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.1561 \| 0.2001 \| 720 \| 0.0861 \|
	\| 0.0845 \| 0.4001 \| 1440 \| 0.0716 \|
	\| 0.0724 \| 0.6002 \| 2160 \| 0.0636 \|
	\| 0.07 \| 0.8002 \| 2880 \| 0.0615 \|
	\| 0.0654 \| 1.0003 \| 3600 \| 0.0629 \|
	\| 0.0636 \| 1.2003 \| 4320 \| 0.0623 \|
	\| 0.0631 \| 1.4004 \| 5040 \| 0.0600 \|
	\| 0.0626 \| 1.6004 \| 5760 \| 0.0609 \|
	\| 0.0631 \| 1.8005 \| 6480 \| 0.0562 \|
	\| 0.061 \| 2.0006 \| 7200 \| 0.0559 \|
	\| 0.0597 \| 2.2006 \| 7920 \| 0.0585 \|
	\| 0.0591 \| 2.4007 \| 8640 \| 0.0543 \|
	\| 0.0553 \| 2.6007 \| 9360 \| 0.0566 \|
	\| 0.0572 \| 2.8008 \| 10080 \| 0.0528 \|
	\| 0.058 \| 3.0008 \| 10800 \| 0.0504 \|
	\| 0.0543 \| 3.2009 \| 11520 \| 0.0512 \|
	\| 0.054 \| 3.4009 \| 12240 \| 0.0537 \|
	\| 0.0554 \| 3.6010 \| 12960 \| 0.0520 \|
	\| 0.0532 \| 3.8011 \| 13680 \| 0.0520 \|
	\| 0.0551 \| 4.0011 \| 14400 \| 0.0513 \|
	\| 0.0514 \| 4.2012 \| 15120 \| 0.0527 \|
	\| 0.0525 \| 4.4012 \| 15840 \| 0.0498 \|
	\| 0.0509 \| 4.6013 \| 16560 \| 0.0491 \|
	\| 0.0519 \| 4.8013 \| 17280 \| 0.0501 \|
	\| 0.0519 \| 5.0014 \| 18000 \| 0.0497 \|
	\| 0.0503 \| 5.2014 \| 18720 \| 0.0496 \|
	\| 0.0489 \| 5.4015 \| 19440 \| 0.0523 \|
	\| 0.05 \| 5.6016 \| 20160 \| 0.0478 \|
	\| 0.0508 \| 5.8016 \| 20880 \| 0.0467 \|
	\| 0.047 \| 6.0017 \| 21600 \| 0.0471 \|
	\| 0.0477 \| 6.2017 \| 22320 \| 0.0472 \|
	\| 0.0469 \| 6.4018 \| 23040 \| 0.0474 \|
	\| 0.0484 \| 6.6018 \| 23760 \| 0.0459 \|
	\| 0.0478 \| 6.8019 \| 24480 \| 0.0453 \|
	\| 0.0472 \| 7.0019 \| 25200 \| 0.0460 \|
	\| 0.0459 \| 7.2020 \| 25920 \| 0.0446 \|
	\| 0.0454 \| 7.4021 \| 26640 \| 0.0443 \|
	\| 0.0454 \| 7.6021 \| 27360 \| 0.0461 \|
	\| 0.0453 \| 7.8022 \| 28080 \| 0.0455 \|
	\| 0.0453 \| 8.0022 \| 28800 \| 0.0439 \|
	\| 0.0449 \| 8.2023 \| 29520 \| 0.0437 \|
	\| 0.0447 \| 8.4023 \| 30240 \| 0.0429 \|
	\| 0.0446 \| 8.6024 \| 30960 \| 0.0427 \|
	\| 0.0437 \| 8.8024 \| 31680 \| 0.0441 \|
	\| 0.0437 \| 9.0025 \| 32400 \| 0.0434 \|
	\| 0.0428 \| 9.2026 \| 33120 \| 0.0426 \|
	\| 0.0431 \| 9.4026 \| 33840 \| 0.0417 \|
	\| 0.0428 \| 9.6027 \| 34560 \| 0.0421 \|
	\| 0.0428 \| 9.8027 \| 35280 \| 0.0422 \|
	\| 0.0424 \| 10.0028 \| 36000 \| 0.0425 \|
	\| 0.0422 \| 10.2028 \| 36720 \| 0.0423 \|
	\| 0.042 \| 10.4029 \| 37440 \| 0.0424 \|
	\| 0.0417 \| 10.6029 \| 38160 \| 0.0419 \|
	\| 0.0414 \| 10.8030 \| 38880 \| 0.0424 \|
	\| 0.0413 \| 11.0031 \| 39600 \| 0.0417 \|
	\| 0.0415 \| 11.2031 \| 40320 \| 0.0415 \|
	\| 0.0413 \| 11.4032 \| 41040 \| 0.0418 \|
	\| 0.0412 \| 11.6032 \| 41760 \| 0.0418 \|
	\| 0.0412 \| 11.8033 \| 42480 \| 0.0416 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task4-v2-small-deepseek-coder-1.3b-base-ddp-8lr-v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task4-v2-small-deepseek-coder-1.3b-base-ddp-8lr-v2

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0416

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.1561 \| 0.2001 \| 720 \| 0.0861 \|
	\| 0.0845 \| 0.4001 \| 1440 \| 0.0716 \|
	\| 0.0724 \| 0.6002 \| 2160 \| 0.0636 \|
	\| 0.07 \| 0.8002 \| 2880 \| 0.0615 \|
	\| 0.0654 \| 1.0003 \| 3600 \| 0.0629 \|
	\| 0.0636 \| 1.2003 \| 4320 \| 0.0623 \|
	\| 0.0631 \| 1.4004 \| 5040 \| 0.0600 \|
	\| 0.0626 \| 1.6004 \| 5760 \| 0.0609 \|
	\| 0.0631 \| 1.8005 \| 6480 \| 0.0562 \|
	\| 0.061 \| 2.0006 \| 7200 \| 0.0559 \|
	\| 0.0597 \| 2.2006 \| 7920 \| 0.0585 \|
	\| 0.0591 \| 2.4007 \| 8640 \| 0.0543 \|
	\| 0.0553 \| 2.6007 \| 9360 \| 0.0566 \|
	\| 0.0572 \| 2.8008 \| 10080 \| 0.0528 \|
	\| 0.058 \| 3.0008 \| 10800 \| 0.0504 \|
	\| 0.0543 \| 3.2009 \| 11520 \| 0.0512 \|
	\| 0.054 \| 3.4009 \| 12240 \| 0.0537 \|
	\| 0.0554 \| 3.6010 \| 12960 \| 0.0520 \|
	\| 0.0532 \| 3.8011 \| 13680 \| 0.0520 \|
	\| 0.0551 \| 4.0011 \| 14400 \| 0.0513 \|
	\| 0.0514 \| 4.2012 \| 15120 \| 0.0527 \|
	\| 0.0525 \| 4.4012 \| 15840 \| 0.0498 \|
	\| 0.0509 \| 4.6013 \| 16560 \| 0.0491 \|
	\| 0.0519 \| 4.8013 \| 17280 \| 0.0501 \|
	\| 0.0519 \| 5.0014 \| 18000 \| 0.0497 \|
	\| 0.0503 \| 5.2014 \| 18720 \| 0.0496 \|
	\| 0.0489 \| 5.4015 \| 19440 \| 0.0523 \|
	\| 0.05 \| 5.6016 \| 20160 \| 0.0478 \|
	\| 0.0508 \| 5.8016 \| 20880 \| 0.0467 \|
	\| 0.047 \| 6.0017 \| 21600 \| 0.0471 \|
	\| 0.0477 \| 6.2017 \| 22320 \| 0.0472 \|
	\| 0.0469 \| 6.4018 \| 23040 \| 0.0474 \|
	\| 0.0484 \| 6.6018 \| 23760 \| 0.0459 \|
	\| 0.0478 \| 6.8019 \| 24480 \| 0.0453 \|
	\| 0.0472 \| 7.0019 \| 25200 \| 0.0460 \|
	\| 0.0459 \| 7.2020 \| 25920 \| 0.0446 \|
	\| 0.0454 \| 7.4021 \| 26640 \| 0.0443 \|
	\| 0.0454 \| 7.6021 \| 27360 \| 0.0461 \|
	\| 0.0453 \| 7.8022 \| 28080 \| 0.0455 \|
	\| 0.0453 \| 8.0022 \| 28800 \| 0.0439 \|
	\| 0.0449 \| 8.2023 \| 29520 \| 0.0437 \|
	\| 0.0447 \| 8.4023 \| 30240 \| 0.0429 \|
	\| 0.0446 \| 8.6024 \| 30960 \| 0.0427 \|
	\| 0.0437 \| 8.8024 \| 31680 \| 0.0441 \|
	\| 0.0437 \| 9.0025 \| 32400 \| 0.0434 \|
	\| 0.0428 \| 9.2026 \| 33120 \| 0.0426 \|
	\| 0.0431 \| 9.4026 \| 33840 \| 0.0417 \|
	\| 0.0428 \| 9.6027 \| 34560 \| 0.0421 \|
	\| 0.0428 \| 9.8027 \| 35280 \| 0.0422 \|
	\| 0.0424 \| 10.0028 \| 36000 \| 0.0425 \|
	\| 0.0422 \| 10.2028 \| 36720 \| 0.0423 \|
	\| 0.042 \| 10.4029 \| 37440 \| 0.0424 \|
	\| 0.0417 \| 10.6029 \| 38160 \| 0.0419 \|
	\| 0.0414 \| 10.8030 \| 38880 \| 0.0424 \|
	\| 0.0413 \| 11.0031 \| 39600 \| 0.0417 \|
	\| 0.0415 \| 11.2031 \| 40320 \| 0.0415 \|
	\| 0.0413 \| 11.4032 \| 41040 \| 0.0418 \|
	\| 0.0412 \| 11.6032 \| 41760 \| 0.0418 \|
	\| 0.0412 \| 11.8033 \| 42480 \| 0.0416 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0