End of training

9c4790a over 1 year ago

5.01 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: Mistral-7B-v0.1_cola_relu
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mistral-7B-v0.1_cola_relu

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3969
	- Accuracy: 0.8528

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 2
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 256
	- total_eval_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 750

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 3.3308 \| 0.33 \| 10 \| 3.2312 \| 0.6721 \|
	\| 1.9948 \| 0.66 \| 20 \| 1.9259 \| 0.5628 \|
	\| 1.755 \| 0.98 \| 30 \| 1.6666 \| 0.6529 \|
	\| 1.2472 \| 1.31 \| 40 \| 1.3599 \| 0.6280 \|
	\| 0.7 \| 1.64 \| 50 \| 1.0398 \| 0.6903 \|
	\| 1.0118 \| 1.97 \| 60 \| 0.8845 \| 0.6798 \|
	\| 0.7947 \| 2.3 \| 70 \| 0.7958 \| 0.7200 \|
	\| 0.8203 \| 2.62 \| 80 \| 0.7160 \| 0.7191 \|
	\| 0.8548 \| 2.95 \| 90 \| 0.6607 \| 0.7296 \|
	\| 0.5277 \| 3.28 \| 100 \| 0.6292 \| 0.7430 \|
	\| 0.7134 \| 3.61 \| 110 \| 0.6562 \| 0.7440 \|
	\| 0.7233 \| 3.93 \| 120 \| 0.6248 \| 0.7488 \|
	\| 0.5547 \| 4.26 \| 130 \| 0.5399 \| 0.7488 \|
	\| 0.5171 \| 4.59 \| 140 \| 0.5230 \| 0.7536 \|
	\| 0.492 \| 4.92 \| 150 \| 0.5184 \| 0.7632 \|
	\| 0.5003 \| 5.25 \| 160 \| 0.4999 \| 0.7728 \|
	\| 0.4884 \| 5.57 \| 170 \| 0.4827 \| 0.7814 \|
	\| 0.514 \| 5.9 \| 180 \| 0.5048 \| 0.7910 \|
	\| 0.3669 \| 6.23 \| 190 \| 0.4783 \| 0.7977 \|
	\| 0.4786 \| 6.56 \| 200 \| 0.4533 \| 0.7948 \|
	\| 0.4244 \| 6.89 \| 210 \| 0.4379 \| 0.8035 \|
	\| 0.3235 \| 7.21 \| 220 \| 0.4439 \| 0.8073 \|
	\| 0.4307 \| 7.54 \| 230 \| 0.4258 \| 0.8236 \|
	\| 0.404 \| 7.87 \| 240 \| 0.4184 \| 0.8188 \|
	\| 0.3772 \| 8.2 \| 250 \| 0.4089 \| 0.8207 \|
	\| 0.3937 \| 8.52 \| 260 \| 0.4595 \| 0.8092 \|
	\| 0.3896 \| 8.85 \| 270 \| 0.4148 \| 0.8265 \|
	\| 0.3296 \| 9.18 \| 280 \| 0.4130 \| 0.8236 \|
	\| 0.328 \| 9.51 \| 290 \| 0.3944 \| 0.8389 \|
	\| 0.3383 \| 9.84 \| 300 \| 0.3862 \| 0.8322 \|
	\| 0.3146 \| 10.16 \| 310 \| 0.3847 \| 0.8418 \|
	\| 0.3069 \| 10.49 \| 320 \| 0.4192 \| 0.8245 \|
	\| 0.2732 \| 10.82 \| 330 \| 0.4190 \| 0.8313 \|
	\| 0.2819 \| 11.15 \| 340 \| 0.4427 \| 0.8188 \|
	\| 0.3738 \| 11.48 \| 350 \| 0.3807 \| 0.8408 \|
	\| 0.3004 \| 11.8 \| 360 \| 0.3722 \| 0.8437 \|
	\| 0.2894 \| 12.13 \| 370 \| 0.3922 \| 0.8341 \|
	\| 0.2747 \| 12.46 \| 380 \| 0.3782 \| 0.8370 \|
	\| 0.2812 \| 12.79 \| 390 \| 0.3667 \| 0.8514 \|
	\| 0.2369 \| 13.11 \| 400 \| 0.3884 \| 0.8408 \|
	\| 0.2931 \| 13.44 \| 410 \| 0.3807 \| 0.8456 \|
	\| 0.2702 \| 13.77 \| 420 \| 0.3742 \| 0.8399 \|
	\| 0.2821 \| 14.1 \| 430 \| 0.3737 \| 0.8485 \|
	\| 0.2358 \| 14.43 \| 440 \| 0.3739 \| 0.8456 \|
	\| 0.2326 \| 14.75 \| 450 \| 0.3699 \| 0.8514 \|
	\| 0.2475 \| 15.08 \| 460 \| 0.3771 \| 0.8466 \|
	\| 0.2402 \| 15.41 \| 470 \| 0.4064 \| 0.8351 \|
	\| 0.2435 \| 15.74 \| 480 \| 0.3758 \| 0.8456 \|
	\| 0.1896 \| 16.07 \| 490 \| 0.3779 \| 0.8456 \|
	\| 0.2228 \| 16.39 \| 500 \| 0.3868 \| 0.8456 \|
	\| 0.2149 \| 16.72 \| 510 \| 0.3800 \| 0.8485 \|
	\| 0.1781 \| 17.05 \| 520 \| 0.3841 \| 0.8514 \|
	\| 0.1729 \| 17.38 \| 530 \| 0.4000 \| 0.8476 \|
	\| 0.1897 \| 17.7 \| 540 \| 0.3866 \| 0.8456 \|
	\| 0.1537 \| 18.03 \| 550 \| 0.4317 \| 0.8370 \|
	\| 0.1478 \| 18.36 \| 560 \| 0.4197 \| 0.8466 \|
	\| 0.1686 \| 18.69 \| 570 \| 0.4325 \| 0.8418 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.1+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: Mistral-7B-v0.1_cola_relu
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mistral-7B-v0.1_cola_relu

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3969
	- Accuracy: 0.8528

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 2
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 256
	- total_eval_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 750

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 3.3308 \| 0.33 \| 10 \| 3.2312 \| 0.6721 \|
	\| 1.9948 \| 0.66 \| 20 \| 1.9259 \| 0.5628 \|
	\| 1.755 \| 0.98 \| 30 \| 1.6666 \| 0.6529 \|
	\| 1.2472 \| 1.31 \| 40 \| 1.3599 \| 0.6280 \|
	\| 0.7 \| 1.64 \| 50 \| 1.0398 \| 0.6903 \|
	\| 1.0118 \| 1.97 \| 60 \| 0.8845 \| 0.6798 \|
	\| 0.7947 \| 2.3 \| 70 \| 0.7958 \| 0.7200 \|
	\| 0.8203 \| 2.62 \| 80 \| 0.7160 \| 0.7191 \|
	\| 0.8548 \| 2.95 \| 90 \| 0.6607 \| 0.7296 \|
	\| 0.5277 \| 3.28 \| 100 \| 0.6292 \| 0.7430 \|
	\| 0.7134 \| 3.61 \| 110 \| 0.6562 \| 0.7440 \|
	\| 0.7233 \| 3.93 \| 120 \| 0.6248 \| 0.7488 \|
	\| 0.5547 \| 4.26 \| 130 \| 0.5399 \| 0.7488 \|
	\| 0.5171 \| 4.59 \| 140 \| 0.5230 \| 0.7536 \|
	\| 0.492 \| 4.92 \| 150 \| 0.5184 \| 0.7632 \|
	\| 0.5003 \| 5.25 \| 160 \| 0.4999 \| 0.7728 \|
	\| 0.4884 \| 5.57 \| 170 \| 0.4827 \| 0.7814 \|
	\| 0.514 \| 5.9 \| 180 \| 0.5048 \| 0.7910 \|
	\| 0.3669 \| 6.23 \| 190 \| 0.4783 \| 0.7977 \|
	\| 0.4786 \| 6.56 \| 200 \| 0.4533 \| 0.7948 \|
	\| 0.4244 \| 6.89 \| 210 \| 0.4379 \| 0.8035 \|
	\| 0.3235 \| 7.21 \| 220 \| 0.4439 \| 0.8073 \|
	\| 0.4307 \| 7.54 \| 230 \| 0.4258 \| 0.8236 \|
	\| 0.404 \| 7.87 \| 240 \| 0.4184 \| 0.8188 \|
	\| 0.3772 \| 8.2 \| 250 \| 0.4089 \| 0.8207 \|
	\| 0.3937 \| 8.52 \| 260 \| 0.4595 \| 0.8092 \|
	\| 0.3896 \| 8.85 \| 270 \| 0.4148 \| 0.8265 \|
	\| 0.3296 \| 9.18 \| 280 \| 0.4130 \| 0.8236 \|
	\| 0.328 \| 9.51 \| 290 \| 0.3944 \| 0.8389 \|
	\| 0.3383 \| 9.84 \| 300 \| 0.3862 \| 0.8322 \|
	\| 0.3146 \| 10.16 \| 310 \| 0.3847 \| 0.8418 \|
	\| 0.3069 \| 10.49 \| 320 \| 0.4192 \| 0.8245 \|
	\| 0.2732 \| 10.82 \| 330 \| 0.4190 \| 0.8313 \|
	\| 0.2819 \| 11.15 \| 340 \| 0.4427 \| 0.8188 \|
	\| 0.3738 \| 11.48 \| 350 \| 0.3807 \| 0.8408 \|
	\| 0.3004 \| 11.8 \| 360 \| 0.3722 \| 0.8437 \|
	\| 0.2894 \| 12.13 \| 370 \| 0.3922 \| 0.8341 \|
	\| 0.2747 \| 12.46 \| 380 \| 0.3782 \| 0.8370 \|
	\| 0.2812 \| 12.79 \| 390 \| 0.3667 \| 0.8514 \|
	\| 0.2369 \| 13.11 \| 400 \| 0.3884 \| 0.8408 \|
	\| 0.2931 \| 13.44 \| 410 \| 0.3807 \| 0.8456 \|
	\| 0.2702 \| 13.77 \| 420 \| 0.3742 \| 0.8399 \|
	\| 0.2821 \| 14.1 \| 430 \| 0.3737 \| 0.8485 \|
	\| 0.2358 \| 14.43 \| 440 \| 0.3739 \| 0.8456 \|
	\| 0.2326 \| 14.75 \| 450 \| 0.3699 \| 0.8514 \|
	\| 0.2475 \| 15.08 \| 460 \| 0.3771 \| 0.8466 \|
	\| 0.2402 \| 15.41 \| 470 \| 0.4064 \| 0.8351 \|
	\| 0.2435 \| 15.74 \| 480 \| 0.3758 \| 0.8456 \|
	\| 0.1896 \| 16.07 \| 490 \| 0.3779 \| 0.8456 \|
	\| 0.2228 \| 16.39 \| 500 \| 0.3868 \| 0.8456 \|
	\| 0.2149 \| 16.72 \| 510 \| 0.3800 \| 0.8485 \|
	\| 0.1781 \| 17.05 \| 520 \| 0.3841 \| 0.8514 \|
	\| 0.1729 \| 17.38 \| 530 \| 0.4000 \| 0.8476 \|
	\| 0.1897 \| 17.7 \| 540 \| 0.3866 \| 0.8456 \|
	\| 0.1537 \| 18.03 \| 550 \| 0.4317 \| 0.8370 \|
	\| 0.1478 \| 18.36 \| 560 \| 0.4197 \| 0.8466 \|
	\| 0.1686 \| 18.69 \| 570 \| 0.4325 \| 0.8418 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.1+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0