pythia-2.8b-sft

This model is a fine-tuned version of EleutherAI/pythia-2.8b on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 1.0

Training Loss	Epoch	Step	Validation Loss
1.8621	0.0442	100	1.7438
1.7909	0.0884	200	1.7135
1.7775	0.1327	300	1.7020
1.7587	0.1769	400	1.6937
1.7683	0.2211	500	1.6876
1.7488	0.2653	600	1.6824
1.7646	0.3096	700	1.6799
1.7557	0.3538	800	1.6776
1.7485	0.3980	900	1.6743
1.7368	0.4422	1000	1.6729
1.7298	0.4865	1100	1.6705
1.7525	0.5307	1200	1.6724
1.7386	0.5749	1300	1.6703
1.7325	0.6191	1400	1.6684
1.7306	0.6633	1500	1.6682
1.7262	0.7076	1600	1.6669
1.7333	0.7518	1700	1.6675
1.7318	0.7960	1800	1.6673
1.7293	0.8402	1900	1.6668
1.7326	0.8845	2000	1.6671
1.7378	0.9287	2100	1.6668
1.7259	0.9729	2200	1.6671