|
--- |
|
license: other |
|
base_model: typeof/phi-1_5 |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: phi-kelm-out |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
# phi-kelm-out |
|
|
|
This model is a fine-tuned version of [typeof/phi-1_5](https://huggingface.co/typeof/phi-1_5) on the Kelm Tiny dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.0079 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 3e-06 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 1000 |
|
- num_epochs: 5 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:-----:|:---------------:| |
|
| 7.8236 | 0.0 | 1 | 5.4714 | |
|
| 4.156 | 0.1 | 995 | 4.0834 | |
|
| 1.9418 | 0.2 | 1990 | 2.8447 | |
|
| 1.8908 | 0.3 | 2985 | 2.2757 | |
|
| 0.7631 | 0.4 | 3980 | 1.8792 | |
|
| 1.0878 | 0.5 | 4975 | 1.4944 | |
|
| 2.1561 | 0.6 | 5970 | 1.3413 | |
|
| 0.452 | 0.7 | 6965 | 1.2682 | |
|
| 2.1017 | 0.8 | 7960 | 1.2247 | |
|
| 0.8352 | 0.9 | 8955 | 1.1999 | |
|
| 5.1122 | 1.0 | 9950 | 1.1778 | |
|
| 1.6136 | 1.1 | 10945 | 1.1515 | |
|
| 2.3537 | 1.2 | 11940 | 1.1364 | |
|
| 0.2987 | 1.3 | 12935 | 1.1391 | |
|
| 0.747 | 1.4 | 13930 | 1.0977 | |
|
| 0.0025 | 1.5 | 14925 | 1.0917 | |
|
| 0.6355 | 1.6 | 15920 | 1.0630 | |
|
| 0.5881 | 1.7 | 16915 | 1.0565 | |
|
| 0.3181 | 1.8 | 17910 | 1.0568 | |
|
| 0.9256 | 1.9 | 18905 | 1.0623 | |
|
| 4.5318 | 2.0 | 19900 | 1.0678 | |
|
| 0.8736 | 2.1 | 20895 | 1.0645 | |
|
| 2.2079 | 2.2 | 21890 | 1.0474 | |
|
| 2.7407 | 2.3 | 22885 | 1.0438 | |
|
| 2.2308 | 2.4 | 23880 | 1.0485 | |
|
| 0.4307 | 2.5 | 24875 | 1.0235 | |
|
| 0.2956 | 2.6 | 25870 | 1.0201 | |
|
| 0.203 | 2.7 | 26865 | 1.0200 | |
|
| 2.2452 | 2.8 | 27860 | 1.0243 | |
|
| 0.942 | 2.9 | 28855 | 1.0289 | |
|
| 0.0069 | 3.0 | 29850 | 1.0181 | |
|
| 3.2121 | 3.1 | 30845 | 1.0235 | |
|
| 1.4533 | 3.2 | 31840 | 1.0127 | |
|
| 0.208 | 3.3 | 32835 | 1.0110 | |
|
| 0.1379 | 3.4 | 33830 | 1.0126 | |
|
| 0.1991 | 3.5 | 34825 | 1.0103 | |
|
| 1.3019 | 3.6 | 35820 | 1.0154 | |
|
| 0.6602 | 3.7 | 36815 | 1.0178 | |
|
| 0.5271 | 3.8 | 37810 | 1.0087 | |
|
| 0.3131 | 3.9 | 38805 | 1.0092 | |
|
| 3.6821 | 4.0 | 39800 | 1.0094 | |
|
| 0.3724 | 4.1 | 40795 | 1.0093 | |
|
| 0.0704 | 4.2 | 41790 | 1.0081 | |
|
| 0.1209 | 4.3 | 42785 | 1.0108 | |
|
| 0.9807 | 4.4 | 43780 | 1.0072 | |
|
| 0.1392 | 4.5 | 44775 | 1.0078 | |
|
| 0.2561 | 4.6 | 45770 | 1.0078 | |
|
| 0.1533 | 4.7 | 46765 | 1.0089 | |
|
| 0.4302 | 4.8 | 47760 | 1.0079 | |
|
| 1.3744 | 4.9 | 48755 | 1.0074 | |
|
| 0.8572 | 5.0 | 49750 | 1.0079 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.1 |
|
- Pytorch 2.0.1+cu118 |
|
- Datasets 2.14.5 |
|
- Tokenizers 0.14.1 |
|
|