File size: 15,928 Bytes
f44ef3c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
2025-08-29 01:51:16 - pico-train - INFO - Step 0 -- ๐ Evaluation Results
2025-08-29 01:51:16 - pico-train - INFO - โโโ paloma: inf
2025-08-29 01:51:17 - pico-train - INFO - ==================================================
2025-08-29 01:51:17 - pico-train - INFO - โจ Training Configuration
2025-08-29 01:51:17 - pico-train - INFO - ==================================================
2025-08-29 01:51:17 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
2025-08-29 01:51:17 - pico-train - INFO - โ checkpointing: โ
2025-08-29 01:51:17 - pico-train - INFO - โ checkpoints_dir: checkpoints โ
2025-08-29 01:51:17 - pico-train - INFO - โ evaluation: โ
2025-08-29 01:51:17 - pico-train - INFO - โ eval_results_dir: eval_results โ
2025-08-29 01:51:17 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ
2025-08-29 01:51:17 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ
2025-08-29 01:51:17 - pico-train - INFO - โ hf_checkpoint: โ
2025-08-29 01:51:17 - pico-train - INFO - โ collection_slug: null โ
2025-08-29 01:51:17 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ
2025-08-29 01:51:17 - pico-train - INFO - โ learning_dynamics: โ
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 1 โ
2025-08-29 01:51:17 - pico-train - INFO - โ eval_data: null โ
2025-08-29 01:51:17 - pico-train - INFO - โ layer_suffixes: โ
2025-08-29 01:51:17 - pico-train - INFO - โ - attention.v_proj โ
2025-08-29 01:51:17 - pico-train - INFO - โ - attention.o_proj โ
2025-08-29 01:51:17 - pico-train - INFO - โ - swiglu.w_2 โ
2025-08-29 01:51:17 - pico-train - INFO - โ sequence_idx: -1 โ
2025-08-29 01:51:17 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ
2025-08-29 01:51:17 - pico-train - INFO - โ logs_dir: logs โ
2025-08-29 01:51:17 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma29k-v3 โ
2025-08-29 01:51:17 - pico-train - INFO - โ runs_dir: runs โ
2025-08-29 01:51:17 - pico-train - INFO - โ save_every_n_steps: 500 โ
2025-08-29 01:51:17 - pico-train - INFO - โ save_to_hf: true โ
2025-08-29 01:51:17 - pico-train - INFO - โ training: โ
2025-08-29 01:51:17 - pico-train - INFO - โ auto_resume: true โ
2025-08-29 01:51:17 - pico-train - INFO - โ data: โ
2025-08-29 01:51:17 - pico-train - INFO - โ dataloader: โ
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 4 โ
2025-08-29 01:51:17 - pico-train - INFO - โ dataset: โ
2025-08-29 01:51:17 - pico-train - INFO - โ name: pico-lm/pretokenized-dolma โ
2025-08-29 01:51:17 - pico-train - INFO - โ tokenizer: โ
2025-08-29 01:51:17 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ
2025-08-29 01:51:17 - pico-train - INFO - โ vocab_size: 50304 โ
2025-08-29 01:51:17 - pico-train - INFO - โ evaluation: โ
2025-08-29 01:51:17 - pico-train - INFO - โ metrics: โ
2025-08-29 01:51:17 - pico-train - INFO - โ - paloma โ
2025-08-29 01:51:17 - pico-train - INFO - โ paloma: โ
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 1 โ
2025-08-29 01:51:17 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ
2025-08-29 01:51:17 - pico-train - INFO - โ dataset_split: val โ
2025-08-29 01:51:17 - pico-train - INFO - โ max_length: 2048 โ
2025-08-29 01:51:17 - pico-train - INFO - โ model: โ
2025-08-29 01:51:17 - pico-train - INFO - โ activation_hidden_dim: 384 โ
2025-08-29 01:51:17 - pico-train - INFO - โ attention_n_heads: 12 โ
2025-08-29 01:51:17 - pico-train - INFO - โ attention_n_kv_heads: 4 โ
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 1024 โ
2025-08-29 01:51:17 - pico-train - INFO - โ d_model: 96 โ
2025-08-29 01:51:17 - pico-train - INFO - โ max_seq_len: 2048 โ
2025-08-29 01:51:17 - pico-train - INFO - โ model_type: pico_decoder โ
2025-08-29 01:51:17 - pico-train - INFO - โ n_layers: 12 โ
2025-08-29 01:51:17 - pico-train - INFO - โ norm_eps: 1.0e-06 โ
2025-08-29 01:51:17 - pico-train - INFO - โ position_emb_theta: 10000.0 โ
2025-08-29 01:51:17 - pico-train - INFO - โ vocab_size: 50304 โ
2025-08-29 01:51:17 - pico-train - INFO - โ monitoring: โ
2025-08-29 01:51:17 - pico-train - INFO - โ logging: โ
2025-08-29 01:51:17 - pico-train - INFO - โ log_every_n_steps: 25 โ
2025-08-29 01:51:17 - pico-train - INFO - โ log_level: INFO โ
2025-08-29 01:51:17 - pico-train - INFO - โ save_to_wandb: false โ
2025-08-29 01:51:17 - pico-train - INFO - โ wandb: โ
2025-08-29 01:51:17 - pico-train - INFO - โ entity: boymyc โ
2025-08-29 01:51:17 - pico-train - INFO - โ project: pico-decoder-tiny โ
2025-08-29 01:51:17 - pico-train - INFO - โ training: โ
2025-08-29 01:51:17 - pico-train - INFO - โ fabric: โ
2025-08-29 01:51:17 - pico-train - INFO - โ accelerator: cuda โ
2025-08-29 01:51:17 - pico-train - INFO - โ num_devices: 1 โ
2025-08-29 01:51:17 - pico-train - INFO - โ num_nodes: 1 โ
2025-08-29 01:51:17 - pico-train - INFO - โ precision: bf16-mixed โ
2025-08-29 01:51:17 - pico-train - INFO - โ max_steps: 20000 โ
2025-08-29 01:51:17 - pico-train - INFO - โ optimization: โ
2025-08-29 01:51:17 - pico-train - INFO - โ gradient_accumulation_steps: 4 โ
2025-08-29 01:51:17 - pico-train - INFO - โ lr: 5.0e-05 โ
2025-08-29 01:51:17 - pico-train - INFO - โ lr_scheduler: linear_with_warmup โ
2025-08-29 01:51:17 - pico-train - INFO - โ lr_warmup_steps: 8000 โ
2025-08-29 01:51:17 - pico-train - INFO - โ optimizer: adamw โ
2025-08-29 01:51:17 - pico-train - INFO - โ โ
2025-08-29 01:51:17 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
2025-08-29 01:51:17 - pico-train - INFO - ==================================================
2025-08-29 01:51:17 - pico-train - INFO - โญ Runtime Summary:
2025-08-29 01:51:17 - pico-train - INFO - ==================================================
2025-08-29 01:51:17 - pico-train - INFO - Starting from step: 0
2025-08-29 01:51:17 - pico-train - INFO - Model Setup:
2025-08-29 01:51:17 - pico-train - INFO - โโ Total Parameters: 11,282,784
2025-08-29 01:51:17 - pico-train - INFO - โโ Trainable Parameters: 11,282,784
2025-08-29 01:51:17 - pico-train - INFO - Distributed Setup:
2025-08-29 01:51:17 - pico-train - INFO - โโ Number of Devices: 1
2025-08-29 01:51:17 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090
2025-08-29 01:51:17 - pico-train - INFO - โโ Available Memory: 33.68 GB
2025-08-29 01:51:17 - pico-train - INFO - Software Setup:
2025-08-29 01:51:17 - pico-train - INFO - โโ Python Version: 3.10.12
2025-08-29 01:51:17 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128
2025-08-29 01:51:17 - pico-train - INFO - โโ CUDA Version: 12.8
2025-08-29 01:51:17 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic
2025-08-29 01:51:17 - pico-train - INFO - Batch Size Configuration:
2025-08-29 01:51:17 - pico-train - INFO - โโ Global Batch Size: 4
2025-08-29 01:51:17 - pico-train - INFO - โโ Per Device Batch Size: 1
2025-08-29 01:51:17 - pico-train - INFO - โโ Gradient Accumulation Steps: 4
2025-08-29 01:51:17 - pico-train - INFO - ==================================================
2025-08-29 01:51:18 - pico-train - INFO - Step 0 -- ๐ Training Metrics
2025-08-29 01:51:18 - pico-train - INFO - โโโ Loss: 10.9975
2025-08-29 01:51:18 - pico-train - INFO - โโโ Learning Rate: 0.00e+00
2025-08-29 01:51:18 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:51:18 - pico-train - INFO - Step 0 -- ๐ Saving Learning Dynamics
2025-08-29 01:51:33 - pico-train - INFO - Step 25 -- ๐ Training Metrics
2025-08-29 01:51:33 - pico-train - INFO - โโโ Loss: 10.9972
2025-08-29 01:51:33 - pico-train - INFO - โโโ Learning Rate: 1.56e-07
2025-08-29 01:51:33 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:51:46 - pico-train - INFO - Step 50 -- ๐ Training Metrics
2025-08-29 01:51:46 - pico-train - INFO - โโโ Loss: 11.0030
2025-08-29 01:51:46 - pico-train - INFO - โโโ Learning Rate: 3.13e-07
2025-08-29 01:51:46 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:51:58 - pico-train - INFO - Step 75 -- ๐ Training Metrics
2025-08-29 01:51:58 - pico-train - INFO - โโโ Loss: 11.0034
2025-08-29 01:51:58 - pico-train - INFO - โโโ Learning Rate: 4.69e-07
2025-08-29 01:51:58 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:52:11 - pico-train - INFO - Step 100 -- ๐ Training Metrics
2025-08-29 01:52:11 - pico-train - INFO - โโโ Loss: 10.9962
2025-08-29 01:52:11 - pico-train - INFO - โโโ Learning Rate: 6.25e-07
2025-08-29 01:52:11 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:52:24 - pico-train - INFO - Step 125 -- ๐ Training Metrics
2025-08-29 01:52:24 - pico-train - INFO - โโโ Loss: 10.9973
2025-08-29 01:52:24 - pico-train - INFO - โโโ Learning Rate: 7.81e-07
2025-08-29 01:52:24 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:52:36 - pico-train - INFO - Step 150 -- ๐ Training Metrics
2025-08-29 01:52:36 - pico-train - INFO - โโโ Loss: 10.9943
2025-08-29 01:52:36 - pico-train - INFO - โโโ Learning Rate: 9.38e-07
2025-08-29 01:52:36 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:52:49 - pico-train - INFO - Step 175 -- ๐ Training Metrics
2025-08-29 01:52:49 - pico-train - INFO - โโโ Loss: 10.9860
2025-08-29 01:52:49 - pico-train - INFO - โโโ Learning Rate: 1.09e-06
2025-08-29 01:52:49 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:53:02 - pico-train - INFO - Step 200 -- ๐ Training Metrics
2025-08-29 01:53:02 - pico-train - INFO - โโโ Loss: 10.9885
2025-08-29 01:53:02 - pico-train - INFO - โโโ Learning Rate: 1.25e-06
2025-08-29 01:53:02 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:53:14 - pico-train - INFO - Step 225 -- ๐ Training Metrics
2025-08-29 01:53:14 - pico-train - INFO - โโโ Loss: 10.9816
2025-08-29 01:53:14 - pico-train - INFO - โโโ Learning Rate: 1.41e-06
2025-08-29 01:53:14 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:53:27 - pico-train - INFO - Step 250 -- ๐ Training Metrics
2025-08-29 01:53:27 - pico-train - INFO - โโโ Loss: 10.9786
2025-08-29 01:53:27 - pico-train - INFO - โโโ Learning Rate: 1.56e-06
2025-08-29 01:53:27 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:53:40 - pico-train - INFO - Step 275 -- ๐ Training Metrics
2025-08-29 01:53:40 - pico-train - INFO - โโโ Loss: 10.9707
2025-08-29 01:53:40 - pico-train - INFO - โโโ Learning Rate: 1.72e-06
2025-08-29 01:53:40 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:53:53 - pico-train - INFO - Step 300 -- ๐ Training Metrics
2025-08-29 01:53:53 - pico-train - INFO - โโโ Loss: 10.9700
2025-08-29 01:53:53 - pico-train - INFO - โโโ Learning Rate: 1.88e-06
2025-08-29 01:53:53 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:54:05 - pico-train - INFO - Step 325 -- ๐ Training Metrics
2025-08-29 01:54:05 - pico-train - INFO - โโโ Loss: 10.9626
2025-08-29 01:54:05 - pico-train - INFO - โโโ Learning Rate: 2.03e-06
2025-08-29 01:54:05 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:54:18 - pico-train - INFO - Step 350 -- ๐ Training Metrics
2025-08-29 01:54:18 - pico-train - INFO - โโโ Loss: 10.9580
2025-08-29 01:54:18 - pico-train - INFO - โโโ Learning Rate: 2.19e-06
2025-08-29 01:54:18 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:54:31 - pico-train - INFO - Step 375 -- ๐ Training Metrics
2025-08-29 01:54:31 - pico-train - INFO - โโโ Loss: 10.9486
2025-08-29 01:54:31 - pico-train - INFO - โโโ Learning Rate: 2.34e-06
2025-08-29 01:54:31 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:54:44 - pico-train - INFO - Step 400 -- ๐ Training Metrics
2025-08-29 01:54:44 - pico-train - INFO - โโโ Loss: 10.9417
2025-08-29 01:54:44 - pico-train - INFO - โโโ Learning Rate: 2.50e-06
2025-08-29 01:54:44 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:54:57 - pico-train - INFO - Step 425 -- ๐ Training Metrics
2025-08-29 01:54:57 - pico-train - INFO - โโโ Loss: 10.9328
2025-08-29 01:54:57 - pico-train - INFO - โโโ Learning Rate: 2.66e-06
2025-08-29 01:54:57 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:55:10 - pico-train - INFO - Step 450 -- ๐ Training Metrics
2025-08-29 01:55:10 - pico-train - INFO - โโโ Loss: 10.9242
2025-08-29 01:55:10 - pico-train - INFO - โโโ Learning Rate: 2.81e-06
2025-08-29 01:55:10 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:55:22 - pico-train - INFO - Step 475 -- ๐ Training Metrics
2025-08-29 01:55:22 - pico-train - INFO - โโโ Loss: 10.9170
2025-08-29 01:55:22 - pico-train - INFO - โโโ Learning Rate: 2.97e-06
2025-08-29 01:55:22 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-29 01:55:35 - pico-train - INFO - Step 500 -- ๐พ Saving Checkpoint
|