File size: 15,985 Bytes
ce2c393 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
2025-08-30 01:43:03 - pico-train - INFO - Step 32000 -- ๐ Evaluation Results
2025-08-30 01:43:03 - pico-train - INFO - โโโ paloma: 2.977755235898109e+26
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - โจ Training Configuration
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
2025-08-30 01:43:05 - pico-train - INFO - โ checkpointing: โ
2025-08-30 01:43:05 - pico-train - INFO - โ checkpoints_dir: checkpoints โ
2025-08-30 01:43:05 - pico-train - INFO - โ evaluation: โ
2025-08-30 01:43:05 - pico-train - INFO - โ eval_results_dir: eval_results โ
2025-08-30 01:43:05 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ
2025-08-30 01:43:05 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ
2025-08-30 01:43:05 - pico-train - INFO - โ hf_checkpoint: โ
2025-08-30 01:43:05 - pico-train - INFO - โ collection_slug: null โ
2025-08-30 01:43:05 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ
2025-08-30 01:43:05 - pico-train - INFO - โ learning_dynamics: โ
2025-08-30 01:43:05 - pico-train - INFO - โ batch_size: 1 โ
2025-08-30 01:43:05 - pico-train - INFO - โ eval_data: null โ
2025-08-30 01:43:05 - pico-train - INFO - โ layer_suffixes: โ
2025-08-30 01:43:05 - pico-train - INFO - โ - attention.v_proj โ
2025-08-30 01:43:05 - pico-train - INFO - โ - attention.o_proj โ
2025-08-30 01:43:05 - pico-train - INFO - โ - swiglu.w_2 โ
2025-08-30 01:43:05 - pico-train - INFO - โ sequence_idx: -1 โ
2025-08-30 01:43:05 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ
2025-08-30 01:43:05 - pico-train - INFO - โ logs_dir: logs โ
2025-08-30 01:43:05 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma5M-v1 โ
2025-08-30 01:43:05 - pico-train - INFO - โ runs_dir: runs โ
2025-08-30 01:43:05 - pico-train - INFO - โ save_every_n_steps: 500 โ
2025-08-30 01:43:05 - pico-train - INFO - โ save_to_hf: true โ
2025-08-30 01:43:05 - pico-train - INFO - โ training: โ
2025-08-30 01:43:05 - pico-train - INFO - โ auto_resume: true โ
2025-08-30 01:43:05 - pico-train - INFO - โ data: โ
2025-08-30 01:43:05 - pico-train - INFO - โ dataloader: โ
2025-08-30 01:43:05 - pico-train - INFO - โ batch_size: 4 โ
2025-08-30 01:43:05 - pico-train - INFO - โ dataset: โ
2025-08-30 01:43:05 - pico-train - INFO - โ name: ThomasTheMaker/pretokenized-dolma-5M โ
2025-08-30 01:43:05 - pico-train - INFO - โ tokenizer: โ
2025-08-30 01:43:05 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ
2025-08-30 01:43:05 - pico-train - INFO - โ vocab_size: 50304 โ
2025-08-30 01:43:05 - pico-train - INFO - โ evaluation: โ
2025-08-30 01:43:05 - pico-train - INFO - โ metrics: โ
2025-08-30 01:43:05 - pico-train - INFO - โ - paloma โ
2025-08-30 01:43:05 - pico-train - INFO - โ paloma: โ
2025-08-30 01:43:05 - pico-train - INFO - โ batch_size: 1 โ
2025-08-30 01:43:05 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ
2025-08-30 01:43:05 - pico-train - INFO - โ dataset_split: val โ
2025-08-30 01:43:05 - pico-train - INFO - โ max_length: 2048 โ
2025-08-30 01:43:05 - pico-train - INFO - โ model: โ
2025-08-30 01:43:05 - pico-train - INFO - โ activation_hidden_dim: 384 โ
2025-08-30 01:43:05 - pico-train - INFO - โ attention_n_heads: 12 โ
2025-08-30 01:43:05 - pico-train - INFO - โ attention_n_kv_heads: 4 โ
2025-08-30 01:43:05 - pico-train - INFO - โ batch_size: 1024 โ
2025-08-30 01:43:05 - pico-train - INFO - โ d_model: 96 โ
2025-08-30 01:43:05 - pico-train - INFO - โ max_seq_len: 2048 โ
2025-08-30 01:43:05 - pico-train - INFO - โ model_type: pico_decoder โ
2025-08-30 01:43:05 - pico-train - INFO - โ n_layers: 12 โ
2025-08-30 01:43:05 - pico-train - INFO - โ norm_eps: 1.0e-06 โ
2025-08-30 01:43:05 - pico-train - INFO - โ position_emb_theta: 10000.0 โ
2025-08-30 01:43:05 - pico-train - INFO - โ vocab_size: 50304 โ
2025-08-30 01:43:05 - pico-train - INFO - โ monitoring: โ
2025-08-30 01:43:05 - pico-train - INFO - โ logging: โ
2025-08-30 01:43:05 - pico-train - INFO - โ log_every_n_steps: 25 โ
2025-08-30 01:43:05 - pico-train - INFO - โ log_level: INFO โ
2025-08-30 01:43:05 - pico-train - INFO - โ save_to_wandb: false โ
2025-08-30 01:43:05 - pico-train - INFO - โ wandb: โ
2025-08-30 01:43:05 - pico-train - INFO - โ entity: boymyc โ
2025-08-30 01:43:05 - pico-train - INFO - โ project: pico-decoder-tiny โ
2025-08-30 01:43:05 - pico-train - INFO - โ training: โ
2025-08-30 01:43:05 - pico-train - INFO - โ fabric: โ
2025-08-30 01:43:05 - pico-train - INFO - โ accelerator: cuda โ
2025-08-30 01:43:05 - pico-train - INFO - โ num_devices: 1 โ
2025-08-30 01:43:05 - pico-train - INFO - โ num_nodes: 1 โ
2025-08-30 01:43:05 - pico-train - INFO - โ precision: bf16-mixed โ
2025-08-30 01:43:05 - pico-train - INFO - โ max_steps: 20000 โ
2025-08-30 01:43:05 - pico-train - INFO - โ optimization: โ
2025-08-30 01:43:05 - pico-train - INFO - โ gradient_accumulation_steps: 4 โ
2025-08-30 01:43:05 - pico-train - INFO - โ lr: 5.0e-05 โ
2025-08-30 01:43:05 - pico-train - INFO - โ lr_scheduler: cosine โ
2025-08-30 01:43:05 - pico-train - INFO - โ lr_warmup_steps: 8000 โ
2025-08-30 01:43:05 - pico-train - INFO - โ optimizer: adamw โ
2025-08-30 01:43:05 - pico-train - INFO - โ โ
2025-08-30 01:43:05 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - โญ Runtime Summary:
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - Starting from step: 32000
2025-08-30 01:43:05 - pico-train - INFO - Model Setup:
2025-08-30 01:43:05 - pico-train - INFO - โโ Total Parameters: 11,282,784
2025-08-30 01:43:05 - pico-train - INFO - โโ Trainable Parameters: 11,282,784
2025-08-30 01:43:05 - pico-train - INFO - Distributed Setup:
2025-08-30 01:43:05 - pico-train - INFO - โโ Number of Devices: 1
2025-08-30 01:43:05 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090
2025-08-30 01:43:05 - pico-train - INFO - โโ Available Memory: 33.68 GB
2025-08-30 01:43:05 - pico-train - INFO - Software Setup:
2025-08-30 01:43:05 - pico-train - INFO - โโ Python Version: 3.10.12
2025-08-30 01:43:05 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128
2025-08-30 01:43:05 - pico-train - INFO - โโ CUDA Version: 12.8
2025-08-30 01:43:05 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic
2025-08-30 01:43:05 - pico-train - INFO - Batch Size Configuration:
2025-08-30 01:43:05 - pico-train - INFO - โโ Global Batch Size: 4
2025-08-30 01:43:05 - pico-train - INFO - โโ Per Device Batch Size: 1
2025-08-30 01:43:05 - pico-train - INFO - โโ Gradient Accumulation Steps: 4
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:06 - pico-train - INFO - Step 32000 -- ๐ Training Metrics
2025-08-30 01:43:06 - pico-train - INFO - โโโ Loss: 6.3376
2025-08-30 01:43:06 - pico-train - INFO - โโโ Learning Rate: 7.32e-06
2025-08-30 01:43:06 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:43:06 - pico-train - INFO - Step 32000 -- ๐ Saving Learning Dynamics
2025-08-30 01:43:20 - pico-train - INFO - Step 32025 -- ๐ Training Metrics
2025-08-30 01:43:20 - pico-train - INFO - โโโ Loss: 6.1999
2025-08-30 01:43:20 - pico-train - INFO - โโโ Learning Rate: 7.28e-06
2025-08-30 01:43:20 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:43:33 - pico-train - INFO - Step 32050 -- ๐ Training Metrics
2025-08-30 01:43:33 - pico-train - INFO - โโโ Loss: 6.1488
2025-08-30 01:43:33 - pico-train - INFO - โโโ Learning Rate: 7.24e-06
2025-08-30 01:43:33 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:43:45 - pico-train - INFO - Step 32075 -- ๐ Training Metrics
2025-08-30 01:43:45 - pico-train - INFO - โโโ Loss: 6.0460
2025-08-30 01:43:45 - pico-train - INFO - โโโ Learning Rate: 7.19e-06
2025-08-30 01:43:45 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:43:58 - pico-train - INFO - Step 32100 -- ๐ Training Metrics
2025-08-30 01:43:58 - pico-train - INFO - โโโ Loss: 6.1627
2025-08-30 01:43:58 - pico-train - INFO - โโโ Learning Rate: 7.15e-06
2025-08-30 01:43:58 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:44:11 - pico-train - INFO - Step 32125 -- ๐ Training Metrics
2025-08-30 01:44:11 - pico-train - INFO - โโโ Loss: 6.2085
2025-08-30 01:44:11 - pico-train - INFO - โโโ Learning Rate: 7.11e-06
2025-08-30 01:44:11 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:44:23 - pico-train - INFO - Step 32150 -- ๐ Training Metrics
2025-08-30 01:44:23 - pico-train - INFO - โโโ Loss: 6.1659
2025-08-30 01:44:23 - pico-train - INFO - โโโ Learning Rate: 7.06e-06
2025-08-30 01:44:23 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:44:36 - pico-train - INFO - Step 32175 -- ๐ Training Metrics
2025-08-30 01:44:36 - pico-train - INFO - โโโ Loss: 6.1719
2025-08-30 01:44:36 - pico-train - INFO - โโโ Learning Rate: 7.02e-06
2025-08-30 01:44:36 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:44:48 - pico-train - INFO - Step 32200 -- ๐ Training Metrics
2025-08-30 01:44:48 - pico-train - INFO - โโโ Loss: 6.2081
2025-08-30 01:44:48 - pico-train - INFO - โโโ Learning Rate: 6.98e-06
2025-08-30 01:44:48 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:45:01 - pico-train - INFO - Step 32225 -- ๐ Training Metrics
2025-08-30 01:45:01 - pico-train - INFO - โโโ Loss: 6.1955
2025-08-30 01:45:01 - pico-train - INFO - โโโ Learning Rate: 6.94e-06
2025-08-30 01:45:01 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:45:14 - pico-train - INFO - Step 32250 -- ๐ Training Metrics
2025-08-30 01:45:14 - pico-train - INFO - โโโ Loss: 6.1139
2025-08-30 01:45:14 - pico-train - INFO - โโโ Learning Rate: 6.89e-06
2025-08-30 01:45:14 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:45:26 - pico-train - INFO - Step 32275 -- ๐ Training Metrics
2025-08-30 01:45:26 - pico-train - INFO - โโโ Loss: 6.1075
2025-08-30 01:45:26 - pico-train - INFO - โโโ Learning Rate: 6.85e-06
2025-08-30 01:45:26 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:45:39 - pico-train - INFO - Step 32300 -- ๐ Training Metrics
2025-08-30 01:45:39 - pico-train - INFO - โโโ Loss: 6.0814
2025-08-30 01:45:39 - pico-train - INFO - โโโ Learning Rate: 6.81e-06
2025-08-30 01:45:39 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:45:51 - pico-train - INFO - Step 32325 -- ๐ Training Metrics
2025-08-30 01:45:51 - pico-train - INFO - โโโ Loss: 6.0880
2025-08-30 01:45:51 - pico-train - INFO - โโโ Learning Rate: 6.77e-06
2025-08-30 01:45:51 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:46:04 - pico-train - INFO - Step 32350 -- ๐ Training Metrics
2025-08-30 01:46:04 - pico-train - INFO - โโโ Loss: 6.1997
2025-08-30 01:46:04 - pico-train - INFO - โโโ Learning Rate: 6.73e-06
2025-08-30 01:46:04 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:46:16 - pico-train - INFO - Step 32375 -- ๐ Training Metrics
2025-08-30 01:46:16 - pico-train - INFO - โโโ Loss: 6.1376
2025-08-30 01:46:16 - pico-train - INFO - โโโ Learning Rate: 6.68e-06
2025-08-30 01:46:16 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:46:29 - pico-train - INFO - Step 32400 -- ๐ Training Metrics
2025-08-30 01:46:29 - pico-train - INFO - โโโ Loss: 6.1077
2025-08-30 01:46:29 - pico-train - INFO - โโโ Learning Rate: 6.64e-06
2025-08-30 01:46:29 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:46:42 - pico-train - INFO - Step 32425 -- ๐ Training Metrics
2025-08-30 01:46:42 - pico-train - INFO - โโโ Loss: 6.2641
2025-08-30 01:46:42 - pico-train - INFO - โโโ Learning Rate: 6.60e-06
2025-08-30 01:46:42 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:46:54 - pico-train - INFO - Step 32450 -- ๐ Training Metrics
2025-08-30 01:46:54 - pico-train - INFO - โโโ Loss: 6.1020
2025-08-30 01:46:54 - pico-train - INFO - โโโ Learning Rate: 6.56e-06
2025-08-30 01:46:54 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:47:07 - pico-train - INFO - Step 32475 -- ๐ Training Metrics
2025-08-30 01:47:07 - pico-train - INFO - โโโ Loss: 6.2170
2025-08-30 01:47:07 - pico-train - INFO - โโโ Learning Rate: 6.52e-06
2025-08-30 01:47:07 - pico-train - INFO - โโโ Inf/NaN count: 0
2025-08-30 01:47:19 - pico-train - INFO - Step 32500 -- ๐พ Saving Checkpoint
|