|
2025-08-29 01:51:16 - pico-train - INFO - Step 0 -- ๐ Evaluation Results |
|
2025-08-29 01:51:16 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-29 01:51:17 - pico-train - INFO - ================================================== |
|
2025-08-29 01:51:17 - pico-train - INFO - โจ Training Configuration |
|
2025-08-29 01:51:17 - pico-train - INFO - ================================================== |
|
2025-08-29 01:51:17 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ checkpointing: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ checkpoints_dir: checkpoints โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ evaluation: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ eval_results_dir: eval_results โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ hf_checkpoint: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ collection_slug: null โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ learning_dynamics: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ eval_data: null โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ layer_suffixes: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ - attention.v_proj โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ - attention.o_proj โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ - swiglu.w_2 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ sequence_idx: -1 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ logs_dir: logs โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma29k-v3 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ runs_dir: runs โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ save_every_n_steps: 500 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ save_to_hf: true โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ training: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ auto_resume: true โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ data: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ dataloader: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 4 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ dataset: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ name: pico-lm/pretokenized-dolma โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ tokenizer: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ evaluation: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ metrics: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ - paloma โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ paloma: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ dataset_split: val โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ max_length: 2048 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ model: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ activation_hidden_dim: 384 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ attention_n_heads: 12 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ attention_n_kv_heads: 4 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ batch_size: 1024 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ d_model: 96 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ max_seq_len: 2048 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ model_type: pico_decoder โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ n_layers: 12 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ norm_eps: 1.0e-06 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ position_emb_theta: 10000.0 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ monitoring: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ logging: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ log_every_n_steps: 25 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ log_level: INFO โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ save_to_wandb: false โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ wandb: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ entity: boymyc โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ project: pico-decoder-tiny โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ training: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ fabric: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ accelerator: cuda โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ num_devices: 1 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ num_nodes: 1 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ precision: bf16-mixed โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ max_steps: 20000 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ optimization: โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ gradient_accumulation_steps: 4 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ lr: 5.0e-05 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ lr_scheduler: linear_with_warmup โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ lr_warmup_steps: 8000 โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ optimizer: adamw โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โ โ |
|
2025-08-29 01:51:17 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ |
|
2025-08-29 01:51:17 - pico-train - INFO - ================================================== |
|
2025-08-29 01:51:17 - pico-train - INFO - โญ Runtime Summary: |
|
2025-08-29 01:51:17 - pico-train - INFO - ================================================== |
|
2025-08-29 01:51:17 - pico-train - INFO - Starting from step: 0 |
|
2025-08-29 01:51:17 - pico-train - INFO - Model Setup: |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Total Parameters: 11,282,784 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 |
|
2025-08-29 01:51:17 - pico-train - INFO - Distributed Setup: |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Number of Devices: 1 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Available Memory: 33.68 GB |
|
2025-08-29 01:51:17 - pico-train - INFO - Software Setup: |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Python Version: 3.10.12 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ CUDA Version: 12.8 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic |
|
2025-08-29 01:51:17 - pico-train - INFO - Batch Size Configuration: |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Global Batch Size: 4 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Per Device Batch Size: 1 |
|
2025-08-29 01:51:17 - pico-train - INFO - โโ Gradient Accumulation Steps: 4 |
|
2025-08-29 01:51:17 - pico-train - INFO - ================================================== |
|
2025-08-29 01:51:18 - pico-train - INFO - Step 0 -- ๐ Training Metrics |
|
2025-08-29 01:51:18 - pico-train - INFO - โโโ Loss: 10.9975 |
|
2025-08-29 01:51:18 - pico-train - INFO - โโโ Learning Rate: 0.00e+00 |
|
2025-08-29 01:51:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:51:18 - pico-train - INFO - Step 0 -- ๐ Saving Learning Dynamics |
|
2025-08-29 01:51:33 - pico-train - INFO - Step 25 -- ๐ Training Metrics |
|
2025-08-29 01:51:33 - pico-train - INFO - โโโ Loss: 10.9972 |
|
2025-08-29 01:51:33 - pico-train - INFO - โโโ Learning Rate: 1.56e-07 |
|
2025-08-29 01:51:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:51:46 - pico-train - INFO - Step 50 -- ๐ Training Metrics |
|
2025-08-29 01:51:46 - pico-train - INFO - โโโ Loss: 11.0030 |
|
2025-08-29 01:51:46 - pico-train - INFO - โโโ Learning Rate: 3.13e-07 |
|
2025-08-29 01:51:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:51:58 - pico-train - INFO - Step 75 -- ๐ Training Metrics |
|
2025-08-29 01:51:58 - pico-train - INFO - โโโ Loss: 11.0034 |
|
2025-08-29 01:51:58 - pico-train - INFO - โโโ Learning Rate: 4.69e-07 |
|
2025-08-29 01:51:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:52:11 - pico-train - INFO - Step 100 -- ๐ Training Metrics |
|
2025-08-29 01:52:11 - pico-train - INFO - โโโ Loss: 10.9962 |
|
2025-08-29 01:52:11 - pico-train - INFO - โโโ Learning Rate: 6.25e-07 |
|
2025-08-29 01:52:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:52:24 - pico-train - INFO - Step 125 -- ๐ Training Metrics |
|
2025-08-29 01:52:24 - pico-train - INFO - โโโ Loss: 10.9973 |
|
2025-08-29 01:52:24 - pico-train - INFO - โโโ Learning Rate: 7.81e-07 |
|
2025-08-29 01:52:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:52:36 - pico-train - INFO - Step 150 -- ๐ Training Metrics |
|
2025-08-29 01:52:36 - pico-train - INFO - โโโ Loss: 10.9943 |
|
2025-08-29 01:52:36 - pico-train - INFO - โโโ Learning Rate: 9.38e-07 |
|
2025-08-29 01:52:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:52:49 - pico-train - INFO - Step 175 -- ๐ Training Metrics |
|
2025-08-29 01:52:49 - pico-train - INFO - โโโ Loss: 10.9860 |
|
2025-08-29 01:52:49 - pico-train - INFO - โโโ Learning Rate: 1.09e-06 |
|
2025-08-29 01:52:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:53:02 - pico-train - INFO - Step 200 -- ๐ Training Metrics |
|
2025-08-29 01:53:02 - pico-train - INFO - โโโ Loss: 10.9885 |
|
2025-08-29 01:53:02 - pico-train - INFO - โโโ Learning Rate: 1.25e-06 |
|
2025-08-29 01:53:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:53:14 - pico-train - INFO - Step 225 -- ๐ Training Metrics |
|
2025-08-29 01:53:14 - pico-train - INFO - โโโ Loss: 10.9816 |
|
2025-08-29 01:53:14 - pico-train - INFO - โโโ Learning Rate: 1.41e-06 |
|
2025-08-29 01:53:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:53:27 - pico-train - INFO - Step 250 -- ๐ Training Metrics |
|
2025-08-29 01:53:27 - pico-train - INFO - โโโ Loss: 10.9786 |
|
2025-08-29 01:53:27 - pico-train - INFO - โโโ Learning Rate: 1.56e-06 |
|
2025-08-29 01:53:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:53:40 - pico-train - INFO - Step 275 -- ๐ Training Metrics |
|
2025-08-29 01:53:40 - pico-train - INFO - โโโ Loss: 10.9707 |
|
2025-08-29 01:53:40 - pico-train - INFO - โโโ Learning Rate: 1.72e-06 |
|
2025-08-29 01:53:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:53:53 - pico-train - INFO - Step 300 -- ๐ Training Metrics |
|
2025-08-29 01:53:53 - pico-train - INFO - โโโ Loss: 10.9700 |
|
2025-08-29 01:53:53 - pico-train - INFO - โโโ Learning Rate: 1.88e-06 |
|
2025-08-29 01:53:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:54:05 - pico-train - INFO - Step 325 -- ๐ Training Metrics |
|
2025-08-29 01:54:05 - pico-train - INFO - โโโ Loss: 10.9626 |
|
2025-08-29 01:54:05 - pico-train - INFO - โโโ Learning Rate: 2.03e-06 |
|
2025-08-29 01:54:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:54:18 - pico-train - INFO - Step 350 -- ๐ Training Metrics |
|
2025-08-29 01:54:18 - pico-train - INFO - โโโ Loss: 10.9580 |
|
2025-08-29 01:54:18 - pico-train - INFO - โโโ Learning Rate: 2.19e-06 |
|
2025-08-29 01:54:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:54:31 - pico-train - INFO - Step 375 -- ๐ Training Metrics |
|
2025-08-29 01:54:31 - pico-train - INFO - โโโ Loss: 10.9486 |
|
2025-08-29 01:54:31 - pico-train - INFO - โโโ Learning Rate: 2.34e-06 |
|
2025-08-29 01:54:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:54:44 - pico-train - INFO - Step 400 -- ๐ Training Metrics |
|
2025-08-29 01:54:44 - pico-train - INFO - โโโ Loss: 10.9417 |
|
2025-08-29 01:54:44 - pico-train - INFO - โโโ Learning Rate: 2.50e-06 |
|
2025-08-29 01:54:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:54:57 - pico-train - INFO - Step 425 -- ๐ Training Metrics |
|
2025-08-29 01:54:57 - pico-train - INFO - โโโ Loss: 10.9328 |
|
2025-08-29 01:54:57 - pico-train - INFO - โโโ Learning Rate: 2.66e-06 |
|
2025-08-29 01:54:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:55:10 - pico-train - INFO - Step 450 -- ๐ Training Metrics |
|
2025-08-29 01:55:10 - pico-train - INFO - โโโ Loss: 10.9242 |
|
2025-08-29 01:55:10 - pico-train - INFO - โโโ Learning Rate: 2.81e-06 |
|
2025-08-29 01:55:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:55:22 - pico-train - INFO - Step 475 -- ๐ Training Metrics |
|
2025-08-29 01:55:22 - pico-train - INFO - โโโ Loss: 10.9170 |
|
2025-08-29 01:55:22 - pico-train - INFO - โโโ Learning Rate: 2.97e-06 |
|
2025-08-29 01:55:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:55:35 - pico-train - INFO - Step 500 -- ๐พ Saving Checkpoint |
|
|