2025-08-29 00:40:55 - pico-train - INFO - Step 0 -- 📊 Evaluation Results 2025-08-29 00:40:55 - pico-train - INFO - └── paloma: inf 2025-08-29 00:40:57 - pico-train - INFO - ================================================== 2025-08-29 00:40:57 - pico-train - INFO - ✨ Training Configuration 2025-08-29 00:40:57 - pico-train - INFO - ================================================== 2025-08-29 00:40:57 - pico-train - INFO - ╭─────────────────────────────────────────────────────╮ 2025-08-29 00:40:57 - pico-train - INFO - │ checkpointing: │ 2025-08-29 00:40:57 - pico-train - INFO - │ checkpoints_dir: checkpoints │ 2025-08-29 00:40:57 - pico-train - INFO - │ evaluation: │ 2025-08-29 00:40:57 - pico-train - INFO - │ eval_results_dir: eval_results │ 2025-08-29 00:40:57 - pico-train - INFO - │ fabric_checkpoint_dir: fabric_state │ 2025-08-29 00:40:57 - pico-train - INFO - │ fabric_checkpoint_filename: checkpoint.pt │ 2025-08-29 00:40:57 - pico-train - INFO - │ hf_checkpoint: │ 2025-08-29 00:40:57 - pico-train - INFO - │ collection_slug: null │ 2025-08-29 00:40:57 - pico-train - INFO - │ repo_id: ThomasTheMaker/pico-decoder-tiny │ 2025-08-29 00:40:57 - pico-train - INFO - │ learning_dynamics: │ 2025-08-29 00:40:57 - pico-train - INFO - │ batch_size: 1 │ 2025-08-29 00:40:57 - pico-train - INFO - │ eval_data: null │ 2025-08-29 00:40:57 - pico-train - INFO - │ layer_suffixes: │ 2025-08-29 00:40:57 - pico-train - INFO - │ - attention.v_proj │ 2025-08-29 00:40:57 - pico-train - INFO - │ - attention.o_proj │ 2025-08-29 00:40:57 - pico-train - INFO - │ - swiglu.w_2 │ 2025-08-29 00:40:57 - pico-train - INFO - │ sequence_idx: -1 │ 2025-08-29 00:40:57 - pico-train - INFO - │ learning_dynamics_dir: learning_dynamics │ 2025-08-29 00:40:57 - pico-train - INFO - │ logs_dir: logs │ 2025-08-29 00:40:57 - pico-train - INFO - │ run_name: pico-decoder-tiny-dolma29k-v2 │ 2025-08-29 00:40:57 - pico-train - INFO - │ runs_dir: runs │ 2025-08-29 00:40:57 - pico-train - INFO - │ save_every_n_steps: 1000 │ 2025-08-29 00:40:57 - pico-train - INFO - │ save_to_hf: true │ 2025-08-29 00:40:57 - pico-train - INFO - │ training: │ 2025-08-29 00:40:57 - pico-train - INFO - │ auto_resume: true │ 2025-08-29 00:40:57 - pico-train - INFO - │ data: │ 2025-08-29 00:40:57 - pico-train - INFO - │ dataloader: │ 2025-08-29 00:40:57 - pico-train - INFO - │ batch_size: 8 │ 2025-08-29 00:40:57 - pico-train - INFO - │ dataset: │ 2025-08-29 00:40:57 - pico-train - INFO - │ name: pico-lm/pretokenized-dolma │ 2025-08-29 00:40:57 - pico-train - INFO - │ tokenizer: │ 2025-08-29 00:40:57 - pico-train - INFO - │ name: allenai/OLMo-7B-0724-hf │ 2025-08-29 00:40:57 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-29 00:40:57 - pico-train - INFO - │ evaluation: │ 2025-08-29 00:40:57 - pico-train - INFO - │ metrics: │ 2025-08-29 00:40:57 - pico-train - INFO - │ - paloma │ 2025-08-29 00:40:57 - pico-train - INFO - │ paloma: │ 2025-08-29 00:40:57 - pico-train - INFO - │ batch_size: 1 │ 2025-08-29 00:40:57 - pico-train - INFO - │ dataset_name: pico-lm/pretokenized-paloma-tinsy │ 2025-08-29 00:40:57 - pico-train - INFO - │ dataset_split: val │ 2025-08-29 00:40:57 - pico-train - INFO - │ max_length: 2048 │ 2025-08-29 00:40:57 - pico-train - INFO - │ model: │ 2025-08-29 00:40:57 - pico-train - INFO - │ activation_hidden_dim: 384 │ 2025-08-29 00:40:57 - pico-train - INFO - │ attention_n_heads: 12 │ 2025-08-29 00:40:57 - pico-train - INFO - │ attention_n_kv_heads: 4 │ 2025-08-29 00:40:57 - pico-train - INFO - │ batch_size: 1024 │ 2025-08-29 00:40:57 - pico-train - INFO - │ d_model: 96 │ 2025-08-29 00:40:57 - pico-train - INFO - │ max_seq_len: 2048 │ 2025-08-29 00:40:57 - pico-train - INFO - │ model_type: pico_decoder │ 2025-08-29 00:40:57 - pico-train - INFO - │ n_layers: 12 │ 2025-08-29 00:40:57 - pico-train - INFO - │ norm_eps: 1.0e-06 │ 2025-08-29 00:40:57 - pico-train - INFO - │ position_emb_theta: 10000.0 │ 2025-08-29 00:40:57 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-29 00:40:57 - pico-train - INFO - │ monitoring: │ 2025-08-29 00:40:57 - pico-train - INFO - │ logging: │ 2025-08-29 00:40:57 - pico-train - INFO - │ log_every_n_steps: 50 │ 2025-08-29 00:40:57 - pico-train - INFO - │ log_level: INFO │ 2025-08-29 00:40:57 - pico-train - INFO - │ save_to_wandb: false │ 2025-08-29 00:40:57 - pico-train - INFO - │ wandb: │ 2025-08-29 00:40:57 - pico-train - INFO - │ entity: boymyc │ 2025-08-29 00:40:57 - pico-train - INFO - │ project: pico-decoder-tiny │ 2025-08-29 00:40:57 - pico-train - INFO - │ training: │ 2025-08-29 00:40:57 - pico-train - INFO - │ fabric: │ 2025-08-29 00:40:57 - pico-train - INFO - │ accelerator: cuda │ 2025-08-29 00:40:57 - pico-train - INFO - │ num_devices: 1 │ 2025-08-29 00:40:57 - pico-train - INFO - │ num_nodes: 1 │ 2025-08-29 00:40:57 - pico-train - INFO - │ precision: bf16-mixed │ 2025-08-29 00:40:57 - pico-train - INFO - │ max_steps: 200000 │ 2025-08-29 00:40:57 - pico-train - INFO - │ optimization: │ 2025-08-29 00:40:57 - pico-train - INFO - │ gradient_accumulation_steps: 2 │ 2025-08-29 00:40:57 - pico-train - INFO - │ lr: 0.0001 │ 2025-08-29 00:40:57 - pico-train - INFO - │ lr_scheduler: linear_with_warmup │ 2025-08-29 00:40:57 - pico-train - INFO - │ lr_warmup_steps: 5000 │ 2025-08-29 00:40:57 - pico-train - INFO - │ optimizer: adamw │ 2025-08-29 00:40:57 - pico-train - INFO - │ │ 2025-08-29 00:40:57 - pico-train - INFO - ╰─────────────────────────────────────────────────────╯ 2025-08-29 00:40:57 - pico-train - INFO - ================================================== 2025-08-29 00:40:57 - pico-train - INFO - ⛭ Runtime Summary: 2025-08-29 00:40:57 - pico-train - INFO - ================================================== 2025-08-29 00:40:57 - pico-train - INFO - Starting from step: 0 2025-08-29 00:40:57 - pico-train - INFO - Model Setup: 2025-08-29 00:40:57 - pico-train - INFO - └─ Total Parameters: 11,282,784 2025-08-29 00:40:57 - pico-train - INFO - └─ Trainable Parameters: 11,282,784 2025-08-29 00:40:57 - pico-train - INFO - Distributed Setup: 2025-08-29 00:40:57 - pico-train - INFO - └─ Number of Devices: 1 2025-08-29 00:40:57 - pico-train - INFO - └─ Device Type: NVIDIA GeForce RTX 5090 2025-08-29 00:40:57 - pico-train - INFO - └─ Available Memory: 33.68 GB 2025-08-29 00:40:57 - pico-train - INFO - Software Setup: 2025-08-29 00:40:57 - pico-train - INFO - └─ Python Version: 3.10.12 2025-08-29 00:40:57 - pico-train - INFO - └─ PyTorch Version: 2.8.0+cu128 2025-08-29 00:40:57 - pico-train - INFO - └─ CUDA Version: 12.8 2025-08-29 00:40:57 - pico-train - INFO - └─ Operating System: Linux 6.8.0-63-generic 2025-08-29 00:40:57 - pico-train - INFO - Batch Size Configuration: 2025-08-29 00:40:57 - pico-train - INFO - └─ Global Batch Size: 8 2025-08-29 00:40:57 - pico-train - INFO - └─ Per Device Batch Size: 4 2025-08-29 00:40:57 - pico-train - INFO - └─ Gradient Accumulation Steps: 2 2025-08-29 00:40:57 - pico-train - INFO - ================================================== 2025-08-29 00:40:58 - pico-train - INFO - Step 0 -- 🔄 Training Metrics 2025-08-29 00:40:58 - pico-train - INFO - ├── Loss: 10.9848 2025-08-29 00:40:58 - pico-train - INFO - ├── Learning Rate: 0.00e+00 2025-08-29 00:40:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:40:58 - pico-train - INFO - Step 0 -- 📈 Saving Learning Dynamics 2025-08-29 00:41:29 - pico-train - INFO - Step 50 -- 🔄 Training Metrics 2025-08-29 00:41:29 - pico-train - INFO - ├── Loss: 11.0005 2025-08-29 00:41:29 - pico-train - INFO - ├── Learning Rate: 1.00e-06 2025-08-29 00:41:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:41:55 - pico-train - INFO - Step 100 -- 🔄 Training Metrics 2025-08-29 00:41:55 - pico-train - INFO - ├── Loss: 10.9918 2025-08-29 00:41:55 - pico-train - INFO - ├── Learning Rate: 2.00e-06 2025-08-29 00:41:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:42:21 - pico-train - INFO - Step 150 -- 🔄 Training Metrics 2025-08-29 00:42:21 - pico-train - INFO - ├── Loss: 10.9776 2025-08-29 00:42:21 - pico-train - INFO - ├── Learning Rate: 3.00e-06 2025-08-29 00:42:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:42:47 - pico-train - INFO - Step 200 -- 🔄 Training Metrics 2025-08-29 00:42:47 - pico-train - INFO - ├── Loss: 10.9569 2025-08-29 00:42:47 - pico-train - INFO - ├── Learning Rate: 4.00e-06 2025-08-29 00:42:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:43:14 - pico-train - INFO - Step 250 -- 🔄 Training Metrics 2025-08-29 00:43:14 - pico-train - INFO - ├── Loss: 10.9255 2025-08-29 00:43:14 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-29 00:43:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:43:40 - pico-train - INFO - Step 300 -- 🔄 Training Metrics 2025-08-29 00:43:40 - pico-train - INFO - ├── Loss: 10.8883 2025-08-29 00:43:40 - pico-train - INFO - ├── Learning Rate: 6.00e-06 2025-08-29 00:43:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:44:06 - pico-train - INFO - Step 350 -- 🔄 Training Metrics 2025-08-29 00:44:06 - pico-train - INFO - ├── Loss: 10.8249 2025-08-29 00:44:06 - pico-train - INFO - ├── Learning Rate: 7.00e-06 2025-08-29 00:44:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:44:32 - pico-train - INFO - Step 400 -- 🔄 Training Metrics 2025-08-29 00:44:32 - pico-train - INFO - ├── Loss: 10.7344 2025-08-29 00:44:32 - pico-train - INFO - ├── Learning Rate: 8.00e-06 2025-08-29 00:44:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:44:58 - pico-train - INFO - Step 450 -- 🔄 Training Metrics 2025-08-29 00:44:58 - pico-train - INFO - ├── Loss: 10.6177 2025-08-29 00:44:58 - pico-train - INFO - ├── Learning Rate: 9.00e-06 2025-08-29 00:44:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:45:24 - pico-train - INFO - Step 500 -- 🔄 Training Metrics 2025-08-29 00:45:24 - pico-train - INFO - ├── Loss: 10.5025 2025-08-29 00:45:24 - pico-train - INFO - ├── Learning Rate: 1.00e-05 2025-08-29 00:45:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:45:50 - pico-train - INFO - Step 550 -- 🔄 Training Metrics 2025-08-29 00:45:50 - pico-train - INFO - ├── Loss: 10.3986 2025-08-29 00:45:50 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-29 00:45:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:46:16 - pico-train - INFO - Step 600 -- 🔄 Training Metrics 2025-08-29 00:46:16 - pico-train - INFO - ├── Loss: 10.3079 2025-08-29 00:46:16 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-29 00:46:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:46:42 - pico-train - INFO - Step 650 -- 🔄 Training Metrics 2025-08-29 00:46:42 - pico-train - INFO - ├── Loss: 10.2142 2025-08-29 00:46:42 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-29 00:46:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:47:08 - pico-train - INFO - Step 700 -- 🔄 Training Metrics 2025-08-29 00:47:08 - pico-train - INFO - ├── Loss: 10.1146 2025-08-29 00:47:08 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-29 00:47:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:47:34 - pico-train - INFO - Step 750 -- 🔄 Training Metrics 2025-08-29 00:47:34 - pico-train - INFO - ├── Loss: 10.0398 2025-08-29 00:47:34 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-29 00:47:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:48:00 - pico-train - INFO - Step 800 -- 🔄 Training Metrics 2025-08-29 00:48:00 - pico-train - INFO - ├── Loss: 9.9311 2025-08-29 00:48:00 - pico-train - INFO - ├── Learning Rate: 1.60e-05 2025-08-29 00:48:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:48:26 - pico-train - INFO - Step 850 -- 🔄 Training Metrics 2025-08-29 00:48:26 - pico-train - INFO - ├── Loss: 9.8431 2025-08-29 00:48:26 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-29 00:48:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:48:52 - pico-train - INFO - Step 900 -- 🔄 Training Metrics 2025-08-29 00:48:52 - pico-train - INFO - ├── Loss: 9.7453 2025-08-29 00:48:52 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-29 00:48:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:49:18 - pico-train - INFO - Step 950 -- 🔄 Training Metrics 2025-08-29 00:49:18 - pico-train - INFO - ├── Loss: 9.6527 2025-08-29 00:49:18 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-29 00:49:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:49:43 - pico-train - INFO - Step 1000 -- 💾 Saving Checkpoint 2025-08-29 00:52:44 - pico-train - INFO - Step 1000 -- 📊 Evaluation Results 2025-08-29 00:52:44 - pico-train - INFO - └── paloma: 5.073320568651489e+18 2025-08-29 00:52:45 - pico-train - INFO - Step 1000 -- 🔄 Training Metrics 2025-08-29 00:52:45 - pico-train - INFO - ├── Loss: 9.5691 2025-08-29 00:52:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-29 00:52:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:52:45 - pico-train - INFO - Step 1000 -- 📈 Saving Learning Dynamics 2025-08-29 00:53:15 - pico-train - INFO - Step 1050 -- 🔄 Training Metrics 2025-08-29 00:53:15 - pico-train - INFO - ├── Loss: 9.4600 2025-08-29 00:53:15 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-29 00:53:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:53:41 - pico-train - INFO - Step 1100 -- 🔄 Training Metrics 2025-08-29 00:53:41 - pico-train - INFO - ├── Loss: 9.3525 2025-08-29 00:53:41 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-29 00:53:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:54:07 - pico-train - INFO - Step 1150 -- 🔄 Training Metrics 2025-08-29 00:54:07 - pico-train - INFO - ├── Loss: 9.2715 2025-08-29 00:54:07 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-29 00:54:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:54:33 - pico-train - INFO - Step 1200 -- 🔄 Training Metrics 2025-08-29 00:54:33 - pico-train - INFO - ├── Loss: 9.1618 2025-08-29 00:54:33 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-29 00:54:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:54:59 - pico-train - INFO - Step 1250 -- 🔄 Training Metrics 2025-08-29 00:54:59 - pico-train - INFO - ├── Loss: 9.0547 2025-08-29 00:54:59 - pico-train - INFO - ├── Learning Rate: 2.50e-05 2025-08-29 00:54:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:55:25 - pico-train - INFO - Step 1300 -- 🔄 Training Metrics 2025-08-29 00:55:25 - pico-train - INFO - ├── Loss: 8.9550 2025-08-29 00:55:25 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-29 00:55:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:55:51 - pico-train - INFO - Step 1350 -- 🔄 Training Metrics 2025-08-29 00:55:51 - pico-train - INFO - ├── Loss: 8.8251 2025-08-29 00:55:51 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-29 00:55:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:56:17 - pico-train - INFO - Step 1400 -- 🔄 Training Metrics 2025-08-29 00:56:17 - pico-train - INFO - ├── Loss: 8.7711 2025-08-29 00:56:17 - pico-train - INFO - ├── Learning Rate: 2.80e-05 2025-08-29 00:56:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:56:43 - pico-train - INFO - Step 1450 -- 🔄 Training Metrics 2025-08-29 00:56:43 - pico-train - INFO - ├── Loss: 8.6834 2025-08-29 00:56:43 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-29 00:56:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:57:09 - pico-train - INFO - Step 1500 -- 🔄 Training Metrics 2025-08-29 00:57:09 - pico-train - INFO - ├── Loss: 8.5638 2025-08-29 00:57:09 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-29 00:57:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:57:35 - pico-train - INFO - Step 1550 -- 🔄 Training Metrics 2025-08-29 00:57:35 - pico-train - INFO - ├── Loss: 8.4572 2025-08-29 00:57:35 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-29 00:57:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:58:01 - pico-train - INFO - Step 1600 -- 🔄 Training Metrics 2025-08-29 00:58:01 - pico-train - INFO - ├── Loss: 8.3940 2025-08-29 00:58:01 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-29 00:58:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:58:27 - pico-train - INFO - Step 1650 -- 🔄 Training Metrics 2025-08-29 00:58:27 - pico-train - INFO - ├── Loss: 8.2973 2025-08-29 00:58:27 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-29 00:58:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:58:53 - pico-train - INFO - Step 1700 -- 🔄 Training Metrics 2025-08-29 00:58:53 - pico-train - INFO - ├── Loss: 8.2264 2025-08-29 00:58:53 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-29 00:58:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:59:19 - pico-train - INFO - Step 1750 -- 🔄 Training Metrics 2025-08-29 00:59:19 - pico-train - INFO - ├── Loss: 8.1672 2025-08-29 00:59:19 - pico-train - INFO - ├── Learning Rate: 3.50e-05 2025-08-29 00:59:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 00:59:45 - pico-train - INFO - Step 1800 -- 🔄 Training Metrics 2025-08-29 00:59:45 - pico-train - INFO - ├── Loss: 8.0695 2025-08-29 00:59:45 - pico-train - INFO - ├── Learning Rate: 3.60e-05 2025-08-29 00:59:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:00:11 - pico-train - INFO - Step 1850 -- 🔄 Training Metrics 2025-08-29 01:00:11 - pico-train - INFO - ├── Loss: 8.0299 2025-08-29 01:00:11 - pico-train - INFO - ├── Learning Rate: 3.70e-05 2025-08-29 01:00:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:00:37 - pico-train - INFO - Step 1900 -- 🔄 Training Metrics 2025-08-29 01:00:37 - pico-train - INFO - ├── Loss: 7.9883 2025-08-29 01:00:37 - pico-train - INFO - ├── Learning Rate: 3.80e-05 2025-08-29 01:00:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:01:03 - pico-train - INFO - Step 1950 -- 🔄 Training Metrics 2025-08-29 01:01:03 - pico-train - INFO - ├── Loss: 7.9429 2025-08-29 01:01:03 - pico-train - INFO - ├── Learning Rate: 3.90e-05 2025-08-29 01:01:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:01:28 - pico-train - INFO - Step 2000 -- 💾 Saving Checkpoint 2025-08-29 01:03:57 - pico-train - INFO - Step 2000 -- 📊 Evaluation Results 2025-08-29 01:03:57 - pico-train - INFO - └── paloma: 1.8978577072995303e+19 2025-08-29 01:04:01 - pico-train - INFO - Step 2000 -- 🔄 Training Metrics 2025-08-29 01:04:01 - pico-train - INFO - ├── Loss: 7.8447 2025-08-29 01:04:01 - pico-train - INFO - ├── Learning Rate: 4.00e-05 2025-08-29 01:04:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:04:01 - pico-train - INFO - Step 2000 -- 📈 Saving Learning Dynamics 2025-08-29 01:04:31 - pico-train - INFO - Step 2050 -- 🔄 Training Metrics 2025-08-29 01:04:31 - pico-train - INFO - ├── Loss: 7.8380 2025-08-29 01:04:31 - pico-train - INFO - ├── Learning Rate: 4.10e-05 2025-08-29 01:04:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:04:57 - pico-train - INFO - Step 2100 -- 🔄 Training Metrics 2025-08-29 01:04:57 - pico-train - INFO - ├── Loss: 7.7671 2025-08-29 01:04:57 - pico-train - INFO - ├── Learning Rate: 4.20e-05 2025-08-29 01:04:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:05:23 - pico-train - INFO - Step 2150 -- 🔄 Training Metrics 2025-08-29 01:05:23 - pico-train - INFO - ├── Loss: 7.7637 2025-08-29 01:05:23 - pico-train - INFO - ├── Learning Rate: 4.30e-05 2025-08-29 01:05:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:05:49 - pico-train - INFO - Step 2200 -- 🔄 Training Metrics 2025-08-29 01:05:49 - pico-train - INFO - ├── Loss: 7.7060 2025-08-29 01:05:49 - pico-train - INFO - ├── Learning Rate: 4.40e-05 2025-08-29 01:05:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:06:15 - pico-train - INFO - Step 2250 -- 🔄 Training Metrics 2025-08-29 01:06:15 - pico-train - INFO - ├── Loss: 7.7607 2025-08-29 01:06:15 - pico-train - INFO - ├── Learning Rate: 4.50e-05 2025-08-29 01:06:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:06:41 - pico-train - INFO - Step 2300 -- 🔄 Training Metrics 2025-08-29 01:06:41 - pico-train - INFO - ├── Loss: 7.7076 2025-08-29 01:06:41 - pico-train - INFO - ├── Learning Rate: 4.60e-05 2025-08-29 01:06:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:07:07 - pico-train - INFO - Step 2350 -- 🔄 Training Metrics 2025-08-29 01:07:07 - pico-train - INFO - ├── Loss: 7.6787 2025-08-29 01:07:07 - pico-train - INFO - ├── Learning Rate: 4.70e-05 2025-08-29 01:07:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:07:33 - pico-train - INFO - Step 2400 -- 🔄 Training Metrics 2025-08-29 01:07:33 - pico-train - INFO - ├── Loss: 7.6446 2025-08-29 01:07:33 - pico-train - INFO - ├── Learning Rate: 4.80e-05 2025-08-29 01:07:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:07:59 - pico-train - INFO - Step 2450 -- 🔄 Training Metrics 2025-08-29 01:07:59 - pico-train - INFO - ├── Loss: 7.5999 2025-08-29 01:07:59 - pico-train - INFO - ├── Learning Rate: 4.90e-05 2025-08-29 01:07:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:08:25 - pico-train - INFO - Step 2500 -- 🔄 Training Metrics 2025-08-29 01:08:25 - pico-train - INFO - ├── Loss: 7.6154 2025-08-29 01:08:25 - pico-train - INFO - ├── Learning Rate: 5.00e-05 2025-08-29 01:08:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:08:50 - pico-train - INFO - Step 2550 -- 🔄 Training Metrics 2025-08-29 01:08:50 - pico-train - INFO - ├── Loss: 7.5627 2025-08-29 01:08:50 - pico-train - INFO - ├── Learning Rate: 5.10e-05 2025-08-29 01:08:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:09:17 - pico-train - INFO - Step 2600 -- 🔄 Training Metrics 2025-08-29 01:09:17 - pico-train - INFO - ├── Loss: 7.5747 2025-08-29 01:09:17 - pico-train - INFO - ├── Learning Rate: 5.20e-05 2025-08-29 01:09:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:09:43 - pico-train - INFO - Step 2650 -- 🔄 Training Metrics 2025-08-29 01:09:43 - pico-train - INFO - ├── Loss: 7.5358 2025-08-29 01:09:43 - pico-train - INFO - ├── Learning Rate: 5.30e-05 2025-08-29 01:09:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:10:09 - pico-train - INFO - Step 2700 -- 🔄 Training Metrics 2025-08-29 01:10:09 - pico-train - INFO - ├── Loss: 7.5148 2025-08-29 01:10:09 - pico-train - INFO - ├── Learning Rate: 5.40e-05 2025-08-29 01:10:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:10:35 - pico-train - INFO - Step 2750 -- 🔄 Training Metrics 2025-08-29 01:10:35 - pico-train - INFO - ├── Loss: 7.4874 2025-08-29 01:10:35 - pico-train - INFO - ├── Learning Rate: 5.50e-05 2025-08-29 01:10:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:11:01 - pico-train - INFO - Step 2800 -- 🔄 Training Metrics 2025-08-29 01:11:01 - pico-train - INFO - ├── Loss: 7.4438 2025-08-29 01:11:01 - pico-train - INFO - ├── Learning Rate: 5.60e-05 2025-08-29 01:11:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:11:27 - pico-train - INFO - Step 2850 -- 🔄 Training Metrics 2025-08-29 01:11:27 - pico-train - INFO - ├── Loss: 7.4772 2025-08-29 01:11:27 - pico-train - INFO - ├── Learning Rate: 5.70e-05 2025-08-29 01:11:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:11:53 - pico-train - INFO - Step 2900 -- 🔄 Training Metrics 2025-08-29 01:11:53 - pico-train - INFO - ├── Loss: 7.4135 2025-08-29 01:11:53 - pico-train - INFO - ├── Learning Rate: 5.80e-05 2025-08-29 01:11:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:12:19 - pico-train - INFO - Step 2950 -- 🔄 Training Metrics 2025-08-29 01:12:19 - pico-train - INFO - ├── Loss: 7.3929 2025-08-29 01:12:19 - pico-train - INFO - ├── Learning Rate: 5.90e-05 2025-08-29 01:12:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:12:44 - pico-train - INFO - Step 3000 -- 💾 Saving Checkpoint 2025-08-29 01:14:43 - pico-train - INFO - Step 3000 -- 📊 Evaluation Results 2025-08-29 01:14:43 - pico-train - INFO - └── paloma: 3.1701596694317715e+19 2025-08-29 01:14:46 - pico-train - INFO - Step 3000 -- 🔄 Training Metrics 2025-08-29 01:14:46 - pico-train - INFO - ├── Loss: 7.3566 2025-08-29 01:14:46 - pico-train - INFO - ├── Learning Rate: 6.00e-05 2025-08-29 01:14:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:14:46 - pico-train - INFO - Step 3000 -- 📈 Saving Learning Dynamics 2025-08-29 01:15:16 - pico-train - INFO - Step 3050 -- 🔄 Training Metrics 2025-08-29 01:15:16 - pico-train - INFO - ├── Loss: 7.3318 2025-08-29 01:15:16 - pico-train - INFO - ├── Learning Rate: 6.10e-05 2025-08-29 01:15:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:15:42 - pico-train - INFO - Step 3100 -- 🔄 Training Metrics 2025-08-29 01:15:42 - pico-train - INFO - ├── Loss: 7.3114 2025-08-29 01:15:42 - pico-train - INFO - ├── Learning Rate: 6.20e-05 2025-08-29 01:15:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:16:08 - pico-train - INFO - Step 3150 -- 🔄 Training Metrics 2025-08-29 01:16:08 - pico-train - INFO - ├── Loss: 7.2734 2025-08-29 01:16:08 - pico-train - INFO - ├── Learning Rate: 6.30e-05 2025-08-29 01:16:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:16:34 - pico-train - INFO - Step 3200 -- 🔄 Training Metrics 2025-08-29 01:16:34 - pico-train - INFO - ├── Loss: 7.3220 2025-08-29 01:16:34 - pico-train - INFO - ├── Learning Rate: 6.40e-05 2025-08-29 01:16:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:16:59 - pico-train - INFO - Step 3250 -- 🔄 Training Metrics 2025-08-29 01:16:59 - pico-train - INFO - ├── Loss: 7.2621 2025-08-29 01:16:59 - pico-train - INFO - ├── Learning Rate: 6.50e-05 2025-08-29 01:16:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:17:25 - pico-train - INFO - Step 3300 -- 🔄 Training Metrics 2025-08-29 01:17:25 - pico-train - INFO - ├── Loss: 7.2257 2025-08-29 01:17:25 - pico-train - INFO - ├── Learning Rate: 6.60e-05 2025-08-29 01:17:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:17:52 - pico-train - INFO - Step 3350 -- 🔄 Training Metrics 2025-08-29 01:17:52 - pico-train - INFO - ├── Loss: 7.2447 2025-08-29 01:17:52 - pico-train - INFO - ├── Learning Rate: 6.70e-05 2025-08-29 01:17:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:18:18 - pico-train - INFO - Step 3400 -- 🔄 Training Metrics 2025-08-29 01:18:18 - pico-train - INFO - ├── Loss: 7.2344 2025-08-29 01:18:18 - pico-train - INFO - ├── Learning Rate: 6.80e-05 2025-08-29 01:18:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:18:43 - pico-train - INFO - Step 3450 -- 🔄 Training Metrics 2025-08-29 01:18:43 - pico-train - INFO - ├── Loss: 7.1488 2025-08-29 01:18:43 - pico-train - INFO - ├── Learning Rate: 6.90e-05 2025-08-29 01:18:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:19:09 - pico-train - INFO - Step 3500 -- 🔄 Training Metrics 2025-08-29 01:19:09 - pico-train - INFO - ├── Loss: 7.1797 2025-08-29 01:19:09 - pico-train - INFO - ├── Learning Rate: 7.00e-05 2025-08-29 01:19:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:19:35 - pico-train - INFO - Step 3550 -- 🔄 Training Metrics 2025-08-29 01:19:35 - pico-train - INFO - ├── Loss: 7.1737 2025-08-29 01:19:35 - pico-train - INFO - ├── Learning Rate: 7.10e-05 2025-08-29 01:19:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:20:01 - pico-train - INFO - Step 3600 -- 🔄 Training Metrics 2025-08-29 01:20:01 - pico-train - INFO - ├── Loss: 7.1204 2025-08-29 01:20:01 - pico-train - INFO - ├── Learning Rate: 7.20e-05 2025-08-29 01:20:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:20:27 - pico-train - INFO - Step 3650 -- 🔄 Training Metrics 2025-08-29 01:20:27 - pico-train - INFO - ├── Loss: 7.1102 2025-08-29 01:20:27 - pico-train - INFO - ├── Learning Rate: 7.30e-05 2025-08-29 01:20:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:20:53 - pico-train - INFO - Step 3700 -- 🔄 Training Metrics 2025-08-29 01:20:53 - pico-train - INFO - ├── Loss: 7.0845 2025-08-29 01:20:53 - pico-train - INFO - ├── Learning Rate: 7.40e-05 2025-08-29 01:20:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:21:19 - pico-train - INFO - Step 3750 -- 🔄 Training Metrics 2025-08-29 01:21:19 - pico-train - INFO - ├── Loss: 7.0858 2025-08-29 01:21:19 - pico-train - INFO - ├── Learning Rate: 7.50e-05 2025-08-29 01:21:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:21:45 - pico-train - INFO - Step 3800 -- 🔄 Training Metrics 2025-08-29 01:21:45 - pico-train - INFO - ├── Loss: 7.0362 2025-08-29 01:21:45 - pico-train - INFO - ├── Learning Rate: 7.60e-05 2025-08-29 01:21:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:22:11 - pico-train - INFO - Step 3850 -- 🔄 Training Metrics 2025-08-29 01:22:11 - pico-train - INFO - ├── Loss: 7.0603 2025-08-29 01:22:11 - pico-train - INFO - ├── Learning Rate: 7.70e-05 2025-08-29 01:22:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:22:37 - pico-train - INFO - Step 3900 -- 🔄 Training Metrics 2025-08-29 01:22:37 - pico-train - INFO - ├── Loss: 7.0172 2025-08-29 01:22:37 - pico-train - INFO - ├── Learning Rate: 7.80e-05 2025-08-29 01:22:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:23:03 - pico-train - INFO - Step 3950 -- 🔄 Training Metrics 2025-08-29 01:23:03 - pico-train - INFO - ├── Loss: 6.9948 2025-08-29 01:23:03 - pico-train - INFO - ├── Learning Rate: 7.90e-05 2025-08-29 01:23:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:23:29 - pico-train - INFO - Step 4000 -- 💾 Saving Checkpoint 2025-08-29 01:25:52 - pico-train - INFO - Step 4000 -- 📊 Evaluation Results 2025-08-29 01:25:52 - pico-train - INFO - └── paloma: 2.5015965971757485e+20 2025-08-29 01:25:54 - pico-train - INFO - Step 4000 -- 🔄 Training Metrics 2025-08-29 01:25:54 - pico-train - INFO - ├── Loss: 6.9909 2025-08-29 01:25:54 - pico-train - INFO - ├── Learning Rate: 8.00e-05 2025-08-29 01:25:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:25:54 - pico-train - INFO - Step 4000 -- 📈 Saving Learning Dynamics 2025-08-29 01:26:24 - pico-train - INFO - Step 4050 -- 🔄 Training Metrics 2025-08-29 01:26:24 - pico-train - INFO - ├── Loss: 6.9477 2025-08-29 01:26:24 - pico-train - INFO - ├── Learning Rate: 8.10e-05 2025-08-29 01:26:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:26:51 - pico-train - INFO - Step 4100 -- 🔄 Training Metrics 2025-08-29 01:26:51 - pico-train - INFO - ├── Loss: 6.9651 2025-08-29 01:26:51 - pico-train - INFO - ├── Learning Rate: 8.20e-05 2025-08-29 01:26:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:27:17 - pico-train - INFO - Step 4150 -- 🔄 Training Metrics 2025-08-29 01:27:17 - pico-train - INFO - ├── Loss: 6.9149 2025-08-29 01:27:17 - pico-train - INFO - ├── Learning Rate: 8.30e-05 2025-08-29 01:27:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:27:43 - pico-train - INFO - Step 4200 -- 🔄 Training Metrics 2025-08-29 01:27:43 - pico-train - INFO - ├── Loss: 6.8930 2025-08-29 01:27:43 - pico-train - INFO - ├── Learning Rate: 8.40e-05 2025-08-29 01:27:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:28:08 - pico-train - INFO - Step 4250 -- 🔄 Training Metrics 2025-08-29 01:28:08 - pico-train - INFO - ├── Loss: 6.9227 2025-08-29 01:28:08 - pico-train - INFO - ├── Learning Rate: 8.50e-05 2025-08-29 01:28:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:28:34 - pico-train - INFO - Step 4300 -- 🔄 Training Metrics 2025-08-29 01:28:34 - pico-train - INFO - ├── Loss: 6.8790 2025-08-29 01:28:34 - pico-train - INFO - ├── Learning Rate: 8.60e-05 2025-08-29 01:28:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:29:01 - pico-train - INFO - Step 4350 -- 🔄 Training Metrics 2025-08-29 01:29:01 - pico-train - INFO - ├── Loss: 6.8649 2025-08-29 01:29:01 - pico-train - INFO - ├── Learning Rate: 8.70e-05 2025-08-29 01:29:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:29:26 - pico-train - INFO - Step 4400 -- 🔄 Training Metrics 2025-08-29 01:29:26 - pico-train - INFO - ├── Loss: 6.8305 2025-08-29 01:29:26 - pico-train - INFO - ├── Learning Rate: 8.80e-05 2025-08-29 01:29:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:29:52 - pico-train - INFO - Step 4450 -- 🔄 Training Metrics 2025-08-29 01:29:52 - pico-train - INFO - ├── Loss: 6.8085 2025-08-29 01:29:52 - pico-train - INFO - ├── Learning Rate: 8.90e-05 2025-08-29 01:29:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:30:18 - pico-train - INFO - Step 4500 -- 🔄 Training Metrics 2025-08-29 01:30:18 - pico-train - INFO - ├── Loss: 6.8315 2025-08-29 01:30:18 - pico-train - INFO - ├── Learning Rate: 9.00e-05 2025-08-29 01:30:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:30:44 - pico-train - INFO - Step 4550 -- 🔄 Training Metrics 2025-08-29 01:30:44 - pico-train - INFO - ├── Loss: 6.7885 2025-08-29 01:30:44 - pico-train - INFO - ├── Learning Rate: 9.10e-05 2025-08-29 01:30:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:31:11 - pico-train - INFO - Step 4600 -- 🔄 Training Metrics 2025-08-29 01:31:11 - pico-train - INFO - ├── Loss: 6.7805 2025-08-29 01:31:11 - pico-train - INFO - ├── Learning Rate: 9.20e-05 2025-08-29 01:31:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:31:36 - pico-train - INFO - Step 4650 -- 🔄 Training Metrics 2025-08-29 01:31:36 - pico-train - INFO - ├── Loss: 6.7737 2025-08-29 01:31:36 - pico-train - INFO - ├── Learning Rate: 9.30e-05 2025-08-29 01:31:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:32:02 - pico-train - INFO - Step 4700 -- 🔄 Training Metrics 2025-08-29 01:32:02 - pico-train - INFO - ├── Loss: 6.7649 2025-08-29 01:32:02 - pico-train - INFO - ├── Learning Rate: 9.40e-05 2025-08-29 01:32:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:32:28 - pico-train - INFO - Step 4750 -- 🔄 Training Metrics 2025-08-29 01:32:28 - pico-train - INFO - ├── Loss: 6.7562 2025-08-29 01:32:28 - pico-train - INFO - ├── Learning Rate: 9.50e-05 2025-08-29 01:32:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:32:54 - pico-train - INFO - Step 4800 -- 🔄 Training Metrics 2025-08-29 01:32:54 - pico-train - INFO - ├── Loss: 6.7347 2025-08-29 01:32:54 - pico-train - INFO - ├── Learning Rate: 9.60e-05 2025-08-29 01:32:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:33:20 - pico-train - INFO - Step 4850 -- 🔄 Training Metrics 2025-08-29 01:33:20 - pico-train - INFO - ├── Loss: 6.7161 2025-08-29 01:33:20 - pico-train - INFO - ├── Learning Rate: 9.70e-05 2025-08-29 01:33:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:33:46 - pico-train - INFO - Step 4900 -- 🔄 Training Metrics 2025-08-29 01:33:46 - pico-train - INFO - ├── Loss: 6.6889 2025-08-29 01:33:46 - pico-train - INFO - ├── Learning Rate: 9.80e-05 2025-08-29 01:33:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:34:12 - pico-train - INFO - Step 4950 -- 🔄 Training Metrics 2025-08-29 01:34:12 - pico-train - INFO - ├── Loss: 6.7299 2025-08-29 01:34:12 - pico-train - INFO - ├── Learning Rate: 9.90e-05 2025-08-29 01:34:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:34:37 - pico-train - INFO - Step 5000 -- 💾 Saving Checkpoint 2025-08-29 01:36:35 - pico-train - INFO - Step 5000 -- 📊 Evaluation Results 2025-08-29 01:36:35 - pico-train - INFO - └── paloma: 2.38712860824014e+21 2025-08-29 01:36:37 - pico-train - INFO - Step 5000 -- 🔄 Training Metrics 2025-08-29 01:36:37 - pico-train - INFO - ├── Loss: 6.6605 2025-08-29 01:36:37 - pico-train - INFO - ├── Learning Rate: 1.00e-04 2025-08-29 01:36:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:36:37 - pico-train - INFO - Step 5000 -- 📈 Saving Learning Dynamics 2025-08-29 01:37:06 - pico-train - INFO - Step 5050 -- 🔄 Training Metrics 2025-08-29 01:37:06 - pico-train - INFO - ├── Loss: 6.6552 2025-08-29 01:37:06 - pico-train - INFO - ├── Learning Rate: 1.00e-04 2025-08-29 01:37:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:37:33 - pico-train - INFO - Step 5100 -- 🔄 Training Metrics 2025-08-29 01:37:33 - pico-train - INFO - ├── Loss: 6.7038 2025-08-29 01:37:33 - pico-train - INFO - ├── Learning Rate: 9.99e-05 2025-08-29 01:37:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:37:59 - pico-train - INFO - Step 5150 -- 🔄 Training Metrics 2025-08-29 01:37:59 - pico-train - INFO - ├── Loss: 6.6452 2025-08-29 01:37:59 - pico-train - INFO - ├── Learning Rate: 9.99e-05 2025-08-29 01:37:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:38:25 - pico-train - INFO - Step 5200 -- 🔄 Training Metrics 2025-08-29 01:38:25 - pico-train - INFO - ├── Loss: 6.6522 2025-08-29 01:38:25 - pico-train - INFO - ├── Learning Rate: 9.99e-05 2025-08-29 01:38:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:38:51 - pico-train - INFO - Step 5250 -- 🔄 Training Metrics 2025-08-29 01:38:51 - pico-train - INFO - ├── Loss: 6.6270 2025-08-29 01:38:51 - pico-train - INFO - ├── Learning Rate: 9.99e-05 2025-08-29 01:38:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:39:17 - pico-train - INFO - Step 5300 -- 🔄 Training Metrics 2025-08-29 01:39:17 - pico-train - INFO - ├── Loss: 6.5733 2025-08-29 01:39:17 - pico-train - INFO - ├── Learning Rate: 9.98e-05 2025-08-29 01:39:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:39:43 - pico-train - INFO - Step 5350 -- 🔄 Training Metrics 2025-08-29 01:39:43 - pico-train - INFO - ├── Loss: 6.5833 2025-08-29 01:39:43 - pico-train - INFO - ├── Learning Rate: 9.98e-05 2025-08-29 01:39:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:40:09 - pico-train - INFO - Step 5400 -- 🔄 Training Metrics 2025-08-29 01:40:09 - pico-train - INFO - ├── Loss: 6.5854 2025-08-29 01:40:09 - pico-train - INFO - ├── Learning Rate: 9.98e-05 2025-08-29 01:40:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:40:35 - pico-train - INFO - Step 5450 -- 🔄 Training Metrics 2025-08-29 01:40:35 - pico-train - INFO - ├── Loss: 6.6012 2025-08-29 01:40:35 - pico-train - INFO - ├── Learning Rate: 9.98e-05 2025-08-29 01:40:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 01:41:01 - pico-train - INFO - Step 5500 -- 🔄 Training Metrics 2025-08-29 01:41:01 - pico-train - INFO - ├── Loss: 6.5786 2025-08-29 01:41:01 - pico-train - INFO - ├── Learning Rate: 9.97e-05 2025-08-29 01:41:01 - pico-train - INFO - └── Inf/NaN count: 0