2025-08-29 22:50:26 - pico-train - INFO - Step 20000 -- 📊 Evaluation Results 2025-08-29 22:50:26 - pico-train - INFO - └── paloma: 1.8399778163273925e+24 2025-08-29 22:50:26 - pico-train - INFO - ================================================== 2025-08-29 22:50:26 - pico-train - INFO - ✨ Training Configuration 2025-08-29 22:50:26 - pico-train - INFO - ================================================== 2025-08-29 22:50:26 - pico-train - INFO - ╭─────────────────────────────────────────────────────╮ 2025-08-29 22:50:26 - pico-train - INFO - │ checkpointing: │ 2025-08-29 22:50:26 - pico-train - INFO - │ checkpoints_dir: checkpoints │ 2025-08-29 22:50:26 - pico-train - INFO - │ evaluation: │ 2025-08-29 22:50:26 - pico-train - INFO - │ eval_results_dir: eval_results │ 2025-08-29 22:50:26 - pico-train - INFO - │ fabric_checkpoint_dir: fabric_state │ 2025-08-29 22:50:26 - pico-train - INFO - │ fabric_checkpoint_filename: checkpoint.pt │ 2025-08-29 22:50:26 - pico-train - INFO - │ hf_checkpoint: │ 2025-08-29 22:50:26 - pico-train - INFO - │ collection_slug: null │ 2025-08-29 22:50:26 - pico-train - INFO - │ repo_id: ThomasTheMaker/pico-decoder-tiny │ 2025-08-29 22:50:26 - pico-train - INFO - │ learning_dynamics: │ 2025-08-29 22:50:26 - pico-train - INFO - │ batch_size: 1 │ 2025-08-29 22:50:26 - pico-train - INFO - │ eval_data: null │ 2025-08-29 22:50:26 - pico-train - INFO - │ layer_suffixes: │ 2025-08-29 22:50:26 - pico-train - INFO - │ - attention.v_proj │ 2025-08-29 22:50:26 - pico-train - INFO - │ - attention.o_proj │ 2025-08-29 22:50:26 - pico-train - INFO - │ - swiglu.w_2 │ 2025-08-29 22:50:26 - pico-train - INFO - │ sequence_idx: -1 │ 2025-08-29 22:50:26 - pico-train - INFO - │ learning_dynamics_dir: learning_dynamics │ 2025-08-29 22:50:26 - pico-train - INFO - │ logs_dir: logs │ 2025-08-29 22:50:26 - pico-train - INFO - │ run_name: pico-decoder-tiny-dolma5M-v1 │ 2025-08-29 22:50:26 - pico-train - INFO - │ runs_dir: runs │ 2025-08-29 22:50:26 - pico-train - INFO - │ save_every_n_steps: 500 │ 2025-08-29 22:50:26 - pico-train - INFO - │ save_to_hf: true │ 2025-08-29 22:50:26 - pico-train - INFO - │ training: │ 2025-08-29 22:50:26 - pico-train - INFO - │ auto_resume: true │ 2025-08-29 22:50:26 - pico-train - INFO - │ data: │ 2025-08-29 22:50:26 - pico-train - INFO - │ dataloader: │ 2025-08-29 22:50:26 - pico-train - INFO - │ batch_size: 4 │ 2025-08-29 22:50:26 - pico-train - INFO - │ dataset: │ 2025-08-29 22:50:26 - pico-train - INFO - │ name: ThomasTheMaker/pretokenized-dolma-5M │ 2025-08-29 22:50:26 - pico-train - INFO - │ tokenizer: │ 2025-08-29 22:50:26 - pico-train - INFO - │ name: allenai/OLMo-7B-0724-hf │ 2025-08-29 22:50:26 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-29 22:50:26 - pico-train - INFO - │ evaluation: │ 2025-08-29 22:50:26 - pico-train - INFO - │ metrics: │ 2025-08-29 22:50:26 - pico-train - INFO - │ - paloma │ 2025-08-29 22:50:26 - pico-train - INFO - │ paloma: │ 2025-08-29 22:50:26 - pico-train - INFO - │ batch_size: 1 │ 2025-08-29 22:50:26 - pico-train - INFO - │ dataset_name: pico-lm/pretokenized-paloma-tinsy │ 2025-08-29 22:50:26 - pico-train - INFO - │ dataset_split: val │ 2025-08-29 22:50:26 - pico-train - INFO - │ max_length: 2048 │ 2025-08-29 22:50:26 - pico-train - INFO - │ model: │ 2025-08-29 22:50:26 - pico-train - INFO - │ activation_hidden_dim: 384 │ 2025-08-29 22:50:26 - pico-train - INFO - │ attention_n_heads: 12 │ 2025-08-29 22:50:26 - pico-train - INFO - │ attention_n_kv_heads: 4 │ 2025-08-29 22:50:26 - pico-train - INFO - │ batch_size: 1024 │ 2025-08-29 22:50:26 - pico-train - INFO - │ d_model: 96 │ 2025-08-29 22:50:26 - pico-train - INFO - │ max_seq_len: 2048 │ 2025-08-29 22:50:26 - pico-train - INFO - │ model_type: pico_decoder │ 2025-08-29 22:50:26 - pico-train - INFO - │ n_layers: 12 │ 2025-08-29 22:50:26 - pico-train - INFO - │ norm_eps: 1.0e-06 │ 2025-08-29 22:50:26 - pico-train - INFO - │ position_emb_theta: 10000.0 │ 2025-08-29 22:50:26 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-29 22:50:26 - pico-train - INFO - │ monitoring: │ 2025-08-29 22:50:26 - pico-train - INFO - │ logging: │ 2025-08-29 22:50:26 - pico-train - INFO - │ log_every_n_steps: 25 │ 2025-08-29 22:50:26 - pico-train - INFO - │ log_level: INFO │ 2025-08-29 22:50:26 - pico-train - INFO - │ save_to_wandb: false │ 2025-08-29 22:50:26 - pico-train - INFO - │ wandb: │ 2025-08-29 22:50:26 - pico-train - INFO - │ entity: boymyc │ 2025-08-29 22:50:26 - pico-train - INFO - │ project: pico-decoder-tiny │ 2025-08-29 22:50:26 - pico-train - INFO - │ training: │ 2025-08-29 22:50:26 - pico-train - INFO - │ fabric: │ 2025-08-29 22:50:26 - pico-train - INFO - │ accelerator: cuda │ 2025-08-29 22:50:26 - pico-train - INFO - │ num_devices: 1 │ 2025-08-29 22:50:26 - pico-train - INFO - │ num_nodes: 1 │ 2025-08-29 22:50:26 - pico-train - INFO - │ precision: bf16-mixed │ 2025-08-29 22:50:26 - pico-train - INFO - │ max_steps: 20000 │ 2025-08-29 22:50:26 - pico-train - INFO - │ optimization: │ 2025-08-29 22:50:26 - pico-train - INFO - │ gradient_accumulation_steps: 4 │ 2025-08-29 22:50:26 - pico-train - INFO - │ lr: 5.0e-05 │ 2025-08-29 22:50:26 - pico-train - INFO - │ lr_scheduler: cosine │ 2025-08-29 22:50:26 - pico-train - INFO - │ lr_warmup_steps: 8000 │ 2025-08-29 22:50:26 - pico-train - INFO - │ optimizer: adamw │ 2025-08-29 22:50:26 - pico-train - INFO - │ │ 2025-08-29 22:50:26 - pico-train - INFO - ╰─────────────────────────────────────────────────────╯ 2025-08-29 22:50:26 - pico-train - INFO - ================================================== 2025-08-29 22:50:26 - pico-train - INFO - ⛭ Runtime Summary: 2025-08-29 22:50:26 - pico-train - INFO - ================================================== 2025-08-29 22:50:26 - pico-train - INFO - Starting from step: 20000 2025-08-29 22:50:26 - pico-train - INFO - Model Setup: 2025-08-29 22:50:26 - pico-train - INFO - └─ Total Parameters: 11,282,784 2025-08-29 22:50:26 - pico-train - INFO - └─ Trainable Parameters: 11,282,784 2025-08-29 22:50:26 - pico-train - INFO - Distributed Setup: 2025-08-29 22:50:26 - pico-train - INFO - └─ Number of Devices: 1 2025-08-29 22:50:26 - pico-train - INFO - └─ Device Type: NVIDIA GeForce RTX 5090 2025-08-29 22:50:26 - pico-train - INFO - └─ Available Memory: 33.68 GB 2025-08-29 22:50:26 - pico-train - INFO - Software Setup: 2025-08-29 22:50:26 - pico-train - INFO - └─ Python Version: 3.10.12 2025-08-29 22:50:26 - pico-train - INFO - └─ PyTorch Version: 2.8.0+cu128 2025-08-29 22:50:26 - pico-train - INFO - └─ CUDA Version: 12.8 2025-08-29 22:50:26 - pico-train - INFO - └─ Operating System: Linux 6.8.0-63-generic 2025-08-29 22:50:26 - pico-train - INFO - Batch Size Configuration: 2025-08-29 22:50:26 - pico-train - INFO - └─ Global Batch Size: 4 2025-08-29 22:50:26 - pico-train - INFO - └─ Per Device Batch Size: 1 2025-08-29 22:50:26 - pico-train - INFO - └─ Gradient Accumulation Steps: 4 2025-08-29 22:50:26 - pico-train - INFO - ================================================== 2025-08-29 22:50:27 - pico-train - INFO - Step 20000 -- 🔄 Training Metrics 2025-08-29 22:50:27 - pico-train - INFO - ├── Loss: 6.5103 2025-08-29 22:50:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-29 22:50:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:50:27 - pico-train - INFO - Step 20000 -- 📈 Saving Learning Dynamics 2025-08-29 22:50:43 - pico-train - INFO - Step 20025 -- 🔄 Training Metrics 2025-08-29 22:50:43 - pico-train - INFO - ├── Loss: 6.4274 2025-08-29 22:50:43 - pico-train - INFO - ├── Learning Rate: 3.45e-05 2025-08-29 22:50:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:50:55 - pico-train - INFO - Step 20050 -- 🔄 Training Metrics 2025-08-29 22:50:55 - pico-train - INFO - ├── Loss: 6.3770 2025-08-29 22:50:55 - pico-train - INFO - ├── Learning Rate: 3.45e-05 2025-08-29 22:50:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:51:08 - pico-train - INFO - Step 20075 -- 🔄 Training Metrics 2025-08-29 22:51:08 - pico-train - INFO - ├── Loss: 6.2797 2025-08-29 22:51:08 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-29 22:51:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:51:21 - pico-train - INFO - Step 20100 -- 🔄 Training Metrics 2025-08-29 22:51:21 - pico-train - INFO - ├── Loss: 6.3924 2025-08-29 22:51:21 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-29 22:51:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:51:34 - pico-train - INFO - Step 20125 -- 🔄 Training Metrics 2025-08-29 22:51:34 - pico-train - INFO - ├── Loss: 6.4442 2025-08-29 22:51:34 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-29 22:51:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:51:47 - pico-train - INFO - Step 20150 -- 🔄 Training Metrics 2025-08-29 22:51:47 - pico-train - INFO - ├── Loss: 6.3881 2025-08-29 22:51:47 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-29 22:51:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:52:00 - pico-train - INFO - Step 20175 -- 🔄 Training Metrics 2025-08-29 22:52:00 - pico-train - INFO - ├── Loss: 6.4008 2025-08-29 22:52:00 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-29 22:52:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:52:12 - pico-train - INFO - Step 20200 -- 🔄 Training Metrics 2025-08-29 22:52:12 - pico-train - INFO - ├── Loss: 6.4257 2025-08-29 22:52:12 - pico-train - INFO - ├── Learning Rate: 3.41e-05 2025-08-29 22:52:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:52:25 - pico-train - INFO - Step 20225 -- 🔄 Training Metrics 2025-08-29 22:52:25 - pico-train - INFO - ├── Loss: 6.4125 2025-08-29 22:52:25 - pico-train - INFO - ├── Learning Rate: 3.41e-05 2025-08-29 22:52:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:52:38 - pico-train - INFO - Step 20250 -- 🔄 Training Metrics 2025-08-29 22:52:38 - pico-train - INFO - ├── Loss: 6.3390 2025-08-29 22:52:38 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-29 22:52:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:52:50 - pico-train - INFO - Step 20275 -- 🔄 Training Metrics 2025-08-29 22:52:50 - pico-train - INFO - ├── Loss: 6.3328 2025-08-29 22:52:50 - pico-train - INFO - ├── Learning Rate: 3.39e-05 2025-08-29 22:52:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:53:03 - pico-train - INFO - Step 20300 -- 🔄 Training Metrics 2025-08-29 22:53:03 - pico-train - INFO - ├── Loss: 6.3035 2025-08-29 22:53:03 - pico-train - INFO - ├── Learning Rate: 3.39e-05 2025-08-29 22:53:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:53:16 - pico-train - INFO - Step 20325 -- 🔄 Training Metrics 2025-08-29 22:53:16 - pico-train - INFO - ├── Loss: 6.2862 2025-08-29 22:53:16 - pico-train - INFO - ├── Learning Rate: 3.38e-05 2025-08-29 22:53:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:53:28 - pico-train - INFO - Step 20350 -- 🔄 Training Metrics 2025-08-29 22:53:28 - pico-train - INFO - ├── Loss: 6.4249 2025-08-29 22:53:28 - pico-train - INFO - ├── Learning Rate: 3.38e-05 2025-08-29 22:53:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:53:41 - pico-train - INFO - Step 20375 -- 🔄 Training Metrics 2025-08-29 22:53:41 - pico-train - INFO - ├── Loss: 6.3582 2025-08-29 22:53:41 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-29 22:53:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:53:54 - pico-train - INFO - Step 20400 -- 🔄 Training Metrics 2025-08-29 22:53:54 - pico-train - INFO - ├── Loss: 6.3195 2025-08-29 22:53:54 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-29 22:53:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:54:07 - pico-train - INFO - Step 20425 -- 🔄 Training Metrics 2025-08-29 22:54:07 - pico-train - INFO - ├── Loss: 6.4802 2025-08-29 22:54:07 - pico-train - INFO - ├── Learning Rate: 3.36e-05 2025-08-29 22:54:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:54:22 - pico-train - INFO - Step 20450 -- 🔄 Training Metrics 2025-08-29 22:54:22 - pico-train - INFO - ├── Loss: 6.3126 2025-08-29 22:54:22 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-29 22:54:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:54:35 - pico-train - INFO - Step 20475 -- 🔄 Training Metrics 2025-08-29 22:54:35 - pico-train - INFO - ├── Loss: 6.4323 2025-08-29 22:54:35 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-29 22:54:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:54:50 - pico-train - INFO - Step 20500 -- 💾 Saving Checkpoint 2025-08-29 22:59:37 - pico-train - INFO - Step 20500 -- 📊 Evaluation Results 2025-08-29 22:59:37 - pico-train - INFO - └── paloma: 4.281028602870165e+24 2025-08-29 22:59:42 - pico-train - INFO - Step 20500 -- 🔄 Training Metrics 2025-08-29 22:59:42 - pico-train - INFO - ├── Loss: 6.4138 2025-08-29 22:59:42 - pico-train - INFO - ├── Learning Rate: 3.34e-05 2025-08-29 22:59:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 22:59:42 - pico-train - INFO - Step 20500 -- 📈 Saving Learning Dynamics 2025-08-29 23:00:21 - pico-train - INFO - Step 20525 -- 🔄 Training Metrics 2025-08-29 23:00:21 - pico-train - INFO - ├── Loss: 6.3971 2025-08-29 23:00:21 - pico-train - INFO - ├── Learning Rate: 3.34e-05 2025-08-29 23:00:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:00:53 - pico-train - INFO - Step 20550 -- 🔄 Training Metrics 2025-08-29 23:00:53 - pico-train - INFO - ├── Loss: 6.3632 2025-08-29 23:00:53 - pico-train - INFO - ├── Learning Rate: 3.33e-05 2025-08-29 23:00:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:01:27 - pico-train - INFO - Step 20575 -- 🔄 Training Metrics 2025-08-29 23:01:27 - pico-train - INFO - ├── Loss: 6.4202 2025-08-29 23:01:27 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-29 23:01:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:02:01 - pico-train - INFO - Step 20600 -- 🔄 Training Metrics 2025-08-29 23:02:01 - pico-train - INFO - ├── Loss: 6.4792 2025-08-29 23:02:01 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-29 23:02:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:02:34 - pico-train - INFO - Step 20625 -- 🔄 Training Metrics 2025-08-29 23:02:34 - pico-train - INFO - ├── Loss: 6.3213 2025-08-29 23:02:34 - pico-train - INFO - ├── Learning Rate: 3.31e-05 2025-08-29 23:02:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:03:09 - pico-train - INFO - Step 20650 -- 🔄 Training Metrics 2025-08-29 23:03:09 - pico-train - INFO - ├── Loss: 6.4173 2025-08-29 23:03:09 - pico-train - INFO - ├── Learning Rate: 3.31e-05 2025-08-29 23:03:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:03:43 - pico-train - INFO - Step 20675 -- 🔄 Training Metrics 2025-08-29 23:03:43 - pico-train - INFO - ├── Loss: 6.4062 2025-08-29 23:03:43 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-29 23:03:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:04:19 - pico-train - INFO - Step 20700 -- 🔄 Training Metrics 2025-08-29 23:04:19 - pico-train - INFO - ├── Loss: 6.3742 2025-08-29 23:04:19 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-29 23:04:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:04:56 - pico-train - INFO - Step 20725 -- 🔄 Training Metrics 2025-08-29 23:04:56 - pico-train - INFO - ├── Loss: 6.3820 2025-08-29 23:04:56 - pico-train - INFO - ├── Learning Rate: 3.29e-05 2025-08-29 23:04:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:05:17 - pico-train - INFO - Step 20750 -- 🔄 Training Metrics 2025-08-29 23:05:17 - pico-train - INFO - ├── Loss: 6.3374 2025-08-29 23:05:17 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-29 23:05:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:05:30 - pico-train - INFO - Step 20775 -- 🔄 Training Metrics 2025-08-29 23:05:30 - pico-train - INFO - ├── Loss: 6.4028 2025-08-29 23:05:30 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-29 23:05:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:05:43 - pico-train - INFO - Step 20800 -- 🔄 Training Metrics 2025-08-29 23:05:43 - pico-train - INFO - ├── Loss: 6.3732 2025-08-29 23:05:43 - pico-train - INFO - ├── Learning Rate: 3.27e-05 2025-08-29 23:05:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:05:55 - pico-train - INFO - Step 20825 -- 🔄 Training Metrics 2025-08-29 23:05:55 - pico-train - INFO - ├── Loss: 6.3486 2025-08-29 23:05:55 - pico-train - INFO - ├── Learning Rate: 3.27e-05 2025-08-29 23:05:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:06:08 - pico-train - INFO - Step 20850 -- 🔄 Training Metrics 2025-08-29 23:06:08 - pico-train - INFO - ├── Loss: 6.3611 2025-08-29 23:06:08 - pico-train - INFO - ├── Learning Rate: 3.26e-05 2025-08-29 23:06:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:06:21 - pico-train - INFO - Step 20875 -- 🔄 Training Metrics 2025-08-29 23:06:21 - pico-train - INFO - ├── Loss: 6.3278 2025-08-29 23:06:21 - pico-train - INFO - ├── Learning Rate: 3.26e-05 2025-08-29 23:06:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:06:33 - pico-train - INFO - Step 20900 -- 🔄 Training Metrics 2025-08-29 23:06:33 - pico-train - INFO - ├── Loss: 6.3287 2025-08-29 23:06:33 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-29 23:06:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:06:46 - pico-train - INFO - Step 20925 -- 🔄 Training Metrics 2025-08-29 23:06:46 - pico-train - INFO - ├── Loss: 6.3276 2025-08-29 23:06:46 - pico-train - INFO - ├── Learning Rate: 3.24e-05 2025-08-29 23:06:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:06:58 - pico-train - INFO - Step 20950 -- 🔄 Training Metrics 2025-08-29 23:06:58 - pico-train - INFO - ├── Loss: 6.4450 2025-08-29 23:06:58 - pico-train - INFO - ├── Learning Rate: 3.24e-05 2025-08-29 23:06:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:07:11 - pico-train - INFO - Step 20975 -- 🔄 Training Metrics 2025-08-29 23:07:11 - pico-train - INFO - ├── Loss: 6.4429 2025-08-29 23:07:11 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-29 23:07:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:07:23 - pico-train - INFO - Step 21000 -- 💾 Saving Checkpoint 2025-08-29 23:09:25 - pico-train - INFO - Step 21000 -- 📊 Evaluation Results 2025-08-29 23:09:25 - pico-train - INFO - └── paloma: 3.816115022517074e+24 2025-08-29 23:09:28 - pico-train - INFO - Step 21000 -- 🔄 Training Metrics 2025-08-29 23:09:28 - pico-train - INFO - ├── Loss: 6.2970 2025-08-29 23:09:28 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-29 23:09:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:09:28 - pico-train - INFO - Step 21000 -- 📈 Saving Learning Dynamics 2025-08-29 23:09:43 - pico-train - INFO - Step 21025 -- 🔄 Training Metrics 2025-08-29 23:09:43 - pico-train - INFO - ├── Loss: 6.3206 2025-08-29 23:09:43 - pico-train - INFO - ├── Learning Rate: 3.22e-05 2025-08-29 23:09:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:09:56 - pico-train - INFO - Step 21050 -- 🔄 Training Metrics 2025-08-29 23:09:56 - pico-train - INFO - ├── Loss: 6.3337 2025-08-29 23:09:56 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-29 23:09:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:10:08 - pico-train - INFO - Step 21075 -- 🔄 Training Metrics 2025-08-29 23:10:08 - pico-train - INFO - ├── Loss: 6.3274 2025-08-29 23:10:08 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-29 23:10:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:10:21 - pico-train - INFO - Step 21100 -- 🔄 Training Metrics 2025-08-29 23:10:21 - pico-train - INFO - ├── Loss: 6.4202 2025-08-29 23:10:21 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-29 23:10:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:10:33 - pico-train - INFO - Step 21125 -- 🔄 Training Metrics 2025-08-29 23:10:33 - pico-train - INFO - ├── Loss: 6.3698 2025-08-29 23:10:33 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-29 23:10:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:10:46 - pico-train - INFO - Step 21150 -- 🔄 Training Metrics 2025-08-29 23:10:46 - pico-train - INFO - ├── Loss: 6.2671 2025-08-29 23:10:46 - pico-train - INFO - ├── Learning Rate: 3.19e-05 2025-08-29 23:10:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:10:59 - pico-train - INFO - Step 21175 -- 🔄 Training Metrics 2025-08-29 23:10:59 - pico-train - INFO - ├── Loss: 6.4334 2025-08-29 23:10:59 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-29 23:10:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:11:11 - pico-train - INFO - Step 21200 -- 🔄 Training Metrics 2025-08-29 23:11:11 - pico-train - INFO - ├── Loss: 6.4208 2025-08-29 23:11:11 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-29 23:11:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:11:24 - pico-train - INFO - Step 21225 -- 🔄 Training Metrics 2025-08-29 23:11:24 - pico-train - INFO - ├── Loss: 6.3380 2025-08-29 23:11:24 - pico-train - INFO - ├── Learning Rate: 3.17e-05 2025-08-29 23:11:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:11:37 - pico-train - INFO - Step 21250 -- 🔄 Training Metrics 2025-08-29 23:11:37 - pico-train - INFO - ├── Loss: 6.3026 2025-08-29 23:11:37 - pico-train - INFO - ├── Learning Rate: 3.17e-05 2025-08-29 23:11:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:11:49 - pico-train - INFO - Step 21275 -- 🔄 Training Metrics 2025-08-29 23:11:49 - pico-train - INFO - ├── Loss: 6.3123 2025-08-29 23:11:49 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-29 23:11:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:12:02 - pico-train - INFO - Step 21300 -- 🔄 Training Metrics 2025-08-29 23:12:02 - pico-train - INFO - ├── Loss: 6.2566 2025-08-29 23:12:02 - pico-train - INFO - ├── Learning Rate: 3.15e-05 2025-08-29 23:12:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:12:15 - pico-train - INFO - Step 21325 -- 🔄 Training Metrics 2025-08-29 23:12:15 - pico-train - INFO - ├── Loss: 6.2697 2025-08-29 23:12:15 - pico-train - INFO - ├── Learning Rate: 3.15e-05 2025-08-29 23:12:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:12:27 - pico-train - INFO - Step 21350 -- 🔄 Training Metrics 2025-08-29 23:12:27 - pico-train - INFO - ├── Loss: 6.2998 2025-08-29 23:12:27 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-29 23:12:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:12:40 - pico-train - INFO - Step 21375 -- 🔄 Training Metrics 2025-08-29 23:12:40 - pico-train - INFO - ├── Loss: 6.3903 2025-08-29 23:12:40 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-29 23:12:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:12:52 - pico-train - INFO - Step 21400 -- 🔄 Training Metrics 2025-08-29 23:12:52 - pico-train - INFO - ├── Loss: 6.2831 2025-08-29 23:12:52 - pico-train - INFO - ├── Learning Rate: 3.13e-05 2025-08-29 23:12:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:13:05 - pico-train - INFO - Step 21425 -- 🔄 Training Metrics 2025-08-29 23:13:05 - pico-train - INFO - ├── Loss: 6.3768 2025-08-29 23:13:05 - pico-train - INFO - ├── Learning Rate: 3.13e-05 2025-08-29 23:13:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:13:18 - pico-train - INFO - Step 21450 -- 🔄 Training Metrics 2025-08-29 23:13:18 - pico-train - INFO - ├── Loss: 6.3917 2025-08-29 23:13:18 - pico-train - INFO - ├── Learning Rate: 3.12e-05 2025-08-29 23:13:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:13:30 - pico-train - INFO - Step 21475 -- 🔄 Training Metrics 2025-08-29 23:13:30 - pico-train - INFO - ├── Loss: 6.3183 2025-08-29 23:13:30 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-29 23:13:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:13:43 - pico-train - INFO - Step 21500 -- 💾 Saving Checkpoint 2025-08-29 23:15:44 - pico-train - INFO - Step 21500 -- 📊 Evaluation Results 2025-08-29 23:15:44 - pico-train - INFO - └── paloma: 6.18596463935147e+24 2025-08-29 23:15:47 - pico-train - INFO - Step 21500 -- 🔄 Training Metrics 2025-08-29 23:15:47 - pico-train - INFO - ├── Loss: 6.3327 2025-08-29 23:15:47 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-29 23:15:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:15:47 - pico-train - INFO - Step 21500 -- 📈 Saving Learning Dynamics 2025-08-29 23:16:02 - pico-train - INFO - Step 21525 -- 🔄 Training Metrics 2025-08-29 23:16:02 - pico-train - INFO - ├── Loss: 6.3111 2025-08-29 23:16:02 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-29 23:16:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:16:14 - pico-train - INFO - Step 21550 -- 🔄 Training Metrics 2025-08-29 23:16:14 - pico-train - INFO - ├── Loss: 6.2823 2025-08-29 23:16:14 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-29 23:16:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:16:27 - pico-train - INFO - Step 21575 -- 🔄 Training Metrics 2025-08-29 23:16:27 - pico-train - INFO - ├── Loss: 6.3073 2025-08-29 23:16:27 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-29 23:16:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:16:40 - pico-train - INFO - Step 21600 -- 🔄 Training Metrics 2025-08-29 23:16:40 - pico-train - INFO - ├── Loss: 6.3168 2025-08-29 23:16:40 - pico-train - INFO - ├── Learning Rate: 3.08e-05 2025-08-29 23:16:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:16:52 - pico-train - INFO - Step 21625 -- 🔄 Training Metrics 2025-08-29 23:16:52 - pico-train - INFO - ├── Loss: 6.3106 2025-08-29 23:16:52 - pico-train - INFO - ├── Learning Rate: 3.08e-05 2025-08-29 23:16:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:17:05 - pico-train - INFO - Step 21650 -- 🔄 Training Metrics 2025-08-29 23:17:05 - pico-train - INFO - ├── Loss: 6.3128 2025-08-29 23:17:05 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-29 23:17:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:17:18 - pico-train - INFO - Step 21675 -- 🔄 Training Metrics 2025-08-29 23:17:18 - pico-train - INFO - ├── Loss: 6.2762 2025-08-29 23:17:18 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-29 23:17:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:17:30 - pico-train - INFO - Step 21700 -- 🔄 Training Metrics 2025-08-29 23:17:30 - pico-train - INFO - ├── Loss: 6.3577 2025-08-29 23:17:30 - pico-train - INFO - ├── Learning Rate: 3.06e-05 2025-08-29 23:17:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:17:43 - pico-train - INFO - Step 21725 -- 🔄 Training Metrics 2025-08-29 23:17:43 - pico-train - INFO - ├── Loss: 6.3495 2025-08-29 23:17:43 - pico-train - INFO - ├── Learning Rate: 3.05e-05 2025-08-29 23:17:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:17:56 - pico-train - INFO - Step 21750 -- 🔄 Training Metrics 2025-08-29 23:17:56 - pico-train - INFO - ├── Loss: 6.3331 2025-08-29 23:17:56 - pico-train - INFO - ├── Learning Rate: 3.05e-05 2025-08-29 23:17:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:18:08 - pico-train - INFO - Step 21775 -- 🔄 Training Metrics 2025-08-29 23:18:08 - pico-train - INFO - ├── Loss: 6.3146 2025-08-29 23:18:08 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-29 23:18:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:18:21 - pico-train - INFO - Step 21800 -- 🔄 Training Metrics 2025-08-29 23:18:21 - pico-train - INFO - ├── Loss: 6.3567 2025-08-29 23:18:21 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-29 23:18:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:18:33 - pico-train - INFO - Step 21825 -- 🔄 Training Metrics 2025-08-29 23:18:33 - pico-train - INFO - ├── Loss: 6.3185 2025-08-29 23:18:33 - pico-train - INFO - ├── Learning Rate: 3.03e-05 2025-08-29 23:18:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:18:46 - pico-train - INFO - Step 21850 -- 🔄 Training Metrics 2025-08-29 23:18:46 - pico-train - INFO - ├── Loss: 6.3087 2025-08-29 23:18:46 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-29 23:18:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:18:59 - pico-train - INFO - Step 21875 -- 🔄 Training Metrics 2025-08-29 23:18:59 - pico-train - INFO - ├── Loss: 6.3817 2025-08-29 23:18:59 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-29 23:18:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:19:12 - pico-train - INFO - Step 21900 -- 🔄 Training Metrics 2025-08-29 23:19:12 - pico-train - INFO - ├── Loss: 6.3398 2025-08-29 23:19:12 - pico-train - INFO - ├── Learning Rate: 3.01e-05 2025-08-29 23:19:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:19:25 - pico-train - INFO - Step 21925 -- 🔄 Training Metrics 2025-08-29 23:19:25 - pico-train - INFO - ├── Loss: 6.4012 2025-08-29 23:19:25 - pico-train - INFO - ├── Learning Rate: 3.01e-05 2025-08-29 23:19:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:19:37 - pico-train - INFO - Step 21950 -- 🔄 Training Metrics 2025-08-29 23:19:37 - pico-train - INFO - ├── Loss: 6.3352 2025-08-29 23:19:37 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-29 23:19:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:19:50 - pico-train - INFO - Step 21975 -- 🔄 Training Metrics 2025-08-29 23:19:50 - pico-train - INFO - ├── Loss: 6.3857 2025-08-29 23:19:50 - pico-train - INFO - ├── Learning Rate: 2.99e-05 2025-08-29 23:19:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:20:02 - pico-train - INFO - Step 22000 -- 💾 Saving Checkpoint 2025-08-29 23:22:06 - pico-train - INFO - Step 22000 -- 📊 Evaluation Results 2025-08-29 23:22:06 - pico-train - INFO - └── paloma: 7.840233924864941e+24 2025-08-29 23:22:08 - pico-train - INFO - Step 22000 -- 🔄 Training Metrics 2025-08-29 23:22:08 - pico-train - INFO - ├── Loss: 6.3421 2025-08-29 23:22:08 - pico-train - INFO - ├── Learning Rate: 2.99e-05 2025-08-29 23:22:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:22:08 - pico-train - INFO - Step 22000 -- 📈 Saving Learning Dynamics 2025-08-29 23:22:24 - pico-train - INFO - Step 22025 -- 🔄 Training Metrics 2025-08-29 23:22:24 - pico-train - INFO - ├── Loss: 6.4107 2025-08-29 23:22:24 - pico-train - INFO - ├── Learning Rate: 2.98e-05 2025-08-29 23:22:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:22:36 - pico-train - INFO - Step 22050 -- 🔄 Training Metrics 2025-08-29 23:22:36 - pico-train - INFO - ├── Loss: 6.3296 2025-08-29 23:22:36 - pico-train - INFO - ├── Learning Rate: 2.98e-05 2025-08-29 23:22:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:22:49 - pico-train - INFO - Step 22075 -- 🔄 Training Metrics 2025-08-29 23:22:49 - pico-train - INFO - ├── Loss: 6.2576 2025-08-29 23:22:49 - pico-train - INFO - ├── Learning Rate: 2.97e-05 2025-08-29 23:22:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:23:01 - pico-train - INFO - Step 22100 -- 🔄 Training Metrics 2025-08-29 23:23:01 - pico-train - INFO - ├── Loss: 6.2705 2025-08-29 23:23:01 - pico-train - INFO - ├── Learning Rate: 2.96e-05 2025-08-29 23:23:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:23:14 - pico-train - INFO - Step 22125 -- 🔄 Training Metrics 2025-08-29 23:23:14 - pico-train - INFO - ├── Loss: 6.2784 2025-08-29 23:23:14 - pico-train - INFO - ├── Learning Rate: 2.96e-05 2025-08-29 23:23:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:23:27 - pico-train - INFO - Step 22150 -- 🔄 Training Metrics 2025-08-29 23:23:27 - pico-train - INFO - ├── Loss: 6.3673 2025-08-29 23:23:27 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-29 23:23:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:23:39 - pico-train - INFO - Step 22175 -- 🔄 Training Metrics 2025-08-29 23:23:39 - pico-train - INFO - ├── Loss: 6.3914 2025-08-29 23:23:39 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-29 23:23:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:23:52 - pico-train - INFO - Step 22200 -- 🔄 Training Metrics 2025-08-29 23:23:52 - pico-train - INFO - ├── Loss: 6.3081 2025-08-29 23:23:52 - pico-train - INFO - ├── Learning Rate: 2.94e-05 2025-08-29 23:23:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:24:05 - pico-train - INFO - Step 22225 -- 🔄 Training Metrics 2025-08-29 23:24:05 - pico-train - INFO - ├── Loss: 6.4045 2025-08-29 23:24:05 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-29 23:24:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:24:17 - pico-train - INFO - Step 22250 -- 🔄 Training Metrics 2025-08-29 23:24:17 - pico-train - INFO - ├── Loss: 6.3830 2025-08-29 23:24:17 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-29 23:24:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:24:30 - pico-train - INFO - Step 22275 -- 🔄 Training Metrics 2025-08-29 23:24:30 - pico-train - INFO - ├── Loss: 6.2955 2025-08-29 23:24:30 - pico-train - INFO - ├── Learning Rate: 2.92e-05 2025-08-29 23:24:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:24:43 - pico-train - INFO - Step 22300 -- 🔄 Training Metrics 2025-08-29 23:24:43 - pico-train - INFO - ├── Loss: 6.3121 2025-08-29 23:24:43 - pico-train - INFO - ├── Learning Rate: 2.92e-05 2025-08-29 23:24:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:24:56 - pico-train - INFO - Step 22325 -- 🔄 Training Metrics 2025-08-29 23:24:56 - pico-train - INFO - ├── Loss: 6.3725 2025-08-29 23:24:56 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-29 23:24:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:25:08 - pico-train - INFO - Step 22350 -- 🔄 Training Metrics 2025-08-29 23:25:08 - pico-train - INFO - ├── Loss: 6.3311 2025-08-29 23:25:08 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-29 23:25:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:25:21 - pico-train - INFO - Step 22375 -- 🔄 Training Metrics 2025-08-29 23:25:21 - pico-train - INFO - ├── Loss: 6.2346 2025-08-29 23:25:21 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-29 23:25:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:25:33 - pico-train - INFO - Step 22400 -- 🔄 Training Metrics 2025-08-29 23:25:33 - pico-train - INFO - ├── Loss: 6.3869 2025-08-29 23:25:33 - pico-train - INFO - ├── Learning Rate: 2.89e-05 2025-08-29 23:25:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:25:46 - pico-train - INFO - Step 22425 -- 🔄 Training Metrics 2025-08-29 23:25:46 - pico-train - INFO - ├── Loss: 6.3370 2025-08-29 23:25:46 - pico-train - INFO - ├── Learning Rate: 2.89e-05 2025-08-29 23:25:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:25:59 - pico-train - INFO - Step 22450 -- 🔄 Training Metrics 2025-08-29 23:25:59 - pico-train - INFO - ├── Loss: 6.3366 2025-08-29 23:25:59 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-29 23:25:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:26:11 - pico-train - INFO - Step 22475 -- 🔄 Training Metrics 2025-08-29 23:26:11 - pico-train - INFO - ├── Loss: 6.3641 2025-08-29 23:26:11 - pico-train - INFO - ├── Learning Rate: 2.87e-05 2025-08-29 23:26:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:26:23 - pico-train - INFO - Step 22500 -- 💾 Saving Checkpoint 2025-08-29 23:28:22 - pico-train - INFO - Step 22500 -- 📊 Evaluation Results 2025-08-29 23:28:22 - pico-train - INFO - └── paloma: 1.0171611158112828e+25 2025-08-29 23:28:23 - pico-train - INFO - Step 22500 -- 🔄 Training Metrics 2025-08-29 23:28:23 - pico-train - INFO - ├── Loss: 6.2880 2025-08-29 23:28:23 - pico-train - INFO - ├── Learning Rate: 2.87e-05 2025-08-29 23:28:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:28:23 - pico-train - INFO - Step 22500 -- 📈 Saving Learning Dynamics 2025-08-29 23:28:39 - pico-train - INFO - Step 22525 -- 🔄 Training Metrics 2025-08-29 23:28:39 - pico-train - INFO - ├── Loss: 6.2955 2025-08-29 23:28:39 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-29 23:28:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:28:51 - pico-train - INFO - Step 22550 -- 🔄 Training Metrics 2025-08-29 23:28:51 - pico-train - INFO - ├── Loss: 6.3124 2025-08-29 23:28:51 - pico-train - INFO - ├── Learning Rate: 2.85e-05 2025-08-29 23:28:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:29:04 - pico-train - INFO - Step 22575 -- 🔄 Training Metrics 2025-08-29 23:29:04 - pico-train - INFO - ├── Loss: 6.3214 2025-08-29 23:29:04 - pico-train - INFO - ├── Learning Rate: 2.85e-05 2025-08-29 23:29:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:29:17 - pico-train - INFO - Step 22600 -- 🔄 Training Metrics 2025-08-29 23:29:17 - pico-train - INFO - ├── Loss: 6.2929 2025-08-29 23:29:17 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-29 23:29:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:29:29 - pico-train - INFO - Step 22625 -- 🔄 Training Metrics 2025-08-29 23:29:29 - pico-train - INFO - ├── Loss: 6.3454 2025-08-29 23:29:29 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-29 23:29:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:29:42 - pico-train - INFO - Step 22650 -- 🔄 Training Metrics 2025-08-29 23:29:42 - pico-train - INFO - ├── Loss: 6.2994 2025-08-29 23:29:42 - pico-train - INFO - ├── Learning Rate: 2.83e-05 2025-08-29 23:29:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:29:55 - pico-train - INFO - Step 22675 -- 🔄 Training Metrics 2025-08-29 23:29:55 - pico-train - INFO - ├── Loss: 6.3245 2025-08-29 23:29:55 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-29 23:29:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:30:07 - pico-train - INFO - Step 22700 -- 🔄 Training Metrics 2025-08-29 23:30:07 - pico-train - INFO - ├── Loss: 6.1874 2025-08-29 23:30:07 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-29 23:30:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:30:20 - pico-train - INFO - Step 22725 -- 🔄 Training Metrics 2025-08-29 23:30:20 - pico-train - INFO - ├── Loss: 6.2636 2025-08-29 23:30:20 - pico-train - INFO - ├── Learning Rate: 2.81e-05 2025-08-29 23:30:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:30:32 - pico-train - INFO - Step 22750 -- 🔄 Training Metrics 2025-08-29 23:30:32 - pico-train - INFO - ├── Loss: 6.3870 2025-08-29 23:30:32 - pico-train - INFO - ├── Learning Rate: 2.81e-05 2025-08-29 23:30:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:30:45 - pico-train - INFO - Step 22775 -- 🔄 Training Metrics 2025-08-29 23:30:45 - pico-train - INFO - ├── Loss: 6.3157 2025-08-29 23:30:45 - pico-train - INFO - ├── Learning Rate: 2.80e-05 2025-08-29 23:30:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:30:57 - pico-train - INFO - Step 22800 -- 🔄 Training Metrics 2025-08-29 23:30:57 - pico-train - INFO - ├── Loss: 6.3617 2025-08-29 23:30:57 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-29 23:30:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:31:10 - pico-train - INFO - Step 22825 -- 🔄 Training Metrics 2025-08-29 23:31:10 - pico-train - INFO - ├── Loss: 6.3006 2025-08-29 23:31:10 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-29 23:31:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:31:23 - pico-train - INFO - Step 22850 -- 🔄 Training Metrics 2025-08-29 23:31:23 - pico-train - INFO - ├── Loss: 6.2552 2025-08-29 23:31:23 - pico-train - INFO - ├── Learning Rate: 2.78e-05 2025-08-29 23:31:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:31:35 - pico-train - INFO - Step 22875 -- 🔄 Training Metrics 2025-08-29 23:31:35 - pico-train - INFO - ├── Loss: 6.3537 2025-08-29 23:31:35 - pico-train - INFO - ├── Learning Rate: 2.78e-05 2025-08-29 23:31:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:31:48 - pico-train - INFO - Step 22900 -- 🔄 Training Metrics 2025-08-29 23:31:48 - pico-train - INFO - ├── Loss: 6.4096 2025-08-29 23:31:48 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-29 23:31:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:32:01 - pico-train - INFO - Step 22925 -- 🔄 Training Metrics 2025-08-29 23:32:01 - pico-train - INFO - ├── Loss: 6.2037 2025-08-29 23:32:01 - pico-train - INFO - ├── Learning Rate: 2.76e-05 2025-08-29 23:32:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:32:13 - pico-train - INFO - Step 22950 -- 🔄 Training Metrics 2025-08-29 23:32:13 - pico-train - INFO - ├── Loss: 6.3007 2025-08-29 23:32:13 - pico-train - INFO - ├── Learning Rate: 2.76e-05 2025-08-29 23:32:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:32:26 - pico-train - INFO - Step 22975 -- 🔄 Training Metrics 2025-08-29 23:32:26 - pico-train - INFO - ├── Loss: 6.2575 2025-08-29 23:32:26 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-29 23:32:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:32:38 - pico-train - INFO - Step 23000 -- 💾 Saving Checkpoint 2025-08-29 23:34:52 - pico-train - INFO - Step 23000 -- 📊 Evaluation Results 2025-08-29 23:34:52 - pico-train - INFO - └── paloma: 1.3786488388612157e+25 2025-08-29 23:34:53 - pico-train - INFO - Step 23000 -- 🔄 Training Metrics 2025-08-29 23:34:53 - pico-train - INFO - ├── Loss: 6.4702 2025-08-29 23:34:53 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-29 23:34:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:34:53 - pico-train - INFO - Step 23000 -- 📈 Saving Learning Dynamics 2025-08-29 23:35:08 - pico-train - INFO - Step 23025 -- 🔄 Training Metrics 2025-08-29 23:35:08 - pico-train - INFO - ├── Loss: 6.3198 2025-08-29 23:35:08 - pico-train - INFO - ├── Learning Rate: 2.74e-05 2025-08-29 23:35:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:35:21 - pico-train - INFO - Step 23050 -- 🔄 Training Metrics 2025-08-29 23:35:21 - pico-train - INFO - ├── Loss: 6.3015 2025-08-29 23:35:21 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-29 23:35:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:35:33 - pico-train - INFO - Step 23075 -- 🔄 Training Metrics 2025-08-29 23:35:33 - pico-train - INFO - ├── Loss: 6.3222 2025-08-29 23:35:33 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-29 23:35:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:35:46 - pico-train - INFO - Step 23100 -- 🔄 Training Metrics 2025-08-29 23:35:46 - pico-train - INFO - ├── Loss: 6.2917 2025-08-29 23:35:46 - pico-train - INFO - ├── Learning Rate: 2.72e-05 2025-08-29 23:35:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:35:59 - pico-train - INFO - Step 23125 -- 🔄 Training Metrics 2025-08-29 23:35:59 - pico-train - INFO - ├── Loss: 6.3574 2025-08-29 23:35:59 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-29 23:35:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:36:11 - pico-train - INFO - Step 23150 -- 🔄 Training Metrics 2025-08-29 23:36:11 - pico-train - INFO - ├── Loss: 6.2434 2025-08-29 23:36:11 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-29 23:36:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:36:24 - pico-train - INFO - Step 23175 -- 🔄 Training Metrics 2025-08-29 23:36:24 - pico-train - INFO - ├── Loss: 6.2580 2025-08-29 23:36:24 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-29 23:36:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:36:36 - pico-train - INFO - Step 23200 -- 🔄 Training Metrics 2025-08-29 23:36:36 - pico-train - INFO - ├── Loss: 6.3214 2025-08-29 23:36:36 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-29 23:36:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:36:49 - pico-train - INFO - Step 23225 -- 🔄 Training Metrics 2025-08-29 23:36:49 - pico-train - INFO - ├── Loss: 6.2731 2025-08-29 23:36:49 - pico-train - INFO - ├── Learning Rate: 2.69e-05 2025-08-29 23:36:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:37:02 - pico-train - INFO - Step 23250 -- 🔄 Training Metrics 2025-08-29 23:37:02 - pico-train - INFO - ├── Loss: 6.3255 2025-08-29 23:37:02 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-29 23:37:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:37:14 - pico-train - INFO - Step 23275 -- 🔄 Training Metrics 2025-08-29 23:37:14 - pico-train - INFO - ├── Loss: 6.3348 2025-08-29 23:37:14 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-29 23:37:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:37:27 - pico-train - INFO - Step 23300 -- 🔄 Training Metrics 2025-08-29 23:37:27 - pico-train - INFO - ├── Loss: 6.3476 2025-08-29 23:37:27 - pico-train - INFO - ├── Learning Rate: 2.67e-05 2025-08-29 23:37:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:37:39 - pico-train - INFO - Step 23325 -- 🔄 Training Metrics 2025-08-29 23:37:39 - pico-train - INFO - ├── Loss: 6.3392 2025-08-29 23:37:39 - pico-train - INFO - ├── Learning Rate: 2.67e-05 2025-08-29 23:37:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:37:52 - pico-train - INFO - Step 23350 -- 🔄 Training Metrics 2025-08-29 23:37:52 - pico-train - INFO - ├── Loss: 6.3051 2025-08-29 23:37:52 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-29 23:37:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:38:05 - pico-train - INFO - Step 23375 -- 🔄 Training Metrics 2025-08-29 23:38:05 - pico-train - INFO - ├── Loss: 6.2683 2025-08-29 23:38:05 - pico-train - INFO - ├── Learning Rate: 2.65e-05 2025-08-29 23:38:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:38:17 - pico-train - INFO - Step 23400 -- 🔄 Training Metrics 2025-08-29 23:38:17 - pico-train - INFO - ├── Loss: 6.2929 2025-08-29 23:38:17 - pico-train - INFO - ├── Learning Rate: 2.65e-05 2025-08-29 23:38:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:38:30 - pico-train - INFO - Step 23425 -- 🔄 Training Metrics 2025-08-29 23:38:30 - pico-train - INFO - ├── Loss: 6.3546 2025-08-29 23:38:30 - pico-train - INFO - ├── Learning Rate: 2.64e-05 2025-08-29 23:38:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:38:42 - pico-train - INFO - Step 23450 -- 🔄 Training Metrics 2025-08-29 23:38:42 - pico-train - INFO - ├── Loss: 6.3572 2025-08-29 23:38:42 - pico-train - INFO - ├── Learning Rate: 2.63e-05 2025-08-29 23:38:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:38:55 - pico-train - INFO - Step 23475 -- 🔄 Training Metrics 2025-08-29 23:38:55 - pico-train - INFO - ├── Loss: 6.2350 2025-08-29 23:38:55 - pico-train - INFO - ├── Learning Rate: 2.63e-05 2025-08-29 23:38:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:39:07 - pico-train - INFO - Step 23500 -- 💾 Saving Checkpoint 2025-08-29 23:41:03 - pico-train - INFO - Step 23500 -- 📊 Evaluation Results 2025-08-29 23:41:03 - pico-train - INFO - └── paloma: 1.5734245831645979e+25 2025-08-29 23:41:04 - pico-train - INFO - Step 23500 -- 🔄 Training Metrics 2025-08-29 23:41:04 - pico-train - INFO - ├── Loss: 6.3544 2025-08-29 23:41:04 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-29 23:41:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:41:04 - pico-train - INFO - Step 23500 -- 📈 Saving Learning Dynamics 2025-08-29 23:41:19 - pico-train - INFO - Step 23525 -- 🔄 Training Metrics 2025-08-29 23:41:19 - pico-train - INFO - ├── Loss: 6.2607 2025-08-29 23:41:19 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-29 23:41:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:41:32 - pico-train - INFO - Step 23550 -- 🔄 Training Metrics 2025-08-29 23:41:32 - pico-train - INFO - ├── Loss: 6.2912 2025-08-29 23:41:32 - pico-train - INFO - ├── Learning Rate: 2.61e-05 2025-08-29 23:41:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:41:45 - pico-train - INFO - Step 23575 -- 🔄 Training Metrics 2025-08-29 23:41:45 - pico-train - INFO - ├── Loss: 6.2348 2025-08-29 23:41:45 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-29 23:41:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:41:57 - pico-train - INFO - Step 23600 -- 🔄 Training Metrics 2025-08-29 23:41:57 - pico-train - INFO - ├── Loss: 6.2372 2025-08-29 23:41:57 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-29 23:41:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:42:10 - pico-train - INFO - Step 23625 -- 🔄 Training Metrics 2025-08-29 23:42:10 - pico-train - INFO - ├── Loss: 6.3467 2025-08-29 23:42:10 - pico-train - INFO - ├── Learning Rate: 2.59e-05 2025-08-29 23:42:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:42:22 - pico-train - INFO - Step 23650 -- 🔄 Training Metrics 2025-08-29 23:42:22 - pico-train - INFO - ├── Loss: 6.2611 2025-08-29 23:42:22 - pico-train - INFO - ├── Learning Rate: 2.59e-05 2025-08-29 23:42:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:42:35 - pico-train - INFO - Step 23675 -- 🔄 Training Metrics 2025-08-29 23:42:35 - pico-train - INFO - ├── Loss: 6.2587 2025-08-29 23:42:35 - pico-train - INFO - ├── Learning Rate: 2.58e-05 2025-08-29 23:42:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:42:47 - pico-train - INFO - Step 23700 -- 🔄 Training Metrics 2025-08-29 23:42:47 - pico-train - INFO - ├── Loss: 6.3048 2025-08-29 23:42:47 - pico-train - INFO - ├── Learning Rate: 2.57e-05 2025-08-29 23:42:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:43:00 - pico-train - INFO - Step 23725 -- 🔄 Training Metrics 2025-08-29 23:43:00 - pico-train - INFO - ├── Loss: 6.2627 2025-08-29 23:43:00 - pico-train - INFO - ├── Learning Rate: 2.57e-05 2025-08-29 23:43:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:43:13 - pico-train - INFO - Step 23750 -- 🔄 Training Metrics 2025-08-29 23:43:13 - pico-train - INFO - ├── Loss: 6.2880 2025-08-29 23:43:13 - pico-train - INFO - ├── Learning Rate: 2.56e-05 2025-08-29 23:43:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:43:25 - pico-train - INFO - Step 23775 -- 🔄 Training Metrics 2025-08-29 23:43:25 - pico-train - INFO - ├── Loss: 6.3205 2025-08-29 23:43:25 - pico-train - INFO - ├── Learning Rate: 2.56e-05 2025-08-29 23:43:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:43:38 - pico-train - INFO - Step 23800 -- 🔄 Training Metrics 2025-08-29 23:43:38 - pico-train - INFO - ├── Loss: 6.2730 2025-08-29 23:43:38 - pico-train - INFO - ├── Learning Rate: 2.55e-05 2025-08-29 23:43:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:43:51 - pico-train - INFO - Step 23825 -- 🔄 Training Metrics 2025-08-29 23:43:51 - pico-train - INFO - ├── Loss: 6.2649 2025-08-29 23:43:51 - pico-train - INFO - ├── Learning Rate: 2.54e-05 2025-08-29 23:43:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:44:03 - pico-train - INFO - Step 23850 -- 🔄 Training Metrics 2025-08-29 23:44:03 - pico-train - INFO - ├── Loss: 6.2840 2025-08-29 23:44:03 - pico-train - INFO - ├── Learning Rate: 2.54e-05 2025-08-29 23:44:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:44:16 - pico-train - INFO - Step 23875 -- 🔄 Training Metrics 2025-08-29 23:44:16 - pico-train - INFO - ├── Loss: 6.3253 2025-08-29 23:44:16 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-29 23:44:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:44:28 - pico-train - INFO - Step 23900 -- 🔄 Training Metrics 2025-08-29 23:44:28 - pico-train - INFO - ├── Loss: 6.3487 2025-08-29 23:44:28 - pico-train - INFO - ├── Learning Rate: 2.52e-05 2025-08-29 23:44:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:44:41 - pico-train - INFO - Step 23925 -- 🔄 Training Metrics 2025-08-29 23:44:41 - pico-train - INFO - ├── Loss: 6.2998 2025-08-29 23:44:41 - pico-train - INFO - ├── Learning Rate: 2.52e-05 2025-08-29 23:44:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:44:54 - pico-train - INFO - Step 23950 -- 🔄 Training Metrics 2025-08-29 23:44:54 - pico-train - INFO - ├── Loss: 6.2444 2025-08-29 23:44:54 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-29 23:44:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:45:06 - pico-train - INFO - Step 23975 -- 🔄 Training Metrics 2025-08-29 23:45:06 - pico-train - INFO - ├── Loss: 6.2611 2025-08-29 23:45:06 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-29 23:45:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:45:18 - pico-train - INFO - Step 24000 -- 💾 Saving Checkpoint 2025-08-29 23:47:14 - pico-train - INFO - Step 24000 -- 📊 Evaluation Results 2025-08-29 23:47:14 - pico-train - INFO - └── paloma: 2.548011467855507e+25 2025-08-29 23:47:17 - pico-train - INFO - Step 24000 -- 🔄 Training Metrics 2025-08-29 23:47:17 - pico-train - INFO - ├── Loss: 6.1774 2025-08-29 23:47:17 - pico-train - INFO - ├── Learning Rate: 2.50e-05 2025-08-29 23:47:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:47:17 - pico-train - INFO - Step 24000 -- 📈 Saving Learning Dynamics 2025-08-29 23:47:32 - pico-train - INFO - Step 24025 -- 🔄 Training Metrics 2025-08-29 23:47:32 - pico-train - INFO - ├── Loss: 6.2658 2025-08-29 23:47:32 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-29 23:47:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:47:44 - pico-train - INFO - Step 24050 -- 🔄 Training Metrics 2025-08-29 23:47:44 - pico-train - INFO - ├── Loss: 6.2641 2025-08-29 23:47:44 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-29 23:47:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:47:57 - pico-train - INFO - Step 24075 -- 🔄 Training Metrics 2025-08-29 23:47:57 - pico-train - INFO - ├── Loss: 6.1837 2025-08-29 23:47:57 - pico-train - INFO - ├── Learning Rate: 2.48e-05 2025-08-29 23:47:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:48:10 - pico-train - INFO - Step 24100 -- 🔄 Training Metrics 2025-08-29 23:48:10 - pico-train - INFO - ├── Loss: 6.3345 2025-08-29 23:48:10 - pico-train - INFO - ├── Learning Rate: 2.48e-05 2025-08-29 23:48:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:48:23 - pico-train - INFO - Step 24125 -- 🔄 Training Metrics 2025-08-29 23:48:23 - pico-train - INFO - ├── Loss: 6.2665 2025-08-29 23:48:23 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-29 23:48:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:48:35 - pico-train - INFO - Step 24150 -- 🔄 Training Metrics 2025-08-29 23:48:35 - pico-train - INFO - ├── Loss: 6.2894 2025-08-29 23:48:35 - pico-train - INFO - ├── Learning Rate: 2.46e-05 2025-08-29 23:48:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:48:48 - pico-train - INFO - Step 24175 -- 🔄 Training Metrics 2025-08-29 23:48:48 - pico-train - INFO - ├── Loss: 6.2354 2025-08-29 23:48:48 - pico-train - INFO - ├── Learning Rate: 2.46e-05 2025-08-29 23:48:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:49:00 - pico-train - INFO - Step 24200 -- 🔄 Training Metrics 2025-08-29 23:49:00 - pico-train - INFO - ├── Loss: 6.2110 2025-08-29 23:49:00 - pico-train - INFO - ├── Learning Rate: 2.45e-05 2025-08-29 23:49:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:49:13 - pico-train - INFO - Step 24225 -- 🔄 Training Metrics 2025-08-29 23:49:13 - pico-train - INFO - ├── Loss: 6.2512 2025-08-29 23:49:13 - pico-train - INFO - ├── Learning Rate: 2.44e-05 2025-08-29 23:49:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:49:25 - pico-train - INFO - Step 24250 -- 🔄 Training Metrics 2025-08-29 23:49:25 - pico-train - INFO - ├── Loss: 6.2544 2025-08-29 23:49:25 - pico-train - INFO - ├── Learning Rate: 2.44e-05 2025-08-29 23:49:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:49:38 - pico-train - INFO - Step 24275 -- 🔄 Training Metrics 2025-08-29 23:49:38 - pico-train - INFO - ├── Loss: 6.2934 2025-08-29 23:49:38 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-29 23:49:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:49:51 - pico-train - INFO - Step 24300 -- 🔄 Training Metrics 2025-08-29 23:49:51 - pico-train - INFO - ├── Loss: 6.2608 2025-08-29 23:49:51 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-29 23:49:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:50:03 - pico-train - INFO - Step 24325 -- 🔄 Training Metrics 2025-08-29 23:50:03 - pico-train - INFO - ├── Loss: 6.2280 2025-08-29 23:50:03 - pico-train - INFO - ├── Learning Rate: 2.42e-05 2025-08-29 23:50:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:50:16 - pico-train - INFO - Step 24350 -- 🔄 Training Metrics 2025-08-29 23:50:16 - pico-train - INFO - ├── Loss: 6.2431 2025-08-29 23:50:16 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-29 23:50:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:50:29 - pico-train - INFO - Step 24375 -- 🔄 Training Metrics 2025-08-29 23:50:29 - pico-train - INFO - ├── Loss: 6.2120 2025-08-29 23:50:29 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-29 23:50:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:50:41 - pico-train - INFO - Step 24400 -- 🔄 Training Metrics 2025-08-29 23:50:41 - pico-train - INFO - ├── Loss: 6.2375 2025-08-29 23:50:41 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-29 23:50:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:50:54 - pico-train - INFO - Step 24425 -- 🔄 Training Metrics 2025-08-29 23:50:54 - pico-train - INFO - ├── Loss: 6.3604 2025-08-29 23:50:54 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-29 23:50:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:51:07 - pico-train - INFO - Step 24450 -- 🔄 Training Metrics 2025-08-29 23:51:07 - pico-train - INFO - ├── Loss: 6.2451 2025-08-29 23:51:07 - pico-train - INFO - ├── Learning Rate: 2.39e-05 2025-08-29 23:51:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:51:20 - pico-train - INFO - Step 24475 -- 🔄 Training Metrics 2025-08-29 23:51:20 - pico-train - INFO - ├── Loss: 6.2877 2025-08-29 23:51:20 - pico-train - INFO - ├── Learning Rate: 2.38e-05 2025-08-29 23:51:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:51:32 - pico-train - INFO - Step 24500 -- 💾 Saving Checkpoint 2025-08-29 23:53:26 - pico-train - INFO - Step 24500 -- 📊 Evaluation Results 2025-08-29 23:53:26 - pico-train - INFO - └── paloma: 2.937466297559389e+25 2025-08-29 23:53:29 - pico-train - INFO - Step 24500 -- 🔄 Training Metrics 2025-08-29 23:53:29 - pico-train - INFO - ├── Loss: 6.3104 2025-08-29 23:53:29 - pico-train - INFO - ├── Learning Rate: 2.38e-05 2025-08-29 23:53:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:53:29 - pico-train - INFO - Step 24500 -- 📈 Saving Learning Dynamics 2025-08-29 23:53:44 - pico-train - INFO - Step 24525 -- 🔄 Training Metrics 2025-08-29 23:53:44 - pico-train - INFO - ├── Loss: 6.2830 2025-08-29 23:53:44 - pico-train - INFO - ├── Learning Rate: 2.37e-05 2025-08-29 23:53:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:53:56 - pico-train - INFO - Step 24550 -- 🔄 Training Metrics 2025-08-29 23:53:56 - pico-train - INFO - ├── Loss: 6.2558 2025-08-29 23:53:56 - pico-train - INFO - ├── Learning Rate: 2.37e-05 2025-08-29 23:53:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:54:09 - pico-train - INFO - Step 24575 -- 🔄 Training Metrics 2025-08-29 23:54:09 - pico-train - INFO - ├── Loss: 6.2140 2025-08-29 23:54:09 - pico-train - INFO - ├── Learning Rate: 2.36e-05 2025-08-29 23:54:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:54:22 - pico-train - INFO - Step 24600 -- 🔄 Training Metrics 2025-08-29 23:54:22 - pico-train - INFO - ├── Loss: 6.2546 2025-08-29 23:54:22 - pico-train - INFO - ├── Learning Rate: 2.35e-05 2025-08-29 23:54:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:54:34 - pico-train - INFO - Step 24625 -- 🔄 Training Metrics 2025-08-29 23:54:34 - pico-train - INFO - ├── Loss: 6.2569 2025-08-29 23:54:34 - pico-train - INFO - ├── Learning Rate: 2.35e-05 2025-08-29 23:54:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:54:47 - pico-train - INFO - Step 24650 -- 🔄 Training Metrics 2025-08-29 23:54:47 - pico-train - INFO - ├── Loss: 6.2170 2025-08-29 23:54:47 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-29 23:54:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:55:00 - pico-train - INFO - Step 24675 -- 🔄 Training Metrics 2025-08-29 23:55:00 - pico-train - INFO - ├── Loss: 6.2187 2025-08-29 23:55:00 - pico-train - INFO - ├── Learning Rate: 2.33e-05 2025-08-29 23:55:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:55:12 - pico-train - INFO - Step 24700 -- 🔄 Training Metrics 2025-08-29 23:55:12 - pico-train - INFO - ├── Loss: 6.2933 2025-08-29 23:55:12 - pico-train - INFO - ├── Learning Rate: 2.33e-05 2025-08-29 23:55:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:55:25 - pico-train - INFO - Step 24725 -- 🔄 Training Metrics 2025-08-29 23:55:25 - pico-train - INFO - ├── Loss: 6.2359 2025-08-29 23:55:25 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-29 23:55:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:55:38 - pico-train - INFO - Step 24750 -- 🔄 Training Metrics 2025-08-29 23:55:38 - pico-train - INFO - ├── Loss: 6.2789 2025-08-29 23:55:38 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-29 23:55:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:55:50 - pico-train - INFO - Step 24775 -- 🔄 Training Metrics 2025-08-29 23:55:50 - pico-train - INFO - ├── Loss: 6.3001 2025-08-29 23:55:50 - pico-train - INFO - ├── Learning Rate: 2.31e-05 2025-08-29 23:55:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:56:03 - pico-train - INFO - Step 24800 -- 🔄 Training Metrics 2025-08-29 23:56:03 - pico-train - INFO - ├── Loss: 6.2419 2025-08-29 23:56:03 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-29 23:56:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:56:16 - pico-train - INFO - Step 24825 -- 🔄 Training Metrics 2025-08-29 23:56:16 - pico-train - INFO - ├── Loss: 6.2251 2025-08-29 23:56:16 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-29 23:56:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:56:28 - pico-train - INFO - Step 24850 -- 🔄 Training Metrics 2025-08-29 23:56:28 - pico-train - INFO - ├── Loss: 6.2023 2025-08-29 23:56:28 - pico-train - INFO - ├── Learning Rate: 2.29e-05 2025-08-29 23:56:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:56:41 - pico-train - INFO - Step 24875 -- 🔄 Training Metrics 2025-08-29 23:56:41 - pico-train - INFO - ├── Loss: 6.2911 2025-08-29 23:56:41 - pico-train - INFO - ├── Learning Rate: 2.29e-05 2025-08-29 23:56:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:56:54 - pico-train - INFO - Step 24900 -- 🔄 Training Metrics 2025-08-29 23:56:54 - pico-train - INFO - ├── Loss: 6.2723 2025-08-29 23:56:54 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-29 23:56:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:57:07 - pico-train - INFO - Step 24925 -- 🔄 Training Metrics 2025-08-29 23:57:07 - pico-train - INFO - ├── Loss: 6.2993 2025-08-29 23:57:07 - pico-train - INFO - ├── Learning Rate: 2.27e-05 2025-08-29 23:57:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:57:19 - pico-train - INFO - Step 24950 -- 🔄 Training Metrics 2025-08-29 23:57:19 - pico-train - INFO - ├── Loss: 6.2579 2025-08-29 23:57:19 - pico-train - INFO - ├── Learning Rate: 2.27e-05 2025-08-29 23:57:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:57:32 - pico-train - INFO - Step 24975 -- 🔄 Training Metrics 2025-08-29 23:57:32 - pico-train - INFO - ├── Loss: 6.2620 2025-08-29 23:57:32 - pico-train - INFO - ├── Learning Rate: 2.26e-05 2025-08-29 23:57:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:57:44 - pico-train - INFO - Step 25000 -- 💾 Saving Checkpoint 2025-08-29 23:59:48 - pico-train - INFO - Step 25000 -- 📊 Evaluation Results 2025-08-29 23:59:48 - pico-train - INFO - └── paloma: 3.4105304760288245e+25 2025-08-29 23:59:49 - pico-train - INFO - Step 25000 -- 🔄 Training Metrics 2025-08-29 23:59:49 - pico-train - INFO - ├── Loss: 6.2956 2025-08-29 23:59:49 - pico-train - INFO - ├── Learning Rate: 2.25e-05 2025-08-29 23:59:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-29 23:59:49 - pico-train - INFO - Step 25000 -- 📈 Saving Learning Dynamics 2025-08-30 00:00:04 - pico-train - INFO - Step 25025 -- 🔄 Training Metrics 2025-08-30 00:00:04 - pico-train - INFO - ├── Loss: 6.2348 2025-08-30 00:00:04 - pico-train - INFO - ├── Learning Rate: 2.25e-05 2025-08-30 00:00:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:00:17 - pico-train - INFO - Step 25050 -- 🔄 Training Metrics 2025-08-30 00:00:17 - pico-train - INFO - ├── Loss: 6.2363 2025-08-30 00:00:17 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 00:00:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:00:30 - pico-train - INFO - Step 25075 -- 🔄 Training Metrics 2025-08-30 00:00:30 - pico-train - INFO - ├── Loss: 6.2567 2025-08-30 00:00:30 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 00:00:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:00:43 - pico-train - INFO - Step 25100 -- 🔄 Training Metrics 2025-08-30 00:00:43 - pico-train - INFO - ├── Loss: 6.2186 2025-08-30 00:00:43 - pico-train - INFO - ├── Learning Rate: 2.23e-05 2025-08-30 00:00:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:00:56 - pico-train - INFO - Step 25125 -- 🔄 Training Metrics 2025-08-30 00:00:56 - pico-train - INFO - ├── Loss: 6.2886 2025-08-30 00:00:56 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-30 00:00:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:01:08 - pico-train - INFO - Step 25150 -- 🔄 Training Metrics 2025-08-30 00:01:08 - pico-train - INFO - ├── Loss: 6.2310 2025-08-30 00:01:08 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-30 00:01:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:01:21 - pico-train - INFO - Step 25175 -- 🔄 Training Metrics 2025-08-30 00:01:21 - pico-train - INFO - ├── Loss: 6.3884 2025-08-30 00:01:21 - pico-train - INFO - ├── Learning Rate: 2.21e-05 2025-08-30 00:01:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:01:34 - pico-train - INFO - Step 25200 -- 🔄 Training Metrics 2025-08-30 00:01:34 - pico-train - INFO - ├── Loss: 6.2232 2025-08-30 00:01:34 - pico-train - INFO - ├── Learning Rate: 2.21e-05 2025-08-30 00:01:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:01:46 - pico-train - INFO - Step 25225 -- 🔄 Training Metrics 2025-08-30 00:01:46 - pico-train - INFO - ├── Loss: 6.2254 2025-08-30 00:01:46 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-30 00:01:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:01:59 - pico-train - INFO - Step 25250 -- 🔄 Training Metrics 2025-08-30 00:01:59 - pico-train - INFO - ├── Loss: 6.2140 2025-08-30 00:01:59 - pico-train - INFO - ├── Learning Rate: 2.19e-05 2025-08-30 00:01:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:02:12 - pico-train - INFO - Step 25275 -- 🔄 Training Metrics 2025-08-30 00:02:12 - pico-train - INFO - ├── Loss: 6.3619 2025-08-30 00:02:12 - pico-train - INFO - ├── Learning Rate: 2.19e-05 2025-08-30 00:02:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:02:24 - pico-train - INFO - Step 25300 -- 🔄 Training Metrics 2025-08-30 00:02:24 - pico-train - INFO - ├── Loss: 6.2660 2025-08-30 00:02:24 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 00:02:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:02:37 - pico-train - INFO - Step 25325 -- 🔄 Training Metrics 2025-08-30 00:02:37 - pico-train - INFO - ├── Loss: 6.1959 2025-08-30 00:02:37 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 00:02:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:02:49 - pico-train - INFO - Step 25350 -- 🔄 Training Metrics 2025-08-30 00:02:49 - pico-train - INFO - ├── Loss: 6.2983 2025-08-30 00:02:49 - pico-train - INFO - ├── Learning Rate: 2.17e-05 2025-08-30 00:02:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:03:02 - pico-train - INFO - Step 25375 -- 🔄 Training Metrics 2025-08-30 00:03:02 - pico-train - INFO - ├── Loss: 6.2441 2025-08-30 00:03:02 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 00:03:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:03:15 - pico-train - INFO - Step 25400 -- 🔄 Training Metrics 2025-08-30 00:03:15 - pico-train - INFO - ├── Loss: 6.2454 2025-08-30 00:03:15 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 00:03:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:03:28 - pico-train - INFO - Step 25425 -- 🔄 Training Metrics 2025-08-30 00:03:28 - pico-train - INFO - ├── Loss: 6.2099 2025-08-30 00:03:28 - pico-train - INFO - ├── Learning Rate: 2.15e-05 2025-08-30 00:03:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:03:40 - pico-train - INFO - Step 25450 -- 🔄 Training Metrics 2025-08-30 00:03:40 - pico-train - INFO - ├── Loss: 6.1991 2025-08-30 00:03:40 - pico-train - INFO - ├── Learning Rate: 2.15e-05 2025-08-30 00:03:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:03:53 - pico-train - INFO - Step 25475 -- 🔄 Training Metrics 2025-08-30 00:03:53 - pico-train - INFO - ├── Loss: 6.1905 2025-08-30 00:03:53 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-30 00:03:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:04:05 - pico-train - INFO - Step 25500 -- 💾 Saving Checkpoint 2025-08-30 00:06:01 - pico-train - INFO - Step 25500 -- 📊 Evaluation Results 2025-08-30 00:06:01 - pico-train - INFO - └── paloma: 5.167340298104552e+25 2025-08-30 00:06:03 - pico-train - INFO - Step 25500 -- 🔄 Training Metrics 2025-08-30 00:06:03 - pico-train - INFO - ├── Loss: 6.2849 2025-08-30 00:06:03 - pico-train - INFO - ├── Learning Rate: 2.13e-05 2025-08-30 00:06:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:06:03 - pico-train - INFO - Step 25500 -- 📈 Saving Learning Dynamics 2025-08-30 00:06:19 - pico-train - INFO - Step 25525 -- 🔄 Training Metrics 2025-08-30 00:06:19 - pico-train - INFO - ├── Loss: 6.2454 2025-08-30 00:06:19 - pico-train - INFO - ├── Learning Rate: 2.13e-05 2025-08-30 00:06:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:06:32 - pico-train - INFO - Step 25550 -- 🔄 Training Metrics 2025-08-30 00:06:32 - pico-train - INFO - ├── Loss: 6.2327 2025-08-30 00:06:32 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-30 00:06:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:06:45 - pico-train - INFO - Step 25575 -- 🔄 Training Metrics 2025-08-30 00:06:45 - pico-train - INFO - ├── Loss: 6.2783 2025-08-30 00:06:45 - pico-train - INFO - ├── Learning Rate: 2.11e-05 2025-08-30 00:06:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:06:57 - pico-train - INFO - Step 25600 -- 🔄 Training Metrics 2025-08-30 00:06:57 - pico-train - INFO - ├── Loss: 6.1487 2025-08-30 00:06:57 - pico-train - INFO - ├── Learning Rate: 2.11e-05 2025-08-30 00:06:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:07:11 - pico-train - INFO - Step 25625 -- 🔄 Training Metrics 2025-08-30 00:07:11 - pico-train - INFO - ├── Loss: 6.3194 2025-08-30 00:07:11 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 00:07:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:07:24 - pico-train - INFO - Step 25650 -- 🔄 Training Metrics 2025-08-30 00:07:24 - pico-train - INFO - ├── Loss: 6.2920 2025-08-30 00:07:24 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 00:07:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:07:37 - pico-train - INFO - Step 25675 -- 🔄 Training Metrics 2025-08-30 00:07:37 - pico-train - INFO - ├── Loss: 6.2623 2025-08-30 00:07:37 - pico-train - INFO - ├── Learning Rate: 2.09e-05 2025-08-30 00:07:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:07:49 - pico-train - INFO - Step 25700 -- 🔄 Training Metrics 2025-08-30 00:07:49 - pico-train - INFO - ├── Loss: 6.2687 2025-08-30 00:07:49 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 00:07:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:08:02 - pico-train - INFO - Step 25725 -- 🔄 Training Metrics 2025-08-30 00:08:02 - pico-train - INFO - ├── Loss: 6.2595 2025-08-30 00:08:02 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 00:08:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:08:15 - pico-train - INFO - Step 25750 -- 🔄 Training Metrics 2025-08-30 00:08:15 - pico-train - INFO - ├── Loss: 6.2781 2025-08-30 00:08:15 - pico-train - INFO - ├── Learning Rate: 2.07e-05 2025-08-30 00:08:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:08:27 - pico-train - INFO - Step 25775 -- 🔄 Training Metrics 2025-08-30 00:08:27 - pico-train - INFO - ├── Loss: 6.2089 2025-08-30 00:08:27 - pico-train - INFO - ├── Learning Rate: 2.07e-05 2025-08-30 00:08:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:08:40 - pico-train - INFO - Step 25800 -- 🔄 Training Metrics 2025-08-30 00:08:40 - pico-train - INFO - ├── Loss: 6.2729 2025-08-30 00:08:40 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-30 00:08:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:08:53 - pico-train - INFO - Step 25825 -- 🔄 Training Metrics 2025-08-30 00:08:53 - pico-train - INFO - ├── Loss: 6.2478 2025-08-30 00:08:53 - pico-train - INFO - ├── Learning Rate: 2.05e-05 2025-08-30 00:08:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:09:05 - pico-train - INFO - Step 25850 -- 🔄 Training Metrics 2025-08-30 00:09:05 - pico-train - INFO - ├── Loss: 6.2238 2025-08-30 00:09:05 - pico-train - INFO - ├── Learning Rate: 2.05e-05 2025-08-30 00:09:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:09:18 - pico-train - INFO - Step 25875 -- 🔄 Training Metrics 2025-08-30 00:09:18 - pico-train - INFO - ├── Loss: 6.2437 2025-08-30 00:09:18 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 00:09:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:09:31 - pico-train - INFO - Step 25900 -- 🔄 Training Metrics 2025-08-30 00:09:31 - pico-train - INFO - ├── Loss: 6.2743 2025-08-30 00:09:31 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 00:09:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:09:43 - pico-train - INFO - Step 25925 -- 🔄 Training Metrics 2025-08-30 00:09:43 - pico-train - INFO - ├── Loss: 6.2143 2025-08-30 00:09:43 - pico-train - INFO - ├── Learning Rate: 2.03e-05 2025-08-30 00:09:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:09:56 - pico-train - INFO - Step 25950 -- 🔄 Training Metrics 2025-08-30 00:09:56 - pico-train - INFO - ├── Loss: 6.1636 2025-08-30 00:09:56 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 00:09:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:10:08 - pico-train - INFO - Step 25975 -- 🔄 Training Metrics 2025-08-30 00:10:08 - pico-train - INFO - ├── Loss: 6.2028 2025-08-30 00:10:08 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 00:10:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:10:21 - pico-train - INFO - Step 26000 -- 💾 Saving Checkpoint 2025-08-30 00:12:22 - pico-train - INFO - Step 26000 -- 📊 Evaluation Results 2025-08-30 00:12:22 - pico-train - INFO - └── paloma: 5.374017629915336e+25 2025-08-30 00:12:25 - pico-train - INFO - Step 26000 -- 🔄 Training Metrics 2025-08-30 00:12:25 - pico-train - INFO - ├── Loss: 6.3023 2025-08-30 00:12:25 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 00:12:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:12:25 - pico-train - INFO - Step 26000 -- 📈 Saving Learning Dynamics 2025-08-30 00:12:40 - pico-train - INFO - Step 26025 -- 🔄 Training Metrics 2025-08-30 00:12:40 - pico-train - INFO - ├── Loss: 6.2060 2025-08-30 00:12:40 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 00:12:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:12:52 - pico-train - INFO - Step 26050 -- 🔄 Training Metrics 2025-08-30 00:12:52 - pico-train - INFO - ├── Loss: 6.2001 2025-08-30 00:12:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 00:12:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:13:05 - pico-train - INFO - Step 26075 -- 🔄 Training Metrics 2025-08-30 00:13:05 - pico-train - INFO - ├── Loss: 6.2546 2025-08-30 00:13:05 - pico-train - INFO - ├── Learning Rate: 1.99e-05 2025-08-30 00:13:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:13:18 - pico-train - INFO - Step 26100 -- 🔄 Training Metrics 2025-08-30 00:13:18 - pico-train - INFO - ├── Loss: 6.1986 2025-08-30 00:13:18 - pico-train - INFO - ├── Learning Rate: 1.99e-05 2025-08-30 00:13:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:13:32 - pico-train - INFO - Step 26125 -- 🔄 Training Metrics 2025-08-30 00:13:32 - pico-train - INFO - ├── Loss: 6.2415 2025-08-30 00:13:32 - pico-train - INFO - ├── Learning Rate: 1.98e-05 2025-08-30 00:13:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:13:44 - pico-train - INFO - Step 26150 -- 🔄 Training Metrics 2025-08-30 00:13:44 - pico-train - INFO - ├── Loss: 6.2411 2025-08-30 00:13:44 - pico-train - INFO - ├── Learning Rate: 1.98e-05 2025-08-30 00:13:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:13:57 - pico-train - INFO - Step 26175 -- 🔄 Training Metrics 2025-08-30 00:13:57 - pico-train - INFO - ├── Loss: 6.1756 2025-08-30 00:13:57 - pico-train - INFO - ├── Learning Rate: 1.97e-05 2025-08-30 00:13:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:14:10 - pico-train - INFO - Step 26200 -- 🔄 Training Metrics 2025-08-30 00:14:10 - pico-train - INFO - ├── Loss: 6.1444 2025-08-30 00:14:10 - pico-train - INFO - ├── Learning Rate: 1.96e-05 2025-08-30 00:14:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:14:22 - pico-train - INFO - Step 26225 -- 🔄 Training Metrics 2025-08-30 00:14:22 - pico-train - INFO - ├── Loss: 6.3335 2025-08-30 00:14:22 - pico-train - INFO - ├── Learning Rate: 1.96e-05 2025-08-30 00:14:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:14:35 - pico-train - INFO - Step 26250 -- 🔄 Training Metrics 2025-08-30 00:14:35 - pico-train - INFO - ├── Loss: 6.1491 2025-08-30 00:14:35 - pico-train - INFO - ├── Learning Rate: 1.95e-05 2025-08-30 00:14:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:14:48 - pico-train - INFO - Step 26275 -- 🔄 Training Metrics 2025-08-30 00:14:48 - pico-train - INFO - ├── Loss: 6.1959 2025-08-30 00:14:48 - pico-train - INFO - ├── Learning Rate: 1.95e-05 2025-08-30 00:14:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:15:00 - pico-train - INFO - Step 26300 -- 🔄 Training Metrics 2025-08-30 00:15:00 - pico-train - INFO - ├── Loss: 6.2494 2025-08-30 00:15:00 - pico-train - INFO - ├── Learning Rate: 1.94e-05 2025-08-30 00:15:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:15:13 - pico-train - INFO - Step 26325 -- 🔄 Training Metrics 2025-08-30 00:15:13 - pico-train - INFO - ├── Loss: 6.2893 2025-08-30 00:15:13 - pico-train - INFO - ├── Learning Rate: 1.93e-05 2025-08-30 00:15:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:15:26 - pico-train - INFO - Step 26350 -- 🔄 Training Metrics 2025-08-30 00:15:26 - pico-train - INFO - ├── Loss: 6.2732 2025-08-30 00:15:26 - pico-train - INFO - ├── Learning Rate: 1.93e-05 2025-08-30 00:15:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:15:38 - pico-train - INFO - Step 26375 -- 🔄 Training Metrics 2025-08-30 00:15:38 - pico-train - INFO - ├── Loss: 6.2804 2025-08-30 00:15:38 - pico-train - INFO - ├── Learning Rate: 1.92e-05 2025-08-30 00:15:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:15:51 - pico-train - INFO - Step 26400 -- 🔄 Training Metrics 2025-08-30 00:15:51 - pico-train - INFO - ├── Loss: 6.2117 2025-08-30 00:15:51 - pico-train - INFO - ├── Learning Rate: 1.92e-05 2025-08-30 00:15:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:16:04 - pico-train - INFO - Step 26425 -- 🔄 Training Metrics 2025-08-30 00:16:04 - pico-train - INFO - ├── Loss: 6.2055 2025-08-30 00:16:04 - pico-train - INFO - ├── Learning Rate: 1.91e-05 2025-08-30 00:16:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:16:17 - pico-train - INFO - Step 26450 -- 🔄 Training Metrics 2025-08-30 00:16:17 - pico-train - INFO - ├── Loss: 6.3085 2025-08-30 00:16:17 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-30 00:16:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:16:29 - pico-train - INFO - Step 26475 -- 🔄 Training Metrics 2025-08-30 00:16:29 - pico-train - INFO - ├── Loss: 6.1870 2025-08-30 00:16:29 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-30 00:16:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:16:41 - pico-train - INFO - Step 26500 -- 💾 Saving Checkpoint 2025-08-30 00:18:38 - pico-train - INFO - Step 26500 -- 📊 Evaluation Results 2025-08-30 00:18:38 - pico-train - INFO - └── paloma: 7.002764153086805e+25 2025-08-30 00:18:39 - pico-train - INFO - Step 26500 -- 🔄 Training Metrics 2025-08-30 00:18:39 - pico-train - INFO - ├── Loss: 6.2219 2025-08-30 00:18:39 - pico-train - INFO - ├── Learning Rate: 1.89e-05 2025-08-30 00:18:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:18:39 - pico-train - INFO - Step 26500 -- 📈 Saving Learning Dynamics 2025-08-30 00:18:54 - pico-train - INFO - Step 26525 -- 🔄 Training Metrics 2025-08-30 00:18:54 - pico-train - INFO - ├── Loss: 6.1945 2025-08-30 00:18:54 - pico-train - INFO - ├── Learning Rate: 1.89e-05 2025-08-30 00:18:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:19:07 - pico-train - INFO - Step 26550 -- 🔄 Training Metrics 2025-08-30 00:19:07 - pico-train - INFO - ├── Loss: 6.1917 2025-08-30 00:19:07 - pico-train - INFO - ├── Learning Rate: 1.88e-05 2025-08-30 00:19:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:19:20 - pico-train - INFO - Step 26575 -- 🔄 Training Metrics 2025-08-30 00:19:20 - pico-train - INFO - ├── Loss: 6.1611 2025-08-30 00:19:20 - pico-train - INFO - ├── Learning Rate: 1.87e-05 2025-08-30 00:19:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:19:32 - pico-train - INFO - Step 26600 -- 🔄 Training Metrics 2025-08-30 00:19:32 - pico-train - INFO - ├── Loss: 6.2254 2025-08-30 00:19:32 - pico-train - INFO - ├── Learning Rate: 1.87e-05 2025-08-30 00:19:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:19:45 - pico-train - INFO - Step 26625 -- 🔄 Training Metrics 2025-08-30 00:19:45 - pico-train - INFO - ├── Loss: 6.2633 2025-08-30 00:19:45 - pico-train - INFO - ├── Learning Rate: 1.86e-05 2025-08-30 00:19:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:19:58 - pico-train - INFO - Step 26650 -- 🔄 Training Metrics 2025-08-30 00:19:58 - pico-train - INFO - ├── Loss: 6.2096 2025-08-30 00:19:58 - pico-train - INFO - ├── Learning Rate: 1.86e-05 2025-08-30 00:19:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:20:10 - pico-train - INFO - Step 26675 -- 🔄 Training Metrics 2025-08-30 00:20:10 - pico-train - INFO - ├── Loss: 6.2665 2025-08-30 00:20:10 - pico-train - INFO - ├── Learning Rate: 1.85e-05 2025-08-30 00:20:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:20:23 - pico-train - INFO - Step 26700 -- 🔄 Training Metrics 2025-08-30 00:20:23 - pico-train - INFO - ├── Loss: 6.2534 2025-08-30 00:20:23 - pico-train - INFO - ├── Learning Rate: 1.85e-05 2025-08-30 00:20:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:20:36 - pico-train - INFO - Step 26725 -- 🔄 Training Metrics 2025-08-30 00:20:36 - pico-train - INFO - ├── Loss: 6.2207 2025-08-30 00:20:36 - pico-train - INFO - ├── Learning Rate: 1.84e-05 2025-08-30 00:20:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:20:48 - pico-train - INFO - Step 26750 -- 🔄 Training Metrics 2025-08-30 00:20:48 - pico-train - INFO - ├── Loss: 6.2923 2025-08-30 00:20:48 - pico-train - INFO - ├── Learning Rate: 1.83e-05 2025-08-30 00:20:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:21:01 - pico-train - INFO - Step 26775 -- 🔄 Training Metrics 2025-08-30 00:21:01 - pico-train - INFO - ├── Loss: 6.2678 2025-08-30 00:21:01 - pico-train - INFO - ├── Learning Rate: 1.83e-05 2025-08-30 00:21:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:21:14 - pico-train - INFO - Step 26800 -- 🔄 Training Metrics 2025-08-30 00:21:14 - pico-train - INFO - ├── Loss: 6.2139 2025-08-30 00:21:14 - pico-train - INFO - ├── Learning Rate: 1.82e-05 2025-08-30 00:21:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:21:26 - pico-train - INFO - Step 26825 -- 🔄 Training Metrics 2025-08-30 00:21:26 - pico-train - INFO - ├── Loss: 6.1680 2025-08-30 00:21:26 - pico-train - INFO - ├── Learning Rate: 1.82e-05 2025-08-30 00:21:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:21:39 - pico-train - INFO - Step 26850 -- 🔄 Training Metrics 2025-08-30 00:21:39 - pico-train - INFO - ├── Loss: 6.1858 2025-08-30 00:21:39 - pico-train - INFO - ├── Learning Rate: 1.81e-05 2025-08-30 00:21:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:21:52 - pico-train - INFO - Step 26875 -- 🔄 Training Metrics 2025-08-30 00:21:52 - pico-train - INFO - ├── Loss: 6.1172 2025-08-30 00:21:52 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-30 00:21:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:22:05 - pico-train - INFO - Step 26900 -- 🔄 Training Metrics 2025-08-30 00:22:05 - pico-train - INFO - ├── Loss: 6.2332 2025-08-30 00:22:05 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-30 00:22:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:22:17 - pico-train - INFO - Step 26925 -- 🔄 Training Metrics 2025-08-30 00:22:17 - pico-train - INFO - ├── Loss: 6.2099 2025-08-30 00:22:17 - pico-train - INFO - ├── Learning Rate: 1.79e-05 2025-08-30 00:22:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:22:30 - pico-train - INFO - Step 26950 -- 🔄 Training Metrics 2025-08-30 00:22:30 - pico-train - INFO - ├── Loss: 6.2551 2025-08-30 00:22:30 - pico-train - INFO - ├── Learning Rate: 1.79e-05 2025-08-30 00:22:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:22:43 - pico-train - INFO - Step 26975 -- 🔄 Training Metrics 2025-08-30 00:22:43 - pico-train - INFO - ├── Loss: 6.2033 2025-08-30 00:22:43 - pico-train - INFO - ├── Learning Rate: 1.78e-05 2025-08-30 00:22:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:22:55 - pico-train - INFO - Step 27000 -- 💾 Saving Checkpoint 2025-08-30 00:24:53 - pico-train - INFO - Step 27000 -- 📊 Evaluation Results 2025-08-30 00:24:53 - pico-train - INFO - └── paloma: 7.722641414937935e+25 2025-08-30 00:24:55 - pico-train - INFO - Step 27000 -- 🔄 Training Metrics 2025-08-30 00:24:55 - pico-train - INFO - ├── Loss: 6.2512 2025-08-30 00:24:55 - pico-train - INFO - ├── Learning Rate: 1.77e-05 2025-08-30 00:24:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:24:55 - pico-train - INFO - Step 27000 -- 📈 Saving Learning Dynamics 2025-08-30 00:25:09 - pico-train - INFO - Step 27025 -- 🔄 Training Metrics 2025-08-30 00:25:09 - pico-train - INFO - ├── Loss: 6.2686 2025-08-30 00:25:09 - pico-train - INFO - ├── Learning Rate: 1.77e-05 2025-08-30 00:25:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:25:22 - pico-train - INFO - Step 27050 -- 🔄 Training Metrics 2025-08-30 00:25:22 - pico-train - INFO - ├── Loss: 6.1854 2025-08-30 00:25:22 - pico-train - INFO - ├── Learning Rate: 1.76e-05 2025-08-30 00:25:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:25:35 - pico-train - INFO - Step 27075 -- 🔄 Training Metrics 2025-08-30 00:25:35 - pico-train - INFO - ├── Loss: 6.1974 2025-08-30 00:25:35 - pico-train - INFO - ├── Learning Rate: 1.76e-05 2025-08-30 00:25:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:25:47 - pico-train - INFO - Step 27100 -- 🔄 Training Metrics 2025-08-30 00:25:47 - pico-train - INFO - ├── Loss: 6.2597 2025-08-30 00:25:47 - pico-train - INFO - ├── Learning Rate: 1.75e-05 2025-08-30 00:25:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:26:00 - pico-train - INFO - Step 27125 -- 🔄 Training Metrics 2025-08-30 00:26:00 - pico-train - INFO - ├── Loss: 6.2280 2025-08-30 00:26:00 - pico-train - INFO - ├── Learning Rate: 1.74e-05 2025-08-30 00:26:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:26:13 - pico-train - INFO - Step 27150 -- 🔄 Training Metrics 2025-08-30 00:26:13 - pico-train - INFO - ├── Loss: 6.2126 2025-08-30 00:26:13 - pico-train - INFO - ├── Learning Rate: 1.74e-05 2025-08-30 00:26:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:26:26 - pico-train - INFO - Step 27175 -- 🔄 Training Metrics 2025-08-30 00:26:26 - pico-train - INFO - ├── Loss: 6.2233 2025-08-30 00:26:26 - pico-train - INFO - ├── Learning Rate: 1.73e-05 2025-08-30 00:26:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:26:38 - pico-train - INFO - Step 27200 -- 🔄 Training Metrics 2025-08-30 00:26:38 - pico-train - INFO - ├── Loss: 6.1393 2025-08-30 00:26:38 - pico-train - INFO - ├── Learning Rate: 1.73e-05 2025-08-30 00:26:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:26:51 - pico-train - INFO - Step 27225 -- 🔄 Training Metrics 2025-08-30 00:26:51 - pico-train - INFO - ├── Loss: 6.3226 2025-08-30 00:26:51 - pico-train - INFO - ├── Learning Rate: 1.72e-05 2025-08-30 00:26:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:27:03 - pico-train - INFO - Step 27250 -- 🔄 Training Metrics 2025-08-30 00:27:03 - pico-train - INFO - ├── Loss: 6.1570 2025-08-30 00:27:03 - pico-train - INFO - ├── Learning Rate: 1.72e-05 2025-08-30 00:27:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:27:16 - pico-train - INFO - Step 27275 -- 🔄 Training Metrics 2025-08-30 00:27:16 - pico-train - INFO - ├── Loss: 6.2252 2025-08-30 00:27:16 - pico-train - INFO - ├── Learning Rate: 1.71e-05 2025-08-30 00:27:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:27:29 - pico-train - INFO - Step 27300 -- 🔄 Training Metrics 2025-08-30 00:27:29 - pico-train - INFO - ├── Loss: 6.1647 2025-08-30 00:27:29 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-30 00:27:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:27:41 - pico-train - INFO - Step 27325 -- 🔄 Training Metrics 2025-08-30 00:27:41 - pico-train - INFO - ├── Loss: 6.1219 2025-08-30 00:27:41 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-30 00:27:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:27:54 - pico-train - INFO - Step 27350 -- 🔄 Training Metrics 2025-08-30 00:27:54 - pico-train - INFO - ├── Loss: 6.2250 2025-08-30 00:27:54 - pico-train - INFO - ├── Learning Rate: 1.69e-05 2025-08-30 00:27:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:28:06 - pico-train - INFO - Step 27375 -- 🔄 Training Metrics 2025-08-30 00:28:06 - pico-train - INFO - ├── Loss: 6.1883 2025-08-30 00:28:06 - pico-train - INFO - ├── Learning Rate: 1.69e-05 2025-08-30 00:28:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:28:19 - pico-train - INFO - Step 27400 -- 🔄 Training Metrics 2025-08-30 00:28:19 - pico-train - INFO - ├── Loss: 6.2074 2025-08-30 00:28:19 - pico-train - INFO - ├── Learning Rate: 1.68e-05 2025-08-30 00:28:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:28:31 - pico-train - INFO - Step 27425 -- 🔄 Training Metrics 2025-08-30 00:28:31 - pico-train - INFO - ├── Loss: 6.1881 2025-08-30 00:28:31 - pico-train - INFO - ├── Learning Rate: 1.68e-05 2025-08-30 00:28:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:28:44 - pico-train - INFO - Step 27450 -- 🔄 Training Metrics 2025-08-30 00:28:44 - pico-train - INFO - ├── Loss: 6.1977 2025-08-30 00:28:44 - pico-train - INFO - ├── Learning Rate: 1.67e-05 2025-08-30 00:28:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:28:57 - pico-train - INFO - Step 27475 -- 🔄 Training Metrics 2025-08-30 00:28:57 - pico-train - INFO - ├── Loss: 6.2394 2025-08-30 00:28:57 - pico-train - INFO - ├── Learning Rate: 1.66e-05 2025-08-30 00:28:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:29:09 - pico-train - INFO - Step 27500 -- 💾 Saving Checkpoint 2025-08-30 00:31:15 - pico-train - INFO - Step 27500 -- 📊 Evaluation Results 2025-08-30 00:31:15 - pico-train - INFO - └── paloma: 1.0733810806931749e+26 2025-08-30 00:31:19 - pico-train - INFO - Step 27500 -- 🔄 Training Metrics 2025-08-30 00:31:19 - pico-train - INFO - ├── Loss: 6.2657 2025-08-30 00:31:19 - pico-train - INFO - ├── Learning Rate: 1.66e-05 2025-08-30 00:31:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:31:19 - pico-train - INFO - Step 27500 -- 📈 Saving Learning Dynamics 2025-08-30 00:31:34 - pico-train - INFO - Step 27525 -- 🔄 Training Metrics 2025-08-30 00:31:34 - pico-train - INFO - ├── Loss: 6.1848 2025-08-30 00:31:34 - pico-train - INFO - ├── Learning Rate: 1.65e-05 2025-08-30 00:31:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:31:46 - pico-train - INFO - Step 27550 -- 🔄 Training Metrics 2025-08-30 00:31:46 - pico-train - INFO - ├── Loss: 6.1677 2025-08-30 00:31:46 - pico-train - INFO - ├── Learning Rate: 1.65e-05 2025-08-30 00:31:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:31:59 - pico-train - INFO - Step 27575 -- 🔄 Training Metrics 2025-08-30 00:31:59 - pico-train - INFO - ├── Loss: 6.2103 2025-08-30 00:31:59 - pico-train - INFO - ├── Learning Rate: 1.64e-05 2025-08-30 00:31:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:32:12 - pico-train - INFO - Step 27600 -- 🔄 Training Metrics 2025-08-30 00:32:12 - pico-train - INFO - ├── Loss: 6.2026 2025-08-30 00:32:12 - pico-train - INFO - ├── Learning Rate: 1.63e-05 2025-08-30 00:32:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:32:25 - pico-train - INFO - Step 27625 -- 🔄 Training Metrics 2025-08-30 00:32:25 - pico-train - INFO - ├── Loss: 6.1656 2025-08-30 00:32:25 - pico-train - INFO - ├── Learning Rate: 1.63e-05 2025-08-30 00:32:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:32:38 - pico-train - INFO - Step 27650 -- 🔄 Training Metrics 2025-08-30 00:32:38 - pico-train - INFO - ├── Loss: 6.1600 2025-08-30 00:32:38 - pico-train - INFO - ├── Learning Rate: 1.62e-05 2025-08-30 00:32:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:32:50 - pico-train - INFO - Step 27675 -- 🔄 Training Metrics 2025-08-30 00:32:50 - pico-train - INFO - ├── Loss: 6.2803 2025-08-30 00:32:50 - pico-train - INFO - ├── Learning Rate: 1.62e-05 2025-08-30 00:32:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:33:03 - pico-train - INFO - Step 27700 -- 🔄 Training Metrics 2025-08-30 00:33:03 - pico-train - INFO - ├── Loss: 6.2837 2025-08-30 00:33:03 - pico-train - INFO - ├── Learning Rate: 1.61e-05 2025-08-30 00:33:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:33:15 - pico-train - INFO - Step 27725 -- 🔄 Training Metrics 2025-08-30 00:33:15 - pico-train - INFO - ├── Loss: 6.1344 2025-08-30 00:33:15 - pico-train - INFO - ├── Learning Rate: 1.61e-05 2025-08-30 00:33:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:33:28 - pico-train - INFO - Step 27750 -- 🔄 Training Metrics 2025-08-30 00:33:28 - pico-train - INFO - ├── Loss: 6.2066 2025-08-30 00:33:28 - pico-train - INFO - ├── Learning Rate: 1.60e-05 2025-08-30 00:33:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:33:41 - pico-train - INFO - Step 27775 -- 🔄 Training Metrics 2025-08-30 00:33:41 - pico-train - INFO - ├── Loss: 6.1848 2025-08-30 00:33:41 - pico-train - INFO - ├── Learning Rate: 1.59e-05 2025-08-30 00:33:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:33:53 - pico-train - INFO - Step 27800 -- 🔄 Training Metrics 2025-08-30 00:33:53 - pico-train - INFO - ├── Loss: 6.2565 2025-08-30 00:33:53 - pico-train - INFO - ├── Learning Rate: 1.59e-05 2025-08-30 00:33:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:34:06 - pico-train - INFO - Step 27825 -- 🔄 Training Metrics 2025-08-30 00:34:06 - pico-train - INFO - ├── Loss: 6.2278 2025-08-30 00:34:06 - pico-train - INFO - ├── Learning Rate: 1.58e-05 2025-08-30 00:34:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:34:19 - pico-train - INFO - Step 27850 -- 🔄 Training Metrics 2025-08-30 00:34:19 - pico-train - INFO - ├── Loss: 6.2249 2025-08-30 00:34:19 - pico-train - INFO - ├── Learning Rate: 1.58e-05 2025-08-30 00:34:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:34:32 - pico-train - INFO - Step 27875 -- 🔄 Training Metrics 2025-08-30 00:34:32 - pico-train - INFO - ├── Loss: 6.1730 2025-08-30 00:34:32 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 00:34:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:34:44 - pico-train - INFO - Step 27900 -- 🔄 Training Metrics 2025-08-30 00:34:44 - pico-train - INFO - ├── Loss: 6.1503 2025-08-30 00:34:44 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 00:34:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:34:57 - pico-train - INFO - Step 27925 -- 🔄 Training Metrics 2025-08-30 00:34:57 - pico-train - INFO - ├── Loss: 6.1955 2025-08-30 00:34:57 - pico-train - INFO - ├── Learning Rate: 1.56e-05 2025-08-30 00:34:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:35:09 - pico-train - INFO - Step 27950 -- 🔄 Training Metrics 2025-08-30 00:35:09 - pico-train - INFO - ├── Loss: 6.1747 2025-08-30 00:35:09 - pico-train - INFO - ├── Learning Rate: 1.55e-05 2025-08-30 00:35:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:35:22 - pico-train - INFO - Step 27975 -- 🔄 Training Metrics 2025-08-30 00:35:22 - pico-train - INFO - ├── Loss: 6.2607 2025-08-30 00:35:22 - pico-train - INFO - ├── Learning Rate: 1.55e-05 2025-08-30 00:35:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:35:34 - pico-train - INFO - Step 28000 -- 💾 Saving Checkpoint 2025-08-30 00:37:31 - pico-train - INFO - Step 28000 -- 📊 Evaluation Results 2025-08-30 00:37:31 - pico-train - INFO - └── paloma: 1.2438803536426585e+26 2025-08-30 00:37:34 - pico-train - INFO - Step 28000 -- 🔄 Training Metrics 2025-08-30 00:37:34 - pico-train - INFO - ├── Loss: 6.2990 2025-08-30 00:37:34 - pico-train - INFO - ├── Learning Rate: 1.54e-05 2025-08-30 00:37:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:37:34 - pico-train - INFO - Step 28000 -- 📈 Saving Learning Dynamics 2025-08-30 00:37:49 - pico-train - INFO - Step 28025 -- 🔄 Training Metrics 2025-08-30 00:37:49 - pico-train - INFO - ├── Loss: 6.1938 2025-08-30 00:37:49 - pico-train - INFO - ├── Learning Rate: 1.54e-05 2025-08-30 00:37:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:38:01 - pico-train - INFO - Step 28050 -- 🔄 Training Metrics 2025-08-30 00:38:01 - pico-train - INFO - ├── Loss: 6.2467 2025-08-30 00:38:01 - pico-train - INFO - ├── Learning Rate: 1.53e-05 2025-08-30 00:38:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:38:14 - pico-train - INFO - Step 28075 -- 🔄 Training Metrics 2025-08-30 00:38:14 - pico-train - INFO - ├── Loss: 6.1609 2025-08-30 00:38:14 - pico-train - INFO - ├── Learning Rate: 1.53e-05 2025-08-30 00:38:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:38:26 - pico-train - INFO - Step 28100 -- 🔄 Training Metrics 2025-08-30 00:38:26 - pico-train - INFO - ├── Loss: 6.1691 2025-08-30 00:38:26 - pico-train - INFO - ├── Learning Rate: 1.52e-05 2025-08-30 00:38:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:38:39 - pico-train - INFO - Step 28125 -- 🔄 Training Metrics 2025-08-30 00:38:39 - pico-train - INFO - ├── Loss: 6.2517 2025-08-30 00:38:39 - pico-train - INFO - ├── Learning Rate: 1.52e-05 2025-08-30 00:38:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:38:52 - pico-train - INFO - Step 28150 -- 🔄 Training Metrics 2025-08-30 00:38:52 - pico-train - INFO - ├── Loss: 6.2758 2025-08-30 00:38:52 - pico-train - INFO - ├── Learning Rate: 1.51e-05 2025-08-30 00:38:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:39:05 - pico-train - INFO - Step 28175 -- 🔄 Training Metrics 2025-08-30 00:39:05 - pico-train - INFO - ├── Loss: 6.2979 2025-08-30 00:39:05 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-30 00:39:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:39:17 - pico-train - INFO - Step 28200 -- 🔄 Training Metrics 2025-08-30 00:39:17 - pico-train - INFO - ├── Loss: 6.1294 2025-08-30 00:39:17 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-30 00:39:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:39:30 - pico-train - INFO - Step 28225 -- 🔄 Training Metrics 2025-08-30 00:39:30 - pico-train - INFO - ├── Loss: 6.1557 2025-08-30 00:39:30 - pico-train - INFO - ├── Learning Rate: 1.49e-05 2025-08-30 00:39:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:39:43 - pico-train - INFO - Step 28250 -- 🔄 Training Metrics 2025-08-30 00:39:43 - pico-train - INFO - ├── Loss: 6.2283 2025-08-30 00:39:43 - pico-train - INFO - ├── Learning Rate: 1.49e-05 2025-08-30 00:39:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:39:56 - pico-train - INFO - Step 28275 -- 🔄 Training Metrics 2025-08-30 00:39:56 - pico-train - INFO - ├── Loss: 6.2104 2025-08-30 00:39:56 - pico-train - INFO - ├── Learning Rate: 1.48e-05 2025-08-30 00:39:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:40:08 - pico-train - INFO - Step 28300 -- 🔄 Training Metrics 2025-08-30 00:40:08 - pico-train - INFO - ├── Loss: 6.2633 2025-08-30 00:40:08 - pico-train - INFO - ├── Learning Rate: 1.48e-05 2025-08-30 00:40:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:40:21 - pico-train - INFO - Step 28325 -- 🔄 Training Metrics 2025-08-30 00:40:21 - pico-train - INFO - ├── Loss: 6.1844 2025-08-30 00:40:21 - pico-train - INFO - ├── Learning Rate: 1.47e-05 2025-08-30 00:40:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:40:34 - pico-train - INFO - Step 28350 -- 🔄 Training Metrics 2025-08-30 00:40:34 - pico-train - INFO - ├── Loss: 6.1349 2025-08-30 00:40:34 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 00:40:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:40:46 - pico-train - INFO - Step 28375 -- 🔄 Training Metrics 2025-08-30 00:40:46 - pico-train - INFO - ├── Loss: 6.2638 2025-08-30 00:40:46 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 00:40:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:40:59 - pico-train - INFO - Step 28400 -- 🔄 Training Metrics 2025-08-30 00:40:59 - pico-train - INFO - ├── Loss: 6.1960 2025-08-30 00:40:59 - pico-train - INFO - ├── Learning Rate: 1.45e-05 2025-08-30 00:40:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:41:11 - pico-train - INFO - Step 28425 -- 🔄 Training Metrics 2025-08-30 00:41:11 - pico-train - INFO - ├── Loss: 6.2582 2025-08-30 00:41:11 - pico-train - INFO - ├── Learning Rate: 1.45e-05 2025-08-30 00:41:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:41:24 - pico-train - INFO - Step 28450 -- 🔄 Training Metrics 2025-08-30 00:41:24 - pico-train - INFO - ├── Loss: 6.2071 2025-08-30 00:41:24 - pico-train - INFO - ├── Learning Rate: 1.44e-05 2025-08-30 00:41:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:41:37 - pico-train - INFO - Step 28475 -- 🔄 Training Metrics 2025-08-30 00:41:37 - pico-train - INFO - ├── Loss: 6.2106 2025-08-30 00:41:37 - pico-train - INFO - ├── Learning Rate: 1.44e-05 2025-08-30 00:41:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:41:49 - pico-train - INFO - Step 28500 -- 💾 Saving Checkpoint 2025-08-30 00:43:48 - pico-train - INFO - Step 28500 -- 📊 Evaluation Results 2025-08-30 00:43:48 - pico-train - INFO - └── paloma: 1.3653691992013197e+26 2025-08-30 00:43:51 - pico-train - INFO - Step 28500 -- 🔄 Training Metrics 2025-08-30 00:43:51 - pico-train - INFO - ├── Loss: 6.2141 2025-08-30 00:43:51 - pico-train - INFO - ├── Learning Rate: 1.43e-05 2025-08-30 00:43:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:43:51 - pico-train - INFO - Step 28500 -- 📈 Saving Learning Dynamics 2025-08-30 00:44:06 - pico-train - INFO - Step 28525 -- 🔄 Training Metrics 2025-08-30 00:44:06 - pico-train - INFO - ├── Loss: 6.1702 2025-08-30 00:44:06 - pico-train - INFO - ├── Learning Rate: 1.43e-05 2025-08-30 00:44:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:44:19 - pico-train - INFO - Step 28550 -- 🔄 Training Metrics 2025-08-30 00:44:19 - pico-train - INFO - ├── Loss: 6.1650 2025-08-30 00:44:19 - pico-train - INFO - ├── Learning Rate: 1.42e-05 2025-08-30 00:44:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:44:31 - pico-train - INFO - Step 28575 -- 🔄 Training Metrics 2025-08-30 00:44:31 - pico-train - INFO - ├── Loss: 6.1357 2025-08-30 00:44:31 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 00:44:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:44:44 - pico-train - INFO - Step 28600 -- 🔄 Training Metrics 2025-08-30 00:44:44 - pico-train - INFO - ├── Loss: 6.2757 2025-08-30 00:44:44 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 00:44:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:44:57 - pico-train - INFO - Step 28625 -- 🔄 Training Metrics 2025-08-30 00:44:57 - pico-train - INFO - ├── Loss: 6.1983 2025-08-30 00:44:57 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-30 00:44:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:45:09 - pico-train - INFO - Step 28650 -- 🔄 Training Metrics 2025-08-30 00:45:09 - pico-train - INFO - ├── Loss: 6.1417 2025-08-30 00:45:09 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-30 00:45:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:45:22 - pico-train - INFO - Step 28675 -- 🔄 Training Metrics 2025-08-30 00:45:22 - pico-train - INFO - ├── Loss: 6.1524 2025-08-30 00:45:22 - pico-train - INFO - ├── Learning Rate: 1.39e-05 2025-08-30 00:45:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:45:34 - pico-train - INFO - Step 28700 -- 🔄 Training Metrics 2025-08-30 00:45:34 - pico-train - INFO - ├── Loss: 6.2928 2025-08-30 00:45:34 - pico-train - INFO - ├── Learning Rate: 1.39e-05 2025-08-30 00:45:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:45:47 - pico-train - INFO - Step 28725 -- 🔄 Training Metrics 2025-08-30 00:45:47 - pico-train - INFO - ├── Loss: 6.1187 2025-08-30 00:45:47 - pico-train - INFO - ├── Learning Rate: 1.38e-05 2025-08-30 00:45:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:46:00 - pico-train - INFO - Step 28750 -- 🔄 Training Metrics 2025-08-30 00:46:00 - pico-train - INFO - ├── Loss: 6.1926 2025-08-30 00:46:00 - pico-train - INFO - ├── Learning Rate: 1.38e-05 2025-08-30 00:46:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:46:12 - pico-train - INFO - Step 28775 -- 🔄 Training Metrics 2025-08-30 00:46:12 - pico-train - INFO - ├── Loss: 6.1810 2025-08-30 00:46:12 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 00:46:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:46:25 - pico-train - INFO - Step 28800 -- 🔄 Training Metrics 2025-08-30 00:46:25 - pico-train - INFO - ├── Loss: 6.1615 2025-08-30 00:46:25 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 00:46:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:46:37 - pico-train - INFO - Step 28825 -- 🔄 Training Metrics 2025-08-30 00:46:37 - pico-train - INFO - ├── Loss: 6.1871 2025-08-30 00:46:37 - pico-train - INFO - ├── Learning Rate: 1.36e-05 2025-08-30 00:46:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:46:50 - pico-train - INFO - Step 28850 -- 🔄 Training Metrics 2025-08-30 00:46:50 - pico-train - INFO - ├── Loss: 6.1287 2025-08-30 00:46:50 - pico-train - INFO - ├── Learning Rate: 1.35e-05 2025-08-30 00:46:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:47:02 - pico-train - INFO - Step 28875 -- 🔄 Training Metrics 2025-08-30 00:47:02 - pico-train - INFO - ├── Loss: 6.1008 2025-08-30 00:47:02 - pico-train - INFO - ├── Learning Rate: 1.35e-05 2025-08-30 00:47:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:47:15 - pico-train - INFO - Step 28900 -- 🔄 Training Metrics 2025-08-30 00:47:15 - pico-train - INFO - ├── Loss: 6.2167 2025-08-30 00:47:15 - pico-train - INFO - ├── Learning Rate: 1.34e-05 2025-08-30 00:47:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:47:28 - pico-train - INFO - Step 28925 -- 🔄 Training Metrics 2025-08-30 00:47:28 - pico-train - INFO - ├── Loss: 6.1657 2025-08-30 00:47:28 - pico-train - INFO - ├── Learning Rate: 1.34e-05 2025-08-30 00:47:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:47:40 - pico-train - INFO - Step 28950 -- 🔄 Training Metrics 2025-08-30 00:47:40 - pico-train - INFO - ├── Loss: 6.2003 2025-08-30 00:47:40 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 00:47:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:47:53 - pico-train - INFO - Step 28975 -- 🔄 Training Metrics 2025-08-30 00:47:53 - pico-train - INFO - ├── Loss: 6.2189 2025-08-30 00:47:53 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 00:47:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:48:05 - pico-train - INFO - Step 29000 -- 💾 Saving Checkpoint 2025-08-30 00:50:04 - pico-train - INFO - Step 29000 -- 📊 Evaluation Results 2025-08-30 00:50:04 - pico-train - INFO - └── paloma: 1.4417132887690374e+26 2025-08-30 00:50:06 - pico-train - INFO - Step 29000 -- 🔄 Training Metrics 2025-08-30 00:50:06 - pico-train - INFO - ├── Loss: 6.1592 2025-08-30 00:50:06 - pico-train - INFO - ├── Learning Rate: 1.32e-05 2025-08-30 00:50:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:50:06 - pico-train - INFO - Step 29000 -- 📈 Saving Learning Dynamics 2025-08-30 00:50:22 - pico-train - INFO - Step 29025 -- 🔄 Training Metrics 2025-08-30 00:50:22 - pico-train - INFO - ├── Loss: 6.2133 2025-08-30 00:50:22 - pico-train - INFO - ├── Learning Rate: 1.32e-05 2025-08-30 00:50:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:50:35 - pico-train - INFO - Step 29050 -- 🔄 Training Metrics 2025-08-30 00:50:35 - pico-train - INFO - ├── Loss: 6.1536 2025-08-30 00:50:35 - pico-train - INFO - ├── Learning Rate: 1.31e-05 2025-08-30 00:50:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:50:47 - pico-train - INFO - Step 29075 -- 🔄 Training Metrics 2025-08-30 00:50:47 - pico-train - INFO - ├── Loss: 6.1872 2025-08-30 00:50:47 - pico-train - INFO - ├── Learning Rate: 1.31e-05 2025-08-30 00:50:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:51:00 - pico-train - INFO - Step 29100 -- 🔄 Training Metrics 2025-08-30 00:51:00 - pico-train - INFO - ├── Loss: 6.1469 2025-08-30 00:51:00 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-30 00:51:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:51:13 - pico-train - INFO - Step 29125 -- 🔄 Training Metrics 2025-08-30 00:51:13 - pico-train - INFO - ├── Loss: 6.2113 2025-08-30 00:51:13 - pico-train - INFO - ├── Learning Rate: 1.29e-05 2025-08-30 00:51:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:51:26 - pico-train - INFO - Step 29150 -- 🔄 Training Metrics 2025-08-30 00:51:26 - pico-train - INFO - ├── Loss: 6.1172 2025-08-30 00:51:26 - pico-train - INFO - ├── Learning Rate: 1.29e-05 2025-08-30 00:51:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:51:38 - pico-train - INFO - Step 29175 -- 🔄 Training Metrics 2025-08-30 00:51:38 - pico-train - INFO - ├── Loss: 6.1350 2025-08-30 00:51:38 - pico-train - INFO - ├── Learning Rate: 1.28e-05 2025-08-30 00:51:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:51:51 - pico-train - INFO - Step 29200 -- 🔄 Training Metrics 2025-08-30 00:51:51 - pico-train - INFO - ├── Loss: 6.2083 2025-08-30 00:51:51 - pico-train - INFO - ├── Learning Rate: 1.28e-05 2025-08-30 00:51:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:52:03 - pico-train - INFO - Step 29225 -- 🔄 Training Metrics 2025-08-30 00:52:03 - pico-train - INFO - ├── Loss: 6.3192 2025-08-30 00:52:03 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 00:52:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:52:16 - pico-train - INFO - Step 29250 -- 🔄 Training Metrics 2025-08-30 00:52:16 - pico-train - INFO - ├── Loss: 6.1807 2025-08-30 00:52:16 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 00:52:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:52:29 - pico-train - INFO - Step 29275 -- 🔄 Training Metrics 2025-08-30 00:52:29 - pico-train - INFO - ├── Loss: 6.1737 2025-08-30 00:52:29 - pico-train - INFO - ├── Learning Rate: 1.26e-05 2025-08-30 00:52:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:52:41 - pico-train - INFO - Step 29300 -- 🔄 Training Metrics 2025-08-30 00:52:41 - pico-train - INFO - ├── Loss: 6.0887 2025-08-30 00:52:41 - pico-train - INFO - ├── Learning Rate: 1.26e-05 2025-08-30 00:52:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:52:54 - pico-train - INFO - Step 29325 -- 🔄 Training Metrics 2025-08-30 00:52:54 - pico-train - INFO - ├── Loss: 6.2875 2025-08-30 00:52:54 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 00:52:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:53:06 - pico-train - INFO - Step 29350 -- 🔄 Training Metrics 2025-08-30 00:53:06 - pico-train - INFO - ├── Loss: 6.2426 2025-08-30 00:53:06 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 00:53:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:53:19 - pico-train - INFO - Step 29375 -- 🔄 Training Metrics 2025-08-30 00:53:19 - pico-train - INFO - ├── Loss: 6.1058 2025-08-30 00:53:19 - pico-train - INFO - ├── Learning Rate: 1.24e-05 2025-08-30 00:53:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:53:32 - pico-train - INFO - Step 29400 -- 🔄 Training Metrics 2025-08-30 00:53:32 - pico-train - INFO - ├── Loss: 6.1215 2025-08-30 00:53:32 - pico-train - INFO - ├── Learning Rate: 1.24e-05 2025-08-30 00:53:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:53:44 - pico-train - INFO - Step 29425 -- 🔄 Training Metrics 2025-08-30 00:53:44 - pico-train - INFO - ├── Loss: 6.2543 2025-08-30 00:53:44 - pico-train - INFO - ├── Learning Rate: 1.23e-05 2025-08-30 00:53:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:53:57 - pico-train - INFO - Step 29450 -- 🔄 Training Metrics 2025-08-30 00:53:57 - pico-train - INFO - ├── Loss: 6.1715 2025-08-30 00:53:57 - pico-train - INFO - ├── Learning Rate: 1.23e-05 2025-08-30 00:53:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:54:09 - pico-train - INFO - Step 29475 -- 🔄 Training Metrics 2025-08-30 00:54:09 - pico-train - INFO - ├── Loss: 6.1795 2025-08-30 00:54:09 - pico-train - INFO - ├── Learning Rate: 1.22e-05 2025-08-30 00:54:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:54:21 - pico-train - INFO - Step 29500 -- 💾 Saving Checkpoint 2025-08-30 00:56:18 - pico-train - INFO - Step 29500 -- 📊 Evaluation Results 2025-08-30 00:56:18 - pico-train - INFO - └── paloma: 1.7095266725777237e+26 2025-08-30 00:56:21 - pico-train - INFO - Step 29500 -- 🔄 Training Metrics 2025-08-30 00:56:21 - pico-train - INFO - ├── Loss: 6.1663 2025-08-30 00:56:21 - pico-train - INFO - ├── Learning Rate: 1.21e-05 2025-08-30 00:56:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:56:21 - pico-train - INFO - Step 29500 -- 📈 Saving Learning Dynamics 2025-08-30 00:56:36 - pico-train - INFO - Step 29525 -- 🔄 Training Metrics 2025-08-30 00:56:36 - pico-train - INFO - ├── Loss: 6.1521 2025-08-30 00:56:36 - pico-train - INFO - ├── Learning Rate: 1.21e-05 2025-08-30 00:56:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:56:48 - pico-train - INFO - Step 29550 -- 🔄 Training Metrics 2025-08-30 00:56:48 - pico-train - INFO - ├── Loss: 6.0880 2025-08-30 00:56:48 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 00:56:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:57:01 - pico-train - INFO - Step 29575 -- 🔄 Training Metrics 2025-08-30 00:57:01 - pico-train - INFO - ├── Loss: 6.1806 2025-08-30 00:57:01 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 00:57:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:57:13 - pico-train - INFO - Step 29600 -- 🔄 Training Metrics 2025-08-30 00:57:13 - pico-train - INFO - ├── Loss: 6.3067 2025-08-30 00:57:13 - pico-train - INFO - ├── Learning Rate: 1.19e-05 2025-08-30 00:57:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:57:27 - pico-train - INFO - Step 29625 -- 🔄 Training Metrics 2025-08-30 00:57:27 - pico-train - INFO - ├── Loss: 6.2586 2025-08-30 00:57:27 - pico-train - INFO - ├── Learning Rate: 1.19e-05 2025-08-30 00:57:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:57:40 - pico-train - INFO - Step 29650 -- 🔄 Training Metrics 2025-08-30 00:57:40 - pico-train - INFO - ├── Loss: 6.1478 2025-08-30 00:57:40 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 00:57:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:57:52 - pico-train - INFO - Step 29675 -- 🔄 Training Metrics 2025-08-30 00:57:52 - pico-train - INFO - ├── Loss: 6.1101 2025-08-30 00:57:52 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 00:57:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:58:05 - pico-train - INFO - Step 29700 -- 🔄 Training Metrics 2025-08-30 00:58:05 - pico-train - INFO - ├── Loss: 6.1873 2025-08-30 00:58:05 - pico-train - INFO - ├── Learning Rate: 1.17e-05 2025-08-30 00:58:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:58:17 - pico-train - INFO - Step 29725 -- 🔄 Training Metrics 2025-08-30 00:58:17 - pico-train - INFO - ├── Loss: 6.0894 2025-08-30 00:58:17 - pico-train - INFO - ├── Learning Rate: 1.17e-05 2025-08-30 00:58:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:58:30 - pico-train - INFO - Step 29750 -- 🔄 Training Metrics 2025-08-30 00:58:30 - pico-train - INFO - ├── Loss: 6.1793 2025-08-30 00:58:30 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 00:58:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:58:42 - pico-train - INFO - Step 29775 -- 🔄 Training Metrics 2025-08-30 00:58:42 - pico-train - INFO - ├── Loss: 6.1858 2025-08-30 00:58:42 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 00:58:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:58:55 - pico-train - INFO - Step 29800 -- 🔄 Training Metrics 2025-08-30 00:58:55 - pico-train - INFO - ├── Loss: 6.1729 2025-08-30 00:58:55 - pico-train - INFO - ├── Learning Rate: 1.15e-05 2025-08-30 00:58:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:59:07 - pico-train - INFO - Step 29825 -- 🔄 Training Metrics 2025-08-30 00:59:07 - pico-train - INFO - ├── Loss: 6.1856 2025-08-30 00:59:07 - pico-train - INFO - ├── Learning Rate: 1.15e-05 2025-08-30 00:59:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:59:20 - pico-train - INFO - Step 29850 -- 🔄 Training Metrics 2025-08-30 00:59:20 - pico-train - INFO - ├── Loss: 6.1591 2025-08-30 00:59:20 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 00:59:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:59:33 - pico-train - INFO - Step 29875 -- 🔄 Training Metrics 2025-08-30 00:59:33 - pico-train - INFO - ├── Loss: 6.2964 2025-08-30 00:59:33 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 00:59:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:59:45 - pico-train - INFO - Step 29900 -- 🔄 Training Metrics 2025-08-30 00:59:45 - pico-train - INFO - ├── Loss: 6.2506 2025-08-30 00:59:45 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 00:59:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 00:59:58 - pico-train - INFO - Step 29925 -- 🔄 Training Metrics 2025-08-30 00:59:58 - pico-train - INFO - ├── Loss: 6.1630 2025-08-30 00:59:58 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 00:59:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:00:11 - pico-train - INFO - Step 29950 -- 🔄 Training Metrics 2025-08-30 01:00:11 - pico-train - INFO - ├── Loss: 6.2033 2025-08-30 01:00:11 - pico-train - INFO - ├── Learning Rate: 1.12e-05 2025-08-30 01:00:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:00:23 - pico-train - INFO - Step 29975 -- 🔄 Training Metrics 2025-08-30 01:00:23 - pico-train - INFO - ├── Loss: 6.0846 2025-08-30 01:00:23 - pico-train - INFO - ├── Learning Rate: 1.12e-05 2025-08-30 01:00:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:00:35 - pico-train - INFO - Step 30000 -- 💾 Saving Checkpoint 2025-08-30 01:02:29 - pico-train - INFO - Step 30000 -- 📊 Evaluation Results 2025-08-30 01:02:29 - pico-train - INFO - └── paloma: 2.0463060977945524e+26 2025-08-30 01:02:31 - pico-train - INFO - Step 30000 -- 🔄 Training Metrics 2025-08-30 01:02:31 - pico-train - INFO - ├── Loss: 6.1682 2025-08-30 01:02:31 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 01:02:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:02:31 - pico-train - INFO - Step 30000 -- 📈 Saving Learning Dynamics 2025-08-30 01:02:46 - pico-train - INFO - Step 30025 -- 🔄 Training Metrics 2025-08-30 01:02:46 - pico-train - INFO - ├── Loss: 6.2143 2025-08-30 01:02:46 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 01:02:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:02:58 - pico-train - INFO - Step 30050 -- 🔄 Training Metrics 2025-08-30 01:02:58 - pico-train - INFO - ├── Loss: 6.1476 2025-08-30 01:02:58 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 01:02:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:03:11 - pico-train - INFO - Step 30075 -- 🔄 Training Metrics 2025-08-30 01:03:11 - pico-train - INFO - ├── Loss: 6.1530 2025-08-30 01:03:11 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 01:03:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:03:23 - pico-train - INFO - Step 30100 -- 🔄 Training Metrics 2025-08-30 01:03:23 - pico-train - INFO - ├── Loss: 6.1518 2025-08-30 01:03:23 - pico-train - INFO - ├── Learning Rate: 1.09e-05 2025-08-30 01:03:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:03:37 - pico-train - INFO - Step 30125 -- 🔄 Training Metrics 2025-08-30 01:03:37 - pico-train - INFO - ├── Loss: 6.1752 2025-08-30 01:03:37 - pico-train - INFO - ├── Learning Rate: 1.09e-05 2025-08-30 01:03:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:03:49 - pico-train - INFO - Step 30150 -- 🔄 Training Metrics 2025-08-30 01:03:49 - pico-train - INFO - ├── Loss: 6.2413 2025-08-30 01:03:49 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 01:03:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:04:02 - pico-train - INFO - Step 30175 -- 🔄 Training Metrics 2025-08-30 01:04:02 - pico-train - INFO - ├── Loss: 6.2624 2025-08-30 01:04:02 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 01:04:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:04:14 - pico-train - INFO - Step 30200 -- 🔄 Training Metrics 2025-08-30 01:04:14 - pico-train - INFO - ├── Loss: 6.2339 2025-08-30 01:04:14 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 01:04:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:04:27 - pico-train - INFO - Step 30225 -- 🔄 Training Metrics 2025-08-30 01:04:27 - pico-train - INFO - ├── Loss: 6.1617 2025-08-30 01:04:27 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 01:04:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:04:40 - pico-train - INFO - Step 30250 -- 🔄 Training Metrics 2025-08-30 01:04:40 - pico-train - INFO - ├── Loss: 6.1225 2025-08-30 01:04:40 - pico-train - INFO - ├── Learning Rate: 1.06e-05 2025-08-30 01:04:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:04:52 - pico-train - INFO - Step 30275 -- 🔄 Training Metrics 2025-08-30 01:04:52 - pico-train - INFO - ├── Loss: 6.2344 2025-08-30 01:04:52 - pico-train - INFO - ├── Learning Rate: 1.06e-05 2025-08-30 01:04:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:05:05 - pico-train - INFO - Step 30300 -- 🔄 Training Metrics 2025-08-30 01:05:05 - pico-train - INFO - ├── Loss: 6.1970 2025-08-30 01:05:05 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 01:05:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:05:18 - pico-train - INFO - Step 30325 -- 🔄 Training Metrics 2025-08-30 01:05:18 - pico-train - INFO - ├── Loss: 6.1580 2025-08-30 01:05:18 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 01:05:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:05:30 - pico-train - INFO - Step 30350 -- 🔄 Training Metrics 2025-08-30 01:05:30 - pico-train - INFO - ├── Loss: 6.2210 2025-08-30 01:05:30 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 01:05:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:05:43 - pico-train - INFO - Step 30375 -- 🔄 Training Metrics 2025-08-30 01:05:43 - pico-train - INFO - ├── Loss: 6.1991 2025-08-30 01:05:43 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 01:05:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:05:56 - pico-train - INFO - Step 30400 -- 🔄 Training Metrics 2025-08-30 01:05:56 - pico-train - INFO - ├── Loss: 6.2500 2025-08-30 01:05:56 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 01:05:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:06:08 - pico-train - INFO - Step 30425 -- 🔄 Training Metrics 2025-08-30 01:06:08 - pico-train - INFO - ├── Loss: 6.2252 2025-08-30 01:06:08 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 01:06:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:06:21 - pico-train - INFO - Step 30450 -- 🔄 Training Metrics 2025-08-30 01:06:21 - pico-train - INFO - ├── Loss: 6.2010 2025-08-30 01:06:21 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 01:06:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:06:33 - pico-train - INFO - Step 30475 -- 🔄 Training Metrics 2025-08-30 01:06:33 - pico-train - INFO - ├── Loss: 6.1309 2025-08-30 01:06:33 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 01:06:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:06:46 - pico-train - INFO - Step 30500 -- 💾 Saving Checkpoint 2025-08-30 01:08:46 - pico-train - INFO - Step 30500 -- 📊 Evaluation Results 2025-08-30 01:08:46 - pico-train - INFO - └── paloma: 2.2542988490213366e+26 2025-08-30 01:08:49 - pico-train - INFO - Step 30500 -- 🔄 Training Metrics 2025-08-30 01:08:49 - pico-train - INFO - ├── Loss: 6.1853 2025-08-30 01:08:49 - pico-train - INFO - ├── Learning Rate: 1.01e-05 2025-08-30 01:08:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:08:49 - pico-train - INFO - Step 30500 -- 📈 Saving Learning Dynamics 2025-08-30 01:09:04 - pico-train - INFO - Step 30525 -- 🔄 Training Metrics 2025-08-30 01:09:04 - pico-train - INFO - ├── Loss: 6.1358 2025-08-30 01:09:04 - pico-train - INFO - ├── Learning Rate: 1.01e-05 2025-08-30 01:09:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:09:17 - pico-train - INFO - Step 30550 -- 🔄 Training Metrics 2025-08-30 01:09:17 - pico-train - INFO - ├── Loss: 6.1170 2025-08-30 01:09:17 - pico-train - INFO - ├── Learning Rate: 1.00e-05 2025-08-30 01:09:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:09:29 - pico-train - INFO - Step 30575 -- 🔄 Training Metrics 2025-08-30 01:09:29 - pico-train - INFO - ├── Loss: 6.1497 2025-08-30 01:09:29 - pico-train - INFO - ├── Learning Rate: 9.96e-06 2025-08-30 01:09:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:09:42 - pico-train - INFO - Step 30600 -- 🔄 Training Metrics 2025-08-30 01:09:42 - pico-train - INFO - ├── Loss: 6.2103 2025-08-30 01:09:42 - pico-train - INFO - ├── Learning Rate: 9.91e-06 2025-08-30 01:09:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:09:54 - pico-train - INFO - Step 30625 -- 🔄 Training Metrics 2025-08-30 01:09:54 - pico-train - INFO - ├── Loss: 6.1137 2025-08-30 01:09:54 - pico-train - INFO - ├── Learning Rate: 9.86e-06 2025-08-30 01:09:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:10:07 - pico-train - INFO - Step 30650 -- 🔄 Training Metrics 2025-08-30 01:10:07 - pico-train - INFO - ├── Loss: 6.1631 2025-08-30 01:10:07 - pico-train - INFO - ├── Learning Rate: 9.81e-06 2025-08-30 01:10:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:10:19 - pico-train - INFO - Step 30675 -- 🔄 Training Metrics 2025-08-30 01:10:19 - pico-train - INFO - ├── Loss: 6.1651 2025-08-30 01:10:19 - pico-train - INFO - ├── Learning Rate: 9.76e-06 2025-08-30 01:10:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:10:32 - pico-train - INFO - Step 30700 -- 🔄 Training Metrics 2025-08-30 01:10:32 - pico-train - INFO - ├── Loss: 6.1969 2025-08-30 01:10:32 - pico-train - INFO - ├── Learning Rate: 9.72e-06 2025-08-30 01:10:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:10:45 - pico-train - INFO - Step 30725 -- 🔄 Training Metrics 2025-08-30 01:10:45 - pico-train - INFO - ├── Loss: 6.1007 2025-08-30 01:10:45 - pico-train - INFO - ├── Learning Rate: 9.67e-06 2025-08-30 01:10:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:10:58 - pico-train - INFO - Step 30750 -- 🔄 Training Metrics 2025-08-30 01:10:58 - pico-train - INFO - ├── Loss: 6.1865 2025-08-30 01:10:58 - pico-train - INFO - ├── Learning Rate: 9.62e-06 2025-08-30 01:10:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:11:10 - pico-train - INFO - Step 30775 -- 🔄 Training Metrics 2025-08-30 01:11:10 - pico-train - INFO - ├── Loss: 6.1659 2025-08-30 01:11:10 - pico-train - INFO - ├── Learning Rate: 9.57e-06 2025-08-30 01:11:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:11:23 - pico-train - INFO - Step 30800 -- 🔄 Training Metrics 2025-08-30 01:11:23 - pico-train - INFO - ├── Loss: 6.2281 2025-08-30 01:11:23 - pico-train - INFO - ├── Learning Rate: 9.52e-06 2025-08-30 01:11:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:11:36 - pico-train - INFO - Step 30825 -- 🔄 Training Metrics 2025-08-30 01:11:36 - pico-train - INFO - ├── Loss: 6.1316 2025-08-30 01:11:36 - pico-train - INFO - ├── Learning Rate: 9.47e-06 2025-08-30 01:11:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:11:48 - pico-train - INFO - Step 30850 -- 🔄 Training Metrics 2025-08-30 01:11:48 - pico-train - INFO - ├── Loss: 6.2135 2025-08-30 01:11:48 - pico-train - INFO - ├── Learning Rate: 9.43e-06 2025-08-30 01:11:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:12:01 - pico-train - INFO - Step 30875 -- 🔄 Training Metrics 2025-08-30 01:12:01 - pico-train - INFO - ├── Loss: 6.2395 2025-08-30 01:12:01 - pico-train - INFO - ├── Learning Rate: 9.38e-06 2025-08-30 01:12:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:12:13 - pico-train - INFO - Step 30900 -- 🔄 Training Metrics 2025-08-30 01:12:13 - pico-train - INFO - ├── Loss: 6.2277 2025-08-30 01:12:13 - pico-train - INFO - ├── Learning Rate: 9.33e-06 2025-08-30 01:12:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:12:26 - pico-train - INFO - Step 30925 -- 🔄 Training Metrics 2025-08-30 01:12:26 - pico-train - INFO - ├── Loss: 6.1863 2025-08-30 01:12:26 - pico-train - INFO - ├── Learning Rate: 9.28e-06 2025-08-30 01:12:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:12:39 - pico-train - INFO - Step 30950 -- 🔄 Training Metrics 2025-08-30 01:12:39 - pico-train - INFO - ├── Loss: 6.2133 2025-08-30 01:12:39 - pico-train - INFO - ├── Learning Rate: 9.24e-06 2025-08-30 01:12:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:12:51 - pico-train - INFO - Step 30975 -- 🔄 Training Metrics 2025-08-30 01:12:51 - pico-train - INFO - ├── Loss: 6.2132 2025-08-30 01:12:51 - pico-train - INFO - ├── Learning Rate: 9.19e-06 2025-08-30 01:12:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:13:03 - pico-train - INFO - Step 31000 -- 💾 Saving Checkpoint 2025-08-30 01:14:57 - pico-train - INFO - Step 31000 -- 📊 Evaluation Results 2025-08-30 01:14:57 - pico-train - INFO - └── paloma: 2.4568970443260916e+26 2025-08-30 01:14:59 - pico-train - INFO - Step 31000 -- 🔄 Training Metrics 2025-08-30 01:14:59 - pico-train - INFO - ├── Loss: 6.1313 2025-08-30 01:14:59 - pico-train - INFO - ├── Learning Rate: 9.14e-06 2025-08-30 01:14:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:14:59 - pico-train - INFO - Step 31000 -- 📈 Saving Learning Dynamics 2025-08-30 01:15:15 - pico-train - INFO - Step 31025 -- 🔄 Training Metrics 2025-08-30 01:15:15 - pico-train - INFO - ├── Loss: 6.2095 2025-08-30 01:15:15 - pico-train - INFO - ├── Learning Rate: 9.09e-06 2025-08-30 01:15:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:15:27 - pico-train - INFO - Step 31050 -- 🔄 Training Metrics 2025-08-30 01:15:27 - pico-train - INFO - ├── Loss: 6.1753 2025-08-30 01:15:27 - pico-train - INFO - ├── Learning Rate: 9.05e-06 2025-08-30 01:15:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:15:40 - pico-train - INFO - Step 31075 -- 🔄 Training Metrics 2025-08-30 01:15:40 - pico-train - INFO - ├── Loss: 6.1722 2025-08-30 01:15:40 - pico-train - INFO - ├── Learning Rate: 9.00e-06 2025-08-30 01:15:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:15:53 - pico-train - INFO - Step 31100 -- 🔄 Training Metrics 2025-08-30 01:15:53 - pico-train - INFO - ├── Loss: 6.1917 2025-08-30 01:15:53 - pico-train - INFO - ├── Learning Rate: 8.95e-06 2025-08-30 01:15:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:16:05 - pico-train - INFO - Step 31125 -- 🔄 Training Metrics 2025-08-30 01:16:05 - pico-train - INFO - ├── Loss: 6.1442 2025-08-30 01:16:05 - pico-train - INFO - ├── Learning Rate: 8.90e-06 2025-08-30 01:16:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:16:18 - pico-train - INFO - Step 31150 -- 🔄 Training Metrics 2025-08-30 01:16:18 - pico-train - INFO - ├── Loss: 6.2128 2025-08-30 01:16:18 - pico-train - INFO - ├── Learning Rate: 8.86e-06 2025-08-30 01:16:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:16:30 - pico-train - INFO - Step 31175 -- 🔄 Training Metrics 2025-08-30 01:16:30 - pico-train - INFO - ├── Loss: 6.1192 2025-08-30 01:16:30 - pico-train - INFO - ├── Learning Rate: 8.81e-06 2025-08-30 01:16:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:16:43 - pico-train - INFO - Step 31200 -- 🔄 Training Metrics 2025-08-30 01:16:43 - pico-train - INFO - ├── Loss: 6.1648 2025-08-30 01:16:43 - pico-train - INFO - ├── Learning Rate: 8.76e-06 2025-08-30 01:16:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:16:55 - pico-train - INFO - Step 31225 -- 🔄 Training Metrics 2025-08-30 01:16:55 - pico-train - INFO - ├── Loss: 6.2030 2025-08-30 01:16:55 - pico-train - INFO - ├── Learning Rate: 8.72e-06 2025-08-30 01:16:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:17:08 - pico-train - INFO - Step 31250 -- 🔄 Training Metrics 2025-08-30 01:17:08 - pico-train - INFO - ├── Loss: 6.1564 2025-08-30 01:17:08 - pico-train - INFO - ├── Learning Rate: 8.67e-06 2025-08-30 01:17:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:17:21 - pico-train - INFO - Step 31275 -- 🔄 Training Metrics 2025-08-30 01:17:21 - pico-train - INFO - ├── Loss: 6.2193 2025-08-30 01:17:21 - pico-train - INFO - ├── Learning Rate: 8.62e-06 2025-08-30 01:17:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:17:33 - pico-train - INFO - Step 31300 -- 🔄 Training Metrics 2025-08-30 01:17:33 - pico-train - INFO - ├── Loss: 6.1630 2025-08-30 01:17:33 - pico-train - INFO - ├── Learning Rate: 8.58e-06 2025-08-30 01:17:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:17:46 - pico-train - INFO - Step 31325 -- 🔄 Training Metrics 2025-08-30 01:17:46 - pico-train - INFO - ├── Loss: 6.1765 2025-08-30 01:17:46 - pico-train - INFO - ├── Learning Rate: 8.53e-06 2025-08-30 01:17:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:17:58 - pico-train - INFO - Step 31350 -- 🔄 Training Metrics 2025-08-30 01:17:58 - pico-train - INFO - ├── Loss: 6.2315 2025-08-30 01:17:58 - pico-train - INFO - ├── Learning Rate: 8.49e-06 2025-08-30 01:17:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:18:11 - pico-train - INFO - Step 31375 -- 🔄 Training Metrics 2025-08-30 01:18:11 - pico-train - INFO - ├── Loss: 6.1719 2025-08-30 01:18:11 - pico-train - INFO - ├── Learning Rate: 8.44e-06 2025-08-30 01:18:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:18:24 - pico-train - INFO - Step 31400 -- 🔄 Training Metrics 2025-08-30 01:18:24 - pico-train - INFO - ├── Loss: 6.2234 2025-08-30 01:18:24 - pico-train - INFO - ├── Learning Rate: 8.39e-06 2025-08-30 01:18:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:18:36 - pico-train - INFO - Step 31425 -- 🔄 Training Metrics 2025-08-30 01:18:36 - pico-train - INFO - ├── Loss: 6.1782 2025-08-30 01:18:36 - pico-train - INFO - ├── Learning Rate: 8.35e-06 2025-08-30 01:18:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:18:49 - pico-train - INFO - Step 31450 -- 🔄 Training Metrics 2025-08-30 01:18:49 - pico-train - INFO - ├── Loss: 6.1711 2025-08-30 01:18:49 - pico-train - INFO - ├── Learning Rate: 8.30e-06 2025-08-30 01:18:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:19:01 - pico-train - INFO - Step 31475 -- 🔄 Training Metrics 2025-08-30 01:19:01 - pico-train - INFO - ├── Loss: 6.1834 2025-08-30 01:19:01 - pico-train - INFO - ├── Learning Rate: 8.26e-06 2025-08-30 01:19:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:19:14 - pico-train - INFO - Step 31500 -- 💾 Saving Checkpoint 2025-08-30 01:21:14 - pico-train - INFO - Step 31500 -- 📊 Evaluation Results 2025-08-30 01:21:14 - pico-train - INFO - └── paloma: 2.8663430235000883e+26 2025-08-30 01:21:17 - pico-train - INFO - Step 31500 -- 🔄 Training Metrics 2025-08-30 01:21:17 - pico-train - INFO - ├── Loss: 6.1338 2025-08-30 01:21:17 - pico-train - INFO - ├── Learning Rate: 8.21e-06 2025-08-30 01:21:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:21:17 - pico-train - INFO - Step 31500 -- 📈 Saving Learning Dynamics 2025-08-30 01:21:33 - pico-train - INFO - Step 31525 -- 🔄 Training Metrics 2025-08-30 01:21:33 - pico-train - INFO - ├── Loss: 6.1819 2025-08-30 01:21:33 - pico-train - INFO - ├── Learning Rate: 8.17e-06 2025-08-30 01:21:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:21:46 - pico-train - INFO - Step 31550 -- 🔄 Training Metrics 2025-08-30 01:21:46 - pico-train - INFO - ├── Loss: 6.1695 2025-08-30 01:21:46 - pico-train - INFO - ├── Learning Rate: 8.12e-06 2025-08-30 01:21:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:21:58 - pico-train - INFO - Step 31575 -- 🔄 Training Metrics 2025-08-30 01:21:58 - pico-train - INFO - ├── Loss: 6.2089 2025-08-30 01:21:58 - pico-train - INFO - ├── Learning Rate: 8.08e-06 2025-08-30 01:21:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:22:11 - pico-train - INFO - Step 31600 -- 🔄 Training Metrics 2025-08-30 01:22:11 - pico-train - INFO - ├── Loss: 6.1555 2025-08-30 01:22:11 - pico-train - INFO - ├── Learning Rate: 8.03e-06 2025-08-30 01:22:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:22:24 - pico-train - INFO - Step 31625 -- 🔄 Training Metrics 2025-08-30 01:22:24 - pico-train - INFO - ├── Loss: 6.1820 2025-08-30 01:22:24 - pico-train - INFO - ├── Learning Rate: 7.98e-06 2025-08-30 01:22:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:22:36 - pico-train - INFO - Step 31650 -- 🔄 Training Metrics 2025-08-30 01:22:36 - pico-train - INFO - ├── Loss: 6.1091 2025-08-30 01:22:36 - pico-train - INFO - ├── Learning Rate: 7.94e-06 2025-08-30 01:22:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:22:49 - pico-train - INFO - Step 31675 -- 🔄 Training Metrics 2025-08-30 01:22:49 - pico-train - INFO - ├── Loss: 6.2098 2025-08-30 01:22:49 - pico-train - INFO - ├── Learning Rate: 7.90e-06 2025-08-30 01:22:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:23:01 - pico-train - INFO - Step 31700 -- 🔄 Training Metrics 2025-08-30 01:23:01 - pico-train - INFO - ├── Loss: 6.0611 2025-08-30 01:23:01 - pico-train - INFO - ├── Learning Rate: 7.85e-06 2025-08-30 01:23:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:23:14 - pico-train - INFO - Step 31725 -- 🔄 Training Metrics 2025-08-30 01:23:14 - pico-train - INFO - ├── Loss: 6.1088 2025-08-30 01:23:14 - pico-train - INFO - ├── Learning Rate: 7.81e-06 2025-08-30 01:23:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:23:27 - pico-train - INFO - Step 31750 -- 🔄 Training Metrics 2025-08-30 01:23:27 - pico-train - INFO - ├── Loss: 6.2220 2025-08-30 01:23:27 - pico-train - INFO - ├── Learning Rate: 7.76e-06 2025-08-30 01:23:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:23:39 - pico-train - INFO - Step 31775 -- 🔄 Training Metrics 2025-08-30 01:23:39 - pico-train - INFO - ├── Loss: 6.2271 2025-08-30 01:23:39 - pico-train - INFO - ├── Learning Rate: 7.72e-06 2025-08-30 01:23:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:23:52 - pico-train - INFO - Step 31800 -- 🔄 Training Metrics 2025-08-30 01:23:52 - pico-train - INFO - ├── Loss: 6.1465 2025-08-30 01:23:52 - pico-train - INFO - ├── Learning Rate: 7.67e-06 2025-08-30 01:23:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:24:04 - pico-train - INFO - Step 31825 -- 🔄 Training Metrics 2025-08-30 01:24:04 - pico-train - INFO - ├── Loss: 6.1742 2025-08-30 01:24:04 - pico-train - INFO - ├── Learning Rate: 7.63e-06 2025-08-30 01:24:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:24:17 - pico-train - INFO - Step 31850 -- 🔄 Training Metrics 2025-08-30 01:24:17 - pico-train - INFO - ├── Loss: 6.2199 2025-08-30 01:24:17 - pico-train - INFO - ├── Learning Rate: 7.58e-06 2025-08-30 01:24:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:24:30 - pico-train - INFO - Step 31875 -- 🔄 Training Metrics 2025-08-30 01:24:30 - pico-train - INFO - ├── Loss: 6.1934 2025-08-30 01:24:30 - pico-train - INFO - ├── Learning Rate: 7.54e-06 2025-08-30 01:24:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:24:42 - pico-train - INFO - Step 31900 -- 🔄 Training Metrics 2025-08-30 01:24:42 - pico-train - INFO - ├── Loss: 6.1503 2025-08-30 01:24:42 - pico-train - INFO - ├── Learning Rate: 7.50e-06 2025-08-30 01:24:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:24:55 - pico-train - INFO - Step 31925 -- 🔄 Training Metrics 2025-08-30 01:24:55 - pico-train - INFO - ├── Loss: 6.0399 2025-08-30 01:24:55 - pico-train - INFO - ├── Learning Rate: 7.45e-06 2025-08-30 01:24:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:25:07 - pico-train - INFO - Step 31950 -- 🔄 Training Metrics 2025-08-30 01:25:07 - pico-train - INFO - ├── Loss: 6.2147 2025-08-30 01:25:07 - pico-train - INFO - ├── Learning Rate: 7.41e-06 2025-08-30 01:25:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:25:20 - pico-train - INFO - Step 31975 -- 🔄 Training Metrics 2025-08-30 01:25:20 - pico-train - INFO - ├── Loss: 6.1952 2025-08-30 01:25:20 - pico-train - INFO - ├── Learning Rate: 7.37e-06 2025-08-30 01:25:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 01:25:32 - pico-train - INFO - Step 32000 -- 💾 Saving Checkpoint