2025-08-30 04:44:44 - pico-train - INFO - Step 40000 -- 📊 Evaluation Results 2025-08-30 04:44:44 - pico-train - INFO - └── paloma: 7.314096757540847e+26 2025-08-30 04:44:44 - pico-train - INFO - ================================================== 2025-08-30 04:44:44 - pico-train - INFO - ✨ Training Configuration 2025-08-30 04:44:44 - pico-train - INFO - ================================================== 2025-08-30 04:44:44 - pico-train - INFO - ╭─────────────────────────────────────────────────────╮ 2025-08-30 04:44:44 - pico-train - INFO - │ checkpointing: │ 2025-08-30 04:44:44 - pico-train - INFO - │ checkpoints_dir: checkpoints │ 2025-08-30 04:44:44 - pico-train - INFO - │ evaluation: │ 2025-08-30 04:44:44 - pico-train - INFO - │ eval_results_dir: eval_results │ 2025-08-30 04:44:44 - pico-train - INFO - │ fabric_checkpoint_dir: fabric_state │ 2025-08-30 04:44:44 - pico-train - INFO - │ fabric_checkpoint_filename: checkpoint.pt │ 2025-08-30 04:44:44 - pico-train - INFO - │ hf_checkpoint: │ 2025-08-30 04:44:44 - pico-train - INFO - │ collection_slug: null │ 2025-08-30 04:44:44 - pico-train - INFO - │ repo_id: ThomasTheMaker/pico-decoder-tiny │ 2025-08-30 04:44:44 - pico-train - INFO - │ learning_dynamics: │ 2025-08-30 04:44:44 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 04:44:44 - pico-train - INFO - │ eval_data: null │ 2025-08-30 04:44:44 - pico-train - INFO - │ layer_suffixes: │ 2025-08-30 04:44:44 - pico-train - INFO - │ - attention.v_proj │ 2025-08-30 04:44:44 - pico-train - INFO - │ - attention.o_proj │ 2025-08-30 04:44:44 - pico-train - INFO - │ - swiglu.w_2 │ 2025-08-30 04:44:44 - pico-train - INFO - │ sequence_idx: -1 │ 2025-08-30 04:44:44 - pico-train - INFO - │ learning_dynamics_dir: learning_dynamics │ 2025-08-30 04:44:44 - pico-train - INFO - │ logs_dir: logs │ 2025-08-30 04:44:44 - pico-train - INFO - │ run_name: pico-decoder-tiny-dolma5M-v1 │ 2025-08-30 04:44:44 - pico-train - INFO - │ runs_dir: runs │ 2025-08-30 04:44:44 - pico-train - INFO - │ save_every_n_steps: 500 │ 2025-08-30 04:44:44 - pico-train - INFO - │ save_to_hf: true │ 2025-08-30 04:44:44 - pico-train - INFO - │ training: │ 2025-08-30 04:44:44 - pico-train - INFO - │ auto_resume: true │ 2025-08-30 04:44:44 - pico-train - INFO - │ data: │ 2025-08-30 04:44:44 - pico-train - INFO - │ dataloader: │ 2025-08-30 04:44:44 - pico-train - INFO - │ batch_size: 4 │ 2025-08-30 04:44:44 - pico-train - INFO - │ dataset: │ 2025-08-30 04:44:44 - pico-train - INFO - │ name: ThomasTheMaker/pretokenized-dolma-5M │ 2025-08-30 04:44:44 - pico-train - INFO - │ tokenizer: │ 2025-08-30 04:44:44 - pico-train - INFO - │ name: allenai/OLMo-7B-0724-hf │ 2025-08-30 04:44:44 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 04:44:44 - pico-train - INFO - │ evaluation: │ 2025-08-30 04:44:44 - pico-train - INFO - │ metrics: │ 2025-08-30 04:44:44 - pico-train - INFO - │ - paloma │ 2025-08-30 04:44:44 - pico-train - INFO - │ paloma: │ 2025-08-30 04:44:44 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 04:44:44 - pico-train - INFO - │ dataset_name: pico-lm/pretokenized-paloma-tinsy │ 2025-08-30 04:44:44 - pico-train - INFO - │ dataset_split: val │ 2025-08-30 04:44:44 - pico-train - INFO - │ max_length: 2048 │ 2025-08-30 04:44:44 - pico-train - INFO - │ model: │ 2025-08-30 04:44:44 - pico-train - INFO - │ activation_hidden_dim: 384 │ 2025-08-30 04:44:44 - pico-train - INFO - │ attention_n_heads: 12 │ 2025-08-30 04:44:44 - pico-train - INFO - │ attention_n_kv_heads: 4 │ 2025-08-30 04:44:44 - pico-train - INFO - │ batch_size: 1024 │ 2025-08-30 04:44:44 - pico-train - INFO - │ d_model: 96 │ 2025-08-30 04:44:44 - pico-train - INFO - │ max_seq_len: 2048 │ 2025-08-30 04:44:44 - pico-train - INFO - │ model_type: pico_decoder │ 2025-08-30 04:44:44 - pico-train - INFO - │ n_layers: 12 │ 2025-08-30 04:44:44 - pico-train - INFO - │ norm_eps: 1.0e-06 │ 2025-08-30 04:44:44 - pico-train - INFO - │ position_emb_theta: 10000.0 │ 2025-08-30 04:44:44 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 04:44:44 - pico-train - INFO - │ monitoring: │ 2025-08-30 04:44:44 - pico-train - INFO - │ logging: │ 2025-08-30 04:44:44 - pico-train - INFO - │ log_every_n_steps: 25 │ 2025-08-30 04:44:44 - pico-train - INFO - │ log_level: INFO │ 2025-08-30 04:44:44 - pico-train - INFO - │ save_to_wandb: false │ 2025-08-30 04:44:44 - pico-train - INFO - │ wandb: │ 2025-08-30 04:44:44 - pico-train - INFO - │ entity: boymyc │ 2025-08-30 04:44:44 - pico-train - INFO - │ project: pico-decoder-tiny │ 2025-08-30 04:44:44 - pico-train - INFO - │ training: │ 2025-08-30 04:44:44 - pico-train - INFO - │ fabric: │ 2025-08-30 04:44:44 - pico-train - INFO - │ accelerator: cuda │ 2025-08-30 04:44:44 - pico-train - INFO - │ num_devices: 1 │ 2025-08-30 04:44:44 - pico-train - INFO - │ num_nodes: 1 │ 2025-08-30 04:44:44 - pico-train - INFO - │ precision: bf16-mixed │ 2025-08-30 04:44:44 - pico-train - INFO - │ max_steps: 20000 │ 2025-08-30 04:44:44 - pico-train - INFO - │ optimization: │ 2025-08-30 04:44:44 - pico-train - INFO - │ gradient_accumulation_steps: 4 │ 2025-08-30 04:44:44 - pico-train - INFO - │ lr: 5.0e-05 │ 2025-08-30 04:44:44 - pico-train - INFO - │ lr_scheduler: cosine │ 2025-08-30 04:44:44 - pico-train - INFO - │ lr_warmup_steps: 8000 │ 2025-08-30 04:44:44 - pico-train - INFO - │ optimizer: adamw │ 2025-08-30 04:44:44 - pico-train - INFO - │ │ 2025-08-30 04:44:44 - pico-train - INFO - ╰─────────────────────────────────────────────────────╯ 2025-08-30 04:44:44 - pico-train - INFO - ================================================== 2025-08-30 04:44:44 - pico-train - INFO - ⛭ Runtime Summary: 2025-08-30 04:44:44 - pico-train - INFO - ================================================== 2025-08-30 04:44:44 - pico-train - INFO - Starting from step: 40000 2025-08-30 04:44:44 - pico-train - INFO - Model Setup: 2025-08-30 04:44:44 - pico-train - INFO - └─ Total Parameters: 11,282,784 2025-08-30 04:44:44 - pico-train - INFO - └─ Trainable Parameters: 11,282,784 2025-08-30 04:44:44 - pico-train - INFO - Distributed Setup: 2025-08-30 04:44:44 - pico-train - INFO - └─ Number of Devices: 1 2025-08-30 04:44:44 - pico-train - INFO - └─ Device Type: NVIDIA GeForce RTX 5090 2025-08-30 04:44:44 - pico-train - INFO - └─ Available Memory: 33.68 GB 2025-08-30 04:44:44 - pico-train - INFO - Software Setup: 2025-08-30 04:44:44 - pico-train - INFO - └─ Python Version: 3.10.12 2025-08-30 04:44:44 - pico-train - INFO - └─ PyTorch Version: 2.8.0+cu128 2025-08-30 04:44:44 - pico-train - INFO - └─ CUDA Version: 12.8 2025-08-30 04:44:44 - pico-train - INFO - └─ Operating System: Linux 6.8.0-63-generic 2025-08-30 04:44:44 - pico-train - INFO - Batch Size Configuration: 2025-08-30 04:44:44 - pico-train - INFO - └─ Global Batch Size: 4 2025-08-30 04:44:44 - pico-train - INFO - └─ Per Device Batch Size: 1 2025-08-30 04:44:44 - pico-train - INFO - └─ Gradient Accumulation Steps: 4 2025-08-30 04:44:44 - pico-train - INFO - ================================================== 2025-08-30 04:44:45 - pico-train - INFO - Step 40000 -- 🔄 Training Metrics 2025-08-30 04:44:45 - pico-train - INFO - ├── Loss: 6.3052 2025-08-30 04:44:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 04:44:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:44:45 - pico-train - INFO - Step 40000 -- 📈 Saving Learning Dynamics 2025-08-30 04:45:06 - pico-train - INFO - Step 40025 -- 🔄 Training Metrics 2025-08-30 04:45:06 - pico-train - INFO - ├── Loss: 6.1689 2025-08-30 04:45:06 - pico-train - INFO - ├── Learning Rate: 3.65e-05 2025-08-30 04:45:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:45:22 - pico-train - INFO - Step 40050 -- 🔄 Training Metrics 2025-08-30 04:45:22 - pico-train - INFO - ├── Loss: 6.1212 2025-08-30 04:45:22 - pico-train - INFO - ├── Learning Rate: 3.65e-05 2025-08-30 04:45:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:45:39 - pico-train - INFO - Step 40075 -- 🔄 Training Metrics 2025-08-30 04:45:39 - pico-train - INFO - ├── Loss: 6.0189 2025-08-30 04:45:39 - pico-train - INFO - ├── Learning Rate: 3.64e-05 2025-08-30 04:45:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:45:55 - pico-train - INFO - Step 40100 -- 🔄 Training Metrics 2025-08-30 04:45:55 - pico-train - INFO - ├── Loss: 6.1347 2025-08-30 04:45:55 - pico-train - INFO - ├── Learning Rate: 3.64e-05 2025-08-30 04:45:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:46:12 - pico-train - INFO - Step 40125 -- 🔄 Training Metrics 2025-08-30 04:46:12 - pico-train - INFO - ├── Loss: 6.1791 2025-08-30 04:46:12 - pico-train - INFO - ├── Learning Rate: 3.64e-05 2025-08-30 04:46:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:46:28 - pico-train - INFO - Step 40150 -- 🔄 Training Metrics 2025-08-30 04:46:28 - pico-train - INFO - ├── Loss: 6.1368 2025-08-30 04:46:28 - pico-train - INFO - ├── Learning Rate: 3.64e-05 2025-08-30 04:46:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:46:44 - pico-train - INFO - Step 40175 -- 🔄 Training Metrics 2025-08-30 04:46:44 - pico-train - INFO - ├── Loss: 6.1443 2025-08-30 04:46:44 - pico-train - INFO - ├── Learning Rate: 3.64e-05 2025-08-30 04:46:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:47:01 - pico-train - INFO - Step 40200 -- 🔄 Training Metrics 2025-08-30 04:47:01 - pico-train - INFO - ├── Loss: 6.1815 2025-08-30 04:47:01 - pico-train - INFO - ├── Learning Rate: 3.63e-05 2025-08-30 04:47:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:47:17 - pico-train - INFO - Step 40225 -- 🔄 Training Metrics 2025-08-30 04:47:17 - pico-train - INFO - ├── Loss: 6.1685 2025-08-30 04:47:17 - pico-train - INFO - ├── Learning Rate: 3.63e-05 2025-08-30 04:47:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:47:34 - pico-train - INFO - Step 40250 -- 🔄 Training Metrics 2025-08-30 04:47:34 - pico-train - INFO - ├── Loss: 6.0835 2025-08-30 04:47:34 - pico-train - INFO - ├── Learning Rate: 3.63e-05 2025-08-30 04:47:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:47:50 - pico-train - INFO - Step 40275 -- 🔄 Training Metrics 2025-08-30 04:47:50 - pico-train - INFO - ├── Loss: 6.0785 2025-08-30 04:47:50 - pico-train - INFO - ├── Learning Rate: 3.63e-05 2025-08-30 04:47:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:48:07 - pico-train - INFO - Step 40300 -- 🔄 Training Metrics 2025-08-30 04:48:07 - pico-train - INFO - ├── Loss: 6.0537 2025-08-30 04:48:07 - pico-train - INFO - ├── Learning Rate: 3.63e-05 2025-08-30 04:48:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:48:23 - pico-train - INFO - Step 40325 -- 🔄 Training Metrics 2025-08-30 04:48:23 - pico-train - INFO - ├── Loss: 6.0608 2025-08-30 04:48:23 - pico-train - INFO - ├── Learning Rate: 3.63e-05 2025-08-30 04:48:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:48:36 - pico-train - INFO - Step 40350 -- 🔄 Training Metrics 2025-08-30 04:48:36 - pico-train - INFO - ├── Loss: 6.1696 2025-08-30 04:48:36 - pico-train - INFO - ├── Learning Rate: 3.62e-05 2025-08-30 04:48:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:48:49 - pico-train - INFO - Step 40375 -- 🔄 Training Metrics 2025-08-30 04:48:49 - pico-train - INFO - ├── Loss: 6.1070 2025-08-30 04:48:49 - pico-train - INFO - ├── Learning Rate: 3.62e-05 2025-08-30 04:48:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:49:02 - pico-train - INFO - Step 40400 -- 🔄 Training Metrics 2025-08-30 04:49:02 - pico-train - INFO - ├── Loss: 6.0783 2025-08-30 04:49:02 - pico-train - INFO - ├── Learning Rate: 3.62e-05 2025-08-30 04:49:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:49:14 - pico-train - INFO - Step 40425 -- 🔄 Training Metrics 2025-08-30 04:49:14 - pico-train - INFO - ├── Loss: 6.2326 2025-08-30 04:49:14 - pico-train - INFO - ├── Learning Rate: 3.62e-05 2025-08-30 04:49:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:49:27 - pico-train - INFO - Step 40450 -- 🔄 Training Metrics 2025-08-30 04:49:27 - pico-train - INFO - ├── Loss: 6.0715 2025-08-30 04:49:27 - pico-train - INFO - ├── Learning Rate: 3.62e-05 2025-08-30 04:49:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:49:39 - pico-train - INFO - Step 40475 -- 🔄 Training Metrics 2025-08-30 04:49:39 - pico-train - INFO - ├── Loss: 6.1857 2025-08-30 04:49:39 - pico-train - INFO - ├── Learning Rate: 3.61e-05 2025-08-30 04:49:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:49:51 - pico-train - INFO - Step 40500 -- 💾 Saving Checkpoint 2025-08-30 04:51:47 - pico-train - INFO - Step 40500 -- 📊 Evaluation Results 2025-08-30 04:51:47 - pico-train - INFO - └── paloma: 1.2201991301470252e+27 2025-08-30 04:51:50 - pico-train - INFO - Step 40500 -- 🔄 Training Metrics 2025-08-30 04:51:50 - pico-train - INFO - ├── Loss: 6.1294 2025-08-30 04:51:50 - pico-train - INFO - ├── Learning Rate: 3.61e-05 2025-08-30 04:51:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:51:50 - pico-train - INFO - Step 40500 -- 📈 Saving Learning Dynamics 2025-08-30 04:52:05 - pico-train - INFO - Step 40525 -- 🔄 Training Metrics 2025-08-30 04:52:05 - pico-train - INFO - ├── Loss: 6.1508 2025-08-30 04:52:05 - pico-train - INFO - ├── Learning Rate: 3.61e-05 2025-08-30 04:52:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:52:18 - pico-train - INFO - Step 40550 -- 🔄 Training Metrics 2025-08-30 04:52:18 - pico-train - INFO - ├── Loss: 6.1130 2025-08-30 04:52:18 - pico-train - INFO - ├── Learning Rate: 3.61e-05 2025-08-30 04:52:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:52:30 - pico-train - INFO - Step 40575 -- 🔄 Training Metrics 2025-08-30 04:52:30 - pico-train - INFO - ├── Loss: 6.1631 2025-08-30 04:52:30 - pico-train - INFO - ├── Learning Rate: 3.61e-05 2025-08-30 04:52:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:52:43 - pico-train - INFO - Step 40600 -- 🔄 Training Metrics 2025-08-30 04:52:43 - pico-train - INFO - ├── Loss: 6.2337 2025-08-30 04:52:43 - pico-train - INFO - ├── Learning Rate: 3.60e-05 2025-08-30 04:52:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:52:55 - pico-train - INFO - Step 40625 -- 🔄 Training Metrics 2025-08-30 04:52:55 - pico-train - INFO - ├── Loss: 6.0858 2025-08-30 04:52:55 - pico-train - INFO - ├── Learning Rate: 3.60e-05 2025-08-30 04:52:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:53:08 - pico-train - INFO - Step 40650 -- 🔄 Training Metrics 2025-08-30 04:53:08 - pico-train - INFO - ├── Loss: 6.1727 2025-08-30 04:53:08 - pico-train - INFO - ├── Learning Rate: 3.60e-05 2025-08-30 04:53:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:53:21 - pico-train - INFO - Step 40675 -- 🔄 Training Metrics 2025-08-30 04:53:21 - pico-train - INFO - ├── Loss: 6.1629 2025-08-30 04:53:21 - pico-train - INFO - ├── Learning Rate: 3.60e-05 2025-08-30 04:53:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:53:33 - pico-train - INFO - Step 40700 -- 🔄 Training Metrics 2025-08-30 04:53:33 - pico-train - INFO - ├── Loss: 6.1451 2025-08-30 04:53:33 - pico-train - INFO - ├── Learning Rate: 3.60e-05 2025-08-30 04:53:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:53:46 - pico-train - INFO - Step 40725 -- 🔄 Training Metrics 2025-08-30 04:53:46 - pico-train - INFO - ├── Loss: 6.1482 2025-08-30 04:53:46 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-30 04:53:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:53:58 - pico-train - INFO - Step 40750 -- 🔄 Training Metrics 2025-08-30 04:53:58 - pico-train - INFO - ├── Loss: 6.0939 2025-08-30 04:53:58 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-30 04:53:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:54:11 - pico-train - INFO - Step 40775 -- 🔄 Training Metrics 2025-08-30 04:54:11 - pico-train - INFO - ├── Loss: 6.1594 2025-08-30 04:54:11 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-30 04:54:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:54:23 - pico-train - INFO - Step 40800 -- 🔄 Training Metrics 2025-08-30 04:54:23 - pico-train - INFO - ├── Loss: 6.1450 2025-08-30 04:54:23 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-30 04:54:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:54:36 - pico-train - INFO - Step 40825 -- 🔄 Training Metrics 2025-08-30 04:54:36 - pico-train - INFO - ├── Loss: 6.0952 2025-08-30 04:54:36 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-30 04:54:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:54:48 - pico-train - INFO - Step 40850 -- 🔄 Training Metrics 2025-08-30 04:54:48 - pico-train - INFO - ├── Loss: 6.1180 2025-08-30 04:54:48 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-30 04:54:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:55:01 - pico-train - INFO - Step 40875 -- 🔄 Training Metrics 2025-08-30 04:55:01 - pico-train - INFO - ├── Loss: 6.0993 2025-08-30 04:55:01 - pico-train - INFO - ├── Learning Rate: 3.58e-05 2025-08-30 04:55:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:55:13 - pico-train - INFO - Step 40900 -- 🔄 Training Metrics 2025-08-30 04:55:13 - pico-train - INFO - ├── Loss: 6.0885 2025-08-30 04:55:13 - pico-train - INFO - ├── Learning Rate: 3.58e-05 2025-08-30 04:55:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:55:26 - pico-train - INFO - Step 40925 -- 🔄 Training Metrics 2025-08-30 04:55:26 - pico-train - INFO - ├── Loss: 6.0793 2025-08-30 04:55:26 - pico-train - INFO - ├── Learning Rate: 3.58e-05 2025-08-30 04:55:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:55:39 - pico-train - INFO - Step 40950 -- 🔄 Training Metrics 2025-08-30 04:55:39 - pico-train - INFO - ├── Loss: 6.1996 2025-08-30 04:55:39 - pico-train - INFO - ├── Learning Rate: 3.58e-05 2025-08-30 04:55:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:55:51 - pico-train - INFO - Step 40975 -- 🔄 Training Metrics 2025-08-30 04:55:51 - pico-train - INFO - ├── Loss: 6.1833 2025-08-30 04:55:51 - pico-train - INFO - ├── Learning Rate: 3.58e-05 2025-08-30 04:55:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:56:03 - pico-train - INFO - Step 41000 -- 💾 Saving Checkpoint 2025-08-30 04:58:02 - pico-train - INFO - Step 41000 -- 📊 Evaluation Results 2025-08-30 04:58:02 - pico-train - INFO - └── paloma: 1.2786105287360795e+27 2025-08-30 04:58:05 - pico-train - INFO - Step 41000 -- 🔄 Training Metrics 2025-08-30 04:58:05 - pico-train - INFO - ├── Loss: 6.0609 2025-08-30 04:58:05 - pico-train - INFO - ├── Learning Rate: 3.57e-05 2025-08-30 04:58:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:58:05 - pico-train - INFO - Step 41000 -- 📈 Saving Learning Dynamics 2025-08-30 04:58:20 - pico-train - INFO - Step 41025 -- 🔄 Training Metrics 2025-08-30 04:58:20 - pico-train - INFO - ├── Loss: 6.0776 2025-08-30 04:58:20 - pico-train - INFO - ├── Learning Rate: 3.57e-05 2025-08-30 04:58:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:58:32 - pico-train - INFO - Step 41050 -- 🔄 Training Metrics 2025-08-30 04:58:32 - pico-train - INFO - ├── Loss: 6.0842 2025-08-30 04:58:32 - pico-train - INFO - ├── Learning Rate: 3.57e-05 2025-08-30 04:58:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:58:45 - pico-train - INFO - Step 41075 -- 🔄 Training Metrics 2025-08-30 04:58:45 - pico-train - INFO - ├── Loss: 6.0750 2025-08-30 04:58:45 - pico-train - INFO - ├── Learning Rate: 3.57e-05 2025-08-30 04:58:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:58:57 - pico-train - INFO - Step 41100 -- 🔄 Training Metrics 2025-08-30 04:58:57 - pico-train - INFO - ├── Loss: 6.1881 2025-08-30 04:58:57 - pico-train - INFO - ├── Learning Rate: 3.57e-05 2025-08-30 04:58:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:59:10 - pico-train - INFO - Step 41125 -- 🔄 Training Metrics 2025-08-30 04:59:10 - pico-train - INFO - ├── Loss: 6.1206 2025-08-30 04:59:10 - pico-train - INFO - ├── Learning Rate: 3.56e-05 2025-08-30 04:59:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:59:23 - pico-train - INFO - Step 41150 -- 🔄 Training Metrics 2025-08-30 04:59:23 - pico-train - INFO - ├── Loss: 6.0181 2025-08-30 04:59:23 - pico-train - INFO - ├── Learning Rate: 3.56e-05 2025-08-30 04:59:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:59:35 - pico-train - INFO - Step 41175 -- 🔄 Training Metrics 2025-08-30 04:59:35 - pico-train - INFO - ├── Loss: 6.2113 2025-08-30 04:59:35 - pico-train - INFO - ├── Learning Rate: 3.56e-05 2025-08-30 04:59:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 04:59:48 - pico-train - INFO - Step 41200 -- 🔄 Training Metrics 2025-08-30 04:59:48 - pico-train - INFO - ├── Loss: 6.1853 2025-08-30 04:59:48 - pico-train - INFO - ├── Learning Rate: 3.56e-05 2025-08-30 04:59:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:00:00 - pico-train - INFO - Step 41225 -- 🔄 Training Metrics 2025-08-30 05:00:00 - pico-train - INFO - ├── Loss: 6.0819 2025-08-30 05:00:00 - pico-train - INFO - ├── Learning Rate: 3.56e-05 2025-08-30 05:00:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:00:13 - pico-train - INFO - Step 41250 -- 🔄 Training Metrics 2025-08-30 05:00:13 - pico-train - INFO - ├── Loss: 6.0575 2025-08-30 05:00:13 - pico-train - INFO - ├── Learning Rate: 3.55e-05 2025-08-30 05:00:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:00:25 - pico-train - INFO - Step 41275 -- 🔄 Training Metrics 2025-08-30 05:00:25 - pico-train - INFO - ├── Loss: 6.0731 2025-08-30 05:00:25 - pico-train - INFO - ├── Learning Rate: 3.55e-05 2025-08-30 05:00:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:00:38 - pico-train - INFO - Step 41300 -- 🔄 Training Metrics 2025-08-30 05:00:38 - pico-train - INFO - ├── Loss: 6.0200 2025-08-30 05:00:38 - pico-train - INFO - ├── Learning Rate: 3.55e-05 2025-08-30 05:00:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:00:50 - pico-train - INFO - Step 41325 -- 🔄 Training Metrics 2025-08-30 05:00:50 - pico-train - INFO - ├── Loss: 6.0379 2025-08-30 05:00:50 - pico-train - INFO - ├── Learning Rate: 3.55e-05 2025-08-30 05:00:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:01:03 - pico-train - INFO - Step 41350 -- 🔄 Training Metrics 2025-08-30 05:01:03 - pico-train - INFO - ├── Loss: 6.0660 2025-08-30 05:01:03 - pico-train - INFO - ├── Learning Rate: 3.55e-05 2025-08-30 05:01:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:01:15 - pico-train - INFO - Step 41375 -- 🔄 Training Metrics 2025-08-30 05:01:15 - pico-train - INFO - ├── Loss: 6.1597 2025-08-30 05:01:15 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-30 05:01:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:01:28 - pico-train - INFO - Step 41400 -- 🔄 Training Metrics 2025-08-30 05:01:28 - pico-train - INFO - ├── Loss: 6.0449 2025-08-30 05:01:28 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-30 05:01:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:01:41 - pico-train - INFO - Step 41425 -- 🔄 Training Metrics 2025-08-30 05:01:41 - pico-train - INFO - ├── Loss: 6.1370 2025-08-30 05:01:41 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-30 05:01:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:01:53 - pico-train - INFO - Step 41450 -- 🔄 Training Metrics 2025-08-30 05:01:53 - pico-train - INFO - ├── Loss: 6.1647 2025-08-30 05:01:53 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-30 05:01:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:02:06 - pico-train - INFO - Step 41475 -- 🔄 Training Metrics 2025-08-30 05:02:06 - pico-train - INFO - ├── Loss: 6.0793 2025-08-30 05:02:06 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-30 05:02:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:02:18 - pico-train - INFO - Step 41500 -- 💾 Saving Checkpoint 2025-08-30 05:04:19 - pico-train - INFO - Step 41500 -- 📊 Evaluation Results 2025-08-30 05:04:19 - pico-train - INFO - └── paloma: 2.062057669347938e+27 2025-08-30 05:04:23 - pico-train - INFO - Step 41500 -- 🔄 Training Metrics 2025-08-30 05:04:23 - pico-train - INFO - ├── Loss: 6.0860 2025-08-30 05:04:23 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-30 05:04:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:04:23 - pico-train - INFO - Step 41500 -- 📈 Saving Learning Dynamics 2025-08-30 05:04:39 - pico-train - INFO - Step 41525 -- 🔄 Training Metrics 2025-08-30 05:04:39 - pico-train - INFO - ├── Loss: 6.0604 2025-08-30 05:04:39 - pico-train - INFO - ├── Learning Rate: 3.53e-05 2025-08-30 05:04:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:04:51 - pico-train - INFO - Step 41550 -- 🔄 Training Metrics 2025-08-30 05:04:51 - pico-train - INFO - ├── Loss: 6.0622 2025-08-30 05:04:51 - pico-train - INFO - ├── Learning Rate: 3.53e-05 2025-08-30 05:04:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:05:04 - pico-train - INFO - Step 41575 -- 🔄 Training Metrics 2025-08-30 05:05:04 - pico-train - INFO - ├── Loss: 6.0831 2025-08-30 05:05:04 - pico-train - INFO - ├── Learning Rate: 3.53e-05 2025-08-30 05:05:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:05:16 - pico-train - INFO - Step 41600 -- 🔄 Training Metrics 2025-08-30 05:05:16 - pico-train - INFO - ├── Loss: 6.0853 2025-08-30 05:05:16 - pico-train - INFO - ├── Learning Rate: 3.53e-05 2025-08-30 05:05:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:05:29 - pico-train - INFO - Step 41625 -- 🔄 Training Metrics 2025-08-30 05:05:29 - pico-train - INFO - ├── Loss: 6.0860 2025-08-30 05:05:29 - pico-train - INFO - ├── Learning Rate: 3.53e-05 2025-08-30 05:05:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:05:41 - pico-train - INFO - Step 41650 -- 🔄 Training Metrics 2025-08-30 05:05:41 - pico-train - INFO - ├── Loss: 6.0905 2025-08-30 05:05:41 - pico-train - INFO - ├── Learning Rate: 3.52e-05 2025-08-30 05:05:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:05:54 - pico-train - INFO - Step 41675 -- 🔄 Training Metrics 2025-08-30 05:05:54 - pico-train - INFO - ├── Loss: 6.0475 2025-08-30 05:05:54 - pico-train - INFO - ├── Learning Rate: 3.52e-05 2025-08-30 05:05:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:06:07 - pico-train - INFO - Step 41700 -- 🔄 Training Metrics 2025-08-30 05:06:07 - pico-train - INFO - ├── Loss: 6.1168 2025-08-30 05:06:07 - pico-train - INFO - ├── Learning Rate: 3.52e-05 2025-08-30 05:06:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:06:19 - pico-train - INFO - Step 41725 -- 🔄 Training Metrics 2025-08-30 05:06:19 - pico-train - INFO - ├── Loss: 6.1310 2025-08-30 05:06:19 - pico-train - INFO - ├── Learning Rate: 3.52e-05 2025-08-30 05:06:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:06:32 - pico-train - INFO - Step 41750 -- 🔄 Training Metrics 2025-08-30 05:06:32 - pico-train - INFO - ├── Loss: 6.0966 2025-08-30 05:06:32 - pico-train - INFO - ├── Learning Rate: 3.52e-05 2025-08-30 05:06:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:06:44 - pico-train - INFO - Step 41775 -- 🔄 Training Metrics 2025-08-30 05:06:44 - pico-train - INFO - ├── Loss: 6.1002 2025-08-30 05:06:44 - pico-train - INFO - ├── Learning Rate: 3.51e-05 2025-08-30 05:06:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:06:57 - pico-train - INFO - Step 41800 -- 🔄 Training Metrics 2025-08-30 05:06:57 - pico-train - INFO - ├── Loss: 6.1383 2025-08-30 05:06:57 - pico-train - INFO - ├── Learning Rate: 3.51e-05 2025-08-30 05:06:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:07:09 - pico-train - INFO - Step 41825 -- 🔄 Training Metrics 2025-08-30 05:07:09 - pico-train - INFO - ├── Loss: 6.0973 2025-08-30 05:07:09 - pico-train - INFO - ├── Learning Rate: 3.51e-05 2025-08-30 05:07:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:07:22 - pico-train - INFO - Step 41850 -- 🔄 Training Metrics 2025-08-30 05:07:22 - pico-train - INFO - ├── Loss: 6.0864 2025-08-30 05:07:22 - pico-train - INFO - ├── Learning Rate: 3.51e-05 2025-08-30 05:07:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:07:34 - pico-train - INFO - Step 41875 -- 🔄 Training Metrics 2025-08-30 05:07:34 - pico-train - INFO - ├── Loss: 6.1542 2025-08-30 05:07:34 - pico-train - INFO - ├── Learning Rate: 3.51e-05 2025-08-30 05:07:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:07:47 - pico-train - INFO - Step 41900 -- 🔄 Training Metrics 2025-08-30 05:07:47 - pico-train - INFO - ├── Loss: 6.1191 2025-08-30 05:07:47 - pico-train - INFO - ├── Learning Rate: 3.50e-05 2025-08-30 05:07:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:07:59 - pico-train - INFO - Step 41925 -- 🔄 Training Metrics 2025-08-30 05:07:59 - pico-train - INFO - ├── Loss: 6.1827 2025-08-30 05:07:59 - pico-train - INFO - ├── Learning Rate: 3.50e-05 2025-08-30 05:07:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:08:12 - pico-train - INFO - Step 41950 -- 🔄 Training Metrics 2025-08-30 05:08:12 - pico-train - INFO - ├── Loss: 6.1001 2025-08-30 05:08:12 - pico-train - INFO - ├── Learning Rate: 3.50e-05 2025-08-30 05:08:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:08:24 - pico-train - INFO - Step 41975 -- 🔄 Training Metrics 2025-08-30 05:08:24 - pico-train - INFO - ├── Loss: 6.1700 2025-08-30 05:08:24 - pico-train - INFO - ├── Learning Rate: 3.50e-05 2025-08-30 05:08:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:08:36 - pico-train - INFO - Step 42000 -- 💾 Saving Checkpoint 2025-08-30 05:10:36 - pico-train - INFO - Step 42000 -- 📊 Evaluation Results 2025-08-30 05:10:36 - pico-train - INFO - └── paloma: 2.5987478678619155e+27 2025-08-30 05:10:39 - pico-train - INFO - Step 42000 -- 🔄 Training Metrics 2025-08-30 05:10:39 - pico-train - INFO - ├── Loss: 6.1167 2025-08-30 05:10:39 - pico-train - INFO - ├── Learning Rate: 3.50e-05 2025-08-30 05:10:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:10:39 - pico-train - INFO - Step 42000 -- 📈 Saving Learning Dynamics 2025-08-30 05:10:57 - pico-train - INFO - Step 42025 -- 🔄 Training Metrics 2025-08-30 05:10:57 - pico-train - INFO - ├── Loss: 6.1833 2025-08-30 05:10:57 - pico-train - INFO - ├── Learning Rate: 3.49e-05 2025-08-30 05:10:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:11:09 - pico-train - INFO - Step 42050 -- 🔄 Training Metrics 2025-08-30 05:11:09 - pico-train - INFO - ├── Loss: 6.0939 2025-08-30 05:11:09 - pico-train - INFO - ├── Learning Rate: 3.49e-05 2025-08-30 05:11:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:11:22 - pico-train - INFO - Step 42075 -- 🔄 Training Metrics 2025-08-30 05:11:22 - pico-train - INFO - ├── Loss: 6.0309 2025-08-30 05:11:22 - pico-train - INFO - ├── Learning Rate: 3.49e-05 2025-08-30 05:11:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:11:34 - pico-train - INFO - Step 42100 -- 🔄 Training Metrics 2025-08-30 05:11:34 - pico-train - INFO - ├── Loss: 6.0340 2025-08-30 05:11:34 - pico-train - INFO - ├── Learning Rate: 3.49e-05 2025-08-30 05:11:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:11:47 - pico-train - INFO - Step 42125 -- 🔄 Training Metrics 2025-08-30 05:11:47 - pico-train - INFO - ├── Loss: 6.0556 2025-08-30 05:11:47 - pico-train - INFO - ├── Learning Rate: 3.49e-05 2025-08-30 05:11:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:11:59 - pico-train - INFO - Step 42150 -- 🔄 Training Metrics 2025-08-30 05:11:59 - pico-train - INFO - ├── Loss: 6.1500 2025-08-30 05:11:59 - pico-train - INFO - ├── Learning Rate: 3.48e-05 2025-08-30 05:11:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:12:12 - pico-train - INFO - Step 42175 -- 🔄 Training Metrics 2025-08-30 05:12:12 - pico-train - INFO - ├── Loss: 6.1793 2025-08-30 05:12:12 - pico-train - INFO - ├── Learning Rate: 3.48e-05 2025-08-30 05:12:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:12:25 - pico-train - INFO - Step 42200 -- 🔄 Training Metrics 2025-08-30 05:12:25 - pico-train - INFO - ├── Loss: 6.0804 2025-08-30 05:12:25 - pico-train - INFO - ├── Learning Rate: 3.48e-05 2025-08-30 05:12:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:12:37 - pico-train - INFO - Step 42225 -- 🔄 Training Metrics 2025-08-30 05:12:37 - pico-train - INFO - ├── Loss: 6.1646 2025-08-30 05:12:37 - pico-train - INFO - ├── Learning Rate: 3.48e-05 2025-08-30 05:12:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:12:50 - pico-train - INFO - Step 42250 -- 🔄 Training Metrics 2025-08-30 05:12:50 - pico-train - INFO - ├── Loss: 6.1414 2025-08-30 05:12:50 - pico-train - INFO - ├── Learning Rate: 3.48e-05 2025-08-30 05:12:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:13:02 - pico-train - INFO - Step 42275 -- 🔄 Training Metrics 2025-08-30 05:13:02 - pico-train - INFO - ├── Loss: 6.0790 2025-08-30 05:13:02 - pico-train - INFO - ├── Learning Rate: 3.47e-05 2025-08-30 05:13:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:13:15 - pico-train - INFO - Step 42300 -- 🔄 Training Metrics 2025-08-30 05:13:15 - pico-train - INFO - ├── Loss: 6.0907 2025-08-30 05:13:15 - pico-train - INFO - ├── Learning Rate: 3.47e-05 2025-08-30 05:13:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:13:27 - pico-train - INFO - Step 42325 -- 🔄 Training Metrics 2025-08-30 05:13:27 - pico-train - INFO - ├── Loss: 6.1426 2025-08-30 05:13:27 - pico-train - INFO - ├── Learning Rate: 3.47e-05 2025-08-30 05:13:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:13:40 - pico-train - INFO - Step 42350 -- 🔄 Training Metrics 2025-08-30 05:13:40 - pico-train - INFO - ├── Loss: 6.1071 2025-08-30 05:13:40 - pico-train - INFO - ├── Learning Rate: 3.47e-05 2025-08-30 05:13:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:13:52 - pico-train - INFO - Step 42375 -- 🔄 Training Metrics 2025-08-30 05:13:52 - pico-train - INFO - ├── Loss: 6.0071 2025-08-30 05:13:52 - pico-train - INFO - ├── Learning Rate: 3.47e-05 2025-08-30 05:13:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:14:05 - pico-train - INFO - Step 42400 -- 🔄 Training Metrics 2025-08-30 05:14:05 - pico-train - INFO - ├── Loss: 6.1562 2025-08-30 05:14:05 - pico-train - INFO - ├── Learning Rate: 3.46e-05 2025-08-30 05:14:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:14:18 - pico-train - INFO - Step 42425 -- 🔄 Training Metrics 2025-08-30 05:14:18 - pico-train - INFO - ├── Loss: 6.1296 2025-08-30 05:14:18 - pico-train - INFO - ├── Learning Rate: 3.46e-05 2025-08-30 05:14:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:14:30 - pico-train - INFO - Step 42450 -- 🔄 Training Metrics 2025-08-30 05:14:30 - pico-train - INFO - ├── Loss: 6.1257 2025-08-30 05:14:30 - pico-train - INFO - ├── Learning Rate: 3.46e-05 2025-08-30 05:14:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:14:43 - pico-train - INFO - Step 42475 -- 🔄 Training Metrics 2025-08-30 05:14:43 - pico-train - INFO - ├── Loss: 6.1398 2025-08-30 05:14:43 - pico-train - INFO - ├── Learning Rate: 3.46e-05 2025-08-30 05:14:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:14:55 - pico-train - INFO - Step 42500 -- 💾 Saving Checkpoint 2025-08-30 05:16:58 - pico-train - INFO - Step 42500 -- 📊 Evaluation Results 2025-08-30 05:16:58 - pico-train - INFO - └── paloma: 3.0154563482458477e+27 2025-08-30 05:17:01 - pico-train - INFO - Step 42500 -- 🔄 Training Metrics 2025-08-30 05:17:01 - pico-train - INFO - ├── Loss: 6.0496 2025-08-30 05:17:01 - pico-train - INFO - ├── Learning Rate: 3.46e-05 2025-08-30 05:17:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:17:01 - pico-train - INFO - Step 42500 -- 📈 Saving Learning Dynamics 2025-08-30 05:17:16 - pico-train - INFO - Step 42525 -- 🔄 Training Metrics 2025-08-30 05:17:16 - pico-train - INFO - ├── Loss: 6.0819 2025-08-30 05:17:16 - pico-train - INFO - ├── Learning Rate: 3.45e-05 2025-08-30 05:17:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:17:29 - pico-train - INFO - Step 42550 -- 🔄 Training Metrics 2025-08-30 05:17:29 - pico-train - INFO - ├── Loss: 6.0871 2025-08-30 05:17:29 - pico-train - INFO - ├── Learning Rate: 3.45e-05 2025-08-30 05:17:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:17:42 - pico-train - INFO - Step 42575 -- 🔄 Training Metrics 2025-08-30 05:17:42 - pico-train - INFO - ├── Loss: 6.0924 2025-08-30 05:17:42 - pico-train - INFO - ├── Learning Rate: 3.45e-05 2025-08-30 05:17:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:17:54 - pico-train - INFO - Step 42600 -- 🔄 Training Metrics 2025-08-30 05:17:54 - pico-train - INFO - ├── Loss: 6.0553 2025-08-30 05:17:54 - pico-train - INFO - ├── Learning Rate: 3.45e-05 2025-08-30 05:17:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:18:07 - pico-train - INFO - Step 42625 -- 🔄 Training Metrics 2025-08-30 05:18:07 - pico-train - INFO - ├── Loss: 6.1371 2025-08-30 05:18:07 - pico-train - INFO - ├── Learning Rate: 3.45e-05 2025-08-30 05:18:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:18:19 - pico-train - INFO - Step 42650 -- 🔄 Training Metrics 2025-08-30 05:18:19 - pico-train - INFO - ├── Loss: 6.0776 2025-08-30 05:18:19 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-30 05:18:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:18:32 - pico-train - INFO - Step 42675 -- 🔄 Training Metrics 2025-08-30 05:18:32 - pico-train - INFO - ├── Loss: 6.1134 2025-08-30 05:18:32 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-30 05:18:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:18:45 - pico-train - INFO - Step 42700 -- 🔄 Training Metrics 2025-08-30 05:18:45 - pico-train - INFO - ├── Loss: 5.9718 2025-08-30 05:18:45 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-30 05:18:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:18:57 - pico-train - INFO - Step 42725 -- 🔄 Training Metrics 2025-08-30 05:18:57 - pico-train - INFO - ├── Loss: 6.0381 2025-08-30 05:18:57 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-30 05:18:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:19:10 - pico-train - INFO - Step 42750 -- 🔄 Training Metrics 2025-08-30 05:19:10 - pico-train - INFO - ├── Loss: 6.1626 2025-08-30 05:19:10 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-30 05:19:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:19:22 - pico-train - INFO - Step 42775 -- 🔄 Training Metrics 2025-08-30 05:19:22 - pico-train - INFO - ├── Loss: 6.0909 2025-08-30 05:19:22 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-30 05:19:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:19:35 - pico-train - INFO - Step 42800 -- 🔄 Training Metrics 2025-08-30 05:19:35 - pico-train - INFO - ├── Loss: 6.1275 2025-08-30 05:19:35 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-30 05:19:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:19:47 - pico-train - INFO - Step 42825 -- 🔄 Training Metrics 2025-08-30 05:19:47 - pico-train - INFO - ├── Loss: 6.0942 2025-08-30 05:19:47 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-30 05:19:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:20:00 - pico-train - INFO - Step 42850 -- 🔄 Training Metrics 2025-08-30 05:20:00 - pico-train - INFO - ├── Loss: 6.0309 2025-08-30 05:20:00 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-30 05:20:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:20:12 - pico-train - INFO - Step 42875 -- 🔄 Training Metrics 2025-08-30 05:20:12 - pico-train - INFO - ├── Loss: 6.1312 2025-08-30 05:20:12 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-30 05:20:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:20:25 - pico-train - INFO - Step 42900 -- 🔄 Training Metrics 2025-08-30 05:20:25 - pico-train - INFO - ├── Loss: 6.1728 2025-08-30 05:20:25 - pico-train - INFO - ├── Learning Rate: 3.43e-05 2025-08-30 05:20:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:20:38 - pico-train - INFO - Step 42925 -- 🔄 Training Metrics 2025-08-30 05:20:38 - pico-train - INFO - ├── Loss: 5.9740 2025-08-30 05:20:38 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-30 05:20:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:20:50 - pico-train - INFO - Step 42950 -- 🔄 Training Metrics 2025-08-30 05:20:50 - pico-train - INFO - ├── Loss: 6.0812 2025-08-30 05:20:50 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-30 05:20:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:21:03 - pico-train - INFO - Step 42975 -- 🔄 Training Metrics 2025-08-30 05:21:03 - pico-train - INFO - ├── Loss: 6.0484 2025-08-30 05:21:03 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-30 05:21:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:21:15 - pico-train - INFO - Step 43000 -- 💾 Saving Checkpoint 2025-08-30 05:23:15 - pico-train - INFO - Step 43000 -- 📊 Evaluation Results 2025-08-30 05:23:15 - pico-train - INFO - └── paloma: 4.4972099298583296e+27 2025-08-30 05:23:19 - pico-train - INFO - Step 43000 -- 🔄 Training Metrics 2025-08-30 05:23:19 - pico-train - INFO - ├── Loss: 6.2475 2025-08-30 05:23:19 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-30 05:23:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:23:19 - pico-train - INFO - Step 43000 -- 📈 Saving Learning Dynamics 2025-08-30 05:23:36 - pico-train - INFO - Step 43025 -- 🔄 Training Metrics 2025-08-30 05:23:36 - pico-train - INFO - ├── Loss: 6.0959 2025-08-30 05:23:36 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-30 05:23:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:23:48 - pico-train - INFO - Step 43050 -- 🔄 Training Metrics 2025-08-30 05:23:48 - pico-train - INFO - ├── Loss: 6.0753 2025-08-30 05:23:48 - pico-train - INFO - ├── Learning Rate: 3.41e-05 2025-08-30 05:23:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:24:01 - pico-train - INFO - Step 43075 -- 🔄 Training Metrics 2025-08-30 05:24:01 - pico-train - INFO - ├── Loss: 6.1130 2025-08-30 05:24:01 - pico-train - INFO - ├── Learning Rate: 3.41e-05 2025-08-30 05:24:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:24:13 - pico-train - INFO - Step 43100 -- 🔄 Training Metrics 2025-08-30 05:24:13 - pico-train - INFO - ├── Loss: 6.0777 2025-08-30 05:24:13 - pico-train - INFO - ├── Learning Rate: 3.41e-05 2025-08-30 05:24:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:24:26 - pico-train - INFO - Step 43125 -- 🔄 Training Metrics 2025-08-30 05:24:26 - pico-train - INFO - ├── Loss: 6.1311 2025-08-30 05:24:26 - pico-train - INFO - ├── Learning Rate: 3.41e-05 2025-08-30 05:24:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:24:39 - pico-train - INFO - Step 43150 -- 🔄 Training Metrics 2025-08-30 05:24:39 - pico-train - INFO - ├── Loss: 6.0421 2025-08-30 05:24:39 - pico-train - INFO - ├── Learning Rate: 3.41e-05 2025-08-30 05:24:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:24:52 - pico-train - INFO - Step 43175 -- 🔄 Training Metrics 2025-08-30 05:24:52 - pico-train - INFO - ├── Loss: 6.0355 2025-08-30 05:24:52 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-30 05:24:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:25:04 - pico-train - INFO - Step 43200 -- 🔄 Training Metrics 2025-08-30 05:25:04 - pico-train - INFO - ├── Loss: 6.0889 2025-08-30 05:25:04 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-30 05:25:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:25:17 - pico-train - INFO - Step 43225 -- 🔄 Training Metrics 2025-08-30 05:25:17 - pico-train - INFO - ├── Loss: 6.0605 2025-08-30 05:25:17 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-30 05:25:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:25:30 - pico-train - INFO - Step 43250 -- 🔄 Training Metrics 2025-08-30 05:25:30 - pico-train - INFO - ├── Loss: 6.1064 2025-08-30 05:25:30 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-30 05:25:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:25:42 - pico-train - INFO - Step 43275 -- 🔄 Training Metrics 2025-08-30 05:25:42 - pico-train - INFO - ├── Loss: 6.1053 2025-08-30 05:25:42 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-30 05:25:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:25:55 - pico-train - INFO - Step 43300 -- 🔄 Training Metrics 2025-08-30 05:25:55 - pico-train - INFO - ├── Loss: 6.1399 2025-08-30 05:25:55 - pico-train - INFO - ├── Learning Rate: 3.39e-05 2025-08-30 05:25:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:26:07 - pico-train - INFO - Step 43325 -- 🔄 Training Metrics 2025-08-30 05:26:07 - pico-train - INFO - ├── Loss: 6.1271 2025-08-30 05:26:07 - pico-train - INFO - ├── Learning Rate: 3.39e-05 2025-08-30 05:26:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:26:20 - pico-train - INFO - Step 43350 -- 🔄 Training Metrics 2025-08-30 05:26:20 - pico-train - INFO - ├── Loss: 6.0790 2025-08-30 05:26:20 - pico-train - INFO - ├── Learning Rate: 3.39e-05 2025-08-30 05:26:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:26:33 - pico-train - INFO - Step 43375 -- 🔄 Training Metrics 2025-08-30 05:26:33 - pico-train - INFO - ├── Loss: 6.0567 2025-08-30 05:26:33 - pico-train - INFO - ├── Learning Rate: 3.39e-05 2025-08-30 05:26:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:26:45 - pico-train - INFO - Step 43400 -- 🔄 Training Metrics 2025-08-30 05:26:45 - pico-train - INFO - ├── Loss: 6.0771 2025-08-30 05:26:45 - pico-train - INFO - ├── Learning Rate: 3.39e-05 2025-08-30 05:26:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:26:58 - pico-train - INFO - Step 43425 -- 🔄 Training Metrics 2025-08-30 05:26:58 - pico-train - INFO - ├── Loss: 6.1399 2025-08-30 05:26:58 - pico-train - INFO - ├── Learning Rate: 3.38e-05 2025-08-30 05:26:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:27:10 - pico-train - INFO - Step 43450 -- 🔄 Training Metrics 2025-08-30 05:27:10 - pico-train - INFO - ├── Loss: 6.1330 2025-08-30 05:27:10 - pico-train - INFO - ├── Learning Rate: 3.38e-05 2025-08-30 05:27:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:27:23 - pico-train - INFO - Step 43475 -- 🔄 Training Metrics 2025-08-30 05:27:23 - pico-train - INFO - ├── Loss: 6.0139 2025-08-30 05:27:23 - pico-train - INFO - ├── Learning Rate: 3.38e-05 2025-08-30 05:27:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:27:35 - pico-train - INFO - Step 43500 -- 💾 Saving Checkpoint 2025-08-30 05:29:43 - pico-train - INFO - Step 43500 -- 📊 Evaluation Results 2025-08-30 05:29:43 - pico-train - INFO - └── paloma: 5.326210528222522e+27 2025-08-30 05:29:45 - pico-train - INFO - Step 43500 -- 🔄 Training Metrics 2025-08-30 05:29:45 - pico-train - INFO - ├── Loss: 6.1439 2025-08-30 05:29:45 - pico-train - INFO - ├── Learning Rate: 3.38e-05 2025-08-30 05:29:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:29:45 - pico-train - INFO - Step 43500 -- 📈 Saving Learning Dynamics 2025-08-30 05:30:00 - pico-train - INFO - Step 43525 -- 🔄 Training Metrics 2025-08-30 05:30:00 - pico-train - INFO - ├── Loss: 6.0445 2025-08-30 05:30:00 - pico-train - INFO - ├── Learning Rate: 3.38e-05 2025-08-30 05:30:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:30:12 - pico-train - INFO - Step 43550 -- 🔄 Training Metrics 2025-08-30 05:30:12 - pico-train - INFO - ├── Loss: 6.0780 2025-08-30 05:30:12 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-30 05:30:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:30:25 - pico-train - INFO - Step 43575 -- 🔄 Training Metrics 2025-08-30 05:30:25 - pico-train - INFO - ├── Loss: 6.0044 2025-08-30 05:30:25 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-30 05:30:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:30:38 - pico-train - INFO - Step 43600 -- 🔄 Training Metrics 2025-08-30 05:30:38 - pico-train - INFO - ├── Loss: 6.0087 2025-08-30 05:30:38 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-30 05:30:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:30:50 - pico-train - INFO - Step 43625 -- 🔄 Training Metrics 2025-08-30 05:30:50 - pico-train - INFO - ├── Loss: 6.1263 2025-08-30 05:30:50 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-30 05:30:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:31:03 - pico-train - INFO - Step 43650 -- 🔄 Training Metrics 2025-08-30 05:31:03 - pico-train - INFO - ├── Loss: 6.0459 2025-08-30 05:31:03 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-30 05:31:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:31:15 - pico-train - INFO - Step 43675 -- 🔄 Training Metrics 2025-08-30 05:31:15 - pico-train - INFO - ├── Loss: 6.0390 2025-08-30 05:31:15 - pico-train - INFO - ├── Learning Rate: 3.36e-05 2025-08-30 05:31:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:31:28 - pico-train - INFO - Step 43700 -- 🔄 Training Metrics 2025-08-30 05:31:28 - pico-train - INFO - ├── Loss: 6.0918 2025-08-30 05:31:28 - pico-train - INFO - ├── Learning Rate: 3.36e-05 2025-08-30 05:31:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:31:41 - pico-train - INFO - Step 43725 -- 🔄 Training Metrics 2025-08-30 05:31:41 - pico-train - INFO - ├── Loss: 6.0426 2025-08-30 05:31:41 - pico-train - INFO - ├── Learning Rate: 3.36e-05 2025-08-30 05:31:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:31:53 - pico-train - INFO - Step 43750 -- 🔄 Training Metrics 2025-08-30 05:31:53 - pico-train - INFO - ├── Loss: 6.0634 2025-08-30 05:31:53 - pico-train - INFO - ├── Learning Rate: 3.36e-05 2025-08-30 05:31:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:32:06 - pico-train - INFO - Step 43775 -- 🔄 Training Metrics 2025-08-30 05:32:06 - pico-train - INFO - ├── Loss: 6.1042 2025-08-30 05:32:06 - pico-train - INFO - ├── Learning Rate: 3.36e-05 2025-08-30 05:32:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:32:18 - pico-train - INFO - Step 43800 -- 🔄 Training Metrics 2025-08-30 05:32:18 - pico-train - INFO - ├── Loss: 6.0510 2025-08-30 05:32:18 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-30 05:32:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:32:31 - pico-train - INFO - Step 43825 -- 🔄 Training Metrics 2025-08-30 05:32:31 - pico-train - INFO - ├── Loss: 6.0403 2025-08-30 05:32:31 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-30 05:32:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:32:43 - pico-train - INFO - Step 43850 -- 🔄 Training Metrics 2025-08-30 05:32:43 - pico-train - INFO - ├── Loss: 6.0537 2025-08-30 05:32:43 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-30 05:32:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:32:56 - pico-train - INFO - Step 43875 -- 🔄 Training Metrics 2025-08-30 05:32:56 - pico-train - INFO - ├── Loss: 6.1244 2025-08-30 05:32:56 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-30 05:32:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:33:09 - pico-train - INFO - Step 43900 -- 🔄 Training Metrics 2025-08-30 05:33:09 - pico-train - INFO - ├── Loss: 6.1294 2025-08-30 05:33:09 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-30 05:33:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:33:21 - pico-train - INFO - Step 43925 -- 🔄 Training Metrics 2025-08-30 05:33:21 - pico-train - INFO - ├── Loss: 6.0845 2025-08-30 05:33:21 - pico-train - INFO - ├── Learning Rate: 3.34e-05 2025-08-30 05:33:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:33:34 - pico-train - INFO - Step 43950 -- 🔄 Training Metrics 2025-08-30 05:33:34 - pico-train - INFO - ├── Loss: 6.0365 2025-08-30 05:33:34 - pico-train - INFO - ├── Learning Rate: 3.34e-05 2025-08-30 05:33:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:33:46 - pico-train - INFO - Step 43975 -- 🔄 Training Metrics 2025-08-30 05:33:46 - pico-train - INFO - ├── Loss: 6.0507 2025-08-30 05:33:46 - pico-train - INFO - ├── Learning Rate: 3.34e-05 2025-08-30 05:33:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:33:58 - pico-train - INFO - Step 44000 -- 💾 Saving Checkpoint 2025-08-30 05:35:55 - pico-train - INFO - Step 44000 -- 📊 Evaluation Results 2025-08-30 05:35:55 - pico-train - INFO - └── paloma: 1.0515089806395209e+28 2025-08-30 05:35:57 - pico-train - INFO - Step 44000 -- 🔄 Training Metrics 2025-08-30 05:35:57 - pico-train - INFO - ├── Loss: 5.9669 2025-08-30 05:35:57 - pico-train - INFO - ├── Learning Rate: 3.34e-05 2025-08-30 05:35:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:35:57 - pico-train - INFO - Step 44000 -- 📈 Saving Learning Dynamics 2025-08-30 05:36:12 - pico-train - INFO - Step 44025 -- 🔄 Training Metrics 2025-08-30 05:36:12 - pico-train - INFO - ├── Loss: 6.0454 2025-08-30 05:36:12 - pico-train - INFO - ├── Learning Rate: 3.34e-05 2025-08-30 05:36:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:36:25 - pico-train - INFO - Step 44050 -- 🔄 Training Metrics 2025-08-30 05:36:25 - pico-train - INFO - ├── Loss: 6.0395 2025-08-30 05:36:25 - pico-train - INFO - ├── Learning Rate: 3.33e-05 2025-08-30 05:36:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:36:38 - pico-train - INFO - Step 44075 -- 🔄 Training Metrics 2025-08-30 05:36:38 - pico-train - INFO - ├── Loss: 5.9733 2025-08-30 05:36:38 - pico-train - INFO - ├── Learning Rate: 3.33e-05 2025-08-30 05:36:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:36:50 - pico-train - INFO - Step 44100 -- 🔄 Training Metrics 2025-08-30 05:36:50 - pico-train - INFO - ├── Loss: 6.1172 2025-08-30 05:36:50 - pico-train - INFO - ├── Learning Rate: 3.33e-05 2025-08-30 05:36:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:37:03 - pico-train - INFO - Step 44125 -- 🔄 Training Metrics 2025-08-30 05:37:03 - pico-train - INFO - ├── Loss: 6.0527 2025-08-30 05:37:03 - pico-train - INFO - ├── Learning Rate: 3.33e-05 2025-08-30 05:37:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:37:15 - pico-train - INFO - Step 44150 -- 🔄 Training Metrics 2025-08-30 05:37:15 - pico-train - INFO - ├── Loss: 6.0853 2025-08-30 05:37:15 - pico-train - INFO - ├── Learning Rate: 3.33e-05 2025-08-30 05:37:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:37:28 - pico-train - INFO - Step 44175 -- 🔄 Training Metrics 2025-08-30 05:37:28 - pico-train - INFO - ├── Loss: 6.0303 2025-08-30 05:37:28 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-30 05:37:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:37:40 - pico-train - INFO - Step 44200 -- 🔄 Training Metrics 2025-08-30 05:37:40 - pico-train - INFO - ├── Loss: 5.9986 2025-08-30 05:37:40 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-30 05:37:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:37:53 - pico-train - INFO - Step 44225 -- 🔄 Training Metrics 2025-08-30 05:37:53 - pico-train - INFO - ├── Loss: 6.0450 2025-08-30 05:37:53 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-30 05:37:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:38:06 - pico-train - INFO - Step 44250 -- 🔄 Training Metrics 2025-08-30 05:38:06 - pico-train - INFO - ├── Loss: 6.0449 2025-08-30 05:38:06 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-30 05:38:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:38:18 - pico-train - INFO - Step 44275 -- 🔄 Training Metrics 2025-08-30 05:38:18 - pico-train - INFO - ├── Loss: 6.0811 2025-08-30 05:38:18 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-30 05:38:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:38:31 - pico-train - INFO - Step 44300 -- 🔄 Training Metrics 2025-08-30 05:38:31 - pico-train - INFO - ├── Loss: 6.0524 2025-08-30 05:38:31 - pico-train - INFO - ├── Learning Rate: 3.31e-05 2025-08-30 05:38:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:38:43 - pico-train - INFO - Step 44325 -- 🔄 Training Metrics 2025-08-30 05:38:43 - pico-train - INFO - ├── Loss: 6.0148 2025-08-30 05:38:43 - pico-train - INFO - ├── Learning Rate: 3.31e-05 2025-08-30 05:38:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:38:56 - pico-train - INFO - Step 44350 -- 🔄 Training Metrics 2025-08-30 05:38:56 - pico-train - INFO - ├── Loss: 6.0216 2025-08-30 05:38:56 - pico-train - INFO - ├── Learning Rate: 3.31e-05 2025-08-30 05:38:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:39:08 - pico-train - INFO - Step 44375 -- 🔄 Training Metrics 2025-08-30 05:39:08 - pico-train - INFO - ├── Loss: 5.9966 2025-08-30 05:39:08 - pico-train - INFO - ├── Learning Rate: 3.31e-05 2025-08-30 05:39:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:39:21 - pico-train - INFO - Step 44400 -- 🔄 Training Metrics 2025-08-30 05:39:21 - pico-train - INFO - ├── Loss: 6.0301 2025-08-30 05:39:21 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-30 05:39:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:39:34 - pico-train - INFO - Step 44425 -- 🔄 Training Metrics 2025-08-30 05:39:34 - pico-train - INFO - ├── Loss: 6.1473 2025-08-30 05:39:34 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-30 05:39:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:39:46 - pico-train - INFO - Step 44450 -- 🔄 Training Metrics 2025-08-30 05:39:46 - pico-train - INFO - ├── Loss: 6.0092 2025-08-30 05:39:46 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-30 05:39:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:39:59 - pico-train - INFO - Step 44475 -- 🔄 Training Metrics 2025-08-30 05:39:59 - pico-train - INFO - ├── Loss: 6.0807 2025-08-30 05:39:59 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-30 05:39:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:40:11 - pico-train - INFO - Step 44500 -- 💾 Saving Checkpoint 2025-08-30 05:42:28 - pico-train - INFO - Step 44500 -- 📊 Evaluation Results 2025-08-30 05:42:28 - pico-train - INFO - └── paloma: 9.953158679071717e+27 2025-08-30 05:42:31 - pico-train - INFO - Step 44500 -- 🔄 Training Metrics 2025-08-30 05:42:31 - pico-train - INFO - ├── Loss: 6.0974 2025-08-30 05:42:31 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-30 05:42:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:42:31 - pico-train - INFO - Step 44500 -- 📈 Saving Learning Dynamics 2025-08-30 05:42:46 - pico-train - INFO - Step 44525 -- 🔄 Training Metrics 2025-08-30 05:42:46 - pico-train - INFO - ├── Loss: 6.0606 2025-08-30 05:42:46 - pico-train - INFO - ├── Learning Rate: 3.29e-05 2025-08-30 05:42:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:42:59 - pico-train - INFO - Step 44550 -- 🔄 Training Metrics 2025-08-30 05:42:59 - pico-train - INFO - ├── Loss: 6.0374 2025-08-30 05:42:59 - pico-train - INFO - ├── Learning Rate: 3.29e-05 2025-08-30 05:42:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:43:11 - pico-train - INFO - Step 44575 -- 🔄 Training Metrics 2025-08-30 05:43:11 - pico-train - INFO - ├── Loss: 5.9995 2025-08-30 05:43:11 - pico-train - INFO - ├── Learning Rate: 3.29e-05 2025-08-30 05:43:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:43:24 - pico-train - INFO - Step 44600 -- 🔄 Training Metrics 2025-08-30 05:43:24 - pico-train - INFO - ├── Loss: 6.0354 2025-08-30 05:43:24 - pico-train - INFO - ├── Learning Rate: 3.29e-05 2025-08-30 05:43:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:43:36 - pico-train - INFO - Step 44625 -- 🔄 Training Metrics 2025-08-30 05:43:36 - pico-train - INFO - ├── Loss: 6.0512 2025-08-30 05:43:36 - pico-train - INFO - ├── Learning Rate: 3.29e-05 2025-08-30 05:43:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:43:49 - pico-train - INFO - Step 44650 -- 🔄 Training Metrics 2025-08-30 05:43:49 - pico-train - INFO - ├── Loss: 5.9998 2025-08-30 05:43:49 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-30 05:43:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:44:01 - pico-train - INFO - Step 44675 -- 🔄 Training Metrics 2025-08-30 05:44:01 - pico-train - INFO - ├── Loss: 6.0010 2025-08-30 05:44:01 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-30 05:44:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:44:14 - pico-train - INFO - Step 44700 -- 🔄 Training Metrics 2025-08-30 05:44:14 - pico-train - INFO - ├── Loss: 6.0795 2025-08-30 05:44:14 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-30 05:44:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:44:26 - pico-train - INFO - Step 44725 -- 🔄 Training Metrics 2025-08-30 05:44:26 - pico-train - INFO - ├── Loss: 6.0255 2025-08-30 05:44:26 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-30 05:44:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:44:39 - pico-train - INFO - Step 44750 -- 🔄 Training Metrics 2025-08-30 05:44:39 - pico-train - INFO - ├── Loss: 6.0648 2025-08-30 05:44:39 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-30 05:44:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:44:52 - pico-train - INFO - Step 44775 -- 🔄 Training Metrics 2025-08-30 05:44:52 - pico-train - INFO - ├── Loss: 6.0873 2025-08-30 05:44:52 - pico-train - INFO - ├── Learning Rate: 3.27e-05 2025-08-30 05:44:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:45:04 - pico-train - INFO - Step 44800 -- 🔄 Training Metrics 2025-08-30 05:45:04 - pico-train - INFO - ├── Loss: 6.0366 2025-08-30 05:45:04 - pico-train - INFO - ├── Learning Rate: 3.27e-05 2025-08-30 05:45:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:45:17 - pico-train - INFO - Step 44825 -- 🔄 Training Metrics 2025-08-30 05:45:17 - pico-train - INFO - ├── Loss: 6.0182 2025-08-30 05:45:17 - pico-train - INFO - ├── Learning Rate: 3.27e-05 2025-08-30 05:45:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:45:29 - pico-train - INFO - Step 44850 -- 🔄 Training Metrics 2025-08-30 05:45:29 - pico-train - INFO - ├── Loss: 6.0006 2025-08-30 05:45:29 - pico-train - INFO - ├── Learning Rate: 3.27e-05 2025-08-30 05:45:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:45:42 - pico-train - INFO - Step 44875 -- 🔄 Training Metrics 2025-08-30 05:45:42 - pico-train - INFO - ├── Loss: 6.0773 2025-08-30 05:45:42 - pico-train - INFO - ├── Learning Rate: 3.27e-05 2025-08-30 05:45:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:45:54 - pico-train - INFO - Step 44900 -- 🔄 Training Metrics 2025-08-30 05:45:54 - pico-train - INFO - ├── Loss: 6.0644 2025-08-30 05:45:54 - pico-train - INFO - ├── Learning Rate: 3.26e-05 2025-08-30 05:45:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:46:07 - pico-train - INFO - Step 44925 -- 🔄 Training Metrics 2025-08-30 05:46:07 - pico-train - INFO - ├── Loss: 6.0927 2025-08-30 05:46:07 - pico-train - INFO - ├── Learning Rate: 3.26e-05 2025-08-30 05:46:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:46:20 - pico-train - INFO - Step 44950 -- 🔄 Training Metrics 2025-08-30 05:46:20 - pico-train - INFO - ├── Loss: 6.0458 2025-08-30 05:46:20 - pico-train - INFO - ├── Learning Rate: 3.26e-05 2025-08-30 05:46:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:46:32 - pico-train - INFO - Step 44975 -- 🔄 Training Metrics 2025-08-30 05:46:32 - pico-train - INFO - ├── Loss: 6.0466 2025-08-30 05:46:32 - pico-train - INFO - ├── Learning Rate: 3.26e-05 2025-08-30 05:46:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:46:44 - pico-train - INFO - Step 45000 -- 💾 Saving Checkpoint 2025-08-30 05:48:48 - pico-train - INFO - Step 45000 -- 📊 Evaluation Results 2025-08-30 05:48:48 - pico-train - INFO - └── paloma: 1.3981708485109732e+28 2025-08-30 05:48:50 - pico-train - INFO - Step 45000 -- 🔄 Training Metrics 2025-08-30 05:48:50 - pico-train - INFO - ├── Loss: 6.0790 2025-08-30 05:48:50 - pico-train - INFO - ├── Learning Rate: 3.26e-05 2025-08-30 05:48:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:48:50 - pico-train - INFO - Step 45000 -- 📈 Saving Learning Dynamics 2025-08-30 05:49:05 - pico-train - INFO - Step 45025 -- 🔄 Training Metrics 2025-08-30 05:49:05 - pico-train - INFO - ├── Loss: 6.0231 2025-08-30 05:49:05 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-30 05:49:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:49:17 - pico-train - INFO - Step 45050 -- 🔄 Training Metrics 2025-08-30 05:49:17 - pico-train - INFO - ├── Loss: 6.0257 2025-08-30 05:49:17 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-30 05:49:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:49:30 - pico-train - INFO - Step 45075 -- 🔄 Training Metrics 2025-08-30 05:49:30 - pico-train - INFO - ├── Loss: 6.0401 2025-08-30 05:49:30 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-30 05:49:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:49:43 - pico-train - INFO - Step 45100 -- 🔄 Training Metrics 2025-08-30 05:49:43 - pico-train - INFO - ├── Loss: 6.0050 2025-08-30 05:49:43 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-30 05:49:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:49:56 - pico-train - INFO - Step 45125 -- 🔄 Training Metrics 2025-08-30 05:49:56 - pico-train - INFO - ├── Loss: 6.0666 2025-08-30 05:49:56 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-30 05:49:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:50:08 - pico-train - INFO - Step 45150 -- 🔄 Training Metrics 2025-08-30 05:50:08 - pico-train - INFO - ├── Loss: 6.0214 2025-08-30 05:50:08 - pico-train - INFO - ├── Learning Rate: 3.24e-05 2025-08-30 05:50:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:50:21 - pico-train - INFO - Step 45175 -- 🔄 Training Metrics 2025-08-30 05:50:21 - pico-train - INFO - ├── Loss: 6.1788 2025-08-30 05:50:21 - pico-train - INFO - ├── Learning Rate: 3.24e-05 2025-08-30 05:50:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:50:33 - pico-train - INFO - Step 45200 -- 🔄 Training Metrics 2025-08-30 05:50:33 - pico-train - INFO - ├── Loss: 6.0156 2025-08-30 05:50:33 - pico-train - INFO - ├── Learning Rate: 3.24e-05 2025-08-30 05:50:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:50:46 - pico-train - INFO - Step 45225 -- 🔄 Training Metrics 2025-08-30 05:50:46 - pico-train - INFO - ├── Loss: 6.0201 2025-08-30 05:50:46 - pico-train - INFO - ├── Learning Rate: 3.24e-05 2025-08-30 05:50:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:50:59 - pico-train - INFO - Step 45250 -- 🔄 Training Metrics 2025-08-30 05:50:59 - pico-train - INFO - ├── Loss: 6.0011 2025-08-30 05:50:59 - pico-train - INFO - ├── Learning Rate: 3.24e-05 2025-08-30 05:50:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:51:11 - pico-train - INFO - Step 45275 -- 🔄 Training Metrics 2025-08-30 05:51:11 - pico-train - INFO - ├── Loss: 6.1612 2025-08-30 05:51:11 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-30 05:51:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:51:24 - pico-train - INFO - Step 45300 -- 🔄 Training Metrics 2025-08-30 05:51:24 - pico-train - INFO - ├── Loss: 6.0480 2025-08-30 05:51:24 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-30 05:51:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:51:36 - pico-train - INFO - Step 45325 -- 🔄 Training Metrics 2025-08-30 05:51:36 - pico-train - INFO - ├── Loss: 5.9685 2025-08-30 05:51:36 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-30 05:51:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:51:49 - pico-train - INFO - Step 45350 -- 🔄 Training Metrics 2025-08-30 05:51:49 - pico-train - INFO - ├── Loss: 6.0803 2025-08-30 05:51:49 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-30 05:51:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:52:02 - pico-train - INFO - Step 45375 -- 🔄 Training Metrics 2025-08-30 05:52:02 - pico-train - INFO - ├── Loss: 6.0258 2025-08-30 05:52:02 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-30 05:52:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:52:14 - pico-train - INFO - Step 45400 -- 🔄 Training Metrics 2025-08-30 05:52:14 - pico-train - INFO - ├── Loss: 6.0367 2025-08-30 05:52:14 - pico-train - INFO - ├── Learning Rate: 3.22e-05 2025-08-30 05:52:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:52:27 - pico-train - INFO - Step 45425 -- 🔄 Training Metrics 2025-08-30 05:52:27 - pico-train - INFO - ├── Loss: 5.9915 2025-08-30 05:52:27 - pico-train - INFO - ├── Learning Rate: 3.22e-05 2025-08-30 05:52:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:52:39 - pico-train - INFO - Step 45450 -- 🔄 Training Metrics 2025-08-30 05:52:39 - pico-train - INFO - ├── Loss: 5.9926 2025-08-30 05:52:39 - pico-train - INFO - ├── Learning Rate: 3.22e-05 2025-08-30 05:52:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:52:52 - pico-train - INFO - Step 45475 -- 🔄 Training Metrics 2025-08-30 05:52:52 - pico-train - INFO - ├── Loss: 5.9767 2025-08-30 05:52:52 - pico-train - INFO - ├── Learning Rate: 3.22e-05 2025-08-30 05:52:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:53:04 - pico-train - INFO - Step 45500 -- 💾 Saving Checkpoint 2025-08-30 05:55:00 - pico-train - INFO - Step 45500 -- 📊 Evaluation Results 2025-08-30 05:55:00 - pico-train - INFO - └── paloma: 2.1286507820171466e+28 2025-08-30 05:55:02 - pico-train - INFO - Step 45500 -- 🔄 Training Metrics 2025-08-30 05:55:02 - pico-train - INFO - ├── Loss: 6.0752 2025-08-30 05:55:02 - pico-train - INFO - ├── Learning Rate: 3.22e-05 2025-08-30 05:55:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:55:02 - pico-train - INFO - Step 45500 -- 📈 Saving Learning Dynamics 2025-08-30 05:55:17 - pico-train - INFO - Step 45525 -- 🔄 Training Metrics 2025-08-30 05:55:17 - pico-train - INFO - ├── Loss: 6.0444 2025-08-30 05:55:17 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-30 05:55:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:55:29 - pico-train - INFO - Step 45550 -- 🔄 Training Metrics 2025-08-30 05:55:29 - pico-train - INFO - ├── Loss: 6.0119 2025-08-30 05:55:29 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-30 05:55:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:55:42 - pico-train - INFO - Step 45575 -- 🔄 Training Metrics 2025-08-30 05:55:42 - pico-train - INFO - ├── Loss: 6.0627 2025-08-30 05:55:42 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-30 05:55:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:55:55 - pico-train - INFO - Step 45600 -- 🔄 Training Metrics 2025-08-30 05:55:55 - pico-train - INFO - ├── Loss: 5.9389 2025-08-30 05:55:55 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-30 05:55:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:56:07 - pico-train - INFO - Step 45625 -- 🔄 Training Metrics 2025-08-30 05:56:07 - pico-train - INFO - ├── Loss: 6.1041 2025-08-30 05:56:07 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-30 05:56:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:56:20 - pico-train - INFO - Step 45650 -- 🔄 Training Metrics 2025-08-30 05:56:20 - pico-train - INFO - ├── Loss: 6.0837 2025-08-30 05:56:20 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-30 05:56:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:56:33 - pico-train - INFO - Step 45675 -- 🔄 Training Metrics 2025-08-30 05:56:33 - pico-train - INFO - ├── Loss: 6.0495 2025-08-30 05:56:33 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-30 05:56:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:56:45 - pico-train - INFO - Step 45700 -- 🔄 Training Metrics 2025-08-30 05:56:45 - pico-train - INFO - ├── Loss: 6.0507 2025-08-30 05:56:45 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-30 05:56:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:56:58 - pico-train - INFO - Step 45725 -- 🔄 Training Metrics 2025-08-30 05:56:58 - pico-train - INFO - ├── Loss: 6.0594 2025-08-30 05:56:58 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-30 05:56:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:57:11 - pico-train - INFO - Step 45750 -- 🔄 Training Metrics 2025-08-30 05:57:11 - pico-train - INFO - ├── Loss: 6.0685 2025-08-30 05:57:11 - pico-train - INFO - ├── Learning Rate: 3.20e-05 2025-08-30 05:57:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:57:23 - pico-train - INFO - Step 45775 -- 🔄 Training Metrics 2025-08-30 05:57:23 - pico-train - INFO - ├── Loss: 6.0040 2025-08-30 05:57:23 - pico-train - INFO - ├── Learning Rate: 3.19e-05 2025-08-30 05:57:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:57:36 - pico-train - INFO - Step 45800 -- 🔄 Training Metrics 2025-08-30 05:57:36 - pico-train - INFO - ├── Loss: 6.0630 2025-08-30 05:57:36 - pico-train - INFO - ├── Learning Rate: 3.19e-05 2025-08-30 05:57:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:57:48 - pico-train - INFO - Step 45825 -- 🔄 Training Metrics 2025-08-30 05:57:48 - pico-train - INFO - ├── Loss: 6.0334 2025-08-30 05:57:48 - pico-train - INFO - ├── Learning Rate: 3.19e-05 2025-08-30 05:57:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:58:01 - pico-train - INFO - Step 45850 -- 🔄 Training Metrics 2025-08-30 05:58:01 - pico-train - INFO - ├── Loss: 6.0141 2025-08-30 05:58:01 - pico-train - INFO - ├── Learning Rate: 3.19e-05 2025-08-30 05:58:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:58:14 - pico-train - INFO - Step 45875 -- 🔄 Training Metrics 2025-08-30 05:58:14 - pico-train - INFO - ├── Loss: 6.0175 2025-08-30 05:58:14 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-30 05:58:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:58:27 - pico-train - INFO - Step 45900 -- 🔄 Training Metrics 2025-08-30 05:58:27 - pico-train - INFO - ├── Loss: 6.0745 2025-08-30 05:58:27 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-30 05:58:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:58:39 - pico-train - INFO - Step 45925 -- 🔄 Training Metrics 2025-08-30 05:58:39 - pico-train - INFO - ├── Loss: 6.0172 2025-08-30 05:58:39 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-30 05:58:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:58:52 - pico-train - INFO - Step 45950 -- 🔄 Training Metrics 2025-08-30 05:58:52 - pico-train - INFO - ├── Loss: 5.9627 2025-08-30 05:58:52 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-30 05:58:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:59:04 - pico-train - INFO - Step 45975 -- 🔄 Training Metrics 2025-08-30 05:59:04 - pico-train - INFO - ├── Loss: 5.9906 2025-08-30 05:59:04 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-30 05:59:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 05:59:17 - pico-train - INFO - Step 46000 -- 💾 Saving Checkpoint 2025-08-30 06:01:12 - pico-train - INFO - Step 46000 -- 📊 Evaluation Results 2025-08-30 06:01:12 - pico-train - INFO - └── paloma: 2.287805203128674e+28 2025-08-30 06:01:14 - pico-train - INFO - Step 46000 -- 🔄 Training Metrics 2025-08-30 06:01:14 - pico-train - INFO - ├── Loss: 6.0973 2025-08-30 06:01:14 - pico-train - INFO - ├── Learning Rate: 3.17e-05 2025-08-30 06:01:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:01:14 - pico-train - INFO - Step 46000 -- 📈 Saving Learning Dynamics 2025-08-30 06:01:29 - pico-train - INFO - Step 46025 -- 🔄 Training Metrics 2025-08-30 06:01:29 - pico-train - INFO - ├── Loss: 5.9999 2025-08-30 06:01:29 - pico-train - INFO - ├── Learning Rate: 3.17e-05 2025-08-30 06:01:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:01:41 - pico-train - INFO - Step 46050 -- 🔄 Training Metrics 2025-08-30 06:01:41 - pico-train - INFO - ├── Loss: 5.9786 2025-08-30 06:01:41 - pico-train - INFO - ├── Learning Rate: 3.17e-05 2025-08-30 06:01:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:01:54 - pico-train - INFO - Step 46075 -- 🔄 Training Metrics 2025-08-30 06:01:54 - pico-train - INFO - ├── Loss: 6.0511 2025-08-30 06:01:54 - pico-train - INFO - ├── Learning Rate: 3.17e-05 2025-08-30 06:01:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:02:06 - pico-train - INFO - Step 46100 -- 🔄 Training Metrics 2025-08-30 06:02:06 - pico-train - INFO - ├── Loss: 5.9915 2025-08-30 06:02:06 - pico-train - INFO - ├── Learning Rate: 3.17e-05 2025-08-30 06:02:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:02:19 - pico-train - INFO - Step 46125 -- 🔄 Training Metrics 2025-08-30 06:02:19 - pico-train - INFO - ├── Loss: 6.0164 2025-08-30 06:02:19 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-30 06:02:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:02:32 - pico-train - INFO - Step 46150 -- 🔄 Training Metrics 2025-08-30 06:02:32 - pico-train - INFO - ├── Loss: 6.0278 2025-08-30 06:02:32 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-30 06:02:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:02:45 - pico-train - INFO - Step 46175 -- 🔄 Training Metrics 2025-08-30 06:02:45 - pico-train - INFO - ├── Loss: 5.9636 2025-08-30 06:02:45 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-30 06:02:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:02:57 - pico-train - INFO - Step 46200 -- 🔄 Training Metrics 2025-08-30 06:02:57 - pico-train - INFO - ├── Loss: 5.9233 2025-08-30 06:02:57 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-30 06:02:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:03:10 - pico-train - INFO - Step 46225 -- 🔄 Training Metrics 2025-08-30 06:03:10 - pico-train - INFO - ├── Loss: 6.1381 2025-08-30 06:03:10 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-30 06:03:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:03:23 - pico-train - INFO - Step 46250 -- 🔄 Training Metrics 2025-08-30 06:03:23 - pico-train - INFO - ├── Loss: 5.9423 2025-08-30 06:03:23 - pico-train - INFO - ├── Learning Rate: 3.15e-05 2025-08-30 06:03:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:03:36 - pico-train - INFO - Step 46275 -- 🔄 Training Metrics 2025-08-30 06:03:36 - pico-train - INFO - ├── Loss: 5.9885 2025-08-30 06:03:36 - pico-train - INFO - ├── Learning Rate: 3.15e-05 2025-08-30 06:03:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:03:48 - pico-train - INFO - Step 46300 -- 🔄 Training Metrics 2025-08-30 06:03:48 - pico-train - INFO - ├── Loss: 6.0572 2025-08-30 06:03:48 - pico-train - INFO - ├── Learning Rate: 3.15e-05 2025-08-30 06:03:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:04:01 - pico-train - INFO - Step 46325 -- 🔄 Training Metrics 2025-08-30 06:04:01 - pico-train - INFO - ├── Loss: 6.0765 2025-08-30 06:04:01 - pico-train - INFO - ├── Learning Rate: 3.15e-05 2025-08-30 06:04:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:04:13 - pico-train - INFO - Step 46350 -- 🔄 Training Metrics 2025-08-30 06:04:13 - pico-train - INFO - ├── Loss: 6.0594 2025-08-30 06:04:13 - pico-train - INFO - ├── Learning Rate: 3.15e-05 2025-08-30 06:04:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:04:26 - pico-train - INFO - Step 46375 -- 🔄 Training Metrics 2025-08-30 06:04:26 - pico-train - INFO - ├── Loss: 6.0579 2025-08-30 06:04:26 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-30 06:04:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:04:39 - pico-train - INFO - Step 46400 -- 🔄 Training Metrics 2025-08-30 06:04:39 - pico-train - INFO - ├── Loss: 5.9964 2025-08-30 06:04:39 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-30 06:04:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:04:51 - pico-train - INFO - Step 46425 -- 🔄 Training Metrics 2025-08-30 06:04:51 - pico-train - INFO - ├── Loss: 6.0002 2025-08-30 06:04:51 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-30 06:04:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:05:04 - pico-train - INFO - Step 46450 -- 🔄 Training Metrics 2025-08-30 06:05:04 - pico-train - INFO - ├── Loss: 6.0970 2025-08-30 06:05:04 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-30 06:05:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:05:17 - pico-train - INFO - Step 46475 -- 🔄 Training Metrics 2025-08-30 06:05:17 - pico-train - INFO - ├── Loss: 5.9791 2025-08-30 06:05:17 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-30 06:05:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:05:29 - pico-train - INFO - Step 46500 -- 💾 Saving Checkpoint 2025-08-30 06:07:22 - pico-train - INFO - Step 46500 -- 📊 Evaluation Results 2025-08-30 06:07:22 - pico-train - INFO - └── paloma: 2.5264771857772e+28 2025-08-30 06:07:35 - pico-train - INFO - Step 46500 -- 🔄 Training Metrics 2025-08-30 06:07:35 - pico-train - INFO - ├── Loss: 5.9970 2025-08-30 06:07:35 - pico-train - INFO - ├── Learning Rate: 3.13e-05 2025-08-30 06:07:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:07:35 - pico-train - INFO - Step 46500 -- 📈 Saving Learning Dynamics 2025-08-30 06:07:50 - pico-train - INFO - Step 46525 -- 🔄 Training Metrics 2025-08-30 06:07:50 - pico-train - INFO - ├── Loss: 5.9723 2025-08-30 06:07:50 - pico-train - INFO - ├── Learning Rate: 3.13e-05 2025-08-30 06:07:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:08:02 - pico-train - INFO - Step 46550 -- 🔄 Training Metrics 2025-08-30 06:08:02 - pico-train - INFO - ├── Loss: 5.9671 2025-08-30 06:08:02 - pico-train - INFO - ├── Learning Rate: 3.13e-05 2025-08-30 06:08:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:08:15 - pico-train - INFO - Step 46575 -- 🔄 Training Metrics 2025-08-30 06:08:15 - pico-train - INFO - ├── Loss: 5.9461 2025-08-30 06:08:15 - pico-train - INFO - ├── Learning Rate: 3.13e-05 2025-08-30 06:08:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:08:27 - pico-train - INFO - Step 46600 -- 🔄 Training Metrics 2025-08-30 06:08:27 - pico-train - INFO - ├── Loss: 6.0239 2025-08-30 06:08:27 - pico-train - INFO - ├── Learning Rate: 3.13e-05 2025-08-30 06:08:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:08:40 - pico-train - INFO - Step 46625 -- 🔄 Training Metrics 2025-08-30 06:08:40 - pico-train - INFO - ├── Loss: 6.0496 2025-08-30 06:08:40 - pico-train - INFO - ├── Learning Rate: 3.12e-05 2025-08-30 06:08:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:08:53 - pico-train - INFO - Step 46650 -- 🔄 Training Metrics 2025-08-30 06:08:53 - pico-train - INFO - ├── Loss: 5.9859 2025-08-30 06:08:53 - pico-train - INFO - ├── Learning Rate: 3.12e-05 2025-08-30 06:08:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:09:05 - pico-train - INFO - Step 46675 -- 🔄 Training Metrics 2025-08-30 06:09:05 - pico-train - INFO - ├── Loss: 6.0529 2025-08-30 06:09:05 - pico-train - INFO - ├── Learning Rate: 3.12e-05 2025-08-30 06:09:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:09:18 - pico-train - INFO - Step 46700 -- 🔄 Training Metrics 2025-08-30 06:09:18 - pico-train - INFO - ├── Loss: 6.0469 2025-08-30 06:09:18 - pico-train - INFO - ├── Learning Rate: 3.12e-05 2025-08-30 06:09:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:09:31 - pico-train - INFO - Step 46725 -- 🔄 Training Metrics 2025-08-30 06:09:31 - pico-train - INFO - ├── Loss: 6.0152 2025-08-30 06:09:31 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-30 06:09:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:09:43 - pico-train - INFO - Step 46750 -- 🔄 Training Metrics 2025-08-30 06:09:43 - pico-train - INFO - ├── Loss: 6.0636 2025-08-30 06:09:43 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-30 06:09:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:09:56 - pico-train - INFO - Step 46775 -- 🔄 Training Metrics 2025-08-30 06:09:56 - pico-train - INFO - ├── Loss: 6.0503 2025-08-30 06:09:56 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-30 06:09:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:10:08 - pico-train - INFO - Step 46800 -- 🔄 Training Metrics 2025-08-30 06:10:08 - pico-train - INFO - ├── Loss: 6.0151 2025-08-30 06:10:08 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-30 06:10:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:10:21 - pico-train - INFO - Step 46825 -- 🔄 Training Metrics 2025-08-30 06:10:21 - pico-train - INFO - ├── Loss: 5.9617 2025-08-30 06:10:21 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-30 06:10:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:10:33 - pico-train - INFO - Step 46850 -- 🔄 Training Metrics 2025-08-30 06:10:33 - pico-train - INFO - ├── Loss: 5.9888 2025-08-30 06:10:33 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-30 06:10:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:10:46 - pico-train - INFO - Step 46875 -- 🔄 Training Metrics 2025-08-30 06:10:46 - pico-train - INFO - ├── Loss: 5.9116 2025-08-30 06:10:46 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-30 06:10:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:10:59 - pico-train - INFO - Step 46900 -- 🔄 Training Metrics 2025-08-30 06:10:59 - pico-train - INFO - ├── Loss: 6.0299 2025-08-30 06:10:59 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-30 06:10:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:11:11 - pico-train - INFO - Step 46925 -- 🔄 Training Metrics 2025-08-30 06:11:11 - pico-train - INFO - ├── Loss: 5.9876 2025-08-30 06:11:11 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-30 06:11:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:11:24 - pico-train - INFO - Step 46950 -- 🔄 Training Metrics 2025-08-30 06:11:24 - pico-train - INFO - ├── Loss: 6.0462 2025-08-30 06:11:24 - pico-train - INFO - ├── Learning Rate: 3.10e-05 2025-08-30 06:11:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:11:36 - pico-train - INFO - Step 46975 -- 🔄 Training Metrics 2025-08-30 06:11:36 - pico-train - INFO - ├── Loss: 6.0083 2025-08-30 06:11:36 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-30 06:11:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:11:48 - pico-train - INFO - Step 47000 -- 💾 Saving Checkpoint 2025-08-30 06:13:47 - pico-train - INFO - Step 47000 -- 📊 Evaluation Results 2025-08-30 06:13:47 - pico-train - INFO - └── paloma: 3.374744437022473e+28 2025-08-30 06:13:49 - pico-train - INFO - Step 47000 -- 🔄 Training Metrics 2025-08-30 06:13:49 - pico-train - INFO - ├── Loss: 6.0269 2025-08-30 06:13:49 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-30 06:13:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:13:49 - pico-train - INFO - Step 47000 -- 📈 Saving Learning Dynamics 2025-08-30 06:14:04 - pico-train - INFO - Step 47025 -- 🔄 Training Metrics 2025-08-30 06:14:04 - pico-train - INFO - ├── Loss: 6.0510 2025-08-30 06:14:04 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-30 06:14:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:14:17 - pico-train - INFO - Step 47050 -- 🔄 Training Metrics 2025-08-30 06:14:17 - pico-train - INFO - ├── Loss: 5.9631 2025-08-30 06:14:17 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-30 06:14:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:14:29 - pico-train - INFO - Step 47075 -- 🔄 Training Metrics 2025-08-30 06:14:29 - pico-train - INFO - ├── Loss: 5.9767 2025-08-30 06:14:29 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-30 06:14:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:14:42 - pico-train - INFO - Step 47100 -- 🔄 Training Metrics 2025-08-30 06:14:42 - pico-train - INFO - ├── Loss: 6.0403 2025-08-30 06:14:42 - pico-train - INFO - ├── Learning Rate: 3.08e-05 2025-08-30 06:14:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:14:55 - pico-train - INFO - Step 47125 -- 🔄 Training Metrics 2025-08-30 06:14:55 - pico-train - INFO - ├── Loss: 6.0179 2025-08-30 06:14:55 - pico-train - INFO - ├── Learning Rate: 3.08e-05 2025-08-30 06:14:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:15:08 - pico-train - INFO - Step 47150 -- 🔄 Training Metrics 2025-08-30 06:15:08 - pico-train - INFO - ├── Loss: 6.0036 2025-08-30 06:15:08 - pico-train - INFO - ├── Learning Rate: 3.08e-05 2025-08-30 06:15:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:15:21 - pico-train - INFO - Step 47175 -- 🔄 Training Metrics 2025-08-30 06:15:21 - pico-train - INFO - ├── Loss: 6.0186 2025-08-30 06:15:21 - pico-train - INFO - ├── Learning Rate: 3.08e-05 2025-08-30 06:15:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:15:33 - pico-train - INFO - Step 47200 -- 🔄 Training Metrics 2025-08-30 06:15:33 - pico-train - INFO - ├── Loss: 5.9299 2025-08-30 06:15:33 - pico-train - INFO - ├── Learning Rate: 3.08e-05 2025-08-30 06:15:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:15:46 - pico-train - INFO - Step 47225 -- 🔄 Training Metrics 2025-08-30 06:15:46 - pico-train - INFO - ├── Loss: 6.1006 2025-08-30 06:15:46 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-30 06:15:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:15:58 - pico-train - INFO - Step 47250 -- 🔄 Training Metrics 2025-08-30 06:15:58 - pico-train - INFO - ├── Loss: 5.9586 2025-08-30 06:15:58 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-30 06:15:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:16:11 - pico-train - INFO - Step 47275 -- 🔄 Training Metrics 2025-08-30 06:16:11 - pico-train - INFO - ├── Loss: 6.0152 2025-08-30 06:16:11 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-30 06:16:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:16:24 - pico-train - INFO - Step 47300 -- 🔄 Training Metrics 2025-08-30 06:16:24 - pico-train - INFO - ├── Loss: 5.9418 2025-08-30 06:16:24 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-30 06:16:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:16:36 - pico-train - INFO - Step 47325 -- 🔄 Training Metrics 2025-08-30 06:16:36 - pico-train - INFO - ├── Loss: 5.9040 2025-08-30 06:16:36 - pico-train - INFO - ├── Learning Rate: 3.06e-05 2025-08-30 06:16:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:16:49 - pico-train - INFO - Step 47350 -- 🔄 Training Metrics 2025-08-30 06:16:49 - pico-train - INFO - ├── Loss: 6.0085 2025-08-30 06:16:49 - pico-train - INFO - ├── Learning Rate: 3.06e-05 2025-08-30 06:16:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:17:01 - pico-train - INFO - Step 47375 -- 🔄 Training Metrics 2025-08-30 06:17:01 - pico-train - INFO - ├── Loss: 5.9546 2025-08-30 06:17:01 - pico-train - INFO - ├── Learning Rate: 3.06e-05 2025-08-30 06:17:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:17:14 - pico-train - INFO - Step 47400 -- 🔄 Training Metrics 2025-08-30 06:17:14 - pico-train - INFO - ├── Loss: 6.0002 2025-08-30 06:17:14 - pico-train - INFO - ├── Learning Rate: 3.06e-05 2025-08-30 06:17:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:17:26 - pico-train - INFO - Step 47425 -- 🔄 Training Metrics 2025-08-30 06:17:26 - pico-train - INFO - ├── Loss: 5.9671 2025-08-30 06:17:26 - pico-train - INFO - ├── Learning Rate: 3.06e-05 2025-08-30 06:17:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:17:39 - pico-train - INFO - Step 47450 -- 🔄 Training Metrics 2025-08-30 06:17:39 - pico-train - INFO - ├── Loss: 5.9857 2025-08-30 06:17:39 - pico-train - INFO - ├── Learning Rate: 3.05e-05 2025-08-30 06:17:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:17:52 - pico-train - INFO - Step 47475 -- 🔄 Training Metrics 2025-08-30 06:17:52 - pico-train - INFO - ├── Loss: 6.0252 2025-08-30 06:17:52 - pico-train - INFO - ├── Learning Rate: 3.05e-05 2025-08-30 06:17:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:18:04 - pico-train - INFO - Step 47500 -- 💾 Saving Checkpoint 2025-08-30 06:19:56 - pico-train - INFO - Step 47500 -- 📊 Evaluation Results 2025-08-30 06:19:56 - pico-train - INFO - └── paloma: 6.3085366283161405e+28 2025-08-30 06:19:57 - pico-train - INFO - Step 47500 -- 🔄 Training Metrics 2025-08-30 06:19:57 - pico-train - INFO - ├── Loss: 6.0560 2025-08-30 06:19:57 - pico-train - INFO - ├── Learning Rate: 3.05e-05 2025-08-30 06:19:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:19:57 - pico-train - INFO - Step 47500 -- 📈 Saving Learning Dynamics 2025-08-30 06:20:12 - pico-train - INFO - Step 47525 -- 🔄 Training Metrics 2025-08-30 06:20:12 - pico-train - INFO - ├── Loss: 5.9855 2025-08-30 06:20:12 - pico-train - INFO - ├── Learning Rate: 3.05e-05 2025-08-30 06:20:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:20:25 - pico-train - INFO - Step 47550 -- 🔄 Training Metrics 2025-08-30 06:20:25 - pico-train - INFO - ├── Loss: 5.9577 2025-08-30 06:20:25 - pico-train - INFO - ├── Learning Rate: 3.05e-05 2025-08-30 06:20:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:20:37 - pico-train - INFO - Step 47575 -- 🔄 Training Metrics 2025-08-30 06:20:37 - pico-train - INFO - ├── Loss: 6.0061 2025-08-30 06:20:37 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-30 06:20:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:20:50 - pico-train - INFO - Step 47600 -- 🔄 Training Metrics 2025-08-30 06:20:50 - pico-train - INFO - ├── Loss: 5.9977 2025-08-30 06:20:50 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-30 06:20:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:21:03 - pico-train - INFO - Step 47625 -- 🔄 Training Metrics 2025-08-30 06:21:03 - pico-train - INFO - ├── Loss: 5.9507 2025-08-30 06:21:03 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-30 06:21:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:21:15 - pico-train - INFO - Step 47650 -- 🔄 Training Metrics 2025-08-30 06:21:15 - pico-train - INFO - ├── Loss: 5.9363 2025-08-30 06:21:15 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-30 06:21:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:21:28 - pico-train - INFO - Step 47675 -- 🔄 Training Metrics 2025-08-30 06:21:28 - pico-train - INFO - ├── Loss: 6.0677 2025-08-30 06:21:28 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-30 06:21:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:21:41 - pico-train - INFO - Step 47700 -- 🔄 Training Metrics 2025-08-30 06:21:41 - pico-train - INFO - ├── Loss: 6.0777 2025-08-30 06:21:41 - pico-train - INFO - ├── Learning Rate: 3.03e-05 2025-08-30 06:21:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:21:53 - pico-train - INFO - Step 47725 -- 🔄 Training Metrics 2025-08-30 06:21:53 - pico-train - INFO - ├── Loss: 5.9203 2025-08-30 06:21:53 - pico-train - INFO - ├── Learning Rate: 3.03e-05 2025-08-30 06:21:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:22:06 - pico-train - INFO - Step 47750 -- 🔄 Training Metrics 2025-08-30 06:22:06 - pico-train - INFO - ├── Loss: 6.0014 2025-08-30 06:22:06 - pico-train - INFO - ├── Learning Rate: 3.03e-05 2025-08-30 06:22:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:22:19 - pico-train - INFO - Step 47775 -- 🔄 Training Metrics 2025-08-30 06:22:19 - pico-train - INFO - ├── Loss: 5.9680 2025-08-30 06:22:19 - pico-train - INFO - ├── Learning Rate: 3.03e-05 2025-08-30 06:22:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:22:31 - pico-train - INFO - Step 47800 -- 🔄 Training Metrics 2025-08-30 06:22:31 - pico-train - INFO - ├── Loss: 6.0516 2025-08-30 06:22:31 - pico-train - INFO - ├── Learning Rate: 3.03e-05 2025-08-30 06:22:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:22:44 - pico-train - INFO - Step 47825 -- 🔄 Training Metrics 2025-08-30 06:22:44 - pico-train - INFO - ├── Loss: 6.0163 2025-08-30 06:22:44 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-30 06:22:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:22:56 - pico-train - INFO - Step 47850 -- 🔄 Training Metrics 2025-08-30 06:22:56 - pico-train - INFO - ├── Loss: 6.0132 2025-08-30 06:22:56 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-30 06:22:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:23:09 - pico-train - INFO - Step 47875 -- 🔄 Training Metrics 2025-08-30 06:23:09 - pico-train - INFO - ├── Loss: 5.9571 2025-08-30 06:23:09 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-30 06:23:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:23:22 - pico-train - INFO - Step 47900 -- 🔄 Training Metrics 2025-08-30 06:23:22 - pico-train - INFO - ├── Loss: 5.9390 2025-08-30 06:23:22 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-30 06:23:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:23:34 - pico-train - INFO - Step 47925 -- 🔄 Training Metrics 2025-08-30 06:23:34 - pico-train - INFO - ├── Loss: 5.9870 2025-08-30 06:23:34 - pico-train - INFO - ├── Learning Rate: 3.01e-05 2025-08-30 06:23:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:23:47 - pico-train - INFO - Step 47950 -- 🔄 Training Metrics 2025-08-30 06:23:47 - pico-train - INFO - ├── Loss: 5.9717 2025-08-30 06:23:47 - pico-train - INFO - ├── Learning Rate: 3.01e-05 2025-08-30 06:23:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:23:59 - pico-train - INFO - Step 47975 -- 🔄 Training Metrics 2025-08-30 06:23:59 - pico-train - INFO - ├── Loss: 6.0558 2025-08-30 06:23:59 - pico-train - INFO - ├── Learning Rate: 3.01e-05 2025-08-30 06:23:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:24:12 - pico-train - INFO - Step 48000 -- 💾 Saving Checkpoint 2025-08-30 06:26:04 - pico-train - INFO - Step 48000 -- 📊 Evaluation Results 2025-08-30 06:26:04 - pico-train - INFO - └── paloma: 6.49975478431273e+28 2025-08-30 06:26:06 - pico-train - INFO - Step 48000 -- 🔄 Training Metrics 2025-08-30 06:26:06 - pico-train - INFO - ├── Loss: 6.0808 2025-08-30 06:26:06 - pico-train - INFO - ├── Learning Rate: 3.01e-05 2025-08-30 06:26:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:26:06 - pico-train - INFO - Step 48000 -- 📈 Saving Learning Dynamics 2025-08-30 06:26:20 - pico-train - INFO - Step 48025 -- 🔄 Training Metrics 2025-08-30 06:26:20 - pico-train - INFO - ├── Loss: 6.0001 2025-08-30 06:26:20 - pico-train - INFO - ├── Learning Rate: 3.01e-05 2025-08-30 06:26:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:26:33 - pico-train - INFO - Step 48050 -- 🔄 Training Metrics 2025-08-30 06:26:33 - pico-train - INFO - ├── Loss: 6.0349 2025-08-30 06:26:33 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-30 06:26:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:26:46 - pico-train - INFO - Step 48075 -- 🔄 Training Metrics 2025-08-30 06:26:46 - pico-train - INFO - ├── Loss: 5.9524 2025-08-30 06:26:46 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-30 06:26:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:26:58 - pico-train - INFO - Step 48100 -- 🔄 Training Metrics 2025-08-30 06:26:58 - pico-train - INFO - ├── Loss: 5.9626 2025-08-30 06:26:58 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-30 06:26:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:27:11 - pico-train - INFO - Step 48125 -- 🔄 Training Metrics 2025-08-30 06:27:11 - pico-train - INFO - ├── Loss: 6.0514 2025-08-30 06:27:11 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-30 06:27:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:27:24 - pico-train - INFO - Step 48150 -- 🔄 Training Metrics 2025-08-30 06:27:24 - pico-train - INFO - ├── Loss: 6.0687 2025-08-30 06:27:24 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-30 06:27:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:27:36 - pico-train - INFO - Step 48175 -- 🔄 Training Metrics 2025-08-30 06:27:36 - pico-train - INFO - ├── Loss: 6.0928 2025-08-30 06:27:36 - pico-train - INFO - ├── Learning Rate: 2.99e-05 2025-08-30 06:27:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:27:49 - pico-train - INFO - Step 48200 -- 🔄 Training Metrics 2025-08-30 06:27:49 - pico-train - INFO - ├── Loss: 5.9182 2025-08-30 06:27:49 - pico-train - INFO - ├── Learning Rate: 2.99e-05 2025-08-30 06:27:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:28:02 - pico-train - INFO - Step 48225 -- 🔄 Training Metrics 2025-08-30 06:28:02 - pico-train - INFO - ├── Loss: 5.9677 2025-08-30 06:28:02 - pico-train - INFO - ├── Learning Rate: 2.99e-05 2025-08-30 06:28:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:28:14 - pico-train - INFO - Step 48250 -- 🔄 Training Metrics 2025-08-30 06:28:14 - pico-train - INFO - ├── Loss: 6.0330 2025-08-30 06:28:14 - pico-train - INFO - ├── Learning Rate: 2.99e-05 2025-08-30 06:28:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:28:27 - pico-train - INFO - Step 48275 -- 🔄 Training Metrics 2025-08-30 06:28:27 - pico-train - INFO - ├── Loss: 6.0136 2025-08-30 06:28:27 - pico-train - INFO - ├── Learning Rate: 2.99e-05 2025-08-30 06:28:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:28:39 - pico-train - INFO - Step 48300 -- 🔄 Training Metrics 2025-08-30 06:28:39 - pico-train - INFO - ├── Loss: 6.0606 2025-08-30 06:28:39 - pico-train - INFO - ├── Learning Rate: 2.98e-05 2025-08-30 06:28:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:28:52 - pico-train - INFO - Step 48325 -- 🔄 Training Metrics 2025-08-30 06:28:52 - pico-train - INFO - ├── Loss: 5.9799 2025-08-30 06:28:52 - pico-train - INFO - ├── Learning Rate: 2.98e-05 2025-08-30 06:28:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:29:04 - pico-train - INFO - Step 48350 -- 🔄 Training Metrics 2025-08-30 06:29:04 - pico-train - INFO - ├── Loss: 5.9201 2025-08-30 06:29:04 - pico-train - INFO - ├── Learning Rate: 2.98e-05 2025-08-30 06:29:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:29:17 - pico-train - INFO - Step 48375 -- 🔄 Training Metrics 2025-08-30 06:29:17 - pico-train - INFO - ├── Loss: 6.0589 2025-08-30 06:29:17 - pico-train - INFO - ├── Learning Rate: 2.98e-05 2025-08-30 06:29:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:29:30 - pico-train - INFO - Step 48400 -- 🔄 Training Metrics 2025-08-30 06:29:30 - pico-train - INFO - ├── Loss: 5.9895 2025-08-30 06:29:30 - pico-train - INFO - ├── Learning Rate: 2.98e-05 2025-08-30 06:29:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:29:42 - pico-train - INFO - Step 48425 -- 🔄 Training Metrics 2025-08-30 06:29:42 - pico-train - INFO - ├── Loss: 6.0389 2025-08-30 06:29:42 - pico-train - INFO - ├── Learning Rate: 2.97e-05 2025-08-30 06:29:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:29:55 - pico-train - INFO - Step 48450 -- 🔄 Training Metrics 2025-08-30 06:29:55 - pico-train - INFO - ├── Loss: 5.9910 2025-08-30 06:29:55 - pico-train - INFO - ├── Learning Rate: 2.97e-05 2025-08-30 06:29:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:30:07 - pico-train - INFO - Step 48475 -- 🔄 Training Metrics 2025-08-30 06:30:07 - pico-train - INFO - ├── Loss: 6.0012 2025-08-30 06:30:07 - pico-train - INFO - ├── Learning Rate: 2.97e-05 2025-08-30 06:30:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:30:20 - pico-train - INFO - Step 48500 -- 💾 Saving Checkpoint 2025-08-30 06:32:12 - pico-train - INFO - Step 48500 -- 📊 Evaluation Results 2025-08-30 06:32:12 - pico-train - INFO - └── paloma: 7.468048914747141e+28 2025-08-30 06:32:15 - pico-train - INFO - Step 48500 -- 🔄 Training Metrics 2025-08-30 06:32:15 - pico-train - INFO - ├── Loss: 6.0047 2025-08-30 06:32:15 - pico-train - INFO - ├── Learning Rate: 2.97e-05 2025-08-30 06:32:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:32:15 - pico-train - INFO - Step 48500 -- 📈 Saving Learning Dynamics 2025-08-30 06:32:31 - pico-train - INFO - Step 48525 -- 🔄 Training Metrics 2025-08-30 06:32:31 - pico-train - INFO - ├── Loss: 5.9447 2025-08-30 06:32:31 - pico-train - INFO - ├── Learning Rate: 2.96e-05 2025-08-30 06:32:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:32:43 - pico-train - INFO - Step 48550 -- 🔄 Training Metrics 2025-08-30 06:32:43 - pico-train - INFO - ├── Loss: 5.9573 2025-08-30 06:32:43 - pico-train - INFO - ├── Learning Rate: 2.96e-05 2025-08-30 06:32:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:32:56 - pico-train - INFO - Step 48575 -- 🔄 Training Metrics 2025-08-30 06:32:56 - pico-train - INFO - ├── Loss: 5.9279 2025-08-30 06:32:56 - pico-train - INFO - ├── Learning Rate: 2.96e-05 2025-08-30 06:32:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:33:08 - pico-train - INFO - Step 48600 -- 🔄 Training Metrics 2025-08-30 06:33:08 - pico-train - INFO - ├── Loss: 6.0511 2025-08-30 06:33:08 - pico-train - INFO - ├── Learning Rate: 2.96e-05 2025-08-30 06:33:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:33:21 - pico-train - INFO - Step 48625 -- 🔄 Training Metrics 2025-08-30 06:33:21 - pico-train - INFO - ├── Loss: 5.9875 2025-08-30 06:33:21 - pico-train - INFO - ├── Learning Rate: 2.96e-05 2025-08-30 06:33:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:33:34 - pico-train - INFO - Step 48650 -- 🔄 Training Metrics 2025-08-30 06:33:34 - pico-train - INFO - ├── Loss: 5.9392 2025-08-30 06:33:34 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-30 06:33:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:33:46 - pico-train - INFO - Step 48675 -- 🔄 Training Metrics 2025-08-30 06:33:46 - pico-train - INFO - ├── Loss: 5.9466 2025-08-30 06:33:46 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-30 06:33:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:33:59 - pico-train - INFO - Step 48700 -- 🔄 Training Metrics 2025-08-30 06:33:59 - pico-train - INFO - ├── Loss: 6.0769 2025-08-30 06:33:59 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-30 06:33:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:34:11 - pico-train - INFO - Step 48725 -- 🔄 Training Metrics 2025-08-30 06:34:11 - pico-train - INFO - ├── Loss: 5.8933 2025-08-30 06:34:11 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-30 06:34:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:34:24 - pico-train - INFO - Step 48750 -- 🔄 Training Metrics 2025-08-30 06:34:24 - pico-train - INFO - ├── Loss: 5.9891 2025-08-30 06:34:24 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-30 06:34:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:34:37 - pico-train - INFO - Step 48775 -- 🔄 Training Metrics 2025-08-30 06:34:37 - pico-train - INFO - ├── Loss: 5.9740 2025-08-30 06:34:37 - pico-train - INFO - ├── Learning Rate: 2.94e-05 2025-08-30 06:34:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:34:49 - pico-train - INFO - Step 48800 -- 🔄 Training Metrics 2025-08-30 06:34:49 - pico-train - INFO - ├── Loss: 5.9417 2025-08-30 06:34:49 - pico-train - INFO - ├── Learning Rate: 2.94e-05 2025-08-30 06:34:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:35:02 - pico-train - INFO - Step 48825 -- 🔄 Training Metrics 2025-08-30 06:35:02 - pico-train - INFO - ├── Loss: 5.9812 2025-08-30 06:35:02 - pico-train - INFO - ├── Learning Rate: 2.94e-05 2025-08-30 06:35:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:35:14 - pico-train - INFO - Step 48850 -- 🔄 Training Metrics 2025-08-30 06:35:14 - pico-train - INFO - ├── Loss: 5.9183 2025-08-30 06:35:14 - pico-train - INFO - ├── Learning Rate: 2.94e-05 2025-08-30 06:35:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:35:27 - pico-train - INFO - Step 48875 -- 🔄 Training Metrics 2025-08-30 06:35:27 - pico-train - INFO - ├── Loss: 5.8828 2025-08-30 06:35:27 - pico-train - INFO - ├── Learning Rate: 2.94e-05 2025-08-30 06:35:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:35:40 - pico-train - INFO - Step 48900 -- 🔄 Training Metrics 2025-08-30 06:35:40 - pico-train - INFO - ├── Loss: 6.0054 2025-08-30 06:35:40 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-30 06:35:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:35:52 - pico-train - INFO - Step 48925 -- 🔄 Training Metrics 2025-08-30 06:35:52 - pico-train - INFO - ├── Loss: 5.9383 2025-08-30 06:35:52 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-30 06:35:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:36:05 - pico-train - INFO - Step 48950 -- 🔄 Training Metrics 2025-08-30 06:36:05 - pico-train - INFO - ├── Loss: 5.9938 2025-08-30 06:36:05 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-30 06:36:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:36:17 - pico-train - INFO - Step 48975 -- 🔄 Training Metrics 2025-08-30 06:36:17 - pico-train - INFO - ├── Loss: 6.0000 2025-08-30 06:36:17 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-30 06:36:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:36:30 - pico-train - INFO - Step 49000 -- 💾 Saving Checkpoint 2025-08-30 06:38:29 - pico-train - INFO - Step 49000 -- 📊 Evaluation Results 2025-08-30 06:38:29 - pico-train - INFO - └── paloma: 1.0055567105609192e+29 2025-08-30 06:38:32 - pico-train - INFO - Step 49000 -- 🔄 Training Metrics 2025-08-30 06:38:32 - pico-train - INFO - ├── Loss: 5.9422 2025-08-30 06:38:32 - pico-train - INFO - ├── Learning Rate: 2.92e-05 2025-08-30 06:38:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:38:32 - pico-train - INFO - Step 49000 -- 📈 Saving Learning Dynamics 2025-08-30 06:38:47 - pico-train - INFO - Step 49025 -- 🔄 Training Metrics 2025-08-30 06:38:47 - pico-train - INFO - ├── Loss: 6.0063 2025-08-30 06:38:47 - pico-train - INFO - ├── Learning Rate: 2.92e-05 2025-08-30 06:38:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:39:00 - pico-train - INFO - Step 49050 -- 🔄 Training Metrics 2025-08-30 06:39:00 - pico-train - INFO - ├── Loss: 5.9355 2025-08-30 06:39:00 - pico-train - INFO - ├── Learning Rate: 2.92e-05 2025-08-30 06:39:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:39:12 - pico-train - INFO - Step 49075 -- 🔄 Training Metrics 2025-08-30 06:39:12 - pico-train - INFO - ├── Loss: 5.9666 2025-08-30 06:39:12 - pico-train - INFO - ├── Learning Rate: 2.92e-05 2025-08-30 06:39:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:39:25 - pico-train - INFO - Step 49100 -- 🔄 Training Metrics 2025-08-30 06:39:25 - pico-train - INFO - ├── Loss: 5.9422 2025-08-30 06:39:25 - pico-train - INFO - ├── Learning Rate: 2.92e-05 2025-08-30 06:39:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:39:38 - pico-train - INFO - Step 49125 -- 🔄 Training Metrics 2025-08-30 06:39:38 - pico-train - INFO - ├── Loss: 6.0065 2025-08-30 06:39:38 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-30 06:39:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:39:51 - pico-train - INFO - Step 49150 -- 🔄 Training Metrics 2025-08-30 06:39:51 - pico-train - INFO - ├── Loss: 5.8978 2025-08-30 06:39:51 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-30 06:39:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:40:04 - pico-train - INFO - Step 49175 -- 🔄 Training Metrics 2025-08-30 06:40:04 - pico-train - INFO - ├── Loss: 5.9054 2025-08-30 06:40:04 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-30 06:40:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:40:16 - pico-train - INFO - Step 49200 -- 🔄 Training Metrics 2025-08-30 06:40:16 - pico-train - INFO - ├── Loss: 5.9853 2025-08-30 06:40:16 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-30 06:40:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:40:29 - pico-train - INFO - Step 49225 -- 🔄 Training Metrics 2025-08-30 06:40:29 - pico-train - INFO - ├── Loss: 6.1100 2025-08-30 06:40:29 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-30 06:40:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:40:42 - pico-train - INFO - Step 49250 -- 🔄 Training Metrics 2025-08-30 06:40:42 - pico-train - INFO - ├── Loss: 5.9674 2025-08-30 06:40:42 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-30 06:40:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:40:54 - pico-train - INFO - Step 49275 -- 🔄 Training Metrics 2025-08-30 06:40:54 - pico-train - INFO - ├── Loss: 5.9658 2025-08-30 06:40:54 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-30 06:40:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:41:07 - pico-train - INFO - Step 49300 -- 🔄 Training Metrics 2025-08-30 06:41:07 - pico-train - INFO - ├── Loss: 5.8764 2025-08-30 06:41:07 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-30 06:41:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:41:19 - pico-train - INFO - Step 49325 -- 🔄 Training Metrics 2025-08-30 06:41:19 - pico-train - INFO - ├── Loss: 6.0664 2025-08-30 06:41:19 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-30 06:41:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:41:32 - pico-train - INFO - Step 49350 -- 🔄 Training Metrics 2025-08-30 06:41:32 - pico-train - INFO - ├── Loss: 6.0158 2025-08-30 06:41:32 - pico-train - INFO - ├── Learning Rate: 2.90e-05 2025-08-30 06:41:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:41:44 - pico-train - INFO - Step 49375 -- 🔄 Training Metrics 2025-08-30 06:41:44 - pico-train - INFO - ├── Loss: 5.8884 2025-08-30 06:41:44 - pico-train - INFO - ├── Learning Rate: 2.89e-05 2025-08-30 06:41:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:41:57 - pico-train - INFO - Step 49400 -- 🔄 Training Metrics 2025-08-30 06:41:57 - pico-train - INFO - ├── Loss: 5.9176 2025-08-30 06:41:57 - pico-train - INFO - ├── Learning Rate: 2.89e-05 2025-08-30 06:41:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:42:10 - pico-train - INFO - Step 49425 -- 🔄 Training Metrics 2025-08-30 06:42:10 - pico-train - INFO - ├── Loss: 6.0363 2025-08-30 06:42:10 - pico-train - INFO - ├── Learning Rate: 2.89e-05 2025-08-30 06:42:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:42:22 - pico-train - INFO - Step 49450 -- 🔄 Training Metrics 2025-08-30 06:42:22 - pico-train - INFO - ├── Loss: 5.9492 2025-08-30 06:42:22 - pico-train - INFO - ├── Learning Rate: 2.89e-05 2025-08-30 06:42:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:42:35 - pico-train - INFO - Step 49475 -- 🔄 Training Metrics 2025-08-30 06:42:35 - pico-train - INFO - ├── Loss: 5.9670 2025-08-30 06:42:35 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-30 06:42:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:42:47 - pico-train - INFO - Step 49500 -- 💾 Saving Checkpoint 2025-08-30 06:44:54 - pico-train - INFO - Step 49500 -- 📊 Evaluation Results 2025-08-30 06:44:54 - pico-train - INFO - └── paloma: 1.1754196849862509e+29 2025-08-30 06:44:57 - pico-train - INFO - Step 49500 -- 🔄 Training Metrics 2025-08-30 06:44:57 - pico-train - INFO - ├── Loss: 5.9626 2025-08-30 06:44:57 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-30 06:44:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:44:57 - pico-train - INFO - Step 49500 -- 📈 Saving Learning Dynamics 2025-08-30 06:45:12 - pico-train - INFO - Step 49525 -- 🔄 Training Metrics 2025-08-30 06:45:12 - pico-train - INFO - ├── Loss: 5.9286 2025-08-30 06:45:12 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-30 06:45:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:45:24 - pico-train - INFO - Step 49550 -- 🔄 Training Metrics 2025-08-30 06:45:24 - pico-train - INFO - ├── Loss: 5.8746 2025-08-30 06:45:24 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-30 06:45:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:45:37 - pico-train - INFO - Step 49575 -- 🔄 Training Metrics 2025-08-30 06:45:37 - pico-train - INFO - ├── Loss: 5.9669 2025-08-30 06:45:37 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-30 06:45:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:45:49 - pico-train - INFO - Step 49600 -- 🔄 Training Metrics 2025-08-30 06:45:49 - pico-train - INFO - ├── Loss: 6.0786 2025-08-30 06:45:49 - pico-train - INFO - ├── Learning Rate: 2.87e-05 2025-08-30 06:45:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:46:03 - pico-train - INFO - Step 49625 -- 🔄 Training Metrics 2025-08-30 06:46:03 - pico-train - INFO - ├── Loss: 6.0407 2025-08-30 06:46:03 - pico-train - INFO - ├── Learning Rate: 2.87e-05 2025-08-30 06:46:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:46:15 - pico-train - INFO - Step 49650 -- 🔄 Training Metrics 2025-08-30 06:46:15 - pico-train - INFO - ├── Loss: 5.9219 2025-08-30 06:46:15 - pico-train - INFO - ├── Learning Rate: 2.87e-05 2025-08-30 06:46:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:46:28 - pico-train - INFO - Step 49675 -- 🔄 Training Metrics 2025-08-30 06:46:28 - pico-train - INFO - ├── Loss: 5.8997 2025-08-30 06:46:28 - pico-train - INFO - ├── Learning Rate: 2.87e-05 2025-08-30 06:46:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:46:41 - pico-train - INFO - Step 49700 -- 🔄 Training Metrics 2025-08-30 06:46:41 - pico-train - INFO - ├── Loss: 5.9761 2025-08-30 06:46:41 - pico-train - INFO - ├── Learning Rate: 2.87e-05 2025-08-30 06:46:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:46:53 - pico-train - INFO - Step 49725 -- 🔄 Training Metrics 2025-08-30 06:46:53 - pico-train - INFO - ├── Loss: 5.8749 2025-08-30 06:46:53 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-30 06:46:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:47:06 - pico-train - INFO - Step 49750 -- 🔄 Training Metrics 2025-08-30 06:47:06 - pico-train - INFO - ├── Loss: 5.9646 2025-08-30 06:47:06 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-30 06:47:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:47:18 - pico-train - INFO - Step 49775 -- 🔄 Training Metrics 2025-08-30 06:47:18 - pico-train - INFO - ├── Loss: 5.9586 2025-08-30 06:47:18 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-30 06:47:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:47:31 - pico-train - INFO - Step 49800 -- 🔄 Training Metrics 2025-08-30 06:47:31 - pico-train - INFO - ├── Loss: 5.9586 2025-08-30 06:47:31 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-30 06:47:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:47:44 - pico-train - INFO - Step 49825 -- 🔄 Training Metrics 2025-08-30 06:47:44 - pico-train - INFO - ├── Loss: 5.9795 2025-08-30 06:47:44 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-30 06:47:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:47:56 - pico-train - INFO - Step 49850 -- 🔄 Training Metrics 2025-08-30 06:47:56 - pico-train - INFO - ├── Loss: 5.9432 2025-08-30 06:47:56 - pico-train - INFO - ├── Learning Rate: 2.85e-05 2025-08-30 06:47:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:48:09 - pico-train - INFO - Step 49875 -- 🔄 Training Metrics 2025-08-30 06:48:09 - pico-train - INFO - ├── Loss: 6.0729 2025-08-30 06:48:09 - pico-train - INFO - ├── Learning Rate: 2.85e-05 2025-08-30 06:48:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:48:22 - pico-train - INFO - Step 49900 -- 🔄 Training Metrics 2025-08-30 06:48:22 - pico-train - INFO - ├── Loss: 6.0377 2025-08-30 06:48:22 - pico-train - INFO - ├── Learning Rate: 2.85e-05 2025-08-30 06:48:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:48:34 - pico-train - INFO - Step 49925 -- 🔄 Training Metrics 2025-08-30 06:48:34 - pico-train - INFO - ├── Loss: 5.9408 2025-08-30 06:48:34 - pico-train - INFO - ├── Learning Rate: 2.85e-05 2025-08-30 06:48:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:48:47 - pico-train - INFO - Step 49950 -- 🔄 Training Metrics 2025-08-30 06:48:47 - pico-train - INFO - ├── Loss: 5.9974 2025-08-30 06:48:47 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-30 06:48:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:48:59 - pico-train - INFO - Step 49975 -- 🔄 Training Metrics 2025-08-30 06:48:59 - pico-train - INFO - ├── Loss: 5.8671 2025-08-30 06:48:59 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-30 06:48:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:49:12 - pico-train - INFO - Step 50000 -- 💾 Saving Checkpoint 2025-08-30 06:51:08 - pico-train - INFO - Step 50000 -- 📊 Evaluation Results 2025-08-30 06:51:08 - pico-train - INFO - └── paloma: 1.5844205816962802e+29 2025-08-30 06:51:11 - pico-train - INFO - Step 50000 -- 🔄 Training Metrics 2025-08-30 06:51:11 - pico-train - INFO - ├── Loss: 5.9467 2025-08-30 06:51:11 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-30 06:51:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:51:11 - pico-train - INFO - Step 50000 -- 📈 Saving Learning Dynamics 2025-08-30 06:51:26 - pico-train - INFO - Step 50025 -- 🔄 Training Metrics 2025-08-30 06:51:26 - pico-train - INFO - ├── Loss: 5.9961 2025-08-30 06:51:26 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-30 06:51:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:51:39 - pico-train - INFO - Step 50050 -- 🔄 Training Metrics 2025-08-30 06:51:39 - pico-train - INFO - ├── Loss: 5.9269 2025-08-30 06:51:39 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-30 06:51:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:51:52 - pico-train - INFO - Step 50075 -- 🔄 Training Metrics 2025-08-30 06:51:52 - pico-train - INFO - ├── Loss: 5.9394 2025-08-30 06:51:52 - pico-train - INFO - ├── Learning Rate: 2.83e-05 2025-08-30 06:51:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:52:04 - pico-train - INFO - Step 50100 -- 🔄 Training Metrics 2025-08-30 06:52:04 - pico-train - INFO - ├── Loss: 5.9330 2025-08-30 06:52:04 - pico-train - INFO - ├── Learning Rate: 2.83e-05 2025-08-30 06:52:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:52:18 - pico-train - INFO - Step 50125 -- 🔄 Training Metrics 2025-08-30 06:52:18 - pico-train - INFO - ├── Loss: 5.9620 2025-08-30 06:52:18 - pico-train - INFO - ├── Learning Rate: 2.83e-05 2025-08-30 06:52:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:52:30 - pico-train - INFO - Step 50150 -- 🔄 Training Metrics 2025-08-30 06:52:30 - pico-train - INFO - ├── Loss: 6.0199 2025-08-30 06:52:30 - pico-train - INFO - ├── Learning Rate: 2.83e-05 2025-08-30 06:52:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:52:43 - pico-train - INFO - Step 50175 -- 🔄 Training Metrics 2025-08-30 06:52:43 - pico-train - INFO - ├── Loss: 6.0399 2025-08-30 06:52:43 - pico-train - INFO - ├── Learning Rate: 2.83e-05 2025-08-30 06:52:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:52:55 - pico-train - INFO - Step 50200 -- 🔄 Training Metrics 2025-08-30 06:52:55 - pico-train - INFO - ├── Loss: 6.0137 2025-08-30 06:52:55 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-30 06:52:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:53:08 - pico-train - INFO - Step 50225 -- 🔄 Training Metrics 2025-08-30 06:53:08 - pico-train - INFO - ├── Loss: 5.9405 2025-08-30 06:53:08 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-30 06:53:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:53:21 - pico-train - INFO - Step 50250 -- 🔄 Training Metrics 2025-08-30 06:53:21 - pico-train - INFO - ├── Loss: 5.9045 2025-08-30 06:53:21 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-30 06:53:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:53:33 - pico-train - INFO - Step 50275 -- 🔄 Training Metrics 2025-08-30 06:53:33 - pico-train - INFO - ├── Loss: 6.0237 2025-08-30 06:53:33 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-30 06:53:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:53:46 - pico-train - INFO - Step 50300 -- 🔄 Training Metrics 2025-08-30 06:53:46 - pico-train - INFO - ├── Loss: 5.9869 2025-08-30 06:53:46 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-30 06:53:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:53:59 - pico-train - INFO - Step 50325 -- 🔄 Training Metrics 2025-08-30 06:53:59 - pico-train - INFO - ├── Loss: 5.9344 2025-08-30 06:53:59 - pico-train - INFO - ├── Learning Rate: 2.81e-05 2025-08-30 06:53:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:54:11 - pico-train - INFO - Step 50350 -- 🔄 Training Metrics 2025-08-30 06:54:11 - pico-train - INFO - ├── Loss: 6.0131 2025-08-30 06:54:11 - pico-train - INFO - ├── Learning Rate: 2.81e-05 2025-08-30 06:54:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:54:24 - pico-train - INFO - Step 50375 -- 🔄 Training Metrics 2025-08-30 06:54:24 - pico-train - INFO - ├── Loss: 5.9916 2025-08-30 06:54:24 - pico-train - INFO - ├── Learning Rate: 2.81e-05 2025-08-30 06:54:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:54:36 - pico-train - INFO - Step 50400 -- 🔄 Training Metrics 2025-08-30 06:54:36 - pico-train - INFO - ├── Loss: 6.0289 2025-08-30 06:54:36 - pico-train - INFO - ├── Learning Rate: 2.81e-05 2025-08-30 06:54:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:54:49 - pico-train - INFO - Step 50425 -- 🔄 Training Metrics 2025-08-30 06:54:49 - pico-train - INFO - ├── Loss: 6.0051 2025-08-30 06:54:49 - pico-train - INFO - ├── Learning Rate: 2.80e-05 2025-08-30 06:54:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:55:01 - pico-train - INFO - Step 50450 -- 🔄 Training Metrics 2025-08-30 06:55:01 - pico-train - INFO - ├── Loss: 5.9803 2025-08-30 06:55:01 - pico-train - INFO - ├── Learning Rate: 2.80e-05 2025-08-30 06:55:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:55:14 - pico-train - INFO - Step 50475 -- 🔄 Training Metrics 2025-08-30 06:55:14 - pico-train - INFO - ├── Loss: 5.9222 2025-08-30 06:55:14 - pico-train - INFO - ├── Learning Rate: 2.80e-05 2025-08-30 06:55:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:55:26 - pico-train - INFO - Step 50500 -- 💾 Saving Checkpoint 2025-08-30 06:57:22 - pico-train - INFO - Step 50500 -- 📊 Evaluation Results 2025-08-30 06:57:22 - pico-train - INFO - └── paloma: 2.307126408238767e+29 2025-08-30 06:57:25 - pico-train - INFO - Step 50500 -- 🔄 Training Metrics 2025-08-30 06:57:25 - pico-train - INFO - ├── Loss: 5.9666 2025-08-30 06:57:25 - pico-train - INFO - ├── Learning Rate: 2.80e-05 2025-08-30 06:57:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:57:25 - pico-train - INFO - Step 50500 -- 📈 Saving Learning Dynamics 2025-08-30 06:57:41 - pico-train - INFO - Step 50525 -- 🔄 Training Metrics 2025-08-30 06:57:41 - pico-train - INFO - ├── Loss: 5.9108 2025-08-30 06:57:41 - pico-train - INFO - ├── Learning Rate: 2.80e-05 2025-08-30 06:57:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:57:54 - pico-train - INFO - Step 50550 -- 🔄 Training Metrics 2025-08-30 06:57:54 - pico-train - INFO - ├── Loss: 5.9066 2025-08-30 06:57:54 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-30 06:57:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:58:06 - pico-train - INFO - Step 50575 -- 🔄 Training Metrics 2025-08-30 06:58:06 - pico-train - INFO - ├── Loss: 5.9447 2025-08-30 06:58:06 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-30 06:58:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:58:19 - pico-train - INFO - Step 50600 -- 🔄 Training Metrics 2025-08-30 06:58:19 - pico-train - INFO - ├── Loss: 5.9920 2025-08-30 06:58:19 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-30 06:58:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:58:32 - pico-train - INFO - Step 50625 -- 🔄 Training Metrics 2025-08-30 06:58:32 - pico-train - INFO - ├── Loss: 5.8969 2025-08-30 06:58:32 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-30 06:58:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:58:44 - pico-train - INFO - Step 50650 -- 🔄 Training Metrics 2025-08-30 06:58:44 - pico-train - INFO - ├── Loss: 5.9352 2025-08-30 06:58:44 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-30 06:58:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:58:57 - pico-train - INFO - Step 50675 -- 🔄 Training Metrics 2025-08-30 06:58:57 - pico-train - INFO - ├── Loss: 5.9511 2025-08-30 06:58:57 - pico-train - INFO - ├── Learning Rate: 2.78e-05 2025-08-30 06:58:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:59:10 - pico-train - INFO - Step 50700 -- 🔄 Training Metrics 2025-08-30 06:59:10 - pico-train - INFO - ├── Loss: 5.9762 2025-08-30 06:59:10 - pico-train - INFO - ├── Learning Rate: 2.78e-05 2025-08-30 06:59:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:59:22 - pico-train - INFO - Step 50725 -- 🔄 Training Metrics 2025-08-30 06:59:22 - pico-train - INFO - ├── Loss: 5.8962 2025-08-30 06:59:22 - pico-train - INFO - ├── Learning Rate: 2.78e-05 2025-08-30 06:59:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:59:35 - pico-train - INFO - Step 50750 -- 🔄 Training Metrics 2025-08-30 06:59:35 - pico-train - INFO - ├── Loss: 5.9610 2025-08-30 06:59:35 - pico-train - INFO - ├── Learning Rate: 2.78e-05 2025-08-30 06:59:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 06:59:47 - pico-train - INFO - Step 50775 -- 🔄 Training Metrics 2025-08-30 06:59:47 - pico-train - INFO - ├── Loss: 5.9507 2025-08-30 06:59:47 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-30 06:59:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:00:00 - pico-train - INFO - Step 50800 -- 🔄 Training Metrics 2025-08-30 07:00:00 - pico-train - INFO - ├── Loss: 6.0094 2025-08-30 07:00:00 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-30 07:00:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:00:13 - pico-train - INFO - Step 50825 -- 🔄 Training Metrics 2025-08-30 07:00:13 - pico-train - INFO - ├── Loss: 5.9076 2025-08-30 07:00:13 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-30 07:00:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:00:25 - pico-train - INFO - Step 50850 -- 🔄 Training Metrics 2025-08-30 07:00:25 - pico-train - INFO - ├── Loss: 5.9857 2025-08-30 07:00:25 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-30 07:00:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:00:38 - pico-train - INFO - Step 50875 -- 🔄 Training Metrics 2025-08-30 07:00:38 - pico-train - INFO - ├── Loss: 6.0201 2025-08-30 07:00:38 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-30 07:00:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:00:51 - pico-train - INFO - Step 50900 -- 🔄 Training Metrics 2025-08-30 07:00:51 - pico-train - INFO - ├── Loss: 6.0121 2025-08-30 07:00:51 - pico-train - INFO - ├── Learning Rate: 2.76e-05 2025-08-30 07:00:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:01:03 - pico-train - INFO - Step 50925 -- 🔄 Training Metrics 2025-08-30 07:01:03 - pico-train - INFO - ├── Loss: 5.9658 2025-08-30 07:01:03 - pico-train - INFO - ├── Learning Rate: 2.76e-05 2025-08-30 07:01:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:01:16 - pico-train - INFO - Step 50950 -- 🔄 Training Metrics 2025-08-30 07:01:16 - pico-train - INFO - ├── Loss: 5.9981 2025-08-30 07:01:16 - pico-train - INFO - ├── Learning Rate: 2.76e-05 2025-08-30 07:01:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:01:28 - pico-train - INFO - Step 50975 -- 🔄 Training Metrics 2025-08-30 07:01:28 - pico-train - INFO - ├── Loss: 5.9961 2025-08-30 07:01:28 - pico-train - INFO - ├── Learning Rate: 2.76e-05 2025-08-30 07:01:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:01:41 - pico-train - INFO - Step 51000 -- 💾 Saving Checkpoint 2025-08-30 07:03:38 - pico-train - INFO - Step 51000 -- 📊 Evaluation Results 2025-08-30 07:03:38 - pico-train - INFO - └── paloma: 2.410761895962811e+29 2025-08-30 07:03:42 - pico-train - INFO - Step 51000 -- 🔄 Training Metrics 2025-08-30 07:03:42 - pico-train - INFO - ├── Loss: 5.9076 2025-08-30 07:03:42 - pico-train - INFO - ├── Learning Rate: 2.76e-05 2025-08-30 07:03:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:03:42 - pico-train - INFO - Step 51000 -- 📈 Saving Learning Dynamics 2025-08-30 07:03:57 - pico-train - INFO - Step 51025 -- 🔄 Training Metrics 2025-08-30 07:03:57 - pico-train - INFO - ├── Loss: 5.9965 2025-08-30 07:03:57 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-30 07:03:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:04:09 - pico-train - INFO - Step 51050 -- 🔄 Training Metrics 2025-08-30 07:04:09 - pico-train - INFO - ├── Loss: 5.9633 2025-08-30 07:04:09 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-30 07:04:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:04:22 - pico-train - INFO - Step 51075 -- 🔄 Training Metrics 2025-08-30 07:04:22 - pico-train - INFO - ├── Loss: 5.9476 2025-08-30 07:04:22 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-30 07:04:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:04:34 - pico-train - INFO - Step 51100 -- 🔄 Training Metrics 2025-08-30 07:04:34 - pico-train - INFO - ├── Loss: 5.9797 2025-08-30 07:04:34 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-30 07:04:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:04:47 - pico-train - INFO - Step 51125 -- 🔄 Training Metrics 2025-08-30 07:04:47 - pico-train - INFO - ├── Loss: 5.9138 2025-08-30 07:04:47 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-30 07:04:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:04:59 - pico-train - INFO - Step 51150 -- 🔄 Training Metrics 2025-08-30 07:04:59 - pico-train - INFO - ├── Loss: 5.9946 2025-08-30 07:04:59 - pico-train - INFO - ├── Learning Rate: 2.74e-05 2025-08-30 07:04:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:05:12 - pico-train - INFO - Step 51175 -- 🔄 Training Metrics 2025-08-30 07:05:12 - pico-train - INFO - ├── Loss: 5.9050 2025-08-30 07:05:12 - pico-train - INFO - ├── Learning Rate: 2.74e-05 2025-08-30 07:05:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:05:25 - pico-train - INFO - Step 51200 -- 🔄 Training Metrics 2025-08-30 07:05:25 - pico-train - INFO - ├── Loss: 5.9431 2025-08-30 07:05:25 - pico-train - INFO - ├── Learning Rate: 2.74e-05 2025-08-30 07:05:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:05:37 - pico-train - INFO - Step 51225 -- 🔄 Training Metrics 2025-08-30 07:05:37 - pico-train - INFO - ├── Loss: 5.9906 2025-08-30 07:05:37 - pico-train - INFO - ├── Learning Rate: 2.74e-05 2025-08-30 07:05:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:05:50 - pico-train - INFO - Step 51250 -- 🔄 Training Metrics 2025-08-30 07:05:50 - pico-train - INFO - ├── Loss: 5.9408 2025-08-30 07:05:50 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-30 07:05:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:06:03 - pico-train - INFO - Step 51275 -- 🔄 Training Metrics 2025-08-30 07:06:03 - pico-train - INFO - ├── Loss: 6.0058 2025-08-30 07:06:03 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-30 07:06:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:06:15 - pico-train - INFO - Step 51300 -- 🔄 Training Metrics 2025-08-30 07:06:15 - pico-train - INFO - ├── Loss: 5.9526 2025-08-30 07:06:15 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-30 07:06:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:06:28 - pico-train - INFO - Step 51325 -- 🔄 Training Metrics 2025-08-30 07:06:28 - pico-train - INFO - ├── Loss: 5.9452 2025-08-30 07:06:28 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-30 07:06:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:06:40 - pico-train - INFO - Step 51350 -- 🔄 Training Metrics 2025-08-30 07:06:40 - pico-train - INFO - ├── Loss: 6.0049 2025-08-30 07:06:40 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-30 07:06:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:06:53 - pico-train - INFO - Step 51375 -- 🔄 Training Metrics 2025-08-30 07:06:53 - pico-train - INFO - ├── Loss: 5.9591 2025-08-30 07:06:53 - pico-train - INFO - ├── Learning Rate: 2.72e-05 2025-08-30 07:06:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:07:06 - pico-train - INFO - Step 51400 -- 🔄 Training Metrics 2025-08-30 07:07:06 - pico-train - INFO - ├── Loss: 5.9947 2025-08-30 07:07:06 - pico-train - INFO - ├── Learning Rate: 2.72e-05 2025-08-30 07:07:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:07:18 - pico-train - INFO - Step 51425 -- 🔄 Training Metrics 2025-08-30 07:07:18 - pico-train - INFO - ├── Loss: 5.9487 2025-08-30 07:07:18 - pico-train - INFO - ├── Learning Rate: 2.72e-05 2025-08-30 07:07:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:07:31 - pico-train - INFO - Step 51450 -- 🔄 Training Metrics 2025-08-30 07:07:31 - pico-train - INFO - ├── Loss: 5.9444 2025-08-30 07:07:31 - pico-train - INFO - ├── Learning Rate: 2.72e-05 2025-08-30 07:07:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:07:44 - pico-train - INFO - Step 51475 -- 🔄 Training Metrics 2025-08-30 07:07:44 - pico-train - INFO - ├── Loss: 5.9784 2025-08-30 07:07:44 - pico-train - INFO - ├── Learning Rate: 2.72e-05 2025-08-30 07:07:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:07:56 - pico-train - INFO - Step 51500 -- 💾 Saving Checkpoint 2025-08-30 07:09:51 - pico-train - INFO - Step 51500 -- 📊 Evaluation Results 2025-08-30 07:09:51 - pico-train - INFO - └── paloma: 3.779255466147184e+29 2025-08-30 07:09:53 - pico-train - INFO - Step 51500 -- 🔄 Training Metrics 2025-08-30 07:09:53 - pico-train - INFO - ├── Loss: 5.9132 2025-08-30 07:09:53 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-30 07:09:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:09:53 - pico-train - INFO - Step 51500 -- 📈 Saving Learning Dynamics 2025-08-30 07:10:09 - pico-train - INFO - Step 51525 -- 🔄 Training Metrics 2025-08-30 07:10:09 - pico-train - INFO - ├── Loss: 5.9536 2025-08-30 07:10:09 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-30 07:10:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:10:21 - pico-train - INFO - Step 51550 -- 🔄 Training Metrics 2025-08-30 07:10:21 - pico-train - INFO - ├── Loss: 5.9491 2025-08-30 07:10:21 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-30 07:10:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:10:34 - pico-train - INFO - Step 51575 -- 🔄 Training Metrics 2025-08-30 07:10:34 - pico-train - INFO - ├── Loss: 5.9868 2025-08-30 07:10:34 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-30 07:10:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:10:47 - pico-train - INFO - Step 51600 -- 🔄 Training Metrics 2025-08-30 07:10:47 - pico-train - INFO - ├── Loss: 5.9299 2025-08-30 07:10:47 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-30 07:10:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:10:59 - pico-train - INFO - Step 51625 -- 🔄 Training Metrics 2025-08-30 07:10:59 - pico-train - INFO - ├── Loss: 5.9520 2025-08-30 07:10:59 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-30 07:10:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:11:12 - pico-train - INFO - Step 51650 -- 🔄 Training Metrics 2025-08-30 07:11:12 - pico-train - INFO - ├── Loss: 5.8812 2025-08-30 07:11:12 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-30 07:11:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:11:24 - pico-train - INFO - Step 51675 -- 🔄 Training Metrics 2025-08-30 07:11:24 - pico-train - INFO - ├── Loss: 5.9874 2025-08-30 07:11:24 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-30 07:11:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:11:37 - pico-train - INFO - Step 51700 -- 🔄 Training Metrics 2025-08-30 07:11:37 - pico-train - INFO - ├── Loss: 5.8259 2025-08-30 07:11:37 - pico-train - INFO - ├── Learning Rate: 2.70e-05 2025-08-30 07:11:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:11:50 - pico-train - INFO - Step 51725 -- 🔄 Training Metrics 2025-08-30 07:11:50 - pico-train - INFO - ├── Loss: 5.8867 2025-08-30 07:11:50 - pico-train - INFO - ├── Learning Rate: 2.69e-05 2025-08-30 07:11:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:12:03 - pico-train - INFO - Step 51750 -- 🔄 Training Metrics 2025-08-30 07:12:03 - pico-train - INFO - ├── Loss: 5.9863 2025-08-30 07:12:03 - pico-train - INFO - ├── Learning Rate: 2.69e-05 2025-08-30 07:12:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:12:16 - pico-train - INFO - Step 51775 -- 🔄 Training Metrics 2025-08-30 07:12:16 - pico-train - INFO - ├── Loss: 6.0154 2025-08-30 07:12:16 - pico-train - INFO - ├── Learning Rate: 2.69e-05 2025-08-30 07:12:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:12:28 - pico-train - INFO - Step 51800 -- 🔄 Training Metrics 2025-08-30 07:12:28 - pico-train - INFO - ├── Loss: 5.9222 2025-08-30 07:12:28 - pico-train - INFO - ├── Learning Rate: 2.69e-05 2025-08-30 07:12:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:12:41 - pico-train - INFO - Step 51825 -- 🔄 Training Metrics 2025-08-30 07:12:41 - pico-train - INFO - ├── Loss: 5.9468 2025-08-30 07:12:41 - pico-train - INFO - ├── Learning Rate: 2.69e-05 2025-08-30 07:12:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:12:53 - pico-train - INFO - Step 51850 -- 🔄 Training Metrics 2025-08-30 07:12:53 - pico-train - INFO - ├── Loss: 5.9967 2025-08-30 07:12:53 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-30 07:12:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:13:06 - pico-train - INFO - Step 51875 -- 🔄 Training Metrics 2025-08-30 07:13:06 - pico-train - INFO - ├── Loss: 5.9565 2025-08-30 07:13:06 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-30 07:13:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:13:19 - pico-train - INFO - Step 51900 -- 🔄 Training Metrics 2025-08-30 07:13:19 - pico-train - INFO - ├── Loss: 5.9186 2025-08-30 07:13:19 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-30 07:13:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:13:31 - pico-train - INFO - Step 51925 -- 🔄 Training Metrics 2025-08-30 07:13:31 - pico-train - INFO - ├── Loss: 5.7959 2025-08-30 07:13:31 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-30 07:13:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:13:44 - pico-train - INFO - Step 51950 -- 🔄 Training Metrics 2025-08-30 07:13:44 - pico-train - INFO - ├── Loss: 5.9955 2025-08-30 07:13:44 - pico-train - INFO - ├── Learning Rate: 2.67e-05 2025-08-30 07:13:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:13:56 - pico-train - INFO - Step 51975 -- 🔄 Training Metrics 2025-08-30 07:13:56 - pico-train - INFO - ├── Loss: 5.9673 2025-08-30 07:13:56 - pico-train - INFO - ├── Learning Rate: 2.67e-05 2025-08-30 07:13:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:14:09 - pico-train - INFO - Step 52000 -- 💾 Saving Checkpoint 2025-08-30 07:16:04 - pico-train - INFO - Step 52000 -- 📊 Evaluation Results 2025-08-30 07:16:04 - pico-train - INFO - └── paloma: 4.093974189838008e+29 2025-08-30 07:16:07 - pico-train - INFO - Step 52000 -- 🔄 Training Metrics 2025-08-30 07:16:07 - pico-train - INFO - ├── Loss: 6.0273 2025-08-30 07:16:07 - pico-train - INFO - ├── Learning Rate: 2.67e-05 2025-08-30 07:16:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:16:07 - pico-train - INFO - Step 52000 -- 📈 Saving Learning Dynamics 2025-08-30 07:16:23 - pico-train - INFO - Step 52025 -- 🔄 Training Metrics 2025-08-30 07:16:23 - pico-train - INFO - ├── Loss: 5.8760 2025-08-30 07:16:23 - pico-train - INFO - ├── Learning Rate: 2.67e-05 2025-08-30 07:16:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:16:36 - pico-train - INFO - Step 52050 -- 🔄 Training Metrics 2025-08-30 07:16:36 - pico-train - INFO - ├── Loss: 5.9087 2025-08-30 07:16:36 - pico-train - INFO - ├── Learning Rate: 2.67e-05 2025-08-30 07:16:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:16:48 - pico-train - INFO - Step 52075 -- 🔄 Training Metrics 2025-08-30 07:16:48 - pico-train - INFO - ├── Loss: 5.8656 2025-08-30 07:16:48 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-30 07:16:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:17:01 - pico-train - INFO - Step 52100 -- 🔄 Training Metrics 2025-08-30 07:17:01 - pico-train - INFO - ├── Loss: 5.9358 2025-08-30 07:17:01 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-30 07:17:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:17:13 - pico-train - INFO - Step 52125 -- 🔄 Training Metrics 2025-08-30 07:17:13 - pico-train - INFO - ├── Loss: 6.0056 2025-08-30 07:17:13 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-30 07:17:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:17:26 - pico-train - INFO - Step 52150 -- 🔄 Training Metrics 2025-08-30 07:17:26 - pico-train - INFO - ├── Loss: 5.9770 2025-08-30 07:17:26 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-30 07:17:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:17:38 - pico-train - INFO - Step 52175 -- 🔄 Training Metrics 2025-08-30 07:17:38 - pico-train - INFO - ├── Loss: 5.9145 2025-08-30 07:17:38 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-30 07:17:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:17:51 - pico-train - INFO - Step 52200 -- 🔄 Training Metrics 2025-08-30 07:17:51 - pico-train - INFO - ├── Loss: 5.9592 2025-08-30 07:17:51 - pico-train - INFO - ├── Learning Rate: 2.65e-05 2025-08-30 07:17:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:18:04 - pico-train - INFO - Step 52225 -- 🔄 Training Metrics 2025-08-30 07:18:04 - pico-train - INFO - ├── Loss: 5.9323 2025-08-30 07:18:04 - pico-train - INFO - ├── Learning Rate: 2.65e-05 2025-08-30 07:18:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:18:17 - pico-train - INFO - Step 52250 -- 🔄 Training Metrics 2025-08-30 07:18:17 - pico-train - INFO - ├── Loss: 5.9309 2025-08-30 07:18:17 - pico-train - INFO - ├── Learning Rate: 2.65e-05 2025-08-30 07:18:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:18:29 - pico-train - INFO - Step 52275 -- 🔄 Training Metrics 2025-08-30 07:18:29 - pico-train - INFO - ├── Loss: 6.0290 2025-08-30 07:18:29 - pico-train - INFO - ├── Learning Rate: 2.65e-05 2025-08-30 07:18:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:18:42 - pico-train - INFO - Step 52300 -- 🔄 Training Metrics 2025-08-30 07:18:42 - pico-train - INFO - ├── Loss: 6.0121 2025-08-30 07:18:42 - pico-train - INFO - ├── Learning Rate: 2.65e-05 2025-08-30 07:18:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:18:55 - pico-train - INFO - Step 52325 -- 🔄 Training Metrics 2025-08-30 07:18:55 - pico-train - INFO - ├── Loss: 5.8936 2025-08-30 07:18:55 - pico-train - INFO - ├── Learning Rate: 2.64e-05 2025-08-30 07:18:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:19:07 - pico-train - INFO - Step 52350 -- 🔄 Training Metrics 2025-08-30 07:19:07 - pico-train - INFO - ├── Loss: 5.9461 2025-08-30 07:19:07 - pico-train - INFO - ├── Learning Rate: 2.64e-05 2025-08-30 07:19:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:19:20 - pico-train - INFO - Step 52375 -- 🔄 Training Metrics 2025-08-30 07:19:20 - pico-train - INFO - ├── Loss: 6.0288 2025-08-30 07:19:20 - pico-train - INFO - ├── Learning Rate: 2.64e-05 2025-08-30 07:19:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:19:32 - pico-train - INFO - Step 52400 -- 🔄 Training Metrics 2025-08-30 07:19:32 - pico-train - INFO - ├── Loss: 5.9728 2025-08-30 07:19:32 - pico-train - INFO - ├── Learning Rate: 2.64e-05 2025-08-30 07:19:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:19:45 - pico-train - INFO - Step 52425 -- 🔄 Training Metrics 2025-08-30 07:19:45 - pico-train - INFO - ├── Loss: 5.8944 2025-08-30 07:19:45 - pico-train - INFO - ├── Learning Rate: 2.63e-05 2025-08-30 07:19:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:19:58 - pico-train - INFO - Step 52450 -- 🔄 Training Metrics 2025-08-30 07:19:58 - pico-train - INFO - ├── Loss: 6.0148 2025-08-30 07:19:58 - pico-train - INFO - ├── Learning Rate: 2.63e-05 2025-08-30 07:19:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:20:10 - pico-train - INFO - Step 52475 -- 🔄 Training Metrics 2025-08-30 07:20:10 - pico-train - INFO - ├── Loss: 5.9232 2025-08-30 07:20:10 - pico-train - INFO - ├── Learning Rate: 2.63e-05 2025-08-30 07:20:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:20:22 - pico-train - INFO - Step 52500 -- 💾 Saving Checkpoint 2025-08-30 07:22:31 - pico-train - INFO - Step 52500 -- 📊 Evaluation Results 2025-08-30 07:22:31 - pico-train - INFO - └── paloma: 4.3829741549987764e+29 2025-08-30 07:22:34 - pico-train - INFO - Step 52500 -- 🔄 Training Metrics 2025-08-30 07:22:34 - pico-train - INFO - ├── Loss: 6.0213 2025-08-30 07:22:34 - pico-train - INFO - ├── Learning Rate: 2.63e-05 2025-08-30 07:22:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:22:34 - pico-train - INFO - Step 52500 -- 📈 Saving Learning Dynamics 2025-08-30 07:22:49 - pico-train - INFO - Step 52525 -- 🔄 Training Metrics 2025-08-30 07:22:49 - pico-train - INFO - ├── Loss: 5.9703 2025-08-30 07:22:49 - pico-train - INFO - ├── Learning Rate: 2.63e-05 2025-08-30 07:22:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:23:01 - pico-train - INFO - Step 52550 -- 🔄 Training Metrics 2025-08-30 07:23:01 - pico-train - INFO - ├── Loss: 5.9471 2025-08-30 07:23:01 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-30 07:23:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:23:14 - pico-train - INFO - Step 52575 -- 🔄 Training Metrics 2025-08-30 07:23:14 - pico-train - INFO - ├── Loss: 5.9502 2025-08-30 07:23:14 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-30 07:23:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:23:26 - pico-train - INFO - Step 52600 -- 🔄 Training Metrics 2025-08-30 07:23:26 - pico-train - INFO - ├── Loss: 5.9032 2025-08-30 07:23:26 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-30 07:23:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:23:39 - pico-train - INFO - Step 52625 -- 🔄 Training Metrics 2025-08-30 07:23:39 - pico-train - INFO - ├── Loss: 5.8965 2025-08-30 07:23:39 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-30 07:23:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:23:51 - pico-train - INFO - Step 52650 -- 🔄 Training Metrics 2025-08-30 07:23:51 - pico-train - INFO - ├── Loss: 5.9213 2025-08-30 07:23:51 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-30 07:23:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:24:04 - pico-train - INFO - Step 52675 -- 🔄 Training Metrics 2025-08-30 07:24:04 - pico-train - INFO - ├── Loss: 6.0115 2025-08-30 07:24:04 - pico-train - INFO - ├── Learning Rate: 2.61e-05 2025-08-30 07:24:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:24:17 - pico-train - INFO - Step 52700 -- 🔄 Training Metrics 2025-08-30 07:24:17 - pico-train - INFO - ├── Loss: 5.9340 2025-08-30 07:24:17 - pico-train - INFO - ├── Learning Rate: 2.61e-05 2025-08-30 07:24:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:24:29 - pico-train - INFO - Step 52725 -- 🔄 Training Metrics 2025-08-30 07:24:29 - pico-train - INFO - ├── Loss: 5.8320 2025-08-30 07:24:29 - pico-train - INFO - ├── Learning Rate: 2.61e-05 2025-08-30 07:24:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:24:42 - pico-train - INFO - Step 52750 -- 🔄 Training Metrics 2025-08-30 07:24:42 - pico-train - INFO - ├── Loss: 5.9125 2025-08-30 07:24:42 - pico-train - INFO - ├── Learning Rate: 2.61e-05 2025-08-30 07:24:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:24:55 - pico-train - INFO - Step 52775 -- 🔄 Training Metrics 2025-08-30 07:24:55 - pico-train - INFO - ├── Loss: 5.8468 2025-08-30 07:24:55 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-30 07:24:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:25:07 - pico-train - INFO - Step 52800 -- 🔄 Training Metrics 2025-08-30 07:25:07 - pico-train - INFO - ├── Loss: 5.9822 2025-08-30 07:25:07 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-30 07:25:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:25:20 - pico-train - INFO - Step 52825 -- 🔄 Training Metrics 2025-08-30 07:25:20 - pico-train - INFO - ├── Loss: 6.0151 2025-08-30 07:25:20 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-30 07:25:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:25:33 - pico-train - INFO - Step 52850 -- 🔄 Training Metrics 2025-08-30 07:25:33 - pico-train - INFO - ├── Loss: 5.9894 2025-08-30 07:25:33 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-30 07:25:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:25:45 - pico-train - INFO - Step 52875 -- 🔄 Training Metrics 2025-08-30 07:25:45 - pico-train - INFO - ├── Loss: 5.8899 2025-08-30 07:25:45 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-30 07:25:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:25:58 - pico-train - INFO - Step 52900 -- 🔄 Training Metrics 2025-08-30 07:25:58 - pico-train - INFO - ├── Loss: 5.8752 2025-08-30 07:25:58 - pico-train - INFO - ├── Learning Rate: 2.59e-05 2025-08-30 07:25:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:26:11 - pico-train - INFO - Step 52925 -- 🔄 Training Metrics 2025-08-30 07:26:11 - pico-train - INFO - ├── Loss: 5.9977 2025-08-30 07:26:11 - pico-train - INFO - ├── Learning Rate: 2.59e-05 2025-08-30 07:26:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:26:23 - pico-train - INFO - Step 52950 -- 🔄 Training Metrics 2025-08-30 07:26:23 - pico-train - INFO - ├── Loss: 5.9522 2025-08-30 07:26:23 - pico-train - INFO - ├── Learning Rate: 2.59e-05 2025-08-30 07:26:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:26:36 - pico-train - INFO - Step 52975 -- 🔄 Training Metrics 2025-08-30 07:26:36 - pico-train - INFO - ├── Loss: 5.9471 2025-08-30 07:26:36 - pico-train - INFO - ├── Learning Rate: 2.59e-05 2025-08-30 07:26:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:26:48 - pico-train - INFO - Step 53000 -- 💾 Saving Checkpoint 2025-08-30 07:28:43 - pico-train - INFO - Step 53000 -- 📊 Evaluation Results 2025-08-30 07:28:43 - pico-train - INFO - └── paloma: 6.0678060947205406e+29 2025-08-30 07:28:45 - pico-train - INFO - Step 53000 -- 🔄 Training Metrics 2025-08-30 07:28:45 - pico-train - INFO - ├── Loss: 5.9668 2025-08-30 07:28:45 - pico-train - INFO - ├── Learning Rate: 2.59e-05 2025-08-30 07:28:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:28:45 - pico-train - INFO - Step 53000 -- 📈 Saving Learning Dynamics 2025-08-30 07:29:00 - pico-train - INFO - Step 53025 -- 🔄 Training Metrics 2025-08-30 07:29:00 - pico-train - INFO - ├── Loss: 5.9943 2025-08-30 07:29:00 - pico-train - INFO - ├── Learning Rate: 2.58e-05 2025-08-30 07:29:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:29:13 - pico-train - INFO - Step 53050 -- 🔄 Training Metrics 2025-08-30 07:29:13 - pico-train - INFO - ├── Loss: 5.9316 2025-08-30 07:29:13 - pico-train - INFO - ├── Learning Rate: 2.58e-05 2025-08-30 07:29:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:29:25 - pico-train - INFO - Step 53075 -- 🔄 Training Metrics 2025-08-30 07:29:25 - pico-train - INFO - ├── Loss: 5.9268 2025-08-30 07:29:25 - pico-train - INFO - ├── Learning Rate: 2.58e-05 2025-08-30 07:29:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:29:38 - pico-train - INFO - Step 53100 -- 🔄 Training Metrics 2025-08-30 07:29:38 - pico-train - INFO - ├── Loss: 5.9084 2025-08-30 07:29:38 - pico-train - INFO - ├── Learning Rate: 2.58e-05 2025-08-30 07:29:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:29:51 - pico-train - INFO - Step 53125 -- 🔄 Training Metrics 2025-08-30 07:29:51 - pico-train - INFO - ├── Loss: 6.0159 2025-08-30 07:29:51 - pico-train - INFO - ├── Learning Rate: 2.57e-05 2025-08-30 07:29:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:30:03 - pico-train - INFO - Step 53150 -- 🔄 Training Metrics 2025-08-30 07:30:03 - pico-train - INFO - ├── Loss: 5.8973 2025-08-30 07:30:03 - pico-train - INFO - ├── Learning Rate: 2.57e-05 2025-08-30 07:30:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:30:16 - pico-train - INFO - Step 53175 -- 🔄 Training Metrics 2025-08-30 07:30:16 - pico-train - INFO - ├── Loss: 5.9660 2025-08-30 07:30:16 - pico-train - INFO - ├── Learning Rate: 2.57e-05 2025-08-30 07:30:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:30:29 - pico-train - INFO - Step 53200 -- 🔄 Training Metrics 2025-08-30 07:30:29 - pico-train - INFO - ├── Loss: 5.9716 2025-08-30 07:30:29 - pico-train - INFO - ├── Learning Rate: 2.57e-05 2025-08-30 07:30:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:30:41 - pico-train - INFO - Step 53225 -- 🔄 Training Metrics 2025-08-30 07:30:41 - pico-train - INFO - ├── Loss: 5.8883 2025-08-30 07:30:41 - pico-train - INFO - ├── Learning Rate: 2.57e-05 2025-08-30 07:30:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:30:54 - pico-train - INFO - Step 53250 -- 🔄 Training Metrics 2025-08-30 07:30:54 - pico-train - INFO - ├── Loss: 5.9727 2025-08-30 07:30:54 - pico-train - INFO - ├── Learning Rate: 2.56e-05 2025-08-30 07:30:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:31:07 - pico-train - INFO - Step 53275 -- 🔄 Training Metrics 2025-08-30 07:31:07 - pico-train - INFO - ├── Loss: 5.8948 2025-08-30 07:31:07 - pico-train - INFO - ├── Learning Rate: 2.56e-05 2025-08-30 07:31:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:31:19 - pico-train - INFO - Step 53300 -- 🔄 Training Metrics 2025-08-30 07:31:19 - pico-train - INFO - ├── Loss: 5.8979 2025-08-30 07:31:19 - pico-train - INFO - ├── Learning Rate: 2.56e-05 2025-08-30 07:31:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:31:32 - pico-train - INFO - Step 53325 -- 🔄 Training Metrics 2025-08-30 07:31:32 - pico-train - INFO - ├── Loss: 5.9572 2025-08-30 07:31:32 - pico-train - INFO - ├── Learning Rate: 2.56e-05 2025-08-30 07:31:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:31:44 - pico-train - INFO - Step 53350 -- 🔄 Training Metrics 2025-08-30 07:31:44 - pico-train - INFO - ├── Loss: 5.8599 2025-08-30 07:31:44 - pico-train - INFO - ├── Learning Rate: 2.56e-05 2025-08-30 07:31:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:31:57 - pico-train - INFO - Step 53375 -- 🔄 Training Metrics 2025-08-30 07:31:57 - pico-train - INFO - ├── Loss: 5.8751 2025-08-30 07:31:57 - pico-train - INFO - ├── Learning Rate: 2.55e-05 2025-08-30 07:31:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:32:09 - pico-train - INFO - Step 53400 -- 🔄 Training Metrics 2025-08-30 07:32:09 - pico-train - INFO - ├── Loss: 5.9950 2025-08-30 07:32:09 - pico-train - INFO - ├── Learning Rate: 2.55e-05 2025-08-30 07:32:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:32:22 - pico-train - INFO - Step 53425 -- 🔄 Training Metrics 2025-08-30 07:32:22 - pico-train - INFO - ├── Loss: 5.9827 2025-08-30 07:32:22 - pico-train - INFO - ├── Learning Rate: 2.55e-05 2025-08-30 07:32:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:32:34 - pico-train - INFO - Step 53450 -- 🔄 Training Metrics 2025-08-30 07:32:34 - pico-train - INFO - ├── Loss: 5.8589 2025-08-30 07:32:34 - pico-train - INFO - ├── Learning Rate: 2.55e-05 2025-08-30 07:32:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:32:47 - pico-train - INFO - Step 53475 -- 🔄 Training Metrics 2025-08-30 07:32:47 - pico-train - INFO - ├── Loss: 5.9415 2025-08-30 07:32:47 - pico-train - INFO - ├── Learning Rate: 2.54e-05 2025-08-30 07:32:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:32:59 - pico-train - INFO - Step 53500 -- 💾 Saving Checkpoint 2025-08-30 07:34:53 - pico-train - INFO - Step 53500 -- 📊 Evaluation Results 2025-08-30 07:34:53 - pico-train - INFO - └── paloma: 5.560195307890516e+29 2025-08-30 07:34:55 - pico-train - INFO - Step 53500 -- 🔄 Training Metrics 2025-08-30 07:34:55 - pico-train - INFO - ├── Loss: 5.8976 2025-08-30 07:34:55 - pico-train - INFO - ├── Learning Rate: 2.54e-05 2025-08-30 07:34:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:34:55 - pico-train - INFO - Step 53500 -- 📈 Saving Learning Dynamics 2025-08-30 07:35:10 - pico-train - INFO - Step 53525 -- 🔄 Training Metrics 2025-08-30 07:35:10 - pico-train - INFO - ├── Loss: 5.9070 2025-08-30 07:35:10 - pico-train - INFO - ├── Learning Rate: 2.54e-05 2025-08-30 07:35:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:35:22 - pico-train - INFO - Step 53550 -- 🔄 Training Metrics 2025-08-30 07:35:22 - pico-train - INFO - ├── Loss: 5.8362 2025-08-30 07:35:22 - pico-train - INFO - ├── Learning Rate: 2.54e-05 2025-08-30 07:35:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:35:35 - pico-train - INFO - Step 53575 -- 🔄 Training Metrics 2025-08-30 07:35:35 - pico-train - INFO - ├── Loss: 5.8874 2025-08-30 07:35:35 - pico-train - INFO - ├── Learning Rate: 2.54e-05 2025-08-30 07:35:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:35:48 - pico-train - INFO - Step 53600 -- 🔄 Training Metrics 2025-08-30 07:35:48 - pico-train - INFO - ├── Loss: 5.8866 2025-08-30 07:35:48 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-30 07:35:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:36:00 - pico-train - INFO - Step 53625 -- 🔄 Training Metrics 2025-08-30 07:36:00 - pico-train - INFO - ├── Loss: 5.8824 2025-08-30 07:36:00 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-30 07:36:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:36:13 - pico-train - INFO - Step 53650 -- 🔄 Training Metrics 2025-08-30 07:36:13 - pico-train - INFO - ├── Loss: 5.7949 2025-08-30 07:36:13 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-30 07:36:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:36:25 - pico-train - INFO - Step 53675 -- 🔄 Training Metrics 2025-08-30 07:36:25 - pico-train - INFO - ├── Loss: 5.9849 2025-08-30 07:36:25 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-30 07:36:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:36:38 - pico-train - INFO - Step 53700 -- 🔄 Training Metrics 2025-08-30 07:36:38 - pico-train - INFO - ├── Loss: 5.9197 2025-08-30 07:36:38 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-30 07:36:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:36:50 - pico-train - INFO - Step 53725 -- 🔄 Training Metrics 2025-08-30 07:36:50 - pico-train - INFO - ├── Loss: 5.9326 2025-08-30 07:36:50 - pico-train - INFO - ├── Learning Rate: 2.52e-05 2025-08-30 07:36:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:37:03 - pico-train - INFO - Step 53750 -- 🔄 Training Metrics 2025-08-30 07:37:03 - pico-train - INFO - ├── Loss: 5.8980 2025-08-30 07:37:03 - pico-train - INFO - ├── Learning Rate: 2.52e-05 2025-08-30 07:37:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:37:16 - pico-train - INFO - Step 53775 -- 🔄 Training Metrics 2025-08-30 07:37:16 - pico-train - INFO - ├── Loss: 5.8599 2025-08-30 07:37:16 - pico-train - INFO - ├── Learning Rate: 2.52e-05 2025-08-30 07:37:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:37:28 - pico-train - INFO - Step 53800 -- 🔄 Training Metrics 2025-08-30 07:37:28 - pico-train - INFO - ├── Loss: 5.8844 2025-08-30 07:37:28 - pico-train - INFO - ├── Learning Rate: 2.52e-05 2025-08-30 07:37:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:37:41 - pico-train - INFO - Step 53825 -- 🔄 Training Metrics 2025-08-30 07:37:41 - pico-train - INFO - ├── Loss: 5.9178 2025-08-30 07:37:41 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-30 07:37:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:37:54 - pico-train - INFO - Step 53850 -- 🔄 Training Metrics 2025-08-30 07:37:54 - pico-train - INFO - ├── Loss: 5.9118 2025-08-30 07:37:54 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-30 07:37:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:38:06 - pico-train - INFO - Step 53875 -- 🔄 Training Metrics 2025-08-30 07:38:06 - pico-train - INFO - ├── Loss: 5.9270 2025-08-30 07:38:06 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-30 07:38:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:38:19 - pico-train - INFO - Step 53900 -- 🔄 Training Metrics 2025-08-30 07:38:19 - pico-train - INFO - ├── Loss: 5.8265 2025-08-30 07:38:19 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-30 07:38:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:38:31 - pico-train - INFO - Step 53925 -- 🔄 Training Metrics 2025-08-30 07:38:31 - pico-train - INFO - ├── Loss: 5.9337 2025-08-30 07:38:31 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-30 07:38:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:38:44 - pico-train - INFO - Step 53950 -- 🔄 Training Metrics 2025-08-30 07:38:44 - pico-train - INFO - ├── Loss: 5.8446 2025-08-30 07:38:44 - pico-train - INFO - ├── Learning Rate: 2.50e-05 2025-08-30 07:38:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:38:57 - pico-train - INFO - Step 53975 -- 🔄 Training Metrics 2025-08-30 07:38:57 - pico-train - INFO - ├── Loss: 5.8990 2025-08-30 07:38:57 - pico-train - INFO - ├── Learning Rate: 2.50e-05 2025-08-30 07:38:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:39:09 - pico-train - INFO - Step 54000 -- 💾 Saving Checkpoint 2025-08-30 07:41:23 - pico-train - INFO - Step 54000 -- 📊 Evaluation Results 2025-08-30 07:41:23 - pico-train - INFO - └── paloma: 7.742991230238928e+29 2025-08-30 07:41:25 - pico-train - INFO - Step 54000 -- 🔄 Training Metrics 2025-08-30 07:41:25 - pico-train - INFO - ├── Loss: 5.7472 2025-08-30 07:41:25 - pico-train - INFO - ├── Learning Rate: 2.50e-05 2025-08-30 07:41:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:41:25 - pico-train - INFO - Step 54000 -- 📈 Saving Learning Dynamics 2025-08-30 07:41:40 - pico-train - INFO - Step 54025 -- 🔄 Training Metrics 2025-08-30 07:41:40 - pico-train - INFO - ├── Loss: 5.9241 2025-08-30 07:41:40 - pico-train - INFO - ├── Learning Rate: 2.50e-05 2025-08-30 07:41:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:41:53 - pico-train - INFO - Step 54050 -- 🔄 Training Metrics 2025-08-30 07:41:53 - pico-train - INFO - ├── Loss: 5.9308 2025-08-30 07:41:53 - pico-train - INFO - ├── Learning Rate: 2.50e-05 2025-08-30 07:41:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:42:05 - pico-train - INFO - Step 54075 -- 🔄 Training Metrics 2025-08-30 07:42:05 - pico-train - INFO - ├── Loss: 5.9971 2025-08-30 07:42:05 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-30 07:42:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:42:18 - pico-train - INFO - Step 54100 -- 🔄 Training Metrics 2025-08-30 07:42:18 - pico-train - INFO - ├── Loss: 5.9050 2025-08-30 07:42:18 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-30 07:42:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:42:31 - pico-train - INFO - Step 54125 -- 🔄 Training Metrics 2025-08-30 07:42:31 - pico-train - INFO - ├── Loss: 6.0286 2025-08-30 07:42:31 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-30 07:42:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:42:43 - pico-train - INFO - Step 54150 -- 🔄 Training Metrics 2025-08-30 07:42:43 - pico-train - INFO - ├── Loss: 5.9402 2025-08-30 07:42:43 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-30 07:42:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:42:56 - pico-train - INFO - Step 54175 -- 🔄 Training Metrics 2025-08-30 07:42:56 - pico-train - INFO - ├── Loss: 5.8842 2025-08-30 07:42:56 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-30 07:42:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:43:08 - pico-train - INFO - Step 54200 -- 🔄 Training Metrics 2025-08-30 07:43:08 - pico-train - INFO - ├── Loss: 5.9398 2025-08-30 07:43:08 - pico-train - INFO - ├── Learning Rate: 2.48e-05 2025-08-30 07:43:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:43:21 - pico-train - INFO - Step 54225 -- 🔄 Training Metrics 2025-08-30 07:43:21 - pico-train - INFO - ├── Loss: 5.8637 2025-08-30 07:43:21 - pico-train - INFO - ├── Learning Rate: 2.48e-05 2025-08-30 07:43:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:43:34 - pico-train - INFO - Step 54250 -- 🔄 Training Metrics 2025-08-30 07:43:34 - pico-train - INFO - ├── Loss: 5.8928 2025-08-30 07:43:34 - pico-train - INFO - ├── Learning Rate: 2.48e-05 2025-08-30 07:43:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:43:47 - pico-train - INFO - Step 54275 -- 🔄 Training Metrics 2025-08-30 07:43:47 - pico-train - INFO - ├── Loss: 5.8749 2025-08-30 07:43:47 - pico-train - INFO - ├── Learning Rate: 2.48e-05 2025-08-30 07:43:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:43:59 - pico-train - INFO - Step 54300 -- 🔄 Training Metrics 2025-08-30 07:43:59 - pico-train - INFO - ├── Loss: 5.8800 2025-08-30 07:43:59 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-30 07:43:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:44:12 - pico-train - INFO - Step 54325 -- 🔄 Training Metrics 2025-08-30 07:44:12 - pico-train - INFO - ├── Loss: 5.9748 2025-08-30 07:44:12 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-30 07:44:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:44:25 - pico-train - INFO - Step 54350 -- 🔄 Training Metrics 2025-08-30 07:44:25 - pico-train - INFO - ├── Loss: 5.8758 2025-08-30 07:44:25 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-30 07:44:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:44:37 - pico-train - INFO - Step 54375 -- 🔄 Training Metrics 2025-08-30 07:44:37 - pico-train - INFO - ├── Loss: 6.0149 2025-08-30 07:44:37 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-30 07:44:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:44:50 - pico-train - INFO - Step 54400 -- 🔄 Training Metrics 2025-08-30 07:44:50 - pico-train - INFO - ├── Loss: 5.9165 2025-08-30 07:44:50 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-30 07:44:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:45:02 - pico-train - INFO - Step 54425 -- 🔄 Training Metrics 2025-08-30 07:45:02 - pico-train - INFO - ├── Loss: 5.8508 2025-08-30 07:45:02 - pico-train - INFO - ├── Learning Rate: 2.46e-05 2025-08-30 07:45:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:45:15 - pico-train - INFO - Step 54450 -- 🔄 Training Metrics 2025-08-30 07:45:15 - pico-train - INFO - ├── Loss: 5.9284 2025-08-30 07:45:15 - pico-train - INFO - ├── Learning Rate: 2.46e-05 2025-08-30 07:45:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:45:27 - pico-train - INFO - Step 54475 -- 🔄 Training Metrics 2025-08-30 07:45:27 - pico-train - INFO - ├── Loss: 5.9071 2025-08-30 07:45:27 - pico-train - INFO - ├── Learning Rate: 2.46e-05 2025-08-30 07:45:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:45:39 - pico-train - INFO - Step 54500 -- 💾 Saving Checkpoint 2025-08-30 07:47:45 - pico-train - INFO - Step 54500 -- 📊 Evaluation Results 2025-08-30 07:47:45 - pico-train - INFO - └── paloma: 9.839335327293338e+29 2025-08-30 07:47:47 - pico-train - INFO - Step 54500 -- 🔄 Training Metrics 2025-08-30 07:47:47 - pico-train - INFO - ├── Loss: 5.8753 2025-08-30 07:47:47 - pico-train - INFO - ├── Learning Rate: 2.46e-05 2025-08-30 07:47:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:47:47 - pico-train - INFO - Step 54500 -- 📈 Saving Learning Dynamics 2025-08-30 07:48:03 - pico-train - INFO - Step 54525 -- 🔄 Training Metrics 2025-08-30 07:48:03 - pico-train - INFO - ├── Loss: 5.9132 2025-08-30 07:48:03 - pico-train - INFO - ├── Learning Rate: 2.46e-05 2025-08-30 07:48:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:48:16 - pico-train - INFO - Step 54550 -- 🔄 Training Metrics 2025-08-30 07:48:16 - pico-train - INFO - ├── Loss: 5.9826 2025-08-30 07:48:16 - pico-train - INFO - ├── Learning Rate: 2.45e-05 2025-08-30 07:48:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:48:28 - pico-train - INFO - Step 54575 -- 🔄 Training Metrics 2025-08-30 07:48:28 - pico-train - INFO - ├── Loss: 5.8963 2025-08-30 07:48:28 - pico-train - INFO - ├── Learning Rate: 2.45e-05 2025-08-30 07:48:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:48:41 - pico-train - INFO - Step 54600 -- 🔄 Training Metrics 2025-08-30 07:48:41 - pico-train - INFO - ├── Loss: 5.9433 2025-08-30 07:48:41 - pico-train - INFO - ├── Learning Rate: 2.45e-05 2025-08-30 07:48:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:48:53 - pico-train - INFO - Step 54625 -- 🔄 Training Metrics 2025-08-30 07:48:53 - pico-train - INFO - ├── Loss: 5.9281 2025-08-30 07:48:53 - pico-train - INFO - ├── Learning Rate: 2.45e-05 2025-08-30 07:48:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:49:06 - pico-train - INFO - Step 54650 -- 🔄 Training Metrics 2025-08-30 07:49:06 - pico-train - INFO - ├── Loss: 5.8462 2025-08-30 07:49:06 - pico-train - INFO - ├── Learning Rate: 2.44e-05 2025-08-30 07:49:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:49:19 - pico-train - INFO - Step 54675 -- 🔄 Training Metrics 2025-08-30 07:49:19 - pico-train - INFO - ├── Loss: 5.9508 2025-08-30 07:49:19 - pico-train - INFO - ├── Learning Rate: 2.44e-05 2025-08-30 07:49:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:49:31 - pico-train - INFO - Step 54700 -- 🔄 Training Metrics 2025-08-30 07:49:31 - pico-train - INFO - ├── Loss: 5.8880 2025-08-30 07:49:31 - pico-train - INFO - ├── Learning Rate: 2.44e-05 2025-08-30 07:49:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:49:44 - pico-train - INFO - Step 54725 -- 🔄 Training Metrics 2025-08-30 07:49:44 - pico-train - INFO - ├── Loss: 5.8829 2025-08-30 07:49:44 - pico-train - INFO - ├── Learning Rate: 2.44e-05 2025-08-30 07:49:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:49:57 - pico-train - INFO - Step 54750 -- 🔄 Training Metrics 2025-08-30 07:49:57 - pico-train - INFO - ├── Loss: 5.9466 2025-08-30 07:49:57 - pico-train - INFO - ├── Learning Rate: 2.44e-05 2025-08-30 07:49:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:50:10 - pico-train - INFO - Step 54775 -- 🔄 Training Metrics 2025-08-30 07:50:10 - pico-train - INFO - ├── Loss: 5.9607 2025-08-30 07:50:10 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-30 07:50:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:50:22 - pico-train - INFO - Step 54800 -- 🔄 Training Metrics 2025-08-30 07:50:22 - pico-train - INFO - ├── Loss: 5.9967 2025-08-30 07:50:22 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-30 07:50:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:50:35 - pico-train - INFO - Step 54825 -- 🔄 Training Metrics 2025-08-30 07:50:35 - pico-train - INFO - ├── Loss: 5.8599 2025-08-30 07:50:35 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-30 07:50:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:50:48 - pico-train - INFO - Step 54850 -- 🔄 Training Metrics 2025-08-30 07:50:48 - pico-train - INFO - ├── Loss: 5.9756 2025-08-30 07:50:48 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-30 07:50:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:51:00 - pico-train - INFO - Step 54875 -- 🔄 Training Metrics 2025-08-30 07:51:00 - pico-train - INFO - ├── Loss: 5.8856 2025-08-30 07:51:00 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-30 07:51:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:51:13 - pico-train - INFO - Step 54900 -- 🔄 Training Metrics 2025-08-30 07:51:13 - pico-train - INFO - ├── Loss: 5.9306 2025-08-30 07:51:13 - pico-train - INFO - ├── Learning Rate: 2.42e-05 2025-08-30 07:51:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:51:26 - pico-train - INFO - Step 54925 -- 🔄 Training Metrics 2025-08-30 07:51:26 - pico-train - INFO - ├── Loss: 6.0266 2025-08-30 07:51:26 - pico-train - INFO - ├── Learning Rate: 2.42e-05 2025-08-30 07:51:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:51:38 - pico-train - INFO - Step 54950 -- 🔄 Training Metrics 2025-08-30 07:51:38 - pico-train - INFO - ├── Loss: 5.9054 2025-08-30 07:51:38 - pico-train - INFO - ├── Learning Rate: 2.42e-05 2025-08-30 07:51:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:51:51 - pico-train - INFO - Step 54975 -- 🔄 Training Metrics 2025-08-30 07:51:51 - pico-train - INFO - ├── Loss: 5.8885 2025-08-30 07:51:51 - pico-train - INFO - ├── Learning Rate: 2.42e-05 2025-08-30 07:51:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:52:03 - pico-train - INFO - Step 55000 -- 💾 Saving Checkpoint 2025-08-30 07:54:06 - pico-train - INFO - Step 55000 -- 📊 Evaluation Results 2025-08-30 07:54:06 - pico-train - INFO - └── paloma: 1.0447307155558866e+30 2025-08-30 07:54:07 - pico-train - INFO - Step 55000 -- 🔄 Training Metrics 2025-08-30 07:54:07 - pico-train - INFO - ├── Loss: 6.0147 2025-08-30 07:54:07 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-30 07:54:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:54:07 - pico-train - INFO - Step 55000 -- 📈 Saving Learning Dynamics 2025-08-30 07:54:22 - pico-train - INFO - Step 55025 -- 🔄 Training Metrics 2025-08-30 07:54:22 - pico-train - INFO - ├── Loss: 5.9628 2025-08-30 07:54:22 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-30 07:54:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:54:34 - pico-train - INFO - Step 55050 -- 🔄 Training Metrics 2025-08-30 07:54:34 - pico-train - INFO - ├── Loss: 5.9456 2025-08-30 07:54:34 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-30 07:54:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:54:47 - pico-train - INFO - Step 55075 -- 🔄 Training Metrics 2025-08-30 07:54:47 - pico-train - INFO - ├── Loss: 5.9061 2025-08-30 07:54:47 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-30 07:54:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:55:00 - pico-train - INFO - Step 55100 -- 🔄 Training Metrics 2025-08-30 07:55:00 - pico-train - INFO - ├── Loss: 5.9604 2025-08-30 07:55:00 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-30 07:55:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:55:12 - pico-train - INFO - Step 55125 -- 🔄 Training Metrics 2025-08-30 07:55:12 - pico-train - INFO - ├── Loss: 5.8649 2025-08-30 07:55:12 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-30 07:55:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:55:25 - pico-train - INFO - Step 55150 -- 🔄 Training Metrics 2025-08-30 07:55:25 - pico-train - INFO - ├── Loss: 5.8123 2025-08-30 07:55:25 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-30 07:55:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:55:37 - pico-train - INFO - Step 55175 -- 🔄 Training Metrics 2025-08-30 07:55:37 - pico-train - INFO - ├── Loss: 5.9016 2025-08-30 07:55:37 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-30 07:55:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:55:50 - pico-train - INFO - Step 55200 -- 🔄 Training Metrics 2025-08-30 07:55:50 - pico-train - INFO - ├── Loss: 5.9233 2025-08-30 07:55:50 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-30 07:55:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:56:03 - pico-train - INFO - Step 55225 -- 🔄 Training Metrics 2025-08-30 07:56:03 - pico-train - INFO - ├── Loss: 5.8768 2025-08-30 07:56:03 - pico-train - INFO - ├── Learning Rate: 2.40e-05 2025-08-30 07:56:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:56:15 - pico-train - INFO - Step 55250 -- 🔄 Training Metrics 2025-08-30 07:56:15 - pico-train - INFO - ├── Loss: 5.9265 2025-08-30 07:56:15 - pico-train - INFO - ├── Learning Rate: 2.39e-05 2025-08-30 07:56:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:56:28 - pico-train - INFO - Step 55275 -- 🔄 Training Metrics 2025-08-30 07:56:28 - pico-train - INFO - ├── Loss: 5.8929 2025-08-30 07:56:28 - pico-train - INFO - ├── Learning Rate: 2.39e-05 2025-08-30 07:56:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:56:41 - pico-train - INFO - Step 55300 -- 🔄 Training Metrics 2025-08-30 07:56:41 - pico-train - INFO - ├── Loss: 5.8863 2025-08-30 07:56:41 - pico-train - INFO - ├── Learning Rate: 2.39e-05 2025-08-30 07:56:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:56:53 - pico-train - INFO - Step 55325 -- 🔄 Training Metrics 2025-08-30 07:56:53 - pico-train - INFO - ├── Loss: 5.8588 2025-08-30 07:56:53 - pico-train - INFO - ├── Learning Rate: 2.39e-05 2025-08-30 07:56:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:57:06 - pico-train - INFO - Step 55350 -- 🔄 Training Metrics 2025-08-30 07:57:06 - pico-train - INFO - ├── Loss: 5.8740 2025-08-30 07:57:06 - pico-train - INFO - ├── Learning Rate: 2.38e-05 2025-08-30 07:57:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:57:19 - pico-train - INFO - Step 55375 -- 🔄 Training Metrics 2025-08-30 07:57:19 - pico-train - INFO - ├── Loss: 5.9163 2025-08-30 07:57:19 - pico-train - INFO - ├── Learning Rate: 2.38e-05 2025-08-30 07:57:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:57:32 - pico-train - INFO - Step 55400 -- 🔄 Training Metrics 2025-08-30 07:57:32 - pico-train - INFO - ├── Loss: 5.8563 2025-08-30 07:57:32 - pico-train - INFO - ├── Learning Rate: 2.38e-05 2025-08-30 07:57:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:57:44 - pico-train - INFO - Step 55425 -- 🔄 Training Metrics 2025-08-30 07:57:44 - pico-train - INFO - ├── Loss: 5.8964 2025-08-30 07:57:44 - pico-train - INFO - ├── Learning Rate: 2.38e-05 2025-08-30 07:57:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:57:57 - pico-train - INFO - Step 55450 -- 🔄 Training Metrics 2025-08-30 07:57:57 - pico-train - INFO - ├── Loss: 5.9860 2025-08-30 07:57:57 - pico-train - INFO - ├── Learning Rate: 2.38e-05 2025-08-30 07:57:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:58:09 - pico-train - INFO - Step 55475 -- 🔄 Training Metrics 2025-08-30 07:58:09 - pico-train - INFO - ├── Loss: 5.9124 2025-08-30 07:58:09 - pico-train - INFO - ├── Learning Rate: 2.37e-05 2025-08-30 07:58:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 07:58:22 - pico-train - INFO - Step 55500 -- 💾 Saving Checkpoint 2025-08-30 08:00:23 - pico-train - INFO - Step 55500 -- 📊 Evaluation Results 2025-08-30 08:00:23 - pico-train - INFO - └── paloma: 1.3871906758809066e+30 2025-08-30 08:00:33 - pico-train - INFO - Step 55500 -- 🔄 Training Metrics 2025-08-30 08:00:33 - pico-train - INFO - ├── Loss: 5.9344 2025-08-30 08:00:33 - pico-train - INFO - ├── Learning Rate: 2.37e-05 2025-08-30 08:00:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:00:33 - pico-train - INFO - Step 55500 -- 📈 Saving Learning Dynamics 2025-08-30 08:00:48 - pico-train - INFO - Step 55525 -- 🔄 Training Metrics 2025-08-30 08:00:48 - pico-train - INFO - ├── Loss: 5.8185 2025-08-30 08:00:48 - pico-train - INFO - ├── Learning Rate: 2.37e-05 2025-08-30 08:00:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:01:00 - pico-train - INFO - Step 55550 -- 🔄 Training Metrics 2025-08-30 08:01:00 - pico-train - INFO - ├── Loss: 5.8907 2025-08-30 08:01:00 - pico-train - INFO - ├── Learning Rate: 2.37e-05 2025-08-30 08:01:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:01:13 - pico-train - INFO - Step 55575 -- 🔄 Training Metrics 2025-08-30 08:01:13 - pico-train - INFO - ├── Loss: 5.8578 2025-08-30 08:01:13 - pico-train - INFO - ├── Learning Rate: 2.37e-05 2025-08-30 08:01:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:01:25 - pico-train - INFO - Step 55600 -- 🔄 Training Metrics 2025-08-30 08:01:25 - pico-train - INFO - ├── Loss: 5.8699 2025-08-30 08:01:25 - pico-train - INFO - ├── Learning Rate: 2.36e-05 2025-08-30 08:01:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:01:38 - pico-train - INFO - Step 55625 -- 🔄 Training Metrics 2025-08-30 08:01:38 - pico-train - INFO - ├── Loss: 5.9257 2025-08-30 08:01:38 - pico-train - INFO - ├── Learning Rate: 2.36e-05 2025-08-30 08:01:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:01:50 - pico-train - INFO - Step 55650 -- 🔄 Training Metrics 2025-08-30 08:01:50 - pico-train - INFO - ├── Loss: 5.9346 2025-08-30 08:01:50 - pico-train - INFO - ├── Learning Rate: 2.36e-05 2025-08-30 08:01:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:02:03 - pico-train - INFO - Step 55675 -- 🔄 Training Metrics 2025-08-30 08:02:03 - pico-train - INFO - ├── Loss: 5.9879 2025-08-30 08:02:03 - pico-train - INFO - ├── Learning Rate: 2.36e-05 2025-08-30 08:02:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:02:15 - pico-train - INFO - Step 55700 -- 🔄 Training Metrics 2025-08-30 08:02:15 - pico-train - INFO - ├── Loss: 5.9003 2025-08-30 08:02:15 - pico-train - INFO - ├── Learning Rate: 2.35e-05 2025-08-30 08:02:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:02:28 - pico-train - INFO - Step 55725 -- 🔄 Training Metrics 2025-08-30 08:02:28 - pico-train - INFO - ├── Loss: 5.9490 2025-08-30 08:02:28 - pico-train - INFO - ├── Learning Rate: 2.35e-05 2025-08-30 08:02:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:02:40 - pico-train - INFO - Step 55750 -- 🔄 Training Metrics 2025-08-30 08:02:40 - pico-train - INFO - ├── Loss: 5.8409 2025-08-30 08:02:40 - pico-train - INFO - ├── Learning Rate: 2.35e-05 2025-08-30 08:02:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:02:53 - pico-train - INFO - Step 55775 -- 🔄 Training Metrics 2025-08-30 08:02:53 - pico-train - INFO - ├── Loss: 5.9248 2025-08-30 08:02:53 - pico-train - INFO - ├── Learning Rate: 2.35e-05 2025-08-30 08:02:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:03:06 - pico-train - INFO - Step 55800 -- 🔄 Training Metrics 2025-08-30 08:03:06 - pico-train - INFO - ├── Loss: 5.8427 2025-08-30 08:03:06 - pico-train - INFO - ├── Learning Rate: 2.35e-05 2025-08-30 08:03:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:03:18 - pico-train - INFO - Step 55825 -- 🔄 Training Metrics 2025-08-30 08:03:18 - pico-train - INFO - ├── Loss: 5.9812 2025-08-30 08:03:18 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-30 08:03:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:03:31 - pico-train - INFO - Step 55850 -- 🔄 Training Metrics 2025-08-30 08:03:31 - pico-train - INFO - ├── Loss: 5.8846 2025-08-30 08:03:31 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-30 08:03:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:03:44 - pico-train - INFO - Step 55875 -- 🔄 Training Metrics 2025-08-30 08:03:44 - pico-train - INFO - ├── Loss: 5.8634 2025-08-30 08:03:44 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-30 08:03:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:03:56 - pico-train - INFO - Step 55900 -- 🔄 Training Metrics 2025-08-30 08:03:56 - pico-train - INFO - ├── Loss: 5.8900 2025-08-30 08:03:56 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-30 08:03:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:04:09 - pico-train - INFO - Step 55925 -- 🔄 Training Metrics 2025-08-30 08:04:09 - pico-train - INFO - ├── Loss: 5.8378 2025-08-30 08:04:09 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-30 08:04:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:04:21 - pico-train - INFO - Step 55950 -- 🔄 Training Metrics 2025-08-30 08:04:21 - pico-train - INFO - ├── Loss: 5.8298 2025-08-30 08:04:21 - pico-train - INFO - ├── Learning Rate: 2.33e-05 2025-08-30 08:04:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:04:34 - pico-train - INFO - Step 55975 -- 🔄 Training Metrics 2025-08-30 08:04:34 - pico-train - INFO - ├── Loss: 6.0091 2025-08-30 08:04:34 - pico-train - INFO - ├── Learning Rate: 2.33e-05 2025-08-30 08:04:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:04:46 - pico-train - INFO - Step 56000 -- 💾 Saving Checkpoint 2025-08-30 08:06:44 - pico-train - INFO - Step 56000 -- 📊 Evaluation Results 2025-08-30 08:06:44 - pico-train - INFO - └── paloma: 1.5920277240703402e+30 2025-08-30 08:06:45 - pico-train - INFO - Step 56000 -- 🔄 Training Metrics 2025-08-30 08:06:45 - pico-train - INFO - ├── Loss: 5.9868 2025-08-30 08:06:45 - pico-train - INFO - ├── Learning Rate: 2.33e-05 2025-08-30 08:06:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:06:45 - pico-train - INFO - Step 56000 -- 📈 Saving Learning Dynamics 2025-08-30 08:07:00 - pico-train - INFO - Step 56025 -- 🔄 Training Metrics 2025-08-30 08:07:00 - pico-train - INFO - ├── Loss: 5.9389 2025-08-30 08:07:00 - pico-train - INFO - ├── Learning Rate: 2.33e-05 2025-08-30 08:07:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:07:13 - pico-train - INFO - Step 56050 -- 🔄 Training Metrics 2025-08-30 08:07:13 - pico-train - INFO - ├── Loss: 5.8835 2025-08-30 08:07:13 - pico-train - INFO - ├── Learning Rate: 2.33e-05 2025-08-30 08:07:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:07:25 - pico-train - INFO - Step 56075 -- 🔄 Training Metrics 2025-08-30 08:07:25 - pico-train - INFO - ├── Loss: 5.8286 2025-08-30 08:07:25 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-30 08:07:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:07:38 - pico-train - INFO - Step 56100 -- 🔄 Training Metrics 2025-08-30 08:07:38 - pico-train - INFO - ├── Loss: 5.8313 2025-08-30 08:07:38 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-30 08:07:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:07:51 - pico-train - INFO - Step 56125 -- 🔄 Training Metrics 2025-08-30 08:07:51 - pico-train - INFO - ├── Loss: 5.8921 2025-08-30 08:07:51 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-30 08:07:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:08:03 - pico-train - INFO - Step 56150 -- 🔄 Training Metrics 2025-08-30 08:08:03 - pico-train - INFO - ├── Loss: 5.8274 2025-08-30 08:08:03 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-30 08:08:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:08:16 - pico-train - INFO - Step 56175 -- 🔄 Training Metrics 2025-08-30 08:08:16 - pico-train - INFO - ├── Loss: 5.9244 2025-08-30 08:08:16 - pico-train - INFO - ├── Learning Rate: 2.31e-05 2025-08-30 08:08:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:08:29 - pico-train - INFO - Step 56200 -- 🔄 Training Metrics 2025-08-30 08:08:29 - pico-train - INFO - ├── Loss: 6.0019 2025-08-30 08:08:29 - pico-train - INFO - ├── Learning Rate: 2.31e-05 2025-08-30 08:08:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:08:41 - pico-train - INFO - Step 56225 -- 🔄 Training Metrics 2025-08-30 08:08:41 - pico-train - INFO - ├── Loss: 5.8920 2025-08-30 08:08:41 - pico-train - INFO - ├── Learning Rate: 2.31e-05 2025-08-30 08:08:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:08:54 - pico-train - INFO - Step 56250 -- 🔄 Training Metrics 2025-08-30 08:08:54 - pico-train - INFO - ├── Loss: 5.8811 2025-08-30 08:08:54 - pico-train - INFO - ├── Learning Rate: 2.31e-05 2025-08-30 08:08:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:09:07 - pico-train - INFO - Step 56275 -- 🔄 Training Metrics 2025-08-30 08:09:07 - pico-train - INFO - ├── Loss: 5.9166 2025-08-30 08:09:07 - pico-train - INFO - ├── Learning Rate: 2.31e-05 2025-08-30 08:09:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:09:19 - pico-train - INFO - Step 56300 -- 🔄 Training Metrics 2025-08-30 08:09:19 - pico-train - INFO - ├── Loss: 5.8974 2025-08-30 08:09:19 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-30 08:09:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:09:32 - pico-train - INFO - Step 56325 -- 🔄 Training Metrics 2025-08-30 08:09:32 - pico-train - INFO - ├── Loss: 5.8989 2025-08-30 08:09:32 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-30 08:09:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:09:45 - pico-train - INFO - Step 56350 -- 🔄 Training Metrics 2025-08-30 08:09:45 - pico-train - INFO - ├── Loss: 5.8976 2025-08-30 08:09:45 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-30 08:09:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:09:57 - pico-train - INFO - Step 56375 -- 🔄 Training Metrics 2025-08-30 08:09:57 - pico-train - INFO - ├── Loss: 5.9189 2025-08-30 08:09:57 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-30 08:09:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:10:10 - pico-train - INFO - Step 56400 -- 🔄 Training Metrics 2025-08-30 08:10:10 - pico-train - INFO - ├── Loss: 5.8489 2025-08-30 08:10:10 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-30 08:10:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:10:23 - pico-train - INFO - Step 56425 -- 🔄 Training Metrics 2025-08-30 08:10:23 - pico-train - INFO - ├── Loss: 5.9099 2025-08-30 08:10:23 - pico-train - INFO - ├── Learning Rate: 2.29e-05 2025-08-30 08:10:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:10:35 - pico-train - INFO - Step 56450 -- 🔄 Training Metrics 2025-08-30 08:10:35 - pico-train - INFO - ├── Loss: 5.8612 2025-08-30 08:10:35 - pico-train - INFO - ├── Learning Rate: 2.29e-05 2025-08-30 08:10:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:10:48 - pico-train - INFO - Step 56475 -- 🔄 Training Metrics 2025-08-30 08:10:48 - pico-train - INFO - ├── Loss: 5.8795 2025-08-30 08:10:48 - pico-train - INFO - ├── Learning Rate: 2.29e-05 2025-08-30 08:10:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:11:00 - pico-train - INFO - Step 56500 -- 💾 Saving Checkpoint 2025-08-30 08:12:53 - pico-train - INFO - Step 56500 -- 📊 Evaluation Results 2025-08-30 08:12:53 - pico-train - INFO - └── paloma: 1.7892090663438402e+30 2025-08-30 08:12:55 - pico-train - INFO - Step 56500 -- 🔄 Training Metrics 2025-08-30 08:12:55 - pico-train - INFO - ├── Loss: 5.8945 2025-08-30 08:12:55 - pico-train - INFO - ├── Learning Rate: 2.29e-05 2025-08-30 08:12:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:12:55 - pico-train - INFO - Step 56500 -- 📈 Saving Learning Dynamics 2025-08-30 08:13:10 - pico-train - INFO - Step 56525 -- 🔄 Training Metrics 2025-08-30 08:13:10 - pico-train - INFO - ├── Loss: 5.8448 2025-08-30 08:13:10 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-30 08:13:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:13:23 - pico-train - INFO - Step 56550 -- 🔄 Training Metrics 2025-08-30 08:13:23 - pico-train - INFO - ├── Loss: 5.8696 2025-08-30 08:13:23 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-30 08:13:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:13:35 - pico-train - INFO - Step 56575 -- 🔄 Training Metrics 2025-08-30 08:13:35 - pico-train - INFO - ├── Loss: 5.8567 2025-08-30 08:13:35 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-30 08:13:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:13:48 - pico-train - INFO - Step 56600 -- 🔄 Training Metrics 2025-08-30 08:13:48 - pico-train - INFO - ├── Loss: 5.8884 2025-08-30 08:13:48 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-30 08:13:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:14:01 - pico-train - INFO - Step 56625 -- 🔄 Training Metrics 2025-08-30 08:14:01 - pico-train - INFO - ├── Loss: 5.9933 2025-08-30 08:14:01 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-30 08:14:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:14:13 - pico-train - INFO - Step 56650 -- 🔄 Training Metrics 2025-08-30 08:14:13 - pico-train - INFO - ├── Loss: 5.8716 2025-08-30 08:14:13 - pico-train - INFO - ├── Learning Rate: 2.27e-05 2025-08-30 08:14:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:14:26 - pico-train - INFO - Step 56675 -- 🔄 Training Metrics 2025-08-30 08:14:26 - pico-train - INFO - ├── Loss: 5.9089 2025-08-30 08:14:26 - pico-train - INFO - ├── Learning Rate: 2.27e-05 2025-08-30 08:14:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:14:39 - pico-train - INFO - Step 56700 -- 🔄 Training Metrics 2025-08-30 08:14:39 - pico-train - INFO - ├── Loss: 5.8769 2025-08-30 08:14:39 - pico-train - INFO - ├── Learning Rate: 2.27e-05 2025-08-30 08:14:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:14:51 - pico-train - INFO - Step 56725 -- 🔄 Training Metrics 2025-08-30 08:14:51 - pico-train - INFO - ├── Loss: 5.8934 2025-08-30 08:14:51 - pico-train - INFO - ├── Learning Rate: 2.27e-05 2025-08-30 08:14:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:15:04 - pico-train - INFO - Step 56750 -- 🔄 Training Metrics 2025-08-30 08:15:04 - pico-train - INFO - ├── Loss: 5.8711 2025-08-30 08:15:04 - pico-train - INFO - ├── Learning Rate: 2.27e-05 2025-08-30 08:15:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:15:16 - pico-train - INFO - Step 56775 -- 🔄 Training Metrics 2025-08-30 08:15:16 - pico-train - INFO - ├── Loss: 5.8866 2025-08-30 08:15:16 - pico-train - INFO - ├── Learning Rate: 2.26e-05 2025-08-30 08:15:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:15:29 - pico-train - INFO - Step 56800 -- 🔄 Training Metrics 2025-08-30 08:15:29 - pico-train - INFO - ├── Loss: 5.9154 2025-08-30 08:15:29 - pico-train - INFO - ├── Learning Rate: 2.26e-05 2025-08-30 08:15:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:15:42 - pico-train - INFO - Step 56825 -- 🔄 Training Metrics 2025-08-30 08:15:42 - pico-train - INFO - ├── Loss: 5.8844 2025-08-30 08:15:42 - pico-train - INFO - ├── Learning Rate: 2.26e-05 2025-08-30 08:15:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:15:54 - pico-train - INFO - Step 56850 -- 🔄 Training Metrics 2025-08-30 08:15:54 - pico-train - INFO - ├── Loss: 5.9142 2025-08-30 08:15:54 - pico-train - INFO - ├── Learning Rate: 2.26e-05 2025-08-30 08:15:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:16:07 - pico-train - INFO - Step 56875 -- 🔄 Training Metrics 2025-08-30 08:16:07 - pico-train - INFO - ├── Loss: 5.8741 2025-08-30 08:16:07 - pico-train - INFO - ├── Learning Rate: 2.25e-05 2025-08-30 08:16:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:16:20 - pico-train - INFO - Step 56900 -- 🔄 Training Metrics 2025-08-30 08:16:20 - pico-train - INFO - ├── Loss: 5.9399 2025-08-30 08:16:20 - pico-train - INFO - ├── Learning Rate: 2.25e-05 2025-08-30 08:16:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:16:32 - pico-train - INFO - Step 56925 -- 🔄 Training Metrics 2025-08-30 08:16:32 - pico-train - INFO - ├── Loss: 5.8245 2025-08-30 08:16:32 - pico-train - INFO - ├── Learning Rate: 2.25e-05 2025-08-30 08:16:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:16:45 - pico-train - INFO - Step 56950 -- 🔄 Training Metrics 2025-08-30 08:16:45 - pico-train - INFO - ├── Loss: 5.9157 2025-08-30 08:16:45 - pico-train - INFO - ├── Learning Rate: 2.25e-05 2025-08-30 08:16:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:16:58 - pico-train - INFO - Step 56975 -- 🔄 Training Metrics 2025-08-30 08:16:58 - pico-train - INFO - ├── Loss: 5.8869 2025-08-30 08:16:58 - pico-train - INFO - ├── Learning Rate: 2.25e-05 2025-08-30 08:16:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:17:10 - pico-train - INFO - Step 57000 -- 💾 Saving Checkpoint 2025-08-30 08:19:09 - pico-train - INFO - Step 57000 -- 📊 Evaluation Results 2025-08-30 08:19:09 - pico-train - INFO - └── paloma: 2.2911292914273982e+30 2025-08-30 08:19:12 - pico-train - INFO - Step 57000 -- 🔄 Training Metrics 2025-08-30 08:19:12 - pico-train - INFO - ├── Loss: 5.8465 2025-08-30 08:19:12 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 08:19:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:19:12 - pico-train - INFO - Step 57000 -- 📈 Saving Learning Dynamics 2025-08-30 08:19:27 - pico-train - INFO - Step 57025 -- 🔄 Training Metrics 2025-08-30 08:19:27 - pico-train - INFO - ├── Loss: 5.8565 2025-08-30 08:19:27 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 08:19:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:19:39 - pico-train - INFO - Step 57050 -- 🔄 Training Metrics 2025-08-30 08:19:39 - pico-train - INFO - ├── Loss: 5.8628 2025-08-30 08:19:39 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 08:19:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:19:52 - pico-train - INFO - Step 57075 -- 🔄 Training Metrics 2025-08-30 08:19:52 - pico-train - INFO - ├── Loss: 5.9016 2025-08-30 08:19:52 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 08:19:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:20:04 - pico-train - INFO - Step 57100 -- 🔄 Training Metrics 2025-08-30 08:20:04 - pico-train - INFO - ├── Loss: 5.9662 2025-08-30 08:20:04 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 08:20:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:20:17 - pico-train - INFO - Step 57125 -- 🔄 Training Metrics 2025-08-30 08:20:17 - pico-train - INFO - ├── Loss: 5.8192 2025-08-30 08:20:17 - pico-train - INFO - ├── Learning Rate: 2.23e-05 2025-08-30 08:20:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:20:30 - pico-train - INFO - Step 57150 -- 🔄 Training Metrics 2025-08-30 08:20:30 - pico-train - INFO - ├── Loss: 5.9000 2025-08-30 08:20:30 - pico-train - INFO - ├── Learning Rate: 2.23e-05 2025-08-30 08:20:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:20:42 - pico-train - INFO - Step 57175 -- 🔄 Training Metrics 2025-08-30 08:20:42 - pico-train - INFO - ├── Loss: 5.7458 2025-08-30 08:20:42 - pico-train - INFO - ├── Learning Rate: 2.23e-05 2025-08-30 08:20:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:20:55 - pico-train - INFO - Step 57200 -- 🔄 Training Metrics 2025-08-30 08:20:55 - pico-train - INFO - ├── Loss: 5.8635 2025-08-30 08:20:55 - pico-train - INFO - ├── Learning Rate: 2.23e-05 2025-08-30 08:20:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:21:08 - pico-train - INFO - Step 57225 -- 🔄 Training Metrics 2025-08-30 08:21:08 - pico-train - INFO - ├── Loss: 5.9097 2025-08-30 08:21:08 - pico-train - INFO - ├── Learning Rate: 2.23e-05 2025-08-30 08:21:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:21:20 - pico-train - INFO - Step 57250 -- 🔄 Training Metrics 2025-08-30 08:21:20 - pico-train - INFO - ├── Loss: 5.9121 2025-08-30 08:21:20 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-30 08:21:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:21:33 - pico-train - INFO - Step 57275 -- 🔄 Training Metrics 2025-08-30 08:21:33 - pico-train - INFO - ├── Loss: 5.8948 2025-08-30 08:21:33 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-30 08:21:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:21:46 - pico-train - INFO - Step 57300 -- 🔄 Training Metrics 2025-08-30 08:21:46 - pico-train - INFO - ├── Loss: 5.8280 2025-08-30 08:21:46 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-30 08:21:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:21:58 - pico-train - INFO - Step 57325 -- 🔄 Training Metrics 2025-08-30 08:21:58 - pico-train - INFO - ├── Loss: 5.8445 2025-08-30 08:21:58 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-30 08:21:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:22:11 - pico-train - INFO - Step 57350 -- 🔄 Training Metrics 2025-08-30 08:22:11 - pico-train - INFO - ├── Loss: 5.9213 2025-08-30 08:22:11 - pico-train - INFO - ├── Learning Rate: 2.21e-05 2025-08-30 08:22:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:22:24 - pico-train - INFO - Step 57375 -- 🔄 Training Metrics 2025-08-30 08:22:24 - pico-train - INFO - ├── Loss: 5.9795 2025-08-30 08:22:24 - pico-train - INFO - ├── Learning Rate: 2.21e-05 2025-08-30 08:22:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:22:36 - pico-train - INFO - Step 57400 -- 🔄 Training Metrics 2025-08-30 08:22:36 - pico-train - INFO - ├── Loss: 5.9827 2025-08-30 08:22:36 - pico-train - INFO - ├── Learning Rate: 2.21e-05 2025-08-30 08:22:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:22:49 - pico-train - INFO - Step 57425 -- 🔄 Training Metrics 2025-08-30 08:22:49 - pico-train - INFO - ├── Loss: 5.9802 2025-08-30 08:22:49 - pico-train - INFO - ├── Learning Rate: 2.21e-05 2025-08-30 08:22:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:23:01 - pico-train - INFO - Step 57450 -- 🔄 Training Metrics 2025-08-30 08:23:01 - pico-train - INFO - ├── Loss: 5.8669 2025-08-30 08:23:01 - pico-train - INFO - ├── Learning Rate: 2.21e-05 2025-08-30 08:23:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:23:14 - pico-train - INFO - Step 57475 -- 🔄 Training Metrics 2025-08-30 08:23:14 - pico-train - INFO - ├── Loss: 5.8762 2025-08-30 08:23:14 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-30 08:23:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:23:26 - pico-train - INFO - Step 57500 -- 💾 Saving Checkpoint 2025-08-30 08:25:33 - pico-train - INFO - Step 57500 -- 📊 Evaluation Results 2025-08-30 08:25:33 - pico-train - INFO - └── paloma: 2.2146898668006388e+30 2025-08-30 08:25:35 - pico-train - INFO - Step 57500 -- 🔄 Training Metrics 2025-08-30 08:25:35 - pico-train - INFO - ├── Loss: 5.8685 2025-08-30 08:25:35 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-30 08:25:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:25:35 - pico-train - INFO - Step 57500 -- 📈 Saving Learning Dynamics 2025-08-30 08:25:49 - pico-train - INFO - Step 57525 -- 🔄 Training Metrics 2025-08-30 08:25:49 - pico-train - INFO - ├── Loss: 5.8952 2025-08-30 08:25:49 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-30 08:25:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:26:02 - pico-train - INFO - Step 57550 -- 🔄 Training Metrics 2025-08-30 08:26:02 - pico-train - INFO - ├── Loss: 5.8838 2025-08-30 08:26:02 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-30 08:26:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:26:15 - pico-train - INFO - Step 57575 -- 🔄 Training Metrics 2025-08-30 08:26:15 - pico-train - INFO - ├── Loss: 5.8700 2025-08-30 08:26:15 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-30 08:26:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:26:28 - pico-train - INFO - Step 57600 -- 🔄 Training Metrics 2025-08-30 08:26:28 - pico-train - INFO - ├── Loss: 5.8803 2025-08-30 08:26:28 - pico-train - INFO - ├── Learning Rate: 2.19e-05 2025-08-30 08:26:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:26:40 - pico-train - INFO - Step 57625 -- 🔄 Training Metrics 2025-08-30 08:26:40 - pico-train - INFO - ├── Loss: 5.9336 2025-08-30 08:26:40 - pico-train - INFO - ├── Learning Rate: 2.19e-05 2025-08-30 08:26:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:26:53 - pico-train - INFO - Step 57650 -- 🔄 Training Metrics 2025-08-30 08:26:53 - pico-train - INFO - ├── Loss: 5.8840 2025-08-30 08:26:53 - pico-train - INFO - ├── Learning Rate: 2.19e-05 2025-08-30 08:26:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:27:06 - pico-train - INFO - Step 57675 -- 🔄 Training Metrics 2025-08-30 08:27:06 - pico-train - INFO - ├── Loss: 5.9388 2025-08-30 08:27:06 - pico-train - INFO - ├── Learning Rate: 2.19e-05 2025-08-30 08:27:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:27:18 - pico-train - INFO - Step 57700 -- 🔄 Training Metrics 2025-08-30 08:27:18 - pico-train - INFO - ├── Loss: 5.9069 2025-08-30 08:27:18 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 08:27:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:27:31 - pico-train - INFO - Step 57725 -- 🔄 Training Metrics 2025-08-30 08:27:31 - pico-train - INFO - ├── Loss: 5.9429 2025-08-30 08:27:31 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 08:27:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:27:44 - pico-train - INFO - Step 57750 -- 🔄 Training Metrics 2025-08-30 08:27:44 - pico-train - INFO - ├── Loss: 5.8362 2025-08-30 08:27:44 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 08:27:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:27:56 - pico-train - INFO - Step 57775 -- 🔄 Training Metrics 2025-08-30 08:27:56 - pico-train - INFO - ├── Loss: 5.8943 2025-08-30 08:27:56 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 08:27:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:28:09 - pico-train - INFO - Step 57800 -- 🔄 Training Metrics 2025-08-30 08:28:09 - pico-train - INFO - ├── Loss: 5.8114 2025-08-30 08:28:09 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 08:28:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:28:22 - pico-train - INFO - Step 57825 -- 🔄 Training Metrics 2025-08-30 08:28:22 - pico-train - INFO - ├── Loss: 5.9848 2025-08-30 08:28:22 - pico-train - INFO - ├── Learning Rate: 2.17e-05 2025-08-30 08:28:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:28:34 - pico-train - INFO - Step 57850 -- 🔄 Training Metrics 2025-08-30 08:28:34 - pico-train - INFO - ├── Loss: 5.8611 2025-08-30 08:28:34 - pico-train - INFO - ├── Learning Rate: 2.17e-05 2025-08-30 08:28:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:28:47 - pico-train - INFO - Step 57875 -- 🔄 Training Metrics 2025-08-30 08:28:47 - pico-train - INFO - ├── Loss: 5.9010 2025-08-30 08:28:47 - pico-train - INFO - ├── Learning Rate: 2.17e-05 2025-08-30 08:28:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:29:00 - pico-train - INFO - Step 57900 -- 🔄 Training Metrics 2025-08-30 08:29:00 - pico-train - INFO - ├── Loss: 5.8876 2025-08-30 08:29:00 - pico-train - INFO - ├── Learning Rate: 2.17e-05 2025-08-30 08:29:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:29:12 - pico-train - INFO - Step 57925 -- 🔄 Training Metrics 2025-08-30 08:29:12 - pico-train - INFO - ├── Loss: 5.9053 2025-08-30 08:29:12 - pico-train - INFO - ├── Learning Rate: 2.17e-05 2025-08-30 08:29:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:29:25 - pico-train - INFO - Step 57950 -- 🔄 Training Metrics 2025-08-30 08:29:25 - pico-train - INFO - ├── Loss: 5.9021 2025-08-30 08:29:25 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 08:29:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:29:38 - pico-train - INFO - Step 57975 -- 🔄 Training Metrics 2025-08-30 08:29:38 - pico-train - INFO - ├── Loss: 5.8546 2025-08-30 08:29:38 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 08:29:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:29:50 - pico-train - INFO - Step 58000 -- 💾 Saving Checkpoint 2025-08-30 08:31:44 - pico-train - INFO - Step 58000 -- 📊 Evaluation Results 2025-08-30 08:31:44 - pico-train - INFO - └── paloma: 2.9327628683408786e+30 2025-08-30 08:31:47 - pico-train - INFO - Step 58000 -- 🔄 Training Metrics 2025-08-30 08:31:47 - pico-train - INFO - ├── Loss: 5.8753 2025-08-30 08:31:47 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 08:31:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:31:47 - pico-train - INFO - Step 58000 -- 📈 Saving Learning Dynamics 2025-08-30 08:32:02 - pico-train - INFO - Step 58025 -- 🔄 Training Metrics 2025-08-30 08:32:02 - pico-train - INFO - ├── Loss: 5.8882 2025-08-30 08:32:02 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 08:32:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:32:15 - pico-train - INFO - Step 58050 -- 🔄 Training Metrics 2025-08-30 08:32:15 - pico-train - INFO - ├── Loss: 5.8783 2025-08-30 08:32:15 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 08:32:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:32:27 - pico-train - INFO - Step 58075 -- 🔄 Training Metrics 2025-08-30 08:32:27 - pico-train - INFO - ├── Loss: 5.8479 2025-08-30 08:32:27 - pico-train - INFO - ├── Learning Rate: 2.15e-05 2025-08-30 08:32:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:32:40 - pico-train - INFO - Step 58100 -- 🔄 Training Metrics 2025-08-30 08:32:40 - pico-train - INFO - ├── Loss: 5.8465 2025-08-30 08:32:40 - pico-train - INFO - ├── Learning Rate: 2.15e-05 2025-08-30 08:32:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:32:53 - pico-train - INFO - Step 58125 -- 🔄 Training Metrics 2025-08-30 08:32:53 - pico-train - INFO - ├── Loss: 5.8889 2025-08-30 08:32:53 - pico-train - INFO - ├── Learning Rate: 2.15e-05 2025-08-30 08:32:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:33:05 - pico-train - INFO - Step 58150 -- 🔄 Training Metrics 2025-08-30 08:33:05 - pico-train - INFO - ├── Loss: 5.8143 2025-08-30 08:33:05 - pico-train - INFO - ├── Learning Rate: 2.15e-05 2025-08-30 08:33:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:33:18 - pico-train - INFO - Step 58175 -- 🔄 Training Metrics 2025-08-30 08:33:18 - pico-train - INFO - ├── Loss: 5.9133 2025-08-30 08:33:18 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-30 08:33:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:33:31 - pico-train - INFO - Step 58200 -- 🔄 Training Metrics 2025-08-30 08:33:31 - pico-train - INFO - ├── Loss: 5.8496 2025-08-30 08:33:31 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-30 08:33:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:33:43 - pico-train - INFO - Step 58225 -- 🔄 Training Metrics 2025-08-30 08:33:43 - pico-train - INFO - ├── Loss: 5.9211 2025-08-30 08:33:43 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-30 08:33:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:33:56 - pico-train - INFO - Step 58250 -- 🔄 Training Metrics 2025-08-30 08:33:56 - pico-train - INFO - ├── Loss: 5.8764 2025-08-30 08:33:56 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-30 08:33:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:34:09 - pico-train - INFO - Step 58275 -- 🔄 Training Metrics 2025-08-30 08:34:09 - pico-train - INFO - ├── Loss: 5.9342 2025-08-30 08:34:09 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-30 08:34:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:34:21 - pico-train - INFO - Step 58300 -- 🔄 Training Metrics 2025-08-30 08:34:21 - pico-train - INFO - ├── Loss: 5.8601 2025-08-30 08:34:21 - pico-train - INFO - ├── Learning Rate: 2.13e-05 2025-08-30 08:34:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:34:34 - pico-train - INFO - Step 58325 -- 🔄 Training Metrics 2025-08-30 08:34:34 - pico-train - INFO - ├── Loss: 5.8394 2025-08-30 08:34:34 - pico-train - INFO - ├── Learning Rate: 2.13e-05 2025-08-30 08:34:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:34:46 - pico-train - INFO - Step 58350 -- 🔄 Training Metrics 2025-08-30 08:34:46 - pico-train - INFO - ├── Loss: 5.9285 2025-08-30 08:34:46 - pico-train - INFO - ├── Learning Rate: 2.13e-05 2025-08-30 08:34:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:34:59 - pico-train - INFO - Step 58375 -- 🔄 Training Metrics 2025-08-30 08:34:59 - pico-train - INFO - ├── Loss: 5.8421 2025-08-30 08:34:59 - pico-train - INFO - ├── Learning Rate: 2.13e-05 2025-08-30 08:34:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:35:12 - pico-train - INFO - Step 58400 -- 🔄 Training Metrics 2025-08-30 08:35:12 - pico-train - INFO - ├── Loss: 5.7891 2025-08-30 08:35:12 - pico-train - INFO - ├── Learning Rate: 2.13e-05 2025-08-30 08:35:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:35:25 - pico-train - INFO - Step 58425 -- 🔄 Training Metrics 2025-08-30 08:35:25 - pico-train - INFO - ├── Loss: 5.8921 2025-08-30 08:35:25 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-30 08:35:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:35:37 - pico-train - INFO - Step 58450 -- 🔄 Training Metrics 2025-08-30 08:35:37 - pico-train - INFO - ├── Loss: 5.8410 2025-08-30 08:35:37 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-30 08:35:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:35:50 - pico-train - INFO - Step 58475 -- 🔄 Training Metrics 2025-08-30 08:35:50 - pico-train - INFO - ├── Loss: 5.8166 2025-08-30 08:35:50 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-30 08:35:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:36:02 - pico-train - INFO - Step 58500 -- 💾 Saving Checkpoint 2025-08-30 08:37:56 - pico-train - INFO - Step 58500 -- 📊 Evaluation Results 2025-08-30 08:37:56 - pico-train - INFO - └── paloma: 2.9542125550009274e+30 2025-08-30 08:38:01 - pico-train - INFO - Step 58500 -- 🔄 Training Metrics 2025-08-30 08:38:01 - pico-train - INFO - ├── Loss: 5.8586 2025-08-30 08:38:01 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-30 08:38:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:38:01 - pico-train - INFO - Step 58500 -- 📈 Saving Learning Dynamics 2025-08-30 08:38:16 - pico-train - INFO - Step 58525 -- 🔄 Training Metrics 2025-08-30 08:38:16 - pico-train - INFO - ├── Loss: 5.8248 2025-08-30 08:38:16 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-30 08:38:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:38:29 - pico-train - INFO - Step 58550 -- 🔄 Training Metrics 2025-08-30 08:38:29 - pico-train - INFO - ├── Loss: 5.8162 2025-08-30 08:38:29 - pico-train - INFO - ├── Learning Rate: 2.11e-05 2025-08-30 08:38:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:38:41 - pico-train - INFO - Step 58575 -- 🔄 Training Metrics 2025-08-30 08:38:41 - pico-train - INFO - ├── Loss: 5.9361 2025-08-30 08:38:41 - pico-train - INFO - ├── Learning Rate: 2.11e-05 2025-08-30 08:38:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:38:54 - pico-train - INFO - Step 58600 -- 🔄 Training Metrics 2025-08-30 08:38:54 - pico-train - INFO - ├── Loss: 5.8945 2025-08-30 08:38:54 - pico-train - INFO - ├── Learning Rate: 2.11e-05 2025-08-30 08:38:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:39:06 - pico-train - INFO - Step 58625 -- 🔄 Training Metrics 2025-08-30 08:39:06 - pico-train - INFO - ├── Loss: 5.7984 2025-08-30 08:39:06 - pico-train - INFO - ├── Learning Rate: 2.11e-05 2025-08-30 08:39:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:39:19 - pico-train - INFO - Step 58650 -- 🔄 Training Metrics 2025-08-30 08:39:19 - pico-train - INFO - ├── Loss: 5.8764 2025-08-30 08:39:19 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 08:39:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:39:32 - pico-train - INFO - Step 58675 -- 🔄 Training Metrics 2025-08-30 08:39:32 - pico-train - INFO - ├── Loss: 5.9141 2025-08-30 08:39:32 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 08:39:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:39:44 - pico-train - INFO - Step 58700 -- 🔄 Training Metrics 2025-08-30 08:39:44 - pico-train - INFO - ├── Loss: 5.9118 2025-08-30 08:39:44 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 08:39:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:39:57 - pico-train - INFO - Step 58725 -- 🔄 Training Metrics 2025-08-30 08:39:57 - pico-train - INFO - ├── Loss: 5.8585 2025-08-30 08:39:57 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 08:39:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:40:10 - pico-train - INFO - Step 58750 -- 🔄 Training Metrics 2025-08-30 08:40:10 - pico-train - INFO - ├── Loss: 5.8661 2025-08-30 08:40:10 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 08:40:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:40:22 - pico-train - INFO - Step 58775 -- 🔄 Training Metrics 2025-08-30 08:40:22 - pico-train - INFO - ├── Loss: 5.8330 2025-08-30 08:40:22 - pico-train - INFO - ├── Learning Rate: 2.09e-05 2025-08-30 08:40:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:40:35 - pico-train - INFO - Step 58800 -- 🔄 Training Metrics 2025-08-30 08:40:35 - pico-train - INFO - ├── Loss: 5.8415 2025-08-30 08:40:35 - pico-train - INFO - ├── Learning Rate: 2.09e-05 2025-08-30 08:40:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:40:47 - pico-train - INFO - Step 58825 -- 🔄 Training Metrics 2025-08-30 08:40:47 - pico-train - INFO - ├── Loss: 5.9273 2025-08-30 08:40:47 - pico-train - INFO - ├── Learning Rate: 2.09e-05 2025-08-30 08:40:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:41:00 - pico-train - INFO - Step 58850 -- 🔄 Training Metrics 2025-08-30 08:41:00 - pico-train - INFO - ├── Loss: 5.8663 2025-08-30 08:41:00 - pico-train - INFO - ├── Learning Rate: 2.09e-05 2025-08-30 08:41:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:41:14 - pico-train - INFO - Step 58875 -- 🔄 Training Metrics 2025-08-30 08:41:14 - pico-train - INFO - ├── Loss: 5.8209 2025-08-30 08:41:14 - pico-train - INFO - ├── Learning Rate: 2.09e-05 2025-08-30 08:41:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:41:26 - pico-train - INFO - Step 58900 -- 🔄 Training Metrics 2025-08-30 08:41:26 - pico-train - INFO - ├── Loss: 5.9101 2025-08-30 08:41:26 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 08:41:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:41:39 - pico-train - INFO - Step 58925 -- 🔄 Training Metrics 2025-08-30 08:41:39 - pico-train - INFO - ├── Loss: 5.9064 2025-08-30 08:41:39 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 08:41:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:41:51 - pico-train - INFO - Step 58950 -- 🔄 Training Metrics 2025-08-30 08:41:51 - pico-train - INFO - ├── Loss: 5.8527 2025-08-30 08:41:51 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 08:41:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:42:04 - pico-train - INFO - Step 58975 -- 🔄 Training Metrics 2025-08-30 08:42:04 - pico-train - INFO - ├── Loss: 5.8115 2025-08-30 08:42:04 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 08:42:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:42:16 - pico-train - INFO - Step 59000 -- 💾 Saving Checkpoint 2025-08-30 08:44:14 - pico-train - INFO - Step 59000 -- 📊 Evaluation Results 2025-08-30 08:44:14 - pico-train - INFO - └── paloma: 3.916054030122377e+30 2025-08-30 08:44:17 - pico-train - INFO - Step 59000 -- 🔄 Training Metrics 2025-08-30 08:44:17 - pico-train - INFO - ├── Loss: 5.8043 2025-08-30 08:44:17 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 08:44:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:44:17 - pico-train - INFO - Step 59000 -- 📈 Saving Learning Dynamics 2025-08-30 08:44:33 - pico-train - INFO - Step 59025 -- 🔄 Training Metrics 2025-08-30 08:44:33 - pico-train - INFO - ├── Loss: 5.7710 2025-08-30 08:44:33 - pico-train - INFO - ├── Learning Rate: 2.07e-05 2025-08-30 08:44:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:44:45 - pico-train - INFO - Step 59050 -- 🔄 Training Metrics 2025-08-30 08:44:45 - pico-train - INFO - ├── Loss: 5.8913 2025-08-30 08:44:45 - pico-train - INFO - ├── Learning Rate: 2.07e-05 2025-08-30 08:44:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:44:59 - pico-train - INFO - Step 59075 -- 🔄 Training Metrics 2025-08-30 08:44:59 - pico-train - INFO - ├── Loss: 5.8823 2025-08-30 08:44:59 - pico-train - INFO - ├── Learning Rate: 2.07e-05 2025-08-30 08:44:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:45:11 - pico-train - INFO - Step 59100 -- 🔄 Training Metrics 2025-08-30 08:45:11 - pico-train - INFO - ├── Loss: 5.8189 2025-08-30 08:45:11 - pico-train - INFO - ├── Learning Rate: 2.07e-05 2025-08-30 08:45:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:45:24 - pico-train - INFO - Step 59125 -- 🔄 Training Metrics 2025-08-30 08:45:24 - pico-train - INFO - ├── Loss: 5.7997 2025-08-30 08:45:24 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-30 08:45:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:45:37 - pico-train - INFO - Step 59150 -- 🔄 Training Metrics 2025-08-30 08:45:37 - pico-train - INFO - ├── Loss: 5.8950 2025-08-30 08:45:37 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-30 08:45:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:45:50 - pico-train - INFO - Step 59175 -- 🔄 Training Metrics 2025-08-30 08:45:50 - pico-train - INFO - ├── Loss: 5.9084 2025-08-30 08:45:50 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-30 08:45:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:46:03 - pico-train - INFO - Step 59200 -- 🔄 Training Metrics 2025-08-30 08:46:03 - pico-train - INFO - ├── Loss: 5.8141 2025-08-30 08:46:03 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-30 08:46:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:46:16 - pico-train - INFO - Step 59225 -- 🔄 Training Metrics 2025-08-30 08:46:16 - pico-train - INFO - ├── Loss: 5.8814 2025-08-30 08:46:16 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-30 08:46:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:46:28 - pico-train - INFO - Step 59250 -- 🔄 Training Metrics 2025-08-30 08:46:28 - pico-train - INFO - ├── Loss: 5.8316 2025-08-30 08:46:28 - pico-train - INFO - ├── Learning Rate: 2.05e-05 2025-08-30 08:46:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:46:41 - pico-train - INFO - Step 59275 -- 🔄 Training Metrics 2025-08-30 08:46:41 - pico-train - INFO - ├── Loss: 5.8489 2025-08-30 08:46:41 - pico-train - INFO - ├── Learning Rate: 2.05e-05 2025-08-30 08:46:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:46:54 - pico-train - INFO - Step 59300 -- 🔄 Training Metrics 2025-08-30 08:46:54 - pico-train - INFO - ├── Loss: 5.7998 2025-08-30 08:46:54 - pico-train - INFO - ├── Learning Rate: 2.05e-05 2025-08-30 08:46:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:47:06 - pico-train - INFO - Step 59325 -- 🔄 Training Metrics 2025-08-30 08:47:06 - pico-train - INFO - ├── Loss: 5.8848 2025-08-30 08:47:06 - pico-train - INFO - ├── Learning Rate: 2.05e-05 2025-08-30 08:47:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:47:19 - pico-train - INFO - Step 59350 -- 🔄 Training Metrics 2025-08-30 08:47:19 - pico-train - INFO - ├── Loss: 5.8543 2025-08-30 08:47:19 - pico-train - INFO - ├── Learning Rate: 2.05e-05 2025-08-30 08:47:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:47:32 - pico-train - INFO - Step 59375 -- 🔄 Training Metrics 2025-08-30 08:47:32 - pico-train - INFO - ├── Loss: 5.8655 2025-08-30 08:47:32 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 08:47:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:47:44 - pico-train - INFO - Step 59400 -- 🔄 Training Metrics 2025-08-30 08:47:44 - pico-train - INFO - ├── Loss: 5.8870 2025-08-30 08:47:44 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 08:47:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:47:57 - pico-train - INFO - Step 59425 -- 🔄 Training Metrics 2025-08-30 08:47:57 - pico-train - INFO - ├── Loss: 5.8000 2025-08-30 08:47:57 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 08:47:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:48:10 - pico-train - INFO - Step 59450 -- 🔄 Training Metrics 2025-08-30 08:48:10 - pico-train - INFO - ├── Loss: 5.8162 2025-08-30 08:48:10 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 08:48:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:48:22 - pico-train - INFO - Step 59475 -- 🔄 Training Metrics 2025-08-30 08:48:22 - pico-train - INFO - ├── Loss: 5.8936 2025-08-30 08:48:22 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 08:48:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:48:35 - pico-train - INFO - Step 59500 -- 💾 Saving Checkpoint 2025-08-30 08:50:28 - pico-train - INFO - Step 59500 -- 📊 Evaluation Results 2025-08-30 08:50:28 - pico-train - INFO - └── paloma: 4.0666865028851395e+30 2025-08-30 08:50:32 - pico-train - INFO - Step 59500 -- 🔄 Training Metrics 2025-08-30 08:50:32 - pico-train - INFO - ├── Loss: 5.8731 2025-08-30 08:50:32 - pico-train - INFO - ├── Learning Rate: 2.03e-05 2025-08-30 08:50:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:50:32 - pico-train - INFO - Step 59500 -- 📈 Saving Learning Dynamics 2025-08-30 08:50:47 - pico-train - INFO - Step 59525 -- 🔄 Training Metrics 2025-08-30 08:50:47 - pico-train - INFO - ├── Loss: 5.9058 2025-08-30 08:50:47 - pico-train - INFO - ├── Learning Rate: 2.03e-05 2025-08-30 08:50:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:51:00 - pico-train - INFO - Step 59550 -- 🔄 Training Metrics 2025-08-30 08:51:00 - pico-train - INFO - ├── Loss: 5.8037 2025-08-30 08:51:00 - pico-train - INFO - ├── Learning Rate: 2.03e-05 2025-08-30 08:51:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:51:12 - pico-train - INFO - Step 59575 -- 🔄 Training Metrics 2025-08-30 08:51:12 - pico-train - INFO - ├── Loss: 5.8553 2025-08-30 08:51:12 - pico-train - INFO - ├── Learning Rate: 2.03e-05 2025-08-30 08:51:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:51:25 - pico-train - INFO - Step 59600 -- 🔄 Training Metrics 2025-08-30 08:51:25 - pico-train - INFO - ├── Loss: 5.8022 2025-08-30 08:51:25 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 08:51:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:51:38 - pico-train - INFO - Step 59625 -- 🔄 Training Metrics 2025-08-30 08:51:38 - pico-train - INFO - ├── Loss: 5.8279 2025-08-30 08:51:38 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 08:51:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:51:51 - pico-train - INFO - Step 59650 -- 🔄 Training Metrics 2025-08-30 08:51:51 - pico-train - INFO - ├── Loss: 5.7732 2025-08-30 08:51:51 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 08:51:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:52:03 - pico-train - INFO - Step 59675 -- 🔄 Training Metrics 2025-08-30 08:52:03 - pico-train - INFO - ├── Loss: 5.8738 2025-08-30 08:52:03 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 08:52:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:52:16 - pico-train - INFO - Step 59700 -- 🔄 Training Metrics 2025-08-30 08:52:16 - pico-train - INFO - ├── Loss: 5.8618 2025-08-30 08:52:16 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 08:52:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:52:29 - pico-train - INFO - Step 59725 -- 🔄 Training Metrics 2025-08-30 08:52:29 - pico-train - INFO - ├── Loss: 5.8423 2025-08-30 08:52:29 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 08:52:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:52:41 - pico-train - INFO - Step 59750 -- 🔄 Training Metrics 2025-08-30 08:52:41 - pico-train - INFO - ├── Loss: 5.9335 2025-08-30 08:52:41 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 08:52:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:52:54 - pico-train - INFO - Step 59775 -- 🔄 Training Metrics 2025-08-30 08:52:54 - pico-train - INFO - ├── Loss: 5.7709 2025-08-30 08:52:54 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 08:52:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:53:07 - pico-train - INFO - Step 59800 -- 🔄 Training Metrics 2025-08-30 08:53:07 - pico-train - INFO - ├── Loss: 5.9237 2025-08-30 08:53:07 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 08:53:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:53:19 - pico-train - INFO - Step 59825 -- 🔄 Training Metrics 2025-08-30 08:53:19 - pico-train - INFO - ├── Loss: 5.9029 2025-08-30 08:53:19 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 08:53:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:53:32 - pico-train - INFO - Step 59850 -- 🔄 Training Metrics 2025-08-30 08:53:32 - pico-train - INFO - ├── Loss: 5.9280 2025-08-30 08:53:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 08:53:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:53:45 - pico-train - INFO - Step 59875 -- 🔄 Training Metrics 2025-08-30 08:53:45 - pico-train - INFO - ├── Loss: 5.8758 2025-08-30 08:53:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 08:53:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:53:58 - pico-train - INFO - Step 59900 -- 🔄 Training Metrics 2025-08-30 08:53:58 - pico-train - INFO - ├── Loss: 5.8195 2025-08-30 08:53:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 08:53:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:54:11 - pico-train - INFO - Step 59925 -- 🔄 Training Metrics 2025-08-30 08:54:11 - pico-train - INFO - ├── Loss: 5.9247 2025-08-30 08:54:11 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 08:54:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:54:23 - pico-train - INFO - Step 59950 -- 🔄 Training Metrics 2025-08-30 08:54:23 - pico-train - INFO - ├── Loss: 5.8941 2025-08-30 08:54:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 08:54:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:54:36 - pico-train - INFO - Step 59975 -- 🔄 Training Metrics 2025-08-30 08:54:36 - pico-train - INFO - ├── Loss: 5.9192 2025-08-30 08:54:36 - pico-train - INFO - ├── Learning Rate: 1.99e-05 2025-08-30 08:54:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:54:48 - pico-train - INFO - Step 60000 -- 💾 Saving Checkpoint 2025-08-30 08:56:42 - pico-train - INFO - Step 60000 -- 📊 Evaluation Results 2025-08-30 08:56:42 - pico-train - INFO - └── paloma: 5.67735563606023e+30 2025-08-30 08:56:44 - pico-train - INFO - Step 60000 -- 🔄 Training Metrics 2025-08-30 08:56:44 - pico-train - INFO - ├── Loss: 5.9175 2025-08-30 08:56:44 - pico-train - INFO - ├── Learning Rate: 1.99e-05 2025-08-30 08:56:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:56:44 - pico-train - INFO - Step 60000 -- 📈 Saving Learning Dynamics 2025-08-30 08:56:59 - pico-train - INFO - Step 60025 -- 🔄 Training Metrics 2025-08-30 08:56:59 - pico-train - INFO - ├── Loss: 5.8005 2025-08-30 08:56:59 - pico-train - INFO - ├── Learning Rate: 1.99e-05 2025-08-30 08:56:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:57:12 - pico-train - INFO - Step 60050 -- 🔄 Training Metrics 2025-08-30 08:57:12 - pico-train - INFO - ├── Loss: 5.8668 2025-08-30 08:57:12 - pico-train - INFO - ├── Learning Rate: 1.99e-05 2025-08-30 08:57:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:57:24 - pico-train - INFO - Step 60075 -- 🔄 Training Metrics 2025-08-30 08:57:24 - pico-train - INFO - ├── Loss: 5.9150 2025-08-30 08:57:24 - pico-train - INFO - ├── Learning Rate: 1.99e-05 2025-08-30 08:57:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:57:37 - pico-train - INFO - Step 60100 -- 🔄 Training Metrics 2025-08-30 08:57:37 - pico-train - INFO - ├── Loss: 5.8577 2025-08-30 08:57:37 - pico-train - INFO - ├── Learning Rate: 1.98e-05 2025-08-30 08:57:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:57:50 - pico-train - INFO - Step 60125 -- 🔄 Training Metrics 2025-08-30 08:57:50 - pico-train - INFO - ├── Loss: 5.9463 2025-08-30 08:57:50 - pico-train - INFO - ├── Learning Rate: 1.98e-05 2025-08-30 08:57:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:58:03 - pico-train - INFO - Step 60150 -- 🔄 Training Metrics 2025-08-30 08:58:03 - pico-train - INFO - ├── Loss: 5.9613 2025-08-30 08:58:03 - pico-train - INFO - ├── Learning Rate: 1.98e-05 2025-08-30 08:58:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:58:15 - pico-train - INFO - Step 60175 -- 🔄 Training Metrics 2025-08-30 08:58:15 - pico-train - INFO - ├── Loss: 5.7742 2025-08-30 08:58:15 - pico-train - INFO - ├── Learning Rate: 1.98e-05 2025-08-30 08:58:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:58:28 - pico-train - INFO - Step 60200 -- 🔄 Training Metrics 2025-08-30 08:58:28 - pico-train - INFO - ├── Loss: 5.9330 2025-08-30 08:58:28 - pico-train - INFO - ├── Learning Rate: 1.97e-05 2025-08-30 08:58:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:58:40 - pico-train - INFO - Step 60225 -- 🔄 Training Metrics 2025-08-30 08:58:40 - pico-train - INFO - ├── Loss: 5.9165 2025-08-30 08:58:40 - pico-train - INFO - ├── Learning Rate: 1.97e-05 2025-08-30 08:58:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:58:53 - pico-train - INFO - Step 60250 -- 🔄 Training Metrics 2025-08-30 08:58:53 - pico-train - INFO - ├── Loss: 5.8891 2025-08-30 08:58:53 - pico-train - INFO - ├── Learning Rate: 1.97e-05 2025-08-30 08:58:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:59:06 - pico-train - INFO - Step 60275 -- 🔄 Training Metrics 2025-08-30 08:59:06 - pico-train - INFO - ├── Loss: 5.8293 2025-08-30 08:59:06 - pico-train - INFO - ├── Learning Rate: 1.97e-05 2025-08-30 08:59:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:59:18 - pico-train - INFO - Step 60300 -- 🔄 Training Metrics 2025-08-30 08:59:18 - pico-train - INFO - ├── Loss: 5.7729 2025-08-30 08:59:18 - pico-train - INFO - ├── Learning Rate: 1.97e-05 2025-08-30 08:59:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:59:31 - pico-train - INFO - Step 60325 -- 🔄 Training Metrics 2025-08-30 08:59:31 - pico-train - INFO - ├── Loss: 5.8043 2025-08-30 08:59:31 - pico-train - INFO - ├── Learning Rate: 1.96e-05 2025-08-30 08:59:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:59:43 - pico-train - INFO - Step 60350 -- 🔄 Training Metrics 2025-08-30 08:59:43 - pico-train - INFO - ├── Loss: 5.8123 2025-08-30 08:59:43 - pico-train - INFO - ├── Learning Rate: 1.96e-05 2025-08-30 08:59:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 08:59:57 - pico-train - INFO - Step 60375 -- 🔄 Training Metrics 2025-08-30 08:59:57 - pico-train - INFO - ├── Loss: 5.9085 2025-08-30 08:59:57 - pico-train - INFO - ├── Learning Rate: 1.96e-05 2025-08-30 08:59:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:00:09 - pico-train - INFO - Step 60400 -- 🔄 Training Metrics 2025-08-30 09:00:09 - pico-train - INFO - ├── Loss: 5.8004 2025-08-30 09:00:09 - pico-train - INFO - ├── Learning Rate: 1.96e-05 2025-08-30 09:00:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:00:22 - pico-train - INFO - Step 60425 -- 🔄 Training Metrics 2025-08-30 09:00:22 - pico-train - INFO - ├── Loss: 5.8664 2025-08-30 09:00:22 - pico-train - INFO - ├── Learning Rate: 1.96e-05 2025-08-30 09:00:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:00:35 - pico-train - INFO - Step 60450 -- 🔄 Training Metrics 2025-08-30 09:00:35 - pico-train - INFO - ├── Loss: 5.8370 2025-08-30 09:00:35 - pico-train - INFO - ├── Learning Rate: 1.95e-05 2025-08-30 09:00:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:00:48 - pico-train - INFO - Step 60475 -- 🔄 Training Metrics 2025-08-30 09:00:48 - pico-train - INFO - ├── Loss: 5.8813 2025-08-30 09:00:48 - pico-train - INFO - ├── Learning Rate: 1.95e-05 2025-08-30 09:00:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:01:01 - pico-train - INFO - Step 60500 -- 💾 Saving Checkpoint 2025-08-30 09:03:00 - pico-train - INFO - Step 60500 -- 📊 Evaluation Results 2025-08-30 09:03:00 - pico-train - INFO - └── paloma: 6.577053610858546e+30 2025-08-30 09:03:04 - pico-train - INFO - Step 60500 -- 🔄 Training Metrics 2025-08-30 09:03:04 - pico-train - INFO - ├── Loss: 5.8644 2025-08-30 09:03:04 - pico-train - INFO - ├── Learning Rate: 1.95e-05 2025-08-30 09:03:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:03:04 - pico-train - INFO - Step 60500 -- 📈 Saving Learning Dynamics 2025-08-30 09:03:20 - pico-train - INFO - Step 60525 -- 🔄 Training Metrics 2025-08-30 09:03:20 - pico-train - INFO - ├── Loss: 5.9048 2025-08-30 09:03:20 - pico-train - INFO - ├── Learning Rate: 1.95e-05 2025-08-30 09:03:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:03:32 - pico-train - INFO - Step 60550 -- 🔄 Training Metrics 2025-08-30 09:03:32 - pico-train - INFO - ├── Loss: 5.8286 2025-08-30 09:03:32 - pico-train - INFO - ├── Learning Rate: 1.95e-05 2025-08-30 09:03:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:03:45 - pico-train - INFO - Step 60575 -- 🔄 Training Metrics 2025-08-30 09:03:45 - pico-train - INFO - ├── Loss: 5.9112 2025-08-30 09:03:45 - pico-train - INFO - ├── Learning Rate: 1.94e-05 2025-08-30 09:03:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:03:58 - pico-train - INFO - Step 60600 -- 🔄 Training Metrics 2025-08-30 09:03:58 - pico-train - INFO - ├── Loss: 5.8445 2025-08-30 09:03:58 - pico-train - INFO - ├── Learning Rate: 1.94e-05 2025-08-30 09:03:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:04:10 - pico-train - INFO - Step 60625 -- 🔄 Training Metrics 2025-08-30 09:04:10 - pico-train - INFO - ├── Loss: 5.8444 2025-08-30 09:04:10 - pico-train - INFO - ├── Learning Rate: 1.94e-05 2025-08-30 09:04:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:04:23 - pico-train - INFO - Step 60650 -- 🔄 Training Metrics 2025-08-30 09:04:23 - pico-train - INFO - ├── Loss: 5.7993 2025-08-30 09:04:23 - pico-train - INFO - ├── Learning Rate: 1.94e-05 2025-08-30 09:04:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:04:36 - pico-train - INFO - Step 60675 -- 🔄 Training Metrics 2025-08-30 09:04:36 - pico-train - INFO - ├── Loss: 5.8188 2025-08-30 09:04:36 - pico-train - INFO - ├── Learning Rate: 1.94e-05 2025-08-30 09:04:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:04:48 - pico-train - INFO - Step 60700 -- 🔄 Training Metrics 2025-08-30 09:04:48 - pico-train - INFO - ├── Loss: 5.8257 2025-08-30 09:04:48 - pico-train - INFO - ├── Learning Rate: 1.93e-05 2025-08-30 09:04:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:05:01 - pico-train - INFO - Step 60725 -- 🔄 Training Metrics 2025-08-30 09:05:01 - pico-train - INFO - ├── Loss: 5.9364 2025-08-30 09:05:01 - pico-train - INFO - ├── Learning Rate: 1.93e-05 2025-08-30 09:05:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:05:14 - pico-train - INFO - Step 60750 -- 🔄 Training Metrics 2025-08-30 09:05:14 - pico-train - INFO - ├── Loss: 5.8968 2025-08-30 09:05:14 - pico-train - INFO - ├── Learning Rate: 1.93e-05 2025-08-30 09:05:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:05:26 - pico-train - INFO - Step 60775 -- 🔄 Training Metrics 2025-08-30 09:05:26 - pico-train - INFO - ├── Loss: 5.7561 2025-08-30 09:05:26 - pico-train - INFO - ├── Learning Rate: 1.93e-05 2025-08-30 09:05:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:05:39 - pico-train - INFO - Step 60800 -- 🔄 Training Metrics 2025-08-30 09:05:39 - pico-train - INFO - ├── Loss: 5.8257 2025-08-30 09:05:39 - pico-train - INFO - ├── Learning Rate: 1.92e-05 2025-08-30 09:05:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:05:52 - pico-train - INFO - Step 60825 -- 🔄 Training Metrics 2025-08-30 09:05:52 - pico-train - INFO - ├── Loss: 5.8018 2025-08-30 09:05:52 - pico-train - INFO - ├── Learning Rate: 1.92e-05 2025-08-30 09:05:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:06:04 - pico-train - INFO - Step 60850 -- 🔄 Training Metrics 2025-08-30 09:06:04 - pico-train - INFO - ├── Loss: 5.8325 2025-08-30 09:06:04 - pico-train - INFO - ├── Learning Rate: 1.92e-05 2025-08-30 09:06:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:06:17 - pico-train - INFO - Step 60875 -- 🔄 Training Metrics 2025-08-30 09:06:17 - pico-train - INFO - ├── Loss: 5.9502 2025-08-30 09:06:17 - pico-train - INFO - ├── Learning Rate: 1.92e-05 2025-08-30 09:06:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:06:30 - pico-train - INFO - Step 60900 -- 🔄 Training Metrics 2025-08-30 09:06:30 - pico-train - INFO - ├── Loss: 5.8632 2025-08-30 09:06:30 - pico-train - INFO - ├── Learning Rate: 1.92e-05 2025-08-30 09:06:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:06:42 - pico-train - INFO - Step 60925 -- 🔄 Training Metrics 2025-08-30 09:06:42 - pico-train - INFO - ├── Loss: 5.7790 2025-08-30 09:06:42 - pico-train - INFO - ├── Learning Rate: 1.91e-05 2025-08-30 09:06:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:06:55 - pico-train - INFO - Step 60950 -- 🔄 Training Metrics 2025-08-30 09:06:55 - pico-train - INFO - ├── Loss: 5.8264 2025-08-30 09:06:55 - pico-train - INFO - ├── Learning Rate: 1.91e-05 2025-08-30 09:06:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:07:08 - pico-train - INFO - Step 60975 -- 🔄 Training Metrics 2025-08-30 09:07:08 - pico-train - INFO - ├── Loss: 5.8425 2025-08-30 09:07:08 - pico-train - INFO - ├── Learning Rate: 1.91e-05 2025-08-30 09:07:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:07:20 - pico-train - INFO - Step 61000 -- 💾 Saving Checkpoint 2025-08-30 09:09:19 - pico-train - INFO - Step 61000 -- 📊 Evaluation Results 2025-08-30 09:09:19 - pico-train - INFO - └── paloma: 7.381800813081388e+30 2025-08-30 09:09:23 - pico-train - INFO - Step 61000 -- 🔄 Training Metrics 2025-08-30 09:09:23 - pico-train - INFO - ├── Loss: 5.8442 2025-08-30 09:09:23 - pico-train - INFO - ├── Learning Rate: 1.91e-05 2025-08-30 09:09:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:09:23 - pico-train - INFO - Step 61000 -- 📈 Saving Learning Dynamics 2025-08-30 09:09:38 - pico-train - INFO - Step 61025 -- 🔄 Training Metrics 2025-08-30 09:09:38 - pico-train - INFO - ├── Loss: 5.9313 2025-08-30 09:09:38 - pico-train - INFO - ├── Learning Rate: 1.91e-05 2025-08-30 09:09:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:09:51 - pico-train - INFO - Step 61050 -- 🔄 Training Metrics 2025-08-30 09:09:51 - pico-train - INFO - ├── Loss: 5.8519 2025-08-30 09:09:51 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-30 09:09:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:10:03 - pico-train - INFO - Step 61075 -- 🔄 Training Metrics 2025-08-30 09:10:03 - pico-train - INFO - ├── Loss: 5.8725 2025-08-30 09:10:03 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-30 09:10:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:10:16 - pico-train - INFO - Step 61100 -- 🔄 Training Metrics 2025-08-30 09:10:16 - pico-train - INFO - ├── Loss: 5.8322 2025-08-30 09:10:16 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-30 09:10:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:10:29 - pico-train - INFO - Step 61125 -- 🔄 Training Metrics 2025-08-30 09:10:29 - pico-train - INFO - ├── Loss: 5.8354 2025-08-30 09:10:29 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-30 09:10:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:10:41 - pico-train - INFO - Step 61150 -- 🔄 Training Metrics 2025-08-30 09:10:41 - pico-train - INFO - ├── Loss: 5.8735 2025-08-30 09:10:41 - pico-train - INFO - ├── Learning Rate: 1.90e-05 2025-08-30 09:10:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:10:54 - pico-train - INFO - Step 61175 -- 🔄 Training Metrics 2025-08-30 09:10:54 - pico-train - INFO - ├── Loss: 5.9433 2025-08-30 09:10:54 - pico-train - INFO - ├── Learning Rate: 1.89e-05 2025-08-30 09:10:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:11:06 - pico-train - INFO - Step 61200 -- 🔄 Training Metrics 2025-08-30 09:11:06 - pico-train - INFO - ├── Loss: 5.8394 2025-08-30 09:11:06 - pico-train - INFO - ├── Learning Rate: 1.89e-05 2025-08-30 09:11:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:11:19 - pico-train - INFO - Step 61225 -- 🔄 Training Metrics 2025-08-30 09:11:19 - pico-train - INFO - ├── Loss: 5.9396 2025-08-30 09:11:19 - pico-train - INFO - ├── Learning Rate: 1.89e-05 2025-08-30 09:11:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:11:32 - pico-train - INFO - Step 61250 -- 🔄 Training Metrics 2025-08-30 09:11:32 - pico-train - INFO - ├── Loss: 5.8461 2025-08-30 09:11:32 - pico-train - INFO - ├── Learning Rate: 1.89e-05 2025-08-30 09:11:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:11:44 - pico-train - INFO - Step 61275 -- 🔄 Training Metrics 2025-08-30 09:11:44 - pico-train - INFO - ├── Loss: 5.9137 2025-08-30 09:11:44 - pico-train - INFO - ├── Learning Rate: 1.89e-05 2025-08-30 09:11:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:11:57 - pico-train - INFO - Step 61300 -- 🔄 Training Metrics 2025-08-30 09:11:57 - pico-train - INFO - ├── Loss: 5.8249 2025-08-30 09:11:57 - pico-train - INFO - ├── Learning Rate: 1.88e-05 2025-08-30 09:11:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:12:10 - pico-train - INFO - Step 61325 -- 🔄 Training Metrics 2025-08-30 09:12:10 - pico-train - INFO - ├── Loss: 5.8248 2025-08-30 09:12:10 - pico-train - INFO - ├── Learning Rate: 1.88e-05 2025-08-30 09:12:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:12:22 - pico-train - INFO - Step 61350 -- 🔄 Training Metrics 2025-08-30 09:12:22 - pico-train - INFO - ├── Loss: 5.8349 2025-08-30 09:12:22 - pico-train - INFO - ├── Learning Rate: 1.88e-05 2025-08-30 09:12:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:12:35 - pico-train - INFO - Step 61375 -- 🔄 Training Metrics 2025-08-30 09:12:35 - pico-train - INFO - ├── Loss: 5.8265 2025-08-30 09:12:35 - pico-train - INFO - ├── Learning Rate: 1.88e-05 2025-08-30 09:12:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:12:47 - pico-train - INFO - Step 61400 -- 🔄 Training Metrics 2025-08-30 09:12:47 - pico-train - INFO - ├── Loss: 5.8919 2025-08-30 09:12:47 - pico-train - INFO - ├── Learning Rate: 1.87e-05 2025-08-30 09:12:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:13:00 - pico-train - INFO - Step 61425 -- 🔄 Training Metrics 2025-08-30 09:13:00 - pico-train - INFO - ├── Loss: 5.8929 2025-08-30 09:13:00 - pico-train - INFO - ├── Learning Rate: 1.87e-05 2025-08-30 09:13:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:13:13 - pico-train - INFO - Step 61450 -- 🔄 Training Metrics 2025-08-30 09:13:13 - pico-train - INFO - ├── Loss: 5.8063 2025-08-30 09:13:13 - pico-train - INFO - ├── Learning Rate: 1.87e-05 2025-08-30 09:13:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:13:25 - pico-train - INFO - Step 61475 -- 🔄 Training Metrics 2025-08-30 09:13:25 - pico-train - INFO - ├── Loss: 5.8834 2025-08-30 09:13:25 - pico-train - INFO - ├── Learning Rate: 1.87e-05 2025-08-30 09:13:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:13:38 - pico-train - INFO - Step 61500 -- 💾 Saving Checkpoint 2025-08-30 09:15:52 - pico-train - INFO - Step 61500 -- 📊 Evaluation Results 2025-08-30 09:15:52 - pico-train - INFO - └── paloma: 7.5580512131553e+30 2025-08-30 09:15:55 - pico-train - INFO - Step 61500 -- 🔄 Training Metrics 2025-08-30 09:15:55 - pico-train - INFO - ├── Loss: 5.8274 2025-08-30 09:15:55 - pico-train - INFO - ├── Learning Rate: 1.87e-05 2025-08-30 09:15:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:15:55 - pico-train - INFO - Step 61500 -- 📈 Saving Learning Dynamics 2025-08-30 09:16:20 - pico-train - INFO - Step 61525 -- 🔄 Training Metrics 2025-08-30 09:16:20 - pico-train - INFO - ├── Loss: 5.8780 2025-08-30 09:16:20 - pico-train - INFO - ├── Learning Rate: 1.86e-05 2025-08-30 09:16:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:16:33 - pico-train - INFO - Step 61550 -- 🔄 Training Metrics 2025-08-30 09:16:33 - pico-train - INFO - ├── Loss: 5.8784 2025-08-30 09:16:33 - pico-train - INFO - ├── Learning Rate: 1.86e-05 2025-08-30 09:16:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:16:45 - pico-train - INFO - Step 61575 -- 🔄 Training Metrics 2025-08-30 09:16:45 - pico-train - INFO - ├── Loss: 5.8547 2025-08-30 09:16:45 - pico-train - INFO - ├── Learning Rate: 1.86e-05 2025-08-30 09:16:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:16:58 - pico-train - INFO - Step 61600 -- 🔄 Training Metrics 2025-08-30 09:16:58 - pico-train - INFO - ├── Loss: 5.8624 2025-08-30 09:16:58 - pico-train - INFO - ├── Learning Rate: 1.86e-05 2025-08-30 09:16:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:17:11 - pico-train - INFO - Step 61625 -- 🔄 Training Metrics 2025-08-30 09:17:11 - pico-train - INFO - ├── Loss: 5.9047 2025-08-30 09:17:11 - pico-train - INFO - ├── Learning Rate: 1.86e-05 2025-08-30 09:17:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:17:24 - pico-train - INFO - Step 61650 -- 🔄 Training Metrics 2025-08-30 09:17:24 - pico-train - INFO - ├── Loss: 5.8888 2025-08-30 09:17:24 - pico-train - INFO - ├── Learning Rate: 1.85e-05 2025-08-30 09:17:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:17:36 - pico-train - INFO - Step 61675 -- 🔄 Training Metrics 2025-08-30 09:17:36 - pico-train - INFO - ├── Loss: 5.8195 2025-08-30 09:17:36 - pico-train - INFO - ├── Learning Rate: 1.85e-05 2025-08-30 09:17:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:17:49 - pico-train - INFO - Step 61700 -- 🔄 Training Metrics 2025-08-30 09:17:49 - pico-train - INFO - ├── Loss: 5.8452 2025-08-30 09:17:49 - pico-train - INFO - ├── Learning Rate: 1.85e-05 2025-08-30 09:17:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:18:02 - pico-train - INFO - Step 61725 -- 🔄 Training Metrics 2025-08-30 09:18:02 - pico-train - INFO - ├── Loss: 5.9150 2025-08-30 09:18:02 - pico-train - INFO - ├── Learning Rate: 1.85e-05 2025-08-30 09:18:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:18:14 - pico-train - INFO - Step 61750 -- 🔄 Training Metrics 2025-08-30 09:18:14 - pico-train - INFO - ├── Loss: 5.7953 2025-08-30 09:18:14 - pico-train - INFO - ├── Learning Rate: 1.85e-05 2025-08-30 09:18:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:18:27 - pico-train - INFO - Step 61775 -- 🔄 Training Metrics 2025-08-30 09:18:27 - pico-train - INFO - ├── Loss: 5.8075 2025-08-30 09:18:27 - pico-train - INFO - ├── Learning Rate: 1.84e-05 2025-08-30 09:18:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:18:40 - pico-train - INFO - Step 61800 -- 🔄 Training Metrics 2025-08-30 09:18:40 - pico-train - INFO - ├── Loss: 5.8305 2025-08-30 09:18:40 - pico-train - INFO - ├── Learning Rate: 1.84e-05 2025-08-30 09:18:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:18:53 - pico-train - INFO - Step 61825 -- 🔄 Training Metrics 2025-08-30 09:18:53 - pico-train - INFO - ├── Loss: 5.8460 2025-08-30 09:18:53 - pico-train - INFO - ├── Learning Rate: 1.84e-05 2025-08-30 09:18:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:19:05 - pico-train - INFO - Step 61850 -- 🔄 Training Metrics 2025-08-30 09:19:05 - pico-train - INFO - ├── Loss: 5.9274 2025-08-30 09:19:05 - pico-train - INFO - ├── Learning Rate: 1.84e-05 2025-08-30 09:19:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:19:18 - pico-train - INFO - Step 61875 -- 🔄 Training Metrics 2025-08-30 09:19:18 - pico-train - INFO - ├── Loss: 5.8535 2025-08-30 09:19:18 - pico-train - INFO - ├── Learning Rate: 1.84e-05 2025-08-30 09:19:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:19:31 - pico-train - INFO - Step 61900 -- 🔄 Training Metrics 2025-08-30 09:19:31 - pico-train - INFO - ├── Loss: 5.8254 2025-08-30 09:19:31 - pico-train - INFO - ├── Learning Rate: 1.83e-05 2025-08-30 09:19:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:19:44 - pico-train - INFO - Step 61925 -- 🔄 Training Metrics 2025-08-30 09:19:44 - pico-train - INFO - ├── Loss: 5.6957 2025-08-30 09:19:44 - pico-train - INFO - ├── Learning Rate: 1.83e-05 2025-08-30 09:19:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:19:56 - pico-train - INFO - Step 61950 -- 🔄 Training Metrics 2025-08-30 09:19:56 - pico-train - INFO - ├── Loss: 5.8474 2025-08-30 09:19:56 - pico-train - INFO - ├── Learning Rate: 1.83e-05 2025-08-30 09:19:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:20:09 - pico-train - INFO - Step 61975 -- 🔄 Training Metrics 2025-08-30 09:20:09 - pico-train - INFO - ├── Loss: 5.8588 2025-08-30 09:20:09 - pico-train - INFO - ├── Learning Rate: 1.83e-05 2025-08-30 09:20:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:20:26 - pico-train - INFO - Step 62000 -- 💾 Saving Checkpoint 2025-08-30 09:22:33 - pico-train - INFO - Step 62000 -- 📊 Evaluation Results 2025-08-30 09:22:33 - pico-train - INFO - └── paloma: 1.0115134118607476e+31 2025-08-30 09:22:37 - pico-train - INFO - Step 62000 -- 🔄 Training Metrics 2025-08-30 09:22:37 - pico-train - INFO - ├── Loss: 5.8579 2025-08-30 09:22:37 - pico-train - INFO - ├── Learning Rate: 1.83e-05 2025-08-30 09:22:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:22:37 - pico-train - INFO - Step 62000 -- 📈 Saving Learning Dynamics 2025-08-30 09:22:52 - pico-train - INFO - Step 62025 -- 🔄 Training Metrics 2025-08-30 09:22:52 - pico-train - INFO - ├── Loss: 5.8263 2025-08-30 09:22:52 - pico-train - INFO - ├── Learning Rate: 1.82e-05 2025-08-30 09:22:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:23:05 - pico-train - INFO - Step 62050 -- 🔄 Training Metrics 2025-08-30 09:23:05 - pico-train - INFO - ├── Loss: 5.8617 2025-08-30 09:23:05 - pico-train - INFO - ├── Learning Rate: 1.82e-05 2025-08-30 09:23:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:23:18 - pico-train - INFO - Step 62075 -- 🔄 Training Metrics 2025-08-30 09:23:18 - pico-train - INFO - ├── Loss: 5.8762 2025-08-30 09:23:18 - pico-train - INFO - ├── Learning Rate: 1.82e-05 2025-08-30 09:23:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:23:30 - pico-train - INFO - Step 62100 -- 🔄 Training Metrics 2025-08-30 09:23:30 - pico-train - INFO - ├── Loss: 5.8857 2025-08-30 09:23:30 - pico-train - INFO - ├── Learning Rate: 1.82e-05 2025-08-30 09:23:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:23:43 - pico-train - INFO - Step 62125 -- 🔄 Training Metrics 2025-08-30 09:23:43 - pico-train - INFO - ├── Loss: 5.7406 2025-08-30 09:23:43 - pico-train - INFO - ├── Learning Rate: 1.82e-05 2025-08-30 09:23:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:23:55 - pico-train - INFO - Step 62150 -- 🔄 Training Metrics 2025-08-30 09:23:55 - pico-train - INFO - ├── Loss: 5.8648 2025-08-30 09:23:55 - pico-train - INFO - ├── Learning Rate: 1.81e-05 2025-08-30 09:23:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:24:08 - pico-train - INFO - Step 62175 -- 🔄 Training Metrics 2025-08-30 09:24:08 - pico-train - INFO - ├── Loss: 5.8611 2025-08-30 09:24:08 - pico-train - INFO - ├── Learning Rate: 1.81e-05 2025-08-30 09:24:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:24:21 - pico-train - INFO - Step 62200 -- 🔄 Training Metrics 2025-08-30 09:24:21 - pico-train - INFO - ├── Loss: 5.8327 2025-08-30 09:24:21 - pico-train - INFO - ├── Learning Rate: 1.81e-05 2025-08-30 09:24:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:24:33 - pico-train - INFO - Step 62225 -- 🔄 Training Metrics 2025-08-30 09:24:33 - pico-train - INFO - ├── Loss: 5.8680 2025-08-30 09:24:33 - pico-train - INFO - ├── Learning Rate: 1.81e-05 2025-08-30 09:24:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:24:46 - pico-train - INFO - Step 62250 -- 🔄 Training Metrics 2025-08-30 09:24:46 - pico-train - INFO - ├── Loss: 5.8013 2025-08-30 09:24:46 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-30 09:24:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:24:58 - pico-train - INFO - Step 62275 -- 🔄 Training Metrics 2025-08-30 09:24:58 - pico-train - INFO - ├── Loss: 5.7716 2025-08-30 09:24:58 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-30 09:24:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:25:11 - pico-train - INFO - Step 62300 -- 🔄 Training Metrics 2025-08-30 09:25:11 - pico-train - INFO - ├── Loss: 5.8227 2025-08-30 09:25:11 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-30 09:25:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:25:24 - pico-train - INFO - Step 62325 -- 🔄 Training Metrics 2025-08-30 09:25:24 - pico-train - INFO - ├── Loss: 5.8460 2025-08-30 09:25:24 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-30 09:25:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:25:37 - pico-train - INFO - Step 62350 -- 🔄 Training Metrics 2025-08-30 09:25:37 - pico-train - INFO - ├── Loss: 5.8503 2025-08-30 09:25:37 - pico-train - INFO - ├── Learning Rate: 1.80e-05 2025-08-30 09:25:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:25:49 - pico-train - INFO - Step 62375 -- 🔄 Training Metrics 2025-08-30 09:25:49 - pico-train - INFO - ├── Loss: 5.7188 2025-08-30 09:25:49 - pico-train - INFO - ├── Learning Rate: 1.79e-05 2025-08-30 09:25:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:26:02 - pico-train - INFO - Step 62400 -- 🔄 Training Metrics 2025-08-30 09:26:02 - pico-train - INFO - ├── Loss: 5.8399 2025-08-30 09:26:02 - pico-train - INFO - ├── Learning Rate: 1.79e-05 2025-08-30 09:26:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:26:15 - pico-train - INFO - Step 62425 -- 🔄 Training Metrics 2025-08-30 09:26:15 - pico-train - INFO - ├── Loss: 5.8522 2025-08-30 09:26:15 - pico-train - INFO - ├── Learning Rate: 1.79e-05 2025-08-30 09:26:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:26:27 - pico-train - INFO - Step 62450 -- 🔄 Training Metrics 2025-08-30 09:26:27 - pico-train - INFO - ├── Loss: 5.8175 2025-08-30 09:26:27 - pico-train - INFO - ├── Learning Rate: 1.79e-05 2025-08-30 09:26:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:26:40 - pico-train - INFO - Step 62475 -- 🔄 Training Metrics 2025-08-30 09:26:40 - pico-train - INFO - ├── Loss: 5.9304 2025-08-30 09:26:40 - pico-train - INFO - ├── Learning Rate: 1.79e-05 2025-08-30 09:26:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:26:53 - pico-train - INFO - Step 62500 -- 💾 Saving Checkpoint 2025-08-30 09:28:59 - pico-train - INFO - Step 62500 -- 📊 Evaluation Results 2025-08-30 09:28:59 - pico-train - INFO - └── paloma: 1.026584430453375e+31 2025-08-30 09:29:02 - pico-train - INFO - Step 62500 -- 🔄 Training Metrics 2025-08-30 09:29:02 - pico-train - INFO - ├── Loss: 5.9047 2025-08-30 09:29:02 - pico-train - INFO - ├── Learning Rate: 1.78e-05 2025-08-30 09:29:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:29:02 - pico-train - INFO - Step 62500 -- 📈 Saving Learning Dynamics 2025-08-30 09:29:17 - pico-train - INFO - Step 62525 -- 🔄 Training Metrics 2025-08-30 09:29:17 - pico-train - INFO - ├── Loss: 5.8436 2025-08-30 09:29:17 - pico-train - INFO - ├── Learning Rate: 1.78e-05 2025-08-30 09:29:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:29:29 - pico-train - INFO - Step 62550 -- 🔄 Training Metrics 2025-08-30 09:29:29 - pico-train - INFO - ├── Loss: 5.8456 2025-08-30 09:29:29 - pico-train - INFO - ├── Learning Rate: 1.78e-05 2025-08-30 09:29:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:29:42 - pico-train - INFO - Step 62575 -- 🔄 Training Metrics 2025-08-30 09:29:42 - pico-train - INFO - ├── Loss: 5.8538 2025-08-30 09:29:42 - pico-train - INFO - ├── Learning Rate: 1.78e-05 2025-08-30 09:29:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:29:55 - pico-train - INFO - Step 62600 -- 🔄 Training Metrics 2025-08-30 09:29:55 - pico-train - INFO - ├── Loss: 5.9303 2025-08-30 09:29:55 - pico-train - INFO - ├── Learning Rate: 1.78e-05 2025-08-30 09:29:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:30:08 - pico-train - INFO - Step 62625 -- 🔄 Training Metrics 2025-08-30 09:30:08 - pico-train - INFO - ├── Loss: 5.8303 2025-08-30 09:30:08 - pico-train - INFO - ├── Learning Rate: 1.77e-05 2025-08-30 09:30:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:30:21 - pico-train - INFO - Step 62650 -- 🔄 Training Metrics 2025-08-30 09:30:21 - pico-train - INFO - ├── Loss: 5.8259 2025-08-30 09:30:21 - pico-train - INFO - ├── Learning Rate: 1.77e-05 2025-08-30 09:30:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:30:33 - pico-train - INFO - Step 62675 -- 🔄 Training Metrics 2025-08-30 09:30:33 - pico-train - INFO - ├── Loss: 5.8603 2025-08-30 09:30:33 - pico-train - INFO - ├── Learning Rate: 1.77e-05 2025-08-30 09:30:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:30:46 - pico-train - INFO - Step 62700 -- 🔄 Training Metrics 2025-08-30 09:30:46 - pico-train - INFO - ├── Loss: 5.8287 2025-08-30 09:30:46 - pico-train - INFO - ├── Learning Rate: 1.77e-05 2025-08-30 09:30:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:30:59 - pico-train - INFO - Step 62725 -- 🔄 Training Metrics 2025-08-30 09:30:59 - pico-train - INFO - ├── Loss: 5.8268 2025-08-30 09:30:59 - pico-train - INFO - ├── Learning Rate: 1.77e-05 2025-08-30 09:30:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:31:12 - pico-train - INFO - Step 62750 -- 🔄 Training Metrics 2025-08-30 09:31:12 - pico-train - INFO - ├── Loss: 5.8671 2025-08-30 09:31:12 - pico-train - INFO - ├── Learning Rate: 1.76e-05 2025-08-30 09:31:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:31:25 - pico-train - INFO - Step 62775 -- 🔄 Training Metrics 2025-08-30 09:31:25 - pico-train - INFO - ├── Loss: 5.7714 2025-08-30 09:31:25 - pico-train - INFO - ├── Learning Rate: 1.76e-05 2025-08-30 09:31:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:31:37 - pico-train - INFO - Step 62800 -- 🔄 Training Metrics 2025-08-30 09:31:37 - pico-train - INFO - ├── Loss: 5.8034 2025-08-30 09:31:37 - pico-train - INFO - ├── Learning Rate: 1.76e-05 2025-08-30 09:31:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:31:50 - pico-train - INFO - Step 62825 -- 🔄 Training Metrics 2025-08-30 09:31:50 - pico-train - INFO - ├── Loss: 5.8833 2025-08-30 09:31:50 - pico-train - INFO - ├── Learning Rate: 1.76e-05 2025-08-30 09:31:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:32:03 - pico-train - INFO - Step 62850 -- 🔄 Training Metrics 2025-08-30 09:32:03 - pico-train - INFO - ├── Loss: 5.7885 2025-08-30 09:32:03 - pico-train - INFO - ├── Learning Rate: 1.76e-05 2025-08-30 09:32:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:32:15 - pico-train - INFO - Step 62875 -- 🔄 Training Metrics 2025-08-30 09:32:15 - pico-train - INFO - ├── Loss: 5.8884 2025-08-30 09:32:15 - pico-train - INFO - ├── Learning Rate: 1.75e-05 2025-08-30 09:32:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:32:28 - pico-train - INFO - Step 62900 -- 🔄 Training Metrics 2025-08-30 09:32:28 - pico-train - INFO - ├── Loss: 5.7919 2025-08-30 09:32:28 - pico-train - INFO - ├── Learning Rate: 1.75e-05 2025-08-30 09:32:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:32:41 - pico-train - INFO - Step 62925 -- 🔄 Training Metrics 2025-08-30 09:32:41 - pico-train - INFO - ├── Loss: 5.8612 2025-08-30 09:32:41 - pico-train - INFO - ├── Learning Rate: 1.75e-05 2025-08-30 09:32:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:32:54 - pico-train - INFO - Step 62950 -- 🔄 Training Metrics 2025-08-30 09:32:54 - pico-train - INFO - ├── Loss: 5.7049 2025-08-30 09:32:54 - pico-train - INFO - ├── Learning Rate: 1.75e-05 2025-08-30 09:32:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:33:06 - pico-train - INFO - Step 62975 -- 🔄 Training Metrics 2025-08-30 09:33:06 - pico-train - INFO - ├── Loss: 5.8447 2025-08-30 09:33:06 - pico-train - INFO - ├── Learning Rate: 1.75e-05 2025-08-30 09:33:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:33:19 - pico-train - INFO - Step 63000 -- 💾 Saving Checkpoint 2025-08-30 09:35:29 - pico-train - INFO - Step 63000 -- 📊 Evaluation Results 2025-08-30 09:35:29 - pico-train - INFO - └── paloma: 1.053901252110863e+31 2025-08-30 09:35:31 - pico-train - INFO - Step 63000 -- 🔄 Training Metrics 2025-08-30 09:35:31 - pico-train - INFO - ├── Loss: 5.8600 2025-08-30 09:35:31 - pico-train - INFO - ├── Learning Rate: 1.74e-05 2025-08-30 09:35:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:35:31 - pico-train - INFO - Step 63000 -- 📈 Saving Learning Dynamics 2025-08-30 09:35:46 - pico-train - INFO - Step 63025 -- 🔄 Training Metrics 2025-08-30 09:35:46 - pico-train - INFO - ├── Loss: 5.8323 2025-08-30 09:35:46 - pico-train - INFO - ├── Learning Rate: 1.74e-05 2025-08-30 09:35:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:35:59 - pico-train - INFO - Step 63050 -- 🔄 Training Metrics 2025-08-30 09:35:59 - pico-train - INFO - ├── Loss: 5.7825 2025-08-30 09:35:59 - pico-train - INFO - ├── Learning Rate: 1.74e-05 2025-08-30 09:35:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:36:12 - pico-train - INFO - Step 63075 -- 🔄 Training Metrics 2025-08-30 09:36:12 - pico-train - INFO - ├── Loss: 5.8469 2025-08-30 09:36:12 - pico-train - INFO - ├── Learning Rate: 1.74e-05 2025-08-30 09:36:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:36:24 - pico-train - INFO - Step 63100 -- 🔄 Training Metrics 2025-08-30 09:36:24 - pico-train - INFO - ├── Loss: 5.8636 2025-08-30 09:36:24 - pico-train - INFO - ├── Learning Rate: 1.74e-05 2025-08-30 09:36:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:36:37 - pico-train - INFO - Step 63125 -- 🔄 Training Metrics 2025-08-30 09:36:37 - pico-train - INFO - ├── Loss: 5.8131 2025-08-30 09:36:37 - pico-train - INFO - ├── Learning Rate: 1.73e-05 2025-08-30 09:36:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:36:50 - pico-train - INFO - Step 63150 -- 🔄 Training Metrics 2025-08-30 09:36:50 - pico-train - INFO - ├── Loss: 5.8570 2025-08-30 09:36:50 - pico-train - INFO - ├── Learning Rate: 1.73e-05 2025-08-30 09:36:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:37:02 - pico-train - INFO - Step 63175 -- 🔄 Training Metrics 2025-08-30 09:37:02 - pico-train - INFO - ├── Loss: 5.9120 2025-08-30 09:37:02 - pico-train - INFO - ├── Learning Rate: 1.73e-05 2025-08-30 09:37:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:37:15 - pico-train - INFO - Step 63200 -- 🔄 Training Metrics 2025-08-30 09:37:15 - pico-train - INFO - ├── Loss: 5.7894 2025-08-30 09:37:15 - pico-train - INFO - ├── Learning Rate: 1.73e-05 2025-08-30 09:37:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:37:28 - pico-train - INFO - Step 63225 -- 🔄 Training Metrics 2025-08-30 09:37:28 - pico-train - INFO - ├── Loss: 5.7796 2025-08-30 09:37:28 - pico-train - INFO - ├── Learning Rate: 1.73e-05 2025-08-30 09:37:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:37:41 - pico-train - INFO - Step 63250 -- 🔄 Training Metrics 2025-08-30 09:37:41 - pico-train - INFO - ├── Loss: 5.7788 2025-08-30 09:37:41 - pico-train - INFO - ├── Learning Rate: 1.72e-05 2025-08-30 09:37:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:37:53 - pico-train - INFO - Step 63275 -- 🔄 Training Metrics 2025-08-30 09:37:53 - pico-train - INFO - ├── Loss: 5.9341 2025-08-30 09:37:53 - pico-train - INFO - ├── Learning Rate: 1.72e-05 2025-08-30 09:37:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:38:06 - pico-train - INFO - Step 63300 -- 🔄 Training Metrics 2025-08-30 09:38:06 - pico-train - INFO - ├── Loss: 5.7428 2025-08-30 09:38:06 - pico-train - INFO - ├── Learning Rate: 1.72e-05 2025-08-30 09:38:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:38:19 - pico-train - INFO - Step 63325 -- 🔄 Training Metrics 2025-08-30 09:38:19 - pico-train - INFO - ├── Loss: 5.8475 2025-08-30 09:38:19 - pico-train - INFO - ├── Learning Rate: 1.72e-05 2025-08-30 09:38:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:38:31 - pico-train - INFO - Step 63350 -- 🔄 Training Metrics 2025-08-30 09:38:31 - pico-train - INFO - ├── Loss: 5.8675 2025-08-30 09:38:31 - pico-train - INFO - ├── Learning Rate: 1.72e-05 2025-08-30 09:38:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:38:44 - pico-train - INFO - Step 63375 -- 🔄 Training Metrics 2025-08-30 09:38:44 - pico-train - INFO - ├── Loss: 5.8387 2025-08-30 09:38:44 - pico-train - INFO - ├── Learning Rate: 1.71e-05 2025-08-30 09:38:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:38:57 - pico-train - INFO - Step 63400 -- 🔄 Training Metrics 2025-08-30 09:38:57 - pico-train - INFO - ├── Loss: 5.8082 2025-08-30 09:38:57 - pico-train - INFO - ├── Learning Rate: 1.71e-05 2025-08-30 09:38:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:39:09 - pico-train - INFO - Step 63425 -- 🔄 Training Metrics 2025-08-30 09:39:09 - pico-train - INFO - ├── Loss: 5.8823 2025-08-30 09:39:09 - pico-train - INFO - ├── Learning Rate: 1.71e-05 2025-08-30 09:39:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:39:22 - pico-train - INFO - Step 63450 -- 🔄 Training Metrics 2025-08-30 09:39:22 - pico-train - INFO - ├── Loss: 5.8131 2025-08-30 09:39:22 - pico-train - INFO - ├── Learning Rate: 1.71e-05 2025-08-30 09:39:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:39:35 - pico-train - INFO - Step 63475 -- 🔄 Training Metrics 2025-08-30 09:39:35 - pico-train - INFO - ├── Loss: 5.8368 2025-08-30 09:39:35 - pico-train - INFO - ├── Learning Rate: 1.71e-05 2025-08-30 09:39:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:39:48 - pico-train - INFO - Step 63500 -- 💾 Saving Checkpoint 2025-08-30 09:41:56 - pico-train - INFO - Step 63500 -- 📊 Evaluation Results 2025-08-30 09:41:56 - pico-train - INFO - └── paloma: 1.3798321560822609e+31 2025-08-30 09:41:59 - pico-train - INFO - Step 63500 -- 🔄 Training Metrics 2025-08-30 09:41:59 - pico-train - INFO - ├── Loss: 5.8774 2025-08-30 09:41:59 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-30 09:41:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:41:59 - pico-train - INFO - Step 63500 -- 📈 Saving Learning Dynamics 2025-08-30 09:42:14 - pico-train - INFO - Step 63525 -- 🔄 Training Metrics 2025-08-30 09:42:14 - pico-train - INFO - ├── Loss: 5.8403 2025-08-30 09:42:14 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-30 09:42:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:42:27 - pico-train - INFO - Step 63550 -- 🔄 Training Metrics 2025-08-30 09:42:27 - pico-train - INFO - ├── Loss: 5.8268 2025-08-30 09:42:27 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-30 09:42:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:42:40 - pico-train - INFO - Step 63575 -- 🔄 Training Metrics 2025-08-30 09:42:40 - pico-train - INFO - ├── Loss: 5.8713 2025-08-30 09:42:40 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-30 09:42:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:42:52 - pico-train - INFO - Step 63600 -- 🔄 Training Metrics 2025-08-30 09:42:52 - pico-train - INFO - ├── Loss: 5.9887 2025-08-30 09:42:52 - pico-train - INFO - ├── Learning Rate: 1.70e-05 2025-08-30 09:42:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:43:05 - pico-train - INFO - Step 63625 -- 🔄 Training Metrics 2025-08-30 09:43:05 - pico-train - INFO - ├── Loss: 5.7719 2025-08-30 09:43:05 - pico-train - INFO - ├── Learning Rate: 1.69e-05 2025-08-30 09:43:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:43:18 - pico-train - INFO - Step 63650 -- 🔄 Training Metrics 2025-08-30 09:43:18 - pico-train - INFO - ├── Loss: 5.9020 2025-08-30 09:43:18 - pico-train - INFO - ├── Learning Rate: 1.69e-05 2025-08-30 09:43:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:43:30 - pico-train - INFO - Step 63675 -- 🔄 Training Metrics 2025-08-30 09:43:30 - pico-train - INFO - ├── Loss: 5.7964 2025-08-30 09:43:30 - pico-train - INFO - ├── Learning Rate: 1.69e-05 2025-08-30 09:43:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:43:43 - pico-train - INFO - Step 63700 -- 🔄 Training Metrics 2025-08-30 09:43:43 - pico-train - INFO - ├── Loss: 5.7920 2025-08-30 09:43:43 - pico-train - INFO - ├── Learning Rate: 1.69e-05 2025-08-30 09:43:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:43:55 - pico-train - INFO - Step 63725 -- 🔄 Training Metrics 2025-08-30 09:43:55 - pico-train - INFO - ├── Loss: 5.7781 2025-08-30 09:43:55 - pico-train - INFO - ├── Learning Rate: 1.68e-05 2025-08-30 09:43:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:44:08 - pico-train - INFO - Step 63750 -- 🔄 Training Metrics 2025-08-30 09:44:08 - pico-train - INFO - ├── Loss: 5.8701 2025-08-30 09:44:08 - pico-train - INFO - ├── Learning Rate: 1.68e-05 2025-08-30 09:44:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:44:21 - pico-train - INFO - Step 63775 -- 🔄 Training Metrics 2025-08-30 09:44:21 - pico-train - INFO - ├── Loss: 5.7957 2025-08-30 09:44:21 - pico-train - INFO - ├── Learning Rate: 1.68e-05 2025-08-30 09:44:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:44:33 - pico-train - INFO - Step 63800 -- 🔄 Training Metrics 2025-08-30 09:44:33 - pico-train - INFO - ├── Loss: 5.8493 2025-08-30 09:44:33 - pico-train - INFO - ├── Learning Rate: 1.68e-05 2025-08-30 09:44:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:44:46 - pico-train - INFO - Step 63825 -- 🔄 Training Metrics 2025-08-30 09:44:46 - pico-train - INFO - ├── Loss: 5.8591 2025-08-30 09:44:46 - pico-train - INFO - ├── Learning Rate: 1.68e-05 2025-08-30 09:44:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:44:59 - pico-train - INFO - Step 63850 -- 🔄 Training Metrics 2025-08-30 09:44:59 - pico-train - INFO - ├── Loss: 5.9283 2025-08-30 09:44:59 - pico-train - INFO - ├── Learning Rate: 1.67e-05 2025-08-30 09:44:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:45:12 - pico-train - INFO - Step 63875 -- 🔄 Training Metrics 2025-08-30 09:45:12 - pico-train - INFO - ├── Loss: 5.8760 2025-08-30 09:45:12 - pico-train - INFO - ├── Learning Rate: 1.67e-05 2025-08-30 09:45:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:45:24 - pico-train - INFO - Step 63900 -- 🔄 Training Metrics 2025-08-30 09:45:24 - pico-train - INFO - ├── Loss: 5.8496 2025-08-30 09:45:24 - pico-train - INFO - ├── Learning Rate: 1.67e-05 2025-08-30 09:45:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:45:37 - pico-train - INFO - Step 63925 -- 🔄 Training Metrics 2025-08-30 09:45:37 - pico-train - INFO - ├── Loss: 5.7896 2025-08-30 09:45:37 - pico-train - INFO - ├── Learning Rate: 1.67e-05 2025-08-30 09:45:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:45:50 - pico-train - INFO - Step 63950 -- 🔄 Training Metrics 2025-08-30 09:45:50 - pico-train - INFO - ├── Loss: 5.8621 2025-08-30 09:45:50 - pico-train - INFO - ├── Learning Rate: 1.67e-05 2025-08-30 09:45:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:46:02 - pico-train - INFO - Step 63975 -- 🔄 Training Metrics 2025-08-30 09:46:02 - pico-train - INFO - ├── Loss: 5.8765 2025-08-30 09:46:02 - pico-train - INFO - ├── Learning Rate: 1.66e-05 2025-08-30 09:46:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:46:15 - pico-train - INFO - Step 64000 -- 💾 Saving Checkpoint 2025-08-30 09:48:17 - pico-train - INFO - Step 64000 -- 📊 Evaluation Results 2025-08-30 09:48:17 - pico-train - INFO - └── paloma: 1.5176259204672668e+31 2025-08-30 09:48:20 - pico-train - INFO - Step 64000 -- 🔄 Training Metrics 2025-08-30 09:48:20 - pico-train - INFO - ├── Loss: 5.9281 2025-08-30 09:48:20 - pico-train - INFO - ├── Learning Rate: 1.66e-05 2025-08-30 09:48:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:48:20 - pico-train - INFO - Step 64000 -- 📈 Saving Learning Dynamics 2025-08-30 09:48:35 - pico-train - INFO - Step 64025 -- 🔄 Training Metrics 2025-08-30 09:48:35 - pico-train - INFO - ├── Loss: 5.8790 2025-08-30 09:48:35 - pico-train - INFO - ├── Learning Rate: 1.66e-05 2025-08-30 09:48:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:48:47 - pico-train - INFO - Step 64050 -- 🔄 Training Metrics 2025-08-30 09:48:47 - pico-train - INFO - ├── Loss: 5.8652 2025-08-30 09:48:47 - pico-train - INFO - ├── Learning Rate: 1.66e-05 2025-08-30 09:48:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:49:00 - pico-train - INFO - Step 64075 -- 🔄 Training Metrics 2025-08-30 09:49:00 - pico-train - INFO - ├── Loss: 5.8631 2025-08-30 09:49:00 - pico-train - INFO - ├── Learning Rate: 1.66e-05 2025-08-30 09:49:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:49:12 - pico-train - INFO - Step 64100 -- 🔄 Training Metrics 2025-08-30 09:49:12 - pico-train - INFO - ├── Loss: 5.8123 2025-08-30 09:49:12 - pico-train - INFO - ├── Learning Rate: 1.65e-05 2025-08-30 09:49:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:49:25 - pico-train - INFO - Step 64125 -- 🔄 Training Metrics 2025-08-30 09:49:25 - pico-train - INFO - ├── Loss: 5.8136 2025-08-30 09:49:25 - pico-train - INFO - ├── Learning Rate: 1.65e-05 2025-08-30 09:49:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:49:38 - pico-train - INFO - Step 64150 -- 🔄 Training Metrics 2025-08-30 09:49:38 - pico-train - INFO - ├── Loss: 5.8727 2025-08-30 09:49:38 - pico-train - INFO - ├── Learning Rate: 1.65e-05 2025-08-30 09:49:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:49:50 - pico-train - INFO - Step 64175 -- 🔄 Training Metrics 2025-08-30 09:49:50 - pico-train - INFO - ├── Loss: 5.8386 2025-08-30 09:49:50 - pico-train - INFO - ├── Learning Rate: 1.65e-05 2025-08-30 09:49:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:50:03 - pico-train - INFO - Step 64200 -- 🔄 Training Metrics 2025-08-30 09:50:03 - pico-train - INFO - ├── Loss: 5.8189 2025-08-30 09:50:03 - pico-train - INFO - ├── Learning Rate: 1.65e-05 2025-08-30 09:50:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:50:16 - pico-train - INFO - Step 64225 -- 🔄 Training Metrics 2025-08-30 09:50:16 - pico-train - INFO - ├── Loss: 5.8936 2025-08-30 09:50:16 - pico-train - INFO - ├── Learning Rate: 1.64e-05 2025-08-30 09:50:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:50:29 - pico-train - INFO - Step 64250 -- 🔄 Training Metrics 2025-08-30 09:50:29 - pico-train - INFO - ├── Loss: 5.8517 2025-08-30 09:50:29 - pico-train - INFO - ├── Learning Rate: 1.64e-05 2025-08-30 09:50:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:50:41 - pico-train - INFO - Step 64275 -- 🔄 Training Metrics 2025-08-30 09:50:41 - pico-train - INFO - ├── Loss: 5.9134 2025-08-30 09:50:41 - pico-train - INFO - ├── Learning Rate: 1.64e-05 2025-08-30 09:50:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:50:54 - pico-train - INFO - Step 64300 -- 🔄 Training Metrics 2025-08-30 09:50:54 - pico-train - INFO - ├── Loss: 5.8338 2025-08-30 09:50:54 - pico-train - INFO - ├── Learning Rate: 1.64e-05 2025-08-30 09:50:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:51:07 - pico-train - INFO - Step 64325 -- 🔄 Training Metrics 2025-08-30 09:51:07 - pico-train - INFO - ├── Loss: 5.9309 2025-08-30 09:51:07 - pico-train - INFO - ├── Learning Rate: 1.64e-05 2025-08-30 09:51:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:51:19 - pico-train - INFO - Step 64350 -- 🔄 Training Metrics 2025-08-30 09:51:19 - pico-train - INFO - ├── Loss: 5.8091 2025-08-30 09:51:19 - pico-train - INFO - ├── Learning Rate: 1.63e-05 2025-08-30 09:51:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:51:32 - pico-train - INFO - Step 64375 -- 🔄 Training Metrics 2025-08-30 09:51:32 - pico-train - INFO - ├── Loss: 5.8666 2025-08-30 09:51:32 - pico-train - INFO - ├── Learning Rate: 1.63e-05 2025-08-30 09:51:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:51:44 - pico-train - INFO - Step 64400 -- 🔄 Training Metrics 2025-08-30 09:51:44 - pico-train - INFO - ├── Loss: 5.7732 2025-08-30 09:51:44 - pico-train - INFO - ├── Learning Rate: 1.63e-05 2025-08-30 09:51:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:51:57 - pico-train - INFO - Step 64425 -- 🔄 Training Metrics 2025-08-30 09:51:57 - pico-train - INFO - ├── Loss: 5.8354 2025-08-30 09:51:57 - pico-train - INFO - ├── Learning Rate: 1.63e-05 2025-08-30 09:51:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:52:10 - pico-train - INFO - Step 64450 -- 🔄 Training Metrics 2025-08-30 09:52:10 - pico-train - INFO - ├── Loss: 5.8674 2025-08-30 09:52:10 - pico-train - INFO - ├── Learning Rate: 1.63e-05 2025-08-30 09:52:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:52:23 - pico-train - INFO - Step 64475 -- 🔄 Training Metrics 2025-08-30 09:52:23 - pico-train - INFO - ├── Loss: 5.8365 2025-08-30 09:52:23 - pico-train - INFO - ├── Learning Rate: 1.62e-05 2025-08-30 09:52:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:52:36 - pico-train - INFO - Step 64500 -- 💾 Saving Checkpoint 2025-08-30 09:54:35 - pico-train - INFO - Step 64500 -- 📊 Evaluation Results 2025-08-30 09:54:35 - pico-train - INFO - └── paloma: 1.7413715227937596e+31 2025-08-30 09:54:37 - pico-train - INFO - Step 64500 -- 🔄 Training Metrics 2025-08-30 09:54:37 - pico-train - INFO - ├── Loss: 5.7904 2025-08-30 09:54:37 - pico-train - INFO - ├── Learning Rate: 1.62e-05 2025-08-30 09:54:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:54:37 - pico-train - INFO - Step 64500 -- 📈 Saving Learning Dynamics 2025-08-30 09:54:52 - pico-train - INFO - Step 64525 -- 🔄 Training Metrics 2025-08-30 09:54:52 - pico-train - INFO - ├── Loss: 5.7861 2025-08-30 09:54:52 - pico-train - INFO - ├── Learning Rate: 1.62e-05 2025-08-30 09:54:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:55:05 - pico-train - INFO - Step 64550 -- 🔄 Training Metrics 2025-08-30 09:55:05 - pico-train - INFO - ├── Loss: 5.7797 2025-08-30 09:55:05 - pico-train - INFO - ├── Learning Rate: 1.62e-05 2025-08-30 09:55:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:55:18 - pico-train - INFO - Step 64575 -- 🔄 Training Metrics 2025-08-30 09:55:18 - pico-train - INFO - ├── Loss: 5.7777 2025-08-30 09:55:18 - pico-train - INFO - ├── Learning Rate: 1.62e-05 2025-08-30 09:55:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:55:30 - pico-train - INFO - Step 64600 -- 🔄 Training Metrics 2025-08-30 09:55:30 - pico-train - INFO - ├── Loss: 5.8649 2025-08-30 09:55:30 - pico-train - INFO - ├── Learning Rate: 1.61e-05 2025-08-30 09:55:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:55:43 - pico-train - INFO - Step 64625 -- 🔄 Training Metrics 2025-08-30 09:55:43 - pico-train - INFO - ├── Loss: 5.8215 2025-08-30 09:55:43 - pico-train - INFO - ├── Learning Rate: 1.61e-05 2025-08-30 09:55:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:55:56 - pico-train - INFO - Step 64650 -- 🔄 Training Metrics 2025-08-30 09:55:56 - pico-train - INFO - ├── Loss: 5.8024 2025-08-30 09:55:56 - pico-train - INFO - ├── Learning Rate: 1.61e-05 2025-08-30 09:55:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:56:08 - pico-train - INFO - Step 64675 -- 🔄 Training Metrics 2025-08-30 09:56:08 - pico-train - INFO - ├── Loss: 5.8857 2025-08-30 09:56:08 - pico-train - INFO - ├── Learning Rate: 1.61e-05 2025-08-30 09:56:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:56:21 - pico-train - INFO - Step 64700 -- 🔄 Training Metrics 2025-08-30 09:56:21 - pico-train - INFO - ├── Loss: 5.7671 2025-08-30 09:56:21 - pico-train - INFO - ├── Learning Rate: 1.61e-05 2025-08-30 09:56:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:56:34 - pico-train - INFO - Step 64725 -- 🔄 Training Metrics 2025-08-30 09:56:34 - pico-train - INFO - ├── Loss: 5.8027 2025-08-30 09:56:34 - pico-train - INFO - ├── Learning Rate: 1.60e-05 2025-08-30 09:56:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:56:47 - pico-train - INFO - Step 64750 -- 🔄 Training Metrics 2025-08-30 09:56:47 - pico-train - INFO - ├── Loss: 5.8995 2025-08-30 09:56:47 - pico-train - INFO - ├── Learning Rate: 1.60e-05 2025-08-30 09:56:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:56:59 - pico-train - INFO - Step 64775 -- 🔄 Training Metrics 2025-08-30 09:56:59 - pico-train - INFO - ├── Loss: 5.7634 2025-08-30 09:56:59 - pico-train - INFO - ├── Learning Rate: 1.60e-05 2025-08-30 09:56:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:57:12 - pico-train - INFO - Step 64800 -- 🔄 Training Metrics 2025-08-30 09:57:12 - pico-train - INFO - ├── Loss: 5.8010 2025-08-30 09:57:12 - pico-train - INFO - ├── Learning Rate: 1.60e-05 2025-08-30 09:57:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:57:25 - pico-train - INFO - Step 64825 -- 🔄 Training Metrics 2025-08-30 09:57:25 - pico-train - INFO - ├── Loss: 5.7916 2025-08-30 09:57:25 - pico-train - INFO - ├── Learning Rate: 1.60e-05 2025-08-30 09:57:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:57:38 - pico-train - INFO - Step 64850 -- 🔄 Training Metrics 2025-08-30 09:57:38 - pico-train - INFO - ├── Loss: 5.7833 2025-08-30 09:57:38 - pico-train - INFO - ├── Learning Rate: 1.59e-05 2025-08-30 09:57:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:57:50 - pico-train - INFO - Step 64875 -- 🔄 Training Metrics 2025-08-30 09:57:50 - pico-train - INFO - ├── Loss: 5.8170 2025-08-30 09:57:50 - pico-train - INFO - ├── Learning Rate: 1.59e-05 2025-08-30 09:57:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:58:03 - pico-train - INFO - Step 64900 -- 🔄 Training Metrics 2025-08-30 09:58:03 - pico-train - INFO - ├── Loss: 5.8529 2025-08-30 09:58:03 - pico-train - INFO - ├── Learning Rate: 1.59e-05 2025-08-30 09:58:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:58:16 - pico-train - INFO - Step 64925 -- 🔄 Training Metrics 2025-08-30 09:58:16 - pico-train - INFO - ├── Loss: 5.8294 2025-08-30 09:58:16 - pico-train - INFO - ├── Learning Rate: 1.59e-05 2025-08-30 09:58:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:58:28 - pico-train - INFO - Step 64950 -- 🔄 Training Metrics 2025-08-30 09:58:28 - pico-train - INFO - ├── Loss: 5.8264 2025-08-30 09:58:28 - pico-train - INFO - ├── Learning Rate: 1.59e-05 2025-08-30 09:58:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:58:41 - pico-train - INFO - Step 64975 -- 🔄 Training Metrics 2025-08-30 09:58:41 - pico-train - INFO - ├── Loss: 5.7959 2025-08-30 09:58:41 - pico-train - INFO - ├── Learning Rate: 1.58e-05 2025-08-30 09:58:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 09:58:54 - pico-train - INFO - Step 65000 -- 💾 Saving Checkpoint 2025-08-30 10:00:52 - pico-train - INFO - Step 65000 -- 📊 Evaluation Results 2025-08-30 10:00:52 - pico-train - INFO - └── paloma: 1.9165716287373382e+31 2025-08-30 10:00:55 - pico-train - INFO - Step 65000 -- 🔄 Training Metrics 2025-08-30 10:00:55 - pico-train - INFO - ├── Loss: 5.8632 2025-08-30 10:00:55 - pico-train - INFO - ├── Learning Rate: 1.58e-05 2025-08-30 10:00:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:00:55 - pico-train - INFO - Step 65000 -- 📈 Saving Learning Dynamics 2025-08-30 10:01:10 - pico-train - INFO - Step 65025 -- 🔄 Training Metrics 2025-08-30 10:01:10 - pico-train - INFO - ├── Loss: 5.8177 2025-08-30 10:01:10 - pico-train - INFO - ├── Learning Rate: 1.58e-05 2025-08-30 10:01:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:01:23 - pico-train - INFO - Step 65050 -- 🔄 Training Metrics 2025-08-30 10:01:23 - pico-train - INFO - ├── Loss: 5.7954 2025-08-30 10:01:23 - pico-train - INFO - ├── Learning Rate: 1.58e-05 2025-08-30 10:01:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:01:35 - pico-train - INFO - Step 65075 -- 🔄 Training Metrics 2025-08-30 10:01:35 - pico-train - INFO - ├── Loss: 5.7900 2025-08-30 10:01:35 - pico-train - INFO - ├── Learning Rate: 1.58e-05 2025-08-30 10:01:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:01:48 - pico-train - INFO - Step 65100 -- 🔄 Training Metrics 2025-08-30 10:01:48 - pico-train - INFO - ├── Loss: 5.8748 2025-08-30 10:01:48 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 10:01:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:02:00 - pico-train - INFO - Step 65125 -- 🔄 Training Metrics 2025-08-30 10:02:00 - pico-train - INFO - ├── Loss: 5.8848 2025-08-30 10:02:00 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 10:02:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:02:13 - pico-train - INFO - Step 65150 -- 🔄 Training Metrics 2025-08-30 10:02:13 - pico-train - INFO - ├── Loss: 5.8230 2025-08-30 10:02:13 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 10:02:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:02:26 - pico-train - INFO - Step 65175 -- 🔄 Training Metrics 2025-08-30 10:02:26 - pico-train - INFO - ├── Loss: 5.8187 2025-08-30 10:02:26 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 10:02:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:02:38 - pico-train - INFO - Step 65200 -- 🔄 Training Metrics 2025-08-30 10:02:38 - pico-train - INFO - ├── Loss: 5.7594 2025-08-30 10:02:38 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 10:02:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:02:51 - pico-train - INFO - Step 65225 -- 🔄 Training Metrics 2025-08-30 10:02:51 - pico-train - INFO - ├── Loss: 5.8269 2025-08-30 10:02:51 - pico-train - INFO - ├── Learning Rate: 1.57e-05 2025-08-30 10:02:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:03:03 - pico-train - INFO - Step 65250 -- 🔄 Training Metrics 2025-08-30 10:03:03 - pico-train - INFO - ├── Loss: 5.8085 2025-08-30 10:03:03 - pico-train - INFO - ├── Learning Rate: 1.56e-05 2025-08-30 10:03:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:03:16 - pico-train - INFO - Step 65275 -- 🔄 Training Metrics 2025-08-30 10:03:16 - pico-train - INFO - ├── Loss: 5.7563 2025-08-30 10:03:16 - pico-train - INFO - ├── Learning Rate: 1.56e-05 2025-08-30 10:03:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:03:29 - pico-train - INFO - Step 65300 -- 🔄 Training Metrics 2025-08-30 10:03:29 - pico-train - INFO - ├── Loss: 5.8133 2025-08-30 10:03:29 - pico-train - INFO - ├── Learning Rate: 1.56e-05 2025-08-30 10:03:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:03:41 - pico-train - INFO - Step 65325 -- 🔄 Training Metrics 2025-08-30 10:03:41 - pico-train - INFO - ├── Loss: 5.8193 2025-08-30 10:03:41 - pico-train - INFO - ├── Learning Rate: 1.56e-05 2025-08-30 10:03:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:03:54 - pico-train - INFO - Step 65350 -- 🔄 Training Metrics 2025-08-30 10:03:54 - pico-train - INFO - ├── Loss: 5.8060 2025-08-30 10:03:54 - pico-train - INFO - ├── Learning Rate: 1.56e-05 2025-08-30 10:03:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:04:06 - pico-train - INFO - Step 65375 -- 🔄 Training Metrics 2025-08-30 10:04:06 - pico-train - INFO - ├── Loss: 5.8249 2025-08-30 10:04:06 - pico-train - INFO - ├── Learning Rate: 1.55e-05 2025-08-30 10:04:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:04:19 - pico-train - INFO - Step 65400 -- 🔄 Training Metrics 2025-08-30 10:04:19 - pico-train - INFO - ├── Loss: 5.8455 2025-08-30 10:04:19 - pico-train - INFO - ├── Learning Rate: 1.55e-05 2025-08-30 10:04:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:04:32 - pico-train - INFO - Step 65425 -- 🔄 Training Metrics 2025-08-30 10:04:32 - pico-train - INFO - ├── Loss: 5.8625 2025-08-30 10:04:32 - pico-train - INFO - ├── Learning Rate: 1.55e-05 2025-08-30 10:04:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:04:44 - pico-train - INFO - Step 65450 -- 🔄 Training Metrics 2025-08-30 10:04:44 - pico-train - INFO - ├── Loss: 5.8366 2025-08-30 10:04:44 - pico-train - INFO - ├── Learning Rate: 1.55e-05 2025-08-30 10:04:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:04:57 - pico-train - INFO - Step 65475 -- 🔄 Training Metrics 2025-08-30 10:04:57 - pico-train - INFO - ├── Loss: 5.8005 2025-08-30 10:04:57 - pico-train - INFO - ├── Learning Rate: 1.55e-05 2025-08-30 10:04:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:05:09 - pico-train - INFO - Step 65500 -- 💾 Saving Checkpoint 2025-08-30 10:07:04 - pico-train - INFO - Step 65500 -- 📊 Evaluation Results 2025-08-30 10:07:04 - pico-train - INFO - └── paloma: 1.8707850216569984e+31 2025-08-30 10:07:06 - pico-train - INFO - Step 65500 -- 🔄 Training Metrics 2025-08-30 10:07:06 - pico-train - INFO - ├── Loss: 5.8969 2025-08-30 10:07:06 - pico-train - INFO - ├── Learning Rate: 1.54e-05 2025-08-30 10:07:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:07:06 - pico-train - INFO - Step 65500 -- 📈 Saving Learning Dynamics 2025-08-30 10:07:21 - pico-train - INFO - Step 65525 -- 🔄 Training Metrics 2025-08-30 10:07:21 - pico-train - INFO - ├── Loss: 5.8361 2025-08-30 10:07:21 - pico-train - INFO - ├── Learning Rate: 1.54e-05 2025-08-30 10:07:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:07:33 - pico-train - INFO - Step 65550 -- 🔄 Training Metrics 2025-08-30 10:07:33 - pico-train - INFO - ├── Loss: 5.8304 2025-08-30 10:07:33 - pico-train - INFO - ├── Learning Rate: 1.54e-05 2025-08-30 10:07:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:07:46 - pico-train - INFO - Step 65575 -- 🔄 Training Metrics 2025-08-30 10:07:46 - pico-train - INFO - ├── Loss: 5.8668 2025-08-30 10:07:46 - pico-train - INFO - ├── Learning Rate: 1.54e-05 2025-08-30 10:07:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:07:59 - pico-train - INFO - Step 65600 -- 🔄 Training Metrics 2025-08-30 10:07:59 - pico-train - INFO - ├── Loss: 5.8797 2025-08-30 10:07:59 - pico-train - INFO - ├── Learning Rate: 1.54e-05 2025-08-30 10:07:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:08:12 - pico-train - INFO - Step 65625 -- 🔄 Training Metrics 2025-08-30 10:08:12 - pico-train - INFO - ├── Loss: 5.8747 2025-08-30 10:08:12 - pico-train - INFO - ├── Learning Rate: 1.53e-05 2025-08-30 10:08:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:08:25 - pico-train - INFO - Step 65650 -- 🔄 Training Metrics 2025-08-30 10:08:25 - pico-train - INFO - ├── Loss: 5.8350 2025-08-30 10:08:25 - pico-train - INFO - ├── Learning Rate: 1.53e-05 2025-08-30 10:08:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:08:37 - pico-train - INFO - Step 65675 -- 🔄 Training Metrics 2025-08-30 10:08:37 - pico-train - INFO - ├── Loss: 5.8606 2025-08-30 10:08:37 - pico-train - INFO - ├── Learning Rate: 1.53e-05 2025-08-30 10:08:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:08:50 - pico-train - INFO - Step 65700 -- 🔄 Training Metrics 2025-08-30 10:08:50 - pico-train - INFO - ├── Loss: 5.8106 2025-08-30 10:08:50 - pico-train - INFO - ├── Learning Rate: 1.53e-05 2025-08-30 10:08:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:09:02 - pico-train - INFO - Step 65725 -- 🔄 Training Metrics 2025-08-30 10:09:02 - pico-train - INFO - ├── Loss: 5.9222 2025-08-30 10:09:02 - pico-train - INFO - ├── Learning Rate: 1.53e-05 2025-08-30 10:09:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:09:15 - pico-train - INFO - Step 65750 -- 🔄 Training Metrics 2025-08-30 10:09:15 - pico-train - INFO - ├── Loss: 5.8246 2025-08-30 10:09:15 - pico-train - INFO - ├── Learning Rate: 1.52e-05 2025-08-30 10:09:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:09:27 - pico-train - INFO - Step 65775 -- 🔄 Training Metrics 2025-08-30 10:09:27 - pico-train - INFO - ├── Loss: 5.8507 2025-08-30 10:09:27 - pico-train - INFO - ├── Learning Rate: 1.52e-05 2025-08-30 10:09:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:09:40 - pico-train - INFO - Step 65800 -- 🔄 Training Metrics 2025-08-30 10:09:40 - pico-train - INFO - ├── Loss: 5.8379 2025-08-30 10:09:40 - pico-train - INFO - ├── Learning Rate: 1.52e-05 2025-08-30 10:09:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:09:53 - pico-train - INFO - Step 65825 -- 🔄 Training Metrics 2025-08-30 10:09:53 - pico-train - INFO - ├── Loss: 5.8610 2025-08-30 10:09:53 - pico-train - INFO - ├── Learning Rate: 1.52e-05 2025-08-30 10:09:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:10:05 - pico-train - INFO - Step 65850 -- 🔄 Training Metrics 2025-08-30 10:10:05 - pico-train - INFO - ├── Loss: 5.8496 2025-08-30 10:10:05 - pico-train - INFO - ├── Learning Rate: 1.52e-05 2025-08-30 10:10:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:10:18 - pico-train - INFO - Step 65875 -- 🔄 Training Metrics 2025-08-30 10:10:18 - pico-train - INFO - ├── Loss: 5.8066 2025-08-30 10:10:18 - pico-train - INFO - ├── Learning Rate: 1.51e-05 2025-08-30 10:10:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:10:30 - pico-train - INFO - Step 65900 -- 🔄 Training Metrics 2025-08-30 10:10:30 - pico-train - INFO - ├── Loss: 5.8117 2025-08-30 10:10:30 - pico-train - INFO - ├── Learning Rate: 1.51e-05 2025-08-30 10:10:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:10:43 - pico-train - INFO - Step 65925 -- 🔄 Training Metrics 2025-08-30 10:10:43 - pico-train - INFO - ├── Loss: 5.7019 2025-08-30 10:10:43 - pico-train - INFO - ├── Learning Rate: 1.51e-05 2025-08-30 10:10:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:10:56 - pico-train - INFO - Step 65950 -- 🔄 Training Metrics 2025-08-30 10:10:56 - pico-train - INFO - ├── Loss: 5.8699 2025-08-30 10:10:56 - pico-train - INFO - ├── Learning Rate: 1.51e-05 2025-08-30 10:10:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:11:08 - pico-train - INFO - Step 65975 -- 🔄 Training Metrics 2025-08-30 10:11:08 - pico-train - INFO - ├── Loss: 5.8359 2025-08-30 10:11:08 - pico-train - INFO - ├── Learning Rate: 1.51e-05 2025-08-30 10:11:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:11:20 - pico-train - INFO - Step 66000 -- 💾 Saving Checkpoint 2025-08-30 10:13:16 - pico-train - INFO - Step 66000 -- 📊 Evaluation Results 2025-08-30 10:13:16 - pico-train - INFO - └── paloma: 2.5231045927678714e+31 2025-08-30 10:13:18 - pico-train - INFO - Step 66000 -- 🔄 Training Metrics 2025-08-30 10:13:18 - pico-train - INFO - ├── Loss: 5.8326 2025-08-30 10:13:18 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-30 10:13:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:13:18 - pico-train - INFO - Step 66000 -- 📈 Saving Learning Dynamics 2025-08-30 10:13:33 - pico-train - INFO - Step 66025 -- 🔄 Training Metrics 2025-08-30 10:13:33 - pico-train - INFO - ├── Loss: 5.7993 2025-08-30 10:13:33 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-30 10:13:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:13:45 - pico-train - INFO - Step 66050 -- 🔄 Training Metrics 2025-08-30 10:13:45 - pico-train - INFO - ├── Loss: 5.7906 2025-08-30 10:13:45 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-30 10:13:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:13:58 - pico-train - INFO - Step 66075 -- 🔄 Training Metrics 2025-08-30 10:13:58 - pico-train - INFO - ├── Loss: 5.8668 2025-08-30 10:13:58 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-30 10:13:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:14:11 - pico-train - INFO - Step 66100 -- 🔄 Training Metrics 2025-08-30 10:14:11 - pico-train - INFO - ├── Loss: 5.7929 2025-08-30 10:14:11 - pico-train - INFO - ├── Learning Rate: 1.50e-05 2025-08-30 10:14:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:14:24 - pico-train - INFO - Step 66125 -- 🔄 Training Metrics 2025-08-30 10:14:24 - pico-train - INFO - ├── Loss: 5.8483 2025-08-30 10:14:24 - pico-train - INFO - ├── Learning Rate: 1.49e-05 2025-08-30 10:14:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:14:37 - pico-train - INFO - Step 66150 -- 🔄 Training Metrics 2025-08-30 10:14:37 - pico-train - INFO - ├── Loss: 5.8747 2025-08-30 10:14:37 - pico-train - INFO - ├── Learning Rate: 1.49e-05 2025-08-30 10:14:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:14:49 - pico-train - INFO - Step 66175 -- 🔄 Training Metrics 2025-08-30 10:14:49 - pico-train - INFO - ├── Loss: 5.7636 2025-08-30 10:14:49 - pico-train - INFO - ├── Learning Rate: 1.49e-05 2025-08-30 10:14:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:15:02 - pico-train - INFO - Step 66200 -- 🔄 Training Metrics 2025-08-30 10:15:02 - pico-train - INFO - ├── Loss: 5.6910 2025-08-30 10:15:02 - pico-train - INFO - ├── Learning Rate: 1.49e-05 2025-08-30 10:15:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:15:14 - pico-train - INFO - Step 66225 -- 🔄 Training Metrics 2025-08-30 10:15:14 - pico-train - INFO - ├── Loss: 5.7696 2025-08-30 10:15:14 - pico-train - INFO - ├── Learning Rate: 1.49e-05 2025-08-30 10:15:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:15:27 - pico-train - INFO - Step 66250 -- 🔄 Training Metrics 2025-08-30 10:15:27 - pico-train - INFO - ├── Loss: 5.8958 2025-08-30 10:15:27 - pico-train - INFO - ├── Learning Rate: 1.48e-05 2025-08-30 10:15:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:15:40 - pico-train - INFO - Step 66275 -- 🔄 Training Metrics 2025-08-30 10:15:40 - pico-train - INFO - ├── Loss: 5.8720 2025-08-30 10:15:40 - pico-train - INFO - ├── Learning Rate: 1.48e-05 2025-08-30 10:15:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:15:53 - pico-train - INFO - Step 66300 -- 🔄 Training Metrics 2025-08-30 10:15:53 - pico-train - INFO - ├── Loss: 5.7927 2025-08-30 10:15:53 - pico-train - INFO - ├── Learning Rate: 1.48e-05 2025-08-30 10:15:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:16:05 - pico-train - INFO - Step 66325 -- 🔄 Training Metrics 2025-08-30 10:16:05 - pico-train - INFO - ├── Loss: 5.7417 2025-08-30 10:16:05 - pico-train - INFO - ├── Learning Rate: 1.48e-05 2025-08-30 10:16:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:16:18 - pico-train - INFO - Step 66350 -- 🔄 Training Metrics 2025-08-30 10:16:18 - pico-train - INFO - ├── Loss: 5.7908 2025-08-30 10:16:18 - pico-train - INFO - ├── Learning Rate: 1.48e-05 2025-08-30 10:16:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:16:31 - pico-train - INFO - Step 66375 -- 🔄 Training Metrics 2025-08-30 10:16:31 - pico-train - INFO - ├── Loss: 5.8609 2025-08-30 10:16:31 - pico-train - INFO - ├── Learning Rate: 1.47e-05 2025-08-30 10:16:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:16:43 - pico-train - INFO - Step 66400 -- 🔄 Training Metrics 2025-08-30 10:16:43 - pico-train - INFO - ├── Loss: 5.7846 2025-08-30 10:16:43 - pico-train - INFO - ├── Learning Rate: 1.47e-05 2025-08-30 10:16:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:16:56 - pico-train - INFO - Step 66425 -- 🔄 Training Metrics 2025-08-30 10:16:56 - pico-train - INFO - ├── Loss: 5.7744 2025-08-30 10:16:56 - pico-train - INFO - ├── Learning Rate: 1.47e-05 2025-08-30 10:16:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:17:09 - pico-train - INFO - Step 66450 -- 🔄 Training Metrics 2025-08-30 10:17:09 - pico-train - INFO - ├── Loss: 5.7639 2025-08-30 10:17:09 - pico-train - INFO - ├── Learning Rate: 1.47e-05 2025-08-30 10:17:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:17:22 - pico-train - INFO - Step 66475 -- 🔄 Training Metrics 2025-08-30 10:17:22 - pico-train - INFO - ├── Loss: 5.8572 2025-08-30 10:17:22 - pico-train - INFO - ├── Learning Rate: 1.47e-05 2025-08-30 10:17:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:17:34 - pico-train - INFO - Step 66500 -- 💾 Saving Checkpoint 2025-08-30 10:19:32 - pico-train - INFO - Step 66500 -- 📊 Evaluation Results 2025-08-30 10:19:32 - pico-train - INFO - └── paloma: 2.557649624835569e+31 2025-08-30 10:19:34 - pico-train - INFO - Step 66500 -- 🔄 Training Metrics 2025-08-30 10:19:34 - pico-train - INFO - ├── Loss: 5.7731 2025-08-30 10:19:34 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 10:19:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:19:34 - pico-train - INFO - Step 66500 -- 📈 Saving Learning Dynamics 2025-08-30 10:19:49 - pico-train - INFO - Step 66525 -- 🔄 Training Metrics 2025-08-30 10:19:49 - pico-train - INFO - ├── Loss: 5.8698 2025-08-30 10:19:49 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 10:19:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:20:01 - pico-train - INFO - Step 66550 -- 🔄 Training Metrics 2025-08-30 10:20:01 - pico-train - INFO - ├── Loss: 5.7763 2025-08-30 10:20:01 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 10:20:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:20:14 - pico-train - INFO - Step 66575 -- 🔄 Training Metrics 2025-08-30 10:20:14 - pico-train - INFO - ├── Loss: 5.7793 2025-08-30 10:20:14 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 10:20:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:20:27 - pico-train - INFO - Step 66600 -- 🔄 Training Metrics 2025-08-30 10:20:27 - pico-train - INFO - ├── Loss: 5.8998 2025-08-30 10:20:27 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 10:20:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:20:40 - pico-train - INFO - Step 66625 -- 🔄 Training Metrics 2025-08-30 10:20:40 - pico-train - INFO - ├── Loss: 5.8772 2025-08-30 10:20:40 - pico-train - INFO - ├── Learning Rate: 1.46e-05 2025-08-30 10:20:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:20:53 - pico-train - INFO - Step 66650 -- 🔄 Training Metrics 2025-08-30 10:20:53 - pico-train - INFO - ├── Loss: 5.7580 2025-08-30 10:20:53 - pico-train - INFO - ├── Learning Rate: 1.45e-05 2025-08-30 10:20:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:21:05 - pico-train - INFO - Step 66675 -- 🔄 Training Metrics 2025-08-30 10:21:05 - pico-train - INFO - ├── Loss: 5.8102 2025-08-30 10:21:05 - pico-train - INFO - ├── Learning Rate: 1.45e-05 2025-08-30 10:21:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:21:18 - pico-train - INFO - Step 66700 -- 🔄 Training Metrics 2025-08-30 10:21:18 - pico-train - INFO - ├── Loss: 5.8321 2025-08-30 10:21:18 - pico-train - INFO - ├── Learning Rate: 1.45e-05 2025-08-30 10:21:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:21:30 - pico-train - INFO - Step 66725 -- 🔄 Training Metrics 2025-08-30 10:21:30 - pico-train - INFO - ├── Loss: 5.7792 2025-08-30 10:21:30 - pico-train - INFO - ├── Learning Rate: 1.45e-05 2025-08-30 10:21:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:21:43 - pico-train - INFO - Step 66750 -- 🔄 Training Metrics 2025-08-30 10:21:43 - pico-train - INFO - ├── Loss: 5.7811 2025-08-30 10:21:43 - pico-train - INFO - ├── Learning Rate: 1.45e-05 2025-08-30 10:21:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:21:56 - pico-train - INFO - Step 66775 -- 🔄 Training Metrics 2025-08-30 10:21:56 - pico-train - INFO - ├── Loss: 5.7789 2025-08-30 10:21:56 - pico-train - INFO - ├── Learning Rate: 1.44e-05 2025-08-30 10:21:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:22:08 - pico-train - INFO - Step 66800 -- 🔄 Training Metrics 2025-08-30 10:22:08 - pico-train - INFO - ├── Loss: 5.7714 2025-08-30 10:22:08 - pico-train - INFO - ├── Learning Rate: 1.44e-05 2025-08-30 10:22:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:22:21 - pico-train - INFO - Step 66825 -- 🔄 Training Metrics 2025-08-30 10:22:21 - pico-train - INFO - ├── Loss: 5.8399 2025-08-30 10:22:21 - pico-train - INFO - ├── Learning Rate: 1.44e-05 2025-08-30 10:22:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:22:34 - pico-train - INFO - Step 66850 -- 🔄 Training Metrics 2025-08-30 10:22:34 - pico-train - INFO - ├── Loss: 5.7693 2025-08-30 10:22:34 - pico-train - INFO - ├── Learning Rate: 1.44e-05 2025-08-30 10:22:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:22:46 - pico-train - INFO - Step 66875 -- 🔄 Training Metrics 2025-08-30 10:22:46 - pico-train - INFO - ├── Loss: 5.8165 2025-08-30 10:22:46 - pico-train - INFO - ├── Learning Rate: 1.44e-05 2025-08-30 10:22:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:22:59 - pico-train - INFO - Step 66900 -- 🔄 Training Metrics 2025-08-30 10:22:59 - pico-train - INFO - ├── Loss: 5.7763 2025-08-30 10:22:59 - pico-train - INFO - ├── Learning Rate: 1.43e-05 2025-08-30 10:22:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:23:11 - pico-train - INFO - Step 66925 -- 🔄 Training Metrics 2025-08-30 10:23:11 - pico-train - INFO - ├── Loss: 5.8683 2025-08-30 10:23:11 - pico-train - INFO - ├── Learning Rate: 1.43e-05 2025-08-30 10:23:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:23:24 - pico-train - INFO - Step 66950 -- 🔄 Training Metrics 2025-08-30 10:23:24 - pico-train - INFO - ├── Loss: 5.8662 2025-08-30 10:23:24 - pico-train - INFO - ├── Learning Rate: 1.43e-05 2025-08-30 10:23:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:23:37 - pico-train - INFO - Step 66975 -- 🔄 Training Metrics 2025-08-30 10:23:37 - pico-train - INFO - ├── Loss: 5.8864 2025-08-30 10:23:37 - pico-train - INFO - ├── Learning Rate: 1.43e-05 2025-08-30 10:23:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:23:49 - pico-train - INFO - Step 67000 -- 💾 Saving Checkpoint 2025-08-30 10:25:56 - pico-train - INFO - Step 67000 -- 📊 Evaluation Results 2025-08-30 10:25:56 - pico-train - INFO - └── paloma: 2.6865032383433974e+31 2025-08-30 10:25:58 - pico-train - INFO - Step 67000 -- 🔄 Training Metrics 2025-08-30 10:25:58 - pico-train - INFO - ├── Loss: 5.7555 2025-08-30 10:25:58 - pico-train - INFO - ├── Learning Rate: 1.43e-05 2025-08-30 10:25:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:25:58 - pico-train - INFO - Step 67000 -- 📈 Saving Learning Dynamics 2025-08-30 10:26:12 - pico-train - INFO - Step 67025 -- 🔄 Training Metrics 2025-08-30 10:26:12 - pico-train - INFO - ├── Loss: 5.8167 2025-08-30 10:26:12 - pico-train - INFO - ├── Learning Rate: 1.42e-05 2025-08-30 10:26:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:26:25 - pico-train - INFO - Step 67050 -- 🔄 Training Metrics 2025-08-30 10:26:25 - pico-train - INFO - ├── Loss: 5.8101 2025-08-30 10:26:25 - pico-train - INFO - ├── Learning Rate: 1.42e-05 2025-08-30 10:26:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:26:38 - pico-train - INFO - Step 67075 -- 🔄 Training Metrics 2025-08-30 10:26:38 - pico-train - INFO - ├── Loss: 5.8146 2025-08-30 10:26:38 - pico-train - INFO - ├── Learning Rate: 1.42e-05 2025-08-30 10:26:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:26:51 - pico-train - INFO - Step 67100 -- 🔄 Training Metrics 2025-08-30 10:26:51 - pico-train - INFO - ├── Loss: 5.9005 2025-08-30 10:26:51 - pico-train - INFO - ├── Learning Rate: 1.42e-05 2025-08-30 10:26:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:27:04 - pico-train - INFO - Step 67125 -- 🔄 Training Metrics 2025-08-30 10:27:04 - pico-train - INFO - ├── Loss: 5.7768 2025-08-30 10:27:04 - pico-train - INFO - ├── Learning Rate: 1.42e-05 2025-08-30 10:27:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:27:16 - pico-train - INFO - Step 67150 -- 🔄 Training Metrics 2025-08-30 10:27:16 - pico-train - INFO - ├── Loss: 5.7152 2025-08-30 10:27:16 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 10:27:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:27:29 - pico-train - INFO - Step 67175 -- 🔄 Training Metrics 2025-08-30 10:27:29 - pico-train - INFO - ├── Loss: 5.8443 2025-08-30 10:27:29 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 10:27:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:27:41 - pico-train - INFO - Step 67200 -- 🔄 Training Metrics 2025-08-30 10:27:41 - pico-train - INFO - ├── Loss: 5.7907 2025-08-30 10:27:41 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 10:27:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:27:54 - pico-train - INFO - Step 67225 -- 🔄 Training Metrics 2025-08-30 10:27:54 - pico-train - INFO - ├── Loss: 5.8160 2025-08-30 10:27:54 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 10:27:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:28:06 - pico-train - INFO - Step 67250 -- 🔄 Training Metrics 2025-08-30 10:28:06 - pico-train - INFO - ├── Loss: 5.8334 2025-08-30 10:28:06 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 10:28:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:28:19 - pico-train - INFO - Step 67275 -- 🔄 Training Metrics 2025-08-30 10:28:19 - pico-train - INFO - ├── Loss: 5.8201 2025-08-30 10:28:19 - pico-train - INFO - ├── Learning Rate: 1.41e-05 2025-08-30 10:28:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:28:32 - pico-train - INFO - Step 67300 -- 🔄 Training Metrics 2025-08-30 10:28:32 - pico-train - INFO - ├── Loss: 5.8962 2025-08-30 10:28:32 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-30 10:28:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:28:44 - pico-train - INFO - Step 67325 -- 🔄 Training Metrics 2025-08-30 10:28:44 - pico-train - INFO - ├── Loss: 5.7876 2025-08-30 10:28:44 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-30 10:28:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:28:57 - pico-train - INFO - Step 67350 -- 🔄 Training Metrics 2025-08-30 10:28:57 - pico-train - INFO - ├── Loss: 5.8093 2025-08-30 10:28:57 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-30 10:28:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:29:09 - pico-train - INFO - Step 67375 -- 🔄 Training Metrics 2025-08-30 10:29:09 - pico-train - INFO - ├── Loss: 5.7282 2025-08-30 10:29:09 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-30 10:29:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:29:22 - pico-train - INFO - Step 67400 -- 🔄 Training Metrics 2025-08-30 10:29:22 - pico-train - INFO - ├── Loss: 5.7584 2025-08-30 10:29:22 - pico-train - INFO - ├── Learning Rate: 1.40e-05 2025-08-30 10:29:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:29:34 - pico-train - INFO - Step 67425 -- 🔄 Training Metrics 2025-08-30 10:29:34 - pico-train - INFO - ├── Loss: 5.7801 2025-08-30 10:29:34 - pico-train - INFO - ├── Learning Rate: 1.39e-05 2025-08-30 10:29:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:29:47 - pico-train - INFO - Step 67450 -- 🔄 Training Metrics 2025-08-30 10:29:47 - pico-train - INFO - ├── Loss: 5.7262 2025-08-30 10:29:47 - pico-train - INFO - ├── Learning Rate: 1.39e-05 2025-08-30 10:29:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:30:00 - pico-train - INFO - Step 67475 -- 🔄 Training Metrics 2025-08-30 10:30:00 - pico-train - INFO - ├── Loss: 5.7496 2025-08-30 10:30:00 - pico-train - INFO - ├── Learning Rate: 1.39e-05 2025-08-30 10:30:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:30:12 - pico-train - INFO - Step 67500 -- 💾 Saving Checkpoint 2025-08-30 10:32:10 - pico-train - INFO - Step 67500 -- 📊 Evaluation Results 2025-08-30 10:32:10 - pico-train - INFO - └── paloma: 3.1065040652754565e+31 2025-08-30 10:32:13 - pico-train - INFO - Step 67500 -- 🔄 Training Metrics 2025-08-30 10:32:13 - pico-train - INFO - ├── Loss: 5.7965 2025-08-30 10:32:13 - pico-train - INFO - ├── Learning Rate: 1.39e-05 2025-08-30 10:32:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:32:13 - pico-train - INFO - Step 67500 -- 📈 Saving Learning Dynamics 2025-08-30 10:32:27 - pico-train - INFO - Step 67525 -- 🔄 Training Metrics 2025-08-30 10:32:27 - pico-train - INFO - ├── Loss: 5.8326 2025-08-30 10:32:27 - pico-train - INFO - ├── Learning Rate: 1.39e-05 2025-08-30 10:32:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:32:40 - pico-train - INFO - Step 67550 -- 🔄 Training Metrics 2025-08-30 10:32:40 - pico-train - INFO - ├── Loss: 5.8544 2025-08-30 10:32:40 - pico-train - INFO - ├── Learning Rate: 1.38e-05 2025-08-30 10:32:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:32:53 - pico-train - INFO - Step 67575 -- 🔄 Training Metrics 2025-08-30 10:32:53 - pico-train - INFO - ├── Loss: 5.8529 2025-08-30 10:32:53 - pico-train - INFO - ├── Learning Rate: 1.38e-05 2025-08-30 10:32:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:33:06 - pico-train - INFO - Step 67600 -- 🔄 Training Metrics 2025-08-30 10:33:06 - pico-train - INFO - ├── Loss: 5.7630 2025-08-30 10:33:06 - pico-train - INFO - ├── Learning Rate: 1.38e-05 2025-08-30 10:33:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:33:19 - pico-train - INFO - Step 67625 -- 🔄 Training Metrics 2025-08-30 10:33:19 - pico-train - INFO - ├── Loss: 5.8400 2025-08-30 10:33:19 - pico-train - INFO - ├── Learning Rate: 1.38e-05 2025-08-30 10:33:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:33:31 - pico-train - INFO - Step 67650 -- 🔄 Training Metrics 2025-08-30 10:33:31 - pico-train - INFO - ├── Loss: 5.6921 2025-08-30 10:33:31 - pico-train - INFO - ├── Learning Rate: 1.38e-05 2025-08-30 10:33:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:33:44 - pico-train - INFO - Step 67675 -- 🔄 Training Metrics 2025-08-30 10:33:44 - pico-train - INFO - ├── Loss: 5.7714 2025-08-30 10:33:44 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 10:33:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:33:57 - pico-train - INFO - Step 67700 -- 🔄 Training Metrics 2025-08-30 10:33:57 - pico-train - INFO - ├── Loss: 5.8415 2025-08-30 10:33:57 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 10:33:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:34:09 - pico-train - INFO - Step 67725 -- 🔄 Training Metrics 2025-08-30 10:34:09 - pico-train - INFO - ├── Loss: 5.7966 2025-08-30 10:34:09 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 10:34:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:34:22 - pico-train - INFO - Step 67750 -- 🔄 Training Metrics 2025-08-30 10:34:22 - pico-train - INFO - ├── Loss: 5.7681 2025-08-30 10:34:22 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 10:34:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:34:34 - pico-train - INFO - Step 67775 -- 🔄 Training Metrics 2025-08-30 10:34:34 - pico-train - INFO - ├── Loss: 5.8142 2025-08-30 10:34:34 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 10:34:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:34:47 - pico-train - INFO - Step 67800 -- 🔄 Training Metrics 2025-08-30 10:34:47 - pico-train - INFO - ├── Loss: 5.8364 2025-08-30 10:34:47 - pico-train - INFO - ├── Learning Rate: 1.37e-05 2025-08-30 10:34:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:35:00 - pico-train - INFO - Step 67825 -- 🔄 Training Metrics 2025-08-30 10:35:00 - pico-train - INFO - ├── Loss: 5.7471 2025-08-30 10:35:00 - pico-train - INFO - ├── Learning Rate: 1.36e-05 2025-08-30 10:35:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:35:12 - pico-train - INFO - Step 67850 -- 🔄 Training Metrics 2025-08-30 10:35:12 - pico-train - INFO - ├── Loss: 5.7829 2025-08-30 10:35:12 - pico-train - INFO - ├── Learning Rate: 1.36e-05 2025-08-30 10:35:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:35:25 - pico-train - INFO - Step 67875 -- 🔄 Training Metrics 2025-08-30 10:35:25 - pico-train - INFO - ├── Loss: 5.7502 2025-08-30 10:35:25 - pico-train - INFO - ├── Learning Rate: 1.36e-05 2025-08-30 10:35:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:35:38 - pico-train - INFO - Step 67900 -- 🔄 Training Metrics 2025-08-30 10:35:38 - pico-train - INFO - ├── Loss: 5.8291 2025-08-30 10:35:38 - pico-train - INFO - ├── Learning Rate: 1.36e-05 2025-08-30 10:35:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:35:50 - pico-train - INFO - Step 67925 -- 🔄 Training Metrics 2025-08-30 10:35:50 - pico-train - INFO - ├── Loss: 5.8411 2025-08-30 10:35:50 - pico-train - INFO - ├── Learning Rate: 1.36e-05 2025-08-30 10:35:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:36:03 - pico-train - INFO - Step 67950 -- 🔄 Training Metrics 2025-08-30 10:36:03 - pico-train - INFO - ├── Loss: 5.8542 2025-08-30 10:36:03 - pico-train - INFO - ├── Learning Rate: 1.35e-05 2025-08-30 10:36:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:36:16 - pico-train - INFO - Step 67975 -- 🔄 Training Metrics 2025-08-30 10:36:16 - pico-train - INFO - ├── Loss: 5.9065 2025-08-30 10:36:16 - pico-train - INFO - ├── Learning Rate: 1.35e-05 2025-08-30 10:36:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:36:28 - pico-train - INFO - Step 68000 -- 💾 Saving Checkpoint 2025-08-30 10:38:32 - pico-train - INFO - Step 68000 -- 📊 Evaluation Results 2025-08-30 10:38:32 - pico-train - INFO - └── paloma: 3.3702997728095594e+31 2025-08-30 10:38:35 - pico-train - INFO - Step 68000 -- 🔄 Training Metrics 2025-08-30 10:38:35 - pico-train - INFO - ├── Loss: 5.7845 2025-08-30 10:38:35 - pico-train - INFO - ├── Learning Rate: 1.35e-05 2025-08-30 10:38:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:38:35 - pico-train - INFO - Step 68000 -- 📈 Saving Learning Dynamics 2025-08-30 10:38:50 - pico-train - INFO - Step 68025 -- 🔄 Training Metrics 2025-08-30 10:38:50 - pico-train - INFO - ├── Loss: 5.6880 2025-08-30 10:38:50 - pico-train - INFO - ├── Learning Rate: 1.35e-05 2025-08-30 10:38:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:39:02 - pico-train - INFO - Step 68050 -- 🔄 Training Metrics 2025-08-30 10:39:02 - pico-train - INFO - ├── Loss: 5.7669 2025-08-30 10:39:02 - pico-train - INFO - ├── Learning Rate: 1.35e-05 2025-08-30 10:39:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:39:15 - pico-train - INFO - Step 68075 -- 🔄 Training Metrics 2025-08-30 10:39:15 - pico-train - INFO - ├── Loss: 5.7084 2025-08-30 10:39:15 - pico-train - INFO - ├── Learning Rate: 1.34e-05 2025-08-30 10:39:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:39:28 - pico-train - INFO - Step 68100 -- 🔄 Training Metrics 2025-08-30 10:39:28 - pico-train - INFO - ├── Loss: 5.8807 2025-08-30 10:39:28 - pico-train - INFO - ├── Learning Rate: 1.34e-05 2025-08-30 10:39:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:39:41 - pico-train - INFO - Step 68125 -- 🔄 Training Metrics 2025-08-30 10:39:41 - pico-train - INFO - ├── Loss: 5.8497 2025-08-30 10:39:41 - pico-train - INFO - ├── Learning Rate: 1.34e-05 2025-08-30 10:39:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:39:54 - pico-train - INFO - Step 68150 -- 🔄 Training Metrics 2025-08-30 10:39:54 - pico-train - INFO - ├── Loss: 5.7487 2025-08-30 10:39:54 - pico-train - INFO - ├── Learning Rate: 1.34e-05 2025-08-30 10:39:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:40:06 - pico-train - INFO - Step 68175 -- 🔄 Training Metrics 2025-08-30 10:40:06 - pico-train - INFO - ├── Loss: 5.7784 2025-08-30 10:40:06 - pico-train - INFO - ├── Learning Rate: 1.34e-05 2025-08-30 10:40:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:40:19 - pico-train - INFO - Step 68200 -- 🔄 Training Metrics 2025-08-30 10:40:19 - pico-train - INFO - ├── Loss: 5.7622 2025-08-30 10:40:19 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 10:40:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:40:31 - pico-train - INFO - Step 68225 -- 🔄 Training Metrics 2025-08-30 10:40:31 - pico-train - INFO - ├── Loss: 5.7823 2025-08-30 10:40:31 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 10:40:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:40:44 - pico-train - INFO - Step 68250 -- 🔄 Training Metrics 2025-08-30 10:40:44 - pico-train - INFO - ├── Loss: 5.7689 2025-08-30 10:40:44 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 10:40:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:40:57 - pico-train - INFO - Step 68275 -- 🔄 Training Metrics 2025-08-30 10:40:57 - pico-train - INFO - ├── Loss: 5.7719 2025-08-30 10:40:57 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 10:40:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:41:09 - pico-train - INFO - Step 68300 -- 🔄 Training Metrics 2025-08-30 10:41:09 - pico-train - INFO - ├── Loss: 5.7754 2025-08-30 10:41:09 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 10:41:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:41:22 - pico-train - INFO - Step 68325 -- 🔄 Training Metrics 2025-08-30 10:41:22 - pico-train - INFO - ├── Loss: 5.8183 2025-08-30 10:41:22 - pico-train - INFO - ├── Learning Rate: 1.33e-05 2025-08-30 10:41:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:41:34 - pico-train - INFO - Step 68350 -- 🔄 Training Metrics 2025-08-30 10:41:34 - pico-train - INFO - ├── Loss: 5.8116 2025-08-30 10:41:34 - pico-train - INFO - ├── Learning Rate: 1.32e-05 2025-08-30 10:41:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:41:47 - pico-train - INFO - Step 68375 -- 🔄 Training Metrics 2025-08-30 10:41:47 - pico-train - INFO - ├── Loss: 5.6714 2025-08-30 10:41:47 - pico-train - INFO - ├── Learning Rate: 1.32e-05 2025-08-30 10:41:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:42:00 - pico-train - INFO - Step 68400 -- 🔄 Training Metrics 2025-08-30 10:42:00 - pico-train - INFO - ├── Loss: 5.7859 2025-08-30 10:42:00 - pico-train - INFO - ├── Learning Rate: 1.32e-05 2025-08-30 10:42:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:42:12 - pico-train - INFO - Step 68425 -- 🔄 Training Metrics 2025-08-30 10:42:12 - pico-train - INFO - ├── Loss: 5.8268 2025-08-30 10:42:12 - pico-train - INFO - ├── Learning Rate: 1.32e-05 2025-08-30 10:42:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:42:25 - pico-train - INFO - Step 68450 -- 🔄 Training Metrics 2025-08-30 10:42:25 - pico-train - INFO - ├── Loss: 5.8194 2025-08-30 10:42:25 - pico-train - INFO - ├── Learning Rate: 1.32e-05 2025-08-30 10:42:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:42:37 - pico-train - INFO - Step 68475 -- 🔄 Training Metrics 2025-08-30 10:42:37 - pico-train - INFO - ├── Loss: 5.8550 2025-08-30 10:42:37 - pico-train - INFO - ├── Learning Rate: 1.31e-05 2025-08-30 10:42:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:42:49 - pico-train - INFO - Step 68500 -- 💾 Saving Checkpoint 2025-08-30 10:45:00 - pico-train - INFO - Step 68500 -- 📊 Evaluation Results 2025-08-30 10:45:00 - pico-train - INFO - └── paloma: 3.3728195138741334e+31 2025-08-30 10:45:04 - pico-train - INFO - Step 68500 -- 🔄 Training Metrics 2025-08-30 10:45:04 - pico-train - INFO - ├── Loss: 5.9096 2025-08-30 10:45:04 - pico-train - INFO - ├── Learning Rate: 1.31e-05 2025-08-30 10:45:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:45:04 - pico-train - INFO - Step 68500 -- 📈 Saving Learning Dynamics 2025-08-30 10:45:19 - pico-train - INFO - Step 68525 -- 🔄 Training Metrics 2025-08-30 10:45:19 - pico-train - INFO - ├── Loss: 5.7826 2025-08-30 10:45:19 - pico-train - INFO - ├── Learning Rate: 1.31e-05 2025-08-30 10:45:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:45:32 - pico-train - INFO - Step 68550 -- 🔄 Training Metrics 2025-08-30 10:45:32 - pico-train - INFO - ├── Loss: 5.7860 2025-08-30 10:45:32 - pico-train - INFO - ├── Learning Rate: 1.31e-05 2025-08-30 10:45:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:45:45 - pico-train - INFO - Step 68575 -- 🔄 Training Metrics 2025-08-30 10:45:45 - pico-train - INFO - ├── Loss: 5.7932 2025-08-30 10:45:45 - pico-train - INFO - ├── Learning Rate: 1.31e-05 2025-08-30 10:45:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:45:58 - pico-train - INFO - Step 68600 -- 🔄 Training Metrics 2025-08-30 10:45:58 - pico-train - INFO - ├── Loss: 5.8207 2025-08-30 10:45:58 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-30 10:45:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:46:10 - pico-train - INFO - Step 68625 -- 🔄 Training Metrics 2025-08-30 10:46:10 - pico-train - INFO - ├── Loss: 5.6706 2025-08-30 10:46:10 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-30 10:46:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:46:23 - pico-train - INFO - Step 68650 -- 🔄 Training Metrics 2025-08-30 10:46:23 - pico-train - INFO - ├── Loss: 5.7751 2025-08-30 10:46:23 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-30 10:46:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:46:36 - pico-train - INFO - Step 68675 -- 🔄 Training Metrics 2025-08-30 10:46:36 - pico-train - INFO - ├── Loss: 5.7419 2025-08-30 10:46:36 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-30 10:46:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:46:48 - pico-train - INFO - Step 68700 -- 🔄 Training Metrics 2025-08-30 10:46:48 - pico-train - INFO - ├── Loss: 5.8879 2025-08-30 10:46:48 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-30 10:46:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:47:01 - pico-train - INFO - Step 68725 -- 🔄 Training Metrics 2025-08-30 10:47:01 - pico-train - INFO - ├── Loss: 5.8349 2025-08-30 10:47:01 - pico-train - INFO - ├── Learning Rate: 1.30e-05 2025-08-30 10:47:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:47:13 - pico-train - INFO - Step 68750 -- 🔄 Training Metrics 2025-08-30 10:47:13 - pico-train - INFO - ├── Loss: 5.8237 2025-08-30 10:47:13 - pico-train - INFO - ├── Learning Rate: 1.29e-05 2025-08-30 10:47:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:47:26 - pico-train - INFO - Step 68775 -- 🔄 Training Metrics 2025-08-30 10:47:26 - pico-train - INFO - ├── Loss: 5.8724 2025-08-30 10:47:26 - pico-train - INFO - ├── Learning Rate: 1.29e-05 2025-08-30 10:47:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:47:39 - pico-train - INFO - Step 68800 -- 🔄 Training Metrics 2025-08-30 10:47:39 - pico-train - INFO - ├── Loss: 5.7777 2025-08-30 10:47:39 - pico-train - INFO - ├── Learning Rate: 1.29e-05 2025-08-30 10:47:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:47:52 - pico-train - INFO - Step 68825 -- 🔄 Training Metrics 2025-08-30 10:47:52 - pico-train - INFO - ├── Loss: 5.7775 2025-08-30 10:47:52 - pico-train - INFO - ├── Learning Rate: 1.29e-05 2025-08-30 10:47:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:48:04 - pico-train - INFO - Step 68850 -- 🔄 Training Metrics 2025-08-30 10:48:04 - pico-train - INFO - ├── Loss: 5.8112 2025-08-30 10:48:04 - pico-train - INFO - ├── Learning Rate: 1.29e-05 2025-08-30 10:48:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:48:17 - pico-train - INFO - Step 68875 -- 🔄 Training Metrics 2025-08-30 10:48:17 - pico-train - INFO - ├── Loss: 5.7673 2025-08-30 10:48:17 - pico-train - INFO - ├── Learning Rate: 1.28e-05 2025-08-30 10:48:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:48:30 - pico-train - INFO - Step 68900 -- 🔄 Training Metrics 2025-08-30 10:48:30 - pico-train - INFO - ├── Loss: 5.7477 2025-08-30 10:48:30 - pico-train - INFO - ├── Learning Rate: 1.28e-05 2025-08-30 10:48:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:48:43 - pico-train - INFO - Step 68925 -- 🔄 Training Metrics 2025-08-30 10:48:43 - pico-train - INFO - ├── Loss: 5.8516 2025-08-30 10:48:43 - pico-train - INFO - ├── Learning Rate: 1.28e-05 2025-08-30 10:48:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:48:56 - pico-train - INFO - Step 68950 -- 🔄 Training Metrics 2025-08-30 10:48:56 - pico-train - INFO - ├── Loss: 5.7671 2025-08-30 10:48:56 - pico-train - INFO - ├── Learning Rate: 1.28e-05 2025-08-30 10:48:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:49:08 - pico-train - INFO - Step 68975 -- 🔄 Training Metrics 2025-08-30 10:49:08 - pico-train - INFO - ├── Loss: 5.8476 2025-08-30 10:49:08 - pico-train - INFO - ├── Learning Rate: 1.28e-05 2025-08-30 10:49:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:49:20 - pico-train - INFO - Step 69000 -- 💾 Saving Checkpoint 2025-08-30 10:51:18 - pico-train - INFO - Step 69000 -- 📊 Evaluation Results 2025-08-30 10:51:18 - pico-train - INFO - └── paloma: 4.015441614691927e+31 2025-08-30 10:51:22 - pico-train - INFO - Step 69000 -- 🔄 Training Metrics 2025-08-30 10:51:22 - pico-train - INFO - ├── Loss: 5.7945 2025-08-30 10:51:22 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 10:51:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:51:22 - pico-train - INFO - Step 69000 -- 📈 Saving Learning Dynamics 2025-08-30 10:51:38 - pico-train - INFO - Step 69025 -- 🔄 Training Metrics 2025-08-30 10:51:38 - pico-train - INFO - ├── Loss: 5.7222 2025-08-30 10:51:38 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 10:51:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:51:50 - pico-train - INFO - Step 69050 -- 🔄 Training Metrics 2025-08-30 10:51:50 - pico-train - INFO - ├── Loss: 5.8469 2025-08-30 10:51:50 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 10:51:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:52:03 - pico-train - INFO - Step 69075 -- 🔄 Training Metrics 2025-08-30 10:52:03 - pico-train - INFO - ├── Loss: 5.7888 2025-08-30 10:52:03 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 10:52:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:52:16 - pico-train - INFO - Step 69100 -- 🔄 Training Metrics 2025-08-30 10:52:16 - pico-train - INFO - ├── Loss: 5.8239 2025-08-30 10:52:16 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 10:52:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:52:29 - pico-train - INFO - Step 69125 -- 🔄 Training Metrics 2025-08-30 10:52:29 - pico-train - INFO - ├── Loss: 5.8123 2025-08-30 10:52:29 - pico-train - INFO - ├── Learning Rate: 1.27e-05 2025-08-30 10:52:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:52:42 - pico-train - INFO - Step 69150 -- 🔄 Training Metrics 2025-08-30 10:52:42 - pico-train - INFO - ├── Loss: 5.8655 2025-08-30 10:52:42 - pico-train - INFO - ├── Learning Rate: 1.26e-05 2025-08-30 10:52:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:52:54 - pico-train - INFO - Step 69175 -- 🔄 Training Metrics 2025-08-30 10:52:54 - pico-train - INFO - ├── Loss: 5.8294 2025-08-30 10:52:54 - pico-train - INFO - ├── Learning Rate: 1.26e-05 2025-08-30 10:52:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:53:07 - pico-train - INFO - Step 69200 -- 🔄 Training Metrics 2025-08-30 10:53:07 - pico-train - INFO - ├── Loss: 5.8492 2025-08-30 10:53:07 - pico-train - INFO - ├── Learning Rate: 1.26e-05 2025-08-30 10:53:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:53:19 - pico-train - INFO - Step 69225 -- 🔄 Training Metrics 2025-08-30 10:53:19 - pico-train - INFO - ├── Loss: 5.8203 2025-08-30 10:53:19 - pico-train - INFO - ├── Learning Rate: 1.26e-05 2025-08-30 10:53:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:53:32 - pico-train - INFO - Step 69250 -- 🔄 Training Metrics 2025-08-30 10:53:32 - pico-train - INFO - ├── Loss: 5.8163 2025-08-30 10:53:32 - pico-train - INFO - ├── Learning Rate: 1.26e-05 2025-08-30 10:53:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:53:45 - pico-train - INFO - Step 69275 -- 🔄 Training Metrics 2025-08-30 10:53:45 - pico-train - INFO - ├── Loss: 5.8982 2025-08-30 10:53:45 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 10:53:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:53:57 - pico-train - INFO - Step 69300 -- 🔄 Training Metrics 2025-08-30 10:53:57 - pico-train - INFO - ├── Loss: 5.7549 2025-08-30 10:53:57 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 10:53:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:54:10 - pico-train - INFO - Step 69325 -- 🔄 Training Metrics 2025-08-30 10:54:10 - pico-train - INFO - ├── Loss: 5.8212 2025-08-30 10:54:10 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 10:54:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:54:23 - pico-train - INFO - Step 69350 -- 🔄 Training Metrics 2025-08-30 10:54:23 - pico-train - INFO - ├── Loss: 5.8512 2025-08-30 10:54:23 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 10:54:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:54:35 - pico-train - INFO - Step 69375 -- 🔄 Training Metrics 2025-08-30 10:54:35 - pico-train - INFO - ├── Loss: 5.8506 2025-08-30 10:54:35 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 10:54:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:54:48 - pico-train - INFO - Step 69400 -- 🔄 Training Metrics 2025-08-30 10:54:48 - pico-train - INFO - ├── Loss: 5.7973 2025-08-30 10:54:48 - pico-train - INFO - ├── Learning Rate: 1.25e-05 2025-08-30 10:54:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:55:00 - pico-train - INFO - Step 69425 -- 🔄 Training Metrics 2025-08-30 10:55:00 - pico-train - INFO - ├── Loss: 5.8587 2025-08-30 10:55:00 - pico-train - INFO - ├── Learning Rate: 1.24e-05 2025-08-30 10:55:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:55:13 - pico-train - INFO - Step 69450 -- 🔄 Training Metrics 2025-08-30 10:55:13 - pico-train - INFO - ├── Loss: 5.7108 2025-08-30 10:55:13 - pico-train - INFO - ├── Learning Rate: 1.24e-05 2025-08-30 10:55:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:55:26 - pico-train - INFO - Step 69475 -- 🔄 Training Metrics 2025-08-30 10:55:26 - pico-train - INFO - ├── Loss: 5.7860 2025-08-30 10:55:26 - pico-train - INFO - ├── Learning Rate: 1.24e-05 2025-08-30 10:55:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:55:38 - pico-train - INFO - Step 69500 -- 💾 Saving Checkpoint 2025-08-30 10:57:37 - pico-train - INFO - Step 69500 -- 📊 Evaluation Results 2025-08-30 10:57:37 - pico-train - INFO - └── paloma: 4.498437349495611e+31 2025-08-30 10:57:40 - pico-train - INFO - Step 69500 -- 🔄 Training Metrics 2025-08-30 10:57:40 - pico-train - INFO - ├── Loss: 5.8497 2025-08-30 10:57:40 - pico-train - INFO - ├── Learning Rate: 1.24e-05 2025-08-30 10:57:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:57:40 - pico-train - INFO - Step 69500 -- 📈 Saving Learning Dynamics 2025-08-30 10:57:55 - pico-train - INFO - Step 69525 -- 🔄 Training Metrics 2025-08-30 10:57:55 - pico-train - INFO - ├── Loss: 5.8320 2025-08-30 10:57:55 - pico-train - INFO - ├── Learning Rate: 1.24e-05 2025-08-30 10:57:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:58:08 - pico-train - INFO - Step 69550 -- 🔄 Training Metrics 2025-08-30 10:58:08 - pico-train - INFO - ├── Loss: 5.7277 2025-08-30 10:58:08 - pico-train - INFO - ├── Learning Rate: 1.23e-05 2025-08-30 10:58:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:58:20 - pico-train - INFO - Step 69575 -- 🔄 Training Metrics 2025-08-30 10:58:20 - pico-train - INFO - ├── Loss: 5.8119 2025-08-30 10:58:20 - pico-train - INFO - ├── Learning Rate: 1.23e-05 2025-08-30 10:58:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:58:34 - pico-train - INFO - Step 69600 -- 🔄 Training Metrics 2025-08-30 10:58:34 - pico-train - INFO - ├── Loss: 5.8142 2025-08-30 10:58:34 - pico-train - INFO - ├── Learning Rate: 1.23e-05 2025-08-30 10:58:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:58:47 - pico-train - INFO - Step 69625 -- 🔄 Training Metrics 2025-08-30 10:58:47 - pico-train - INFO - ├── Loss: 5.8271 2025-08-30 10:58:47 - pico-train - INFO - ├── Learning Rate: 1.23e-05 2025-08-30 10:58:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:58:59 - pico-train - INFO - Step 69650 -- 🔄 Training Metrics 2025-08-30 10:58:59 - pico-train - INFO - ├── Loss: 5.7488 2025-08-30 10:58:59 - pico-train - INFO - ├── Learning Rate: 1.23e-05 2025-08-30 10:58:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:59:12 - pico-train - INFO - Step 69675 -- 🔄 Training Metrics 2025-08-30 10:59:12 - pico-train - INFO - ├── Loss: 5.8036 2025-08-30 10:59:12 - pico-train - INFO - ├── Learning Rate: 1.22e-05 2025-08-30 10:59:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:59:25 - pico-train - INFO - Step 69700 -- 🔄 Training Metrics 2025-08-30 10:59:25 - pico-train - INFO - ├── Loss: 5.8718 2025-08-30 10:59:25 - pico-train - INFO - ├── Learning Rate: 1.22e-05 2025-08-30 10:59:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:59:37 - pico-train - INFO - Step 69725 -- 🔄 Training Metrics 2025-08-30 10:59:37 - pico-train - INFO - ├── Loss: 5.7624 2025-08-30 10:59:37 - pico-train - INFO - ├── Learning Rate: 1.22e-05 2025-08-30 10:59:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 10:59:50 - pico-train - INFO - Step 69750 -- 🔄 Training Metrics 2025-08-30 10:59:50 - pico-train - INFO - ├── Loss: 5.7221 2025-08-30 10:59:50 - pico-train - INFO - ├── Learning Rate: 1.22e-05 2025-08-30 10:59:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:00:03 - pico-train - INFO - Step 69775 -- 🔄 Training Metrics 2025-08-30 11:00:03 - pico-train - INFO - ├── Loss: 5.8421 2025-08-30 11:00:03 - pico-train - INFO - ├── Learning Rate: 1.22e-05 2025-08-30 11:00:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:00:16 - pico-train - INFO - Step 69800 -- 🔄 Training Metrics 2025-08-30 11:00:16 - pico-train - INFO - ├── Loss: 5.8152 2025-08-30 11:00:16 - pico-train - INFO - ├── Learning Rate: 1.22e-05 2025-08-30 11:00:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:00:28 - pico-train - INFO - Step 69825 -- 🔄 Training Metrics 2025-08-30 11:00:28 - pico-train - INFO - ├── Loss: 5.8357 2025-08-30 11:00:28 - pico-train - INFO - ├── Learning Rate: 1.21e-05 2025-08-30 11:00:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:00:41 - pico-train - INFO - Step 69850 -- 🔄 Training Metrics 2025-08-30 11:00:41 - pico-train - INFO - ├── Loss: 5.8124 2025-08-30 11:00:41 - pico-train - INFO - ├── Learning Rate: 1.21e-05 2025-08-30 11:00:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:00:53 - pico-train - INFO - Step 69875 -- 🔄 Training Metrics 2025-08-30 11:00:53 - pico-train - INFO - ├── Loss: 5.8160 2025-08-30 11:00:53 - pico-train - INFO - ├── Learning Rate: 1.21e-05 2025-08-30 11:00:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:01:06 - pico-train - INFO - Step 69900 -- 🔄 Training Metrics 2025-08-30 11:01:06 - pico-train - INFO - ├── Loss: 5.7780 2025-08-30 11:01:06 - pico-train - INFO - ├── Learning Rate: 1.21e-05 2025-08-30 11:01:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:01:19 - pico-train - INFO - Step 69925 -- 🔄 Training Metrics 2025-08-30 11:01:19 - pico-train - INFO - ├── Loss: 5.7680 2025-08-30 11:01:19 - pico-train - INFO - ├── Learning Rate: 1.21e-05 2025-08-30 11:01:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:01:31 - pico-train - INFO - Step 69950 -- 🔄 Training Metrics 2025-08-30 11:01:31 - pico-train - INFO - ├── Loss: 5.7678 2025-08-30 11:01:31 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 11:01:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:01:44 - pico-train - INFO - Step 69975 -- 🔄 Training Metrics 2025-08-30 11:01:44 - pico-train - INFO - ├── Loss: 5.7694 2025-08-30 11:01:44 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 11:01:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:01:56 - pico-train - INFO - Step 70000 -- 💾 Saving Checkpoint 2025-08-30 11:03:54 - pico-train - INFO - Step 70000 -- 📊 Evaluation Results 2025-08-30 11:03:54 - pico-train - INFO - └── paloma: 4.524086501230947e+31 2025-08-30 11:03:58 - pico-train - INFO - Step 70000 -- 🔄 Training Metrics 2025-08-30 11:03:58 - pico-train - INFO - ├── Loss: 5.7691 2025-08-30 11:03:58 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 11:03:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:03:58 - pico-train - INFO - Step 70000 -- 📈 Saving Learning Dynamics 2025-08-30 11:04:13 - pico-train - INFO - Step 70025 -- 🔄 Training Metrics 2025-08-30 11:04:13 - pico-train - INFO - ├── Loss: 5.8459 2025-08-30 11:04:13 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 11:04:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:04:26 - pico-train - INFO - Step 70050 -- 🔄 Training Metrics 2025-08-30 11:04:26 - pico-train - INFO - ├── Loss: 5.7648 2025-08-30 11:04:26 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 11:04:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:04:38 - pico-train - INFO - Step 70075 -- 🔄 Training Metrics 2025-08-30 11:04:38 - pico-train - INFO - ├── Loss: 5.9146 2025-08-30 11:04:38 - pico-train - INFO - ├── Learning Rate: 1.20e-05 2025-08-30 11:04:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:04:51 - pico-train - INFO - Step 70100 -- 🔄 Training Metrics 2025-08-30 11:04:51 - pico-train - INFO - ├── Loss: 5.8547 2025-08-30 11:04:51 - pico-train - INFO - ├── Learning Rate: 1.19e-05 2025-08-30 11:04:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:05:04 - pico-train - INFO - Step 70125 -- 🔄 Training Metrics 2025-08-30 11:05:04 - pico-train - INFO - ├── Loss: 5.7720 2025-08-30 11:05:04 - pico-train - INFO - ├── Learning Rate: 1.19e-05 2025-08-30 11:05:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:05:17 - pico-train - INFO - Step 70150 -- 🔄 Training Metrics 2025-08-30 11:05:17 - pico-train - INFO - ├── Loss: 5.7761 2025-08-30 11:05:17 - pico-train - INFO - ├── Learning Rate: 1.19e-05 2025-08-30 11:05:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:05:29 - pico-train - INFO - Step 70175 -- 🔄 Training Metrics 2025-08-30 11:05:29 - pico-train - INFO - ├── Loss: 5.7980 2025-08-30 11:05:29 - pico-train - INFO - ├── Learning Rate: 1.19e-05 2025-08-30 11:05:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:05:42 - pico-train - INFO - Step 70200 -- 🔄 Training Metrics 2025-08-30 11:05:42 - pico-train - INFO - ├── Loss: 5.7824 2025-08-30 11:05:42 - pico-train - INFO - ├── Learning Rate: 1.19e-05 2025-08-30 11:05:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:05:55 - pico-train - INFO - Step 70225 -- 🔄 Training Metrics 2025-08-30 11:05:55 - pico-train - INFO - ├── Loss: 5.8025 2025-08-30 11:05:55 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 11:05:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:06:07 - pico-train - INFO - Step 70250 -- 🔄 Training Metrics 2025-08-30 11:06:07 - pico-train - INFO - ├── Loss: 5.8501 2025-08-30 11:06:07 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 11:06:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:06:20 - pico-train - INFO - Step 70275 -- 🔄 Training Metrics 2025-08-30 11:06:20 - pico-train - INFO - ├── Loss: 5.7877 2025-08-30 11:06:20 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 11:06:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:06:32 - pico-train - INFO - Step 70300 -- 🔄 Training Metrics 2025-08-30 11:06:32 - pico-train - INFO - ├── Loss: 5.7537 2025-08-30 11:06:32 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 11:06:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:06:45 - pico-train - INFO - Step 70325 -- 🔄 Training Metrics 2025-08-30 11:06:45 - pico-train - INFO - ├── Loss: 5.8530 2025-08-30 11:06:45 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 11:06:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:06:58 - pico-train - INFO - Step 70350 -- 🔄 Training Metrics 2025-08-30 11:06:58 - pico-train - INFO - ├── Loss: 5.6919 2025-08-30 11:06:58 - pico-train - INFO - ├── Learning Rate: 1.18e-05 2025-08-30 11:06:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:07:10 - pico-train - INFO - Step 70375 -- 🔄 Training Metrics 2025-08-30 11:07:10 - pico-train - INFO - ├── Loss: 5.7595 2025-08-30 11:07:10 - pico-train - INFO - ├── Learning Rate: 1.17e-05 2025-08-30 11:07:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:07:23 - pico-train - INFO - Step 70400 -- 🔄 Training Metrics 2025-08-30 11:07:23 - pico-train - INFO - ├── Loss: 5.7637 2025-08-30 11:07:23 - pico-train - INFO - ├── Learning Rate: 1.17e-05 2025-08-30 11:07:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:07:36 - pico-train - INFO - Step 70425 -- 🔄 Training Metrics 2025-08-30 11:07:36 - pico-train - INFO - ├── Loss: 5.8013 2025-08-30 11:07:36 - pico-train - INFO - ├── Learning Rate: 1.17e-05 2025-08-30 11:07:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:07:48 - pico-train - INFO - Step 70450 -- 🔄 Training Metrics 2025-08-30 11:07:48 - pico-train - INFO - ├── Loss: 5.8487 2025-08-30 11:07:48 - pico-train - INFO - ├── Learning Rate: 1.17e-05 2025-08-30 11:07:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:08:01 - pico-train - INFO - Step 70475 -- 🔄 Training Metrics 2025-08-30 11:08:01 - pico-train - INFO - ├── Loss: 5.7931 2025-08-30 11:08:01 - pico-train - INFO - ├── Learning Rate: 1.17e-05 2025-08-30 11:08:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:08:13 - pico-train - INFO - Step 70500 -- 💾 Saving Checkpoint 2025-08-30 11:10:14 - pico-train - INFO - Step 70500 -- 📊 Evaluation Results 2025-08-30 11:10:14 - pico-train - INFO - └── paloma: 5.389143520871013e+31 2025-08-30 11:10:18 - pico-train - INFO - Step 70500 -- 🔄 Training Metrics 2025-08-30 11:10:18 - pico-train - INFO - ├── Loss: 5.8130 2025-08-30 11:10:18 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 11:10:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:10:18 - pico-train - INFO - Step 70500 -- 📈 Saving Learning Dynamics 2025-08-30 11:10:34 - pico-train - INFO - Step 70525 -- 🔄 Training Metrics 2025-08-30 11:10:34 - pico-train - INFO - ├── Loss: 5.8003 2025-08-30 11:10:34 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 11:10:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:10:46 - pico-train - INFO - Step 70550 -- 🔄 Training Metrics 2025-08-30 11:10:46 - pico-train - INFO - ├── Loss: 5.7638 2025-08-30 11:10:46 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 11:10:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:10:59 - pico-train - INFO - Step 70575 -- 🔄 Training Metrics 2025-08-30 11:10:59 - pico-train - INFO - ├── Loss: 5.8081 2025-08-30 11:10:59 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 11:10:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:11:12 - pico-train - INFO - Step 70600 -- 🔄 Training Metrics 2025-08-30 11:11:12 - pico-train - INFO - ├── Loss: 5.8433 2025-08-30 11:11:12 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 11:11:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:11:24 - pico-train - INFO - Step 70625 -- 🔄 Training Metrics 2025-08-30 11:11:24 - pico-train - INFO - ├── Loss: 5.7845 2025-08-30 11:11:24 - pico-train - INFO - ├── Learning Rate: 1.16e-05 2025-08-30 11:11:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:11:37 - pico-train - INFO - Step 70650 -- 🔄 Training Metrics 2025-08-30 11:11:37 - pico-train - INFO - ├── Loss: 5.7766 2025-08-30 11:11:37 - pico-train - INFO - ├── Learning Rate: 1.15e-05 2025-08-30 11:11:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:11:50 - pico-train - INFO - Step 70675 -- 🔄 Training Metrics 2025-08-30 11:11:50 - pico-train - INFO - ├── Loss: 5.8443 2025-08-30 11:11:50 - pico-train - INFO - ├── Learning Rate: 1.15e-05 2025-08-30 11:11:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:12:02 - pico-train - INFO - Step 70700 -- 🔄 Training Metrics 2025-08-30 11:12:02 - pico-train - INFO - ├── Loss: 5.8557 2025-08-30 11:12:02 - pico-train - INFO - ├── Learning Rate: 1.15e-05 2025-08-30 11:12:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:12:16 - pico-train - INFO - Step 70725 -- 🔄 Training Metrics 2025-08-30 11:12:16 - pico-train - INFO - ├── Loss: 5.7753 2025-08-30 11:12:16 - pico-train - INFO - ├── Learning Rate: 1.15e-05 2025-08-30 11:12:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:12:28 - pico-train - INFO - Step 70750 -- 🔄 Training Metrics 2025-08-30 11:12:28 - pico-train - INFO - ├── Loss: 5.7036 2025-08-30 11:12:28 - pico-train - INFO - ├── Learning Rate: 1.15e-05 2025-08-30 11:12:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:12:41 - pico-train - INFO - Step 70775 -- 🔄 Training Metrics 2025-08-30 11:12:41 - pico-train - INFO - ├── Loss: 5.8355 2025-08-30 11:12:41 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 11:12:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:12:54 - pico-train - INFO - Step 70800 -- 🔄 Training Metrics 2025-08-30 11:12:54 - pico-train - INFO - ├── Loss: 5.7925 2025-08-30 11:12:54 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 11:12:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:13:06 - pico-train - INFO - Step 70825 -- 🔄 Training Metrics 2025-08-30 11:13:06 - pico-train - INFO - ├── Loss: 5.7594 2025-08-30 11:13:06 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 11:13:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:13:19 - pico-train - INFO - Step 70850 -- 🔄 Training Metrics 2025-08-30 11:13:19 - pico-train - INFO - ├── Loss: 5.7899 2025-08-30 11:13:19 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 11:13:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:13:31 - pico-train - INFO - Step 70875 -- 🔄 Training Metrics 2025-08-30 11:13:31 - pico-train - INFO - ├── Loss: 5.8210 2025-08-30 11:13:31 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 11:13:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:13:44 - pico-train - INFO - Step 70900 -- 🔄 Training Metrics 2025-08-30 11:13:44 - pico-train - INFO - ├── Loss: 5.7877 2025-08-30 11:13:44 - pico-train - INFO - ├── Learning Rate: 1.14e-05 2025-08-30 11:13:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:13:57 - pico-train - INFO - Step 70925 -- 🔄 Training Metrics 2025-08-30 11:13:57 - pico-train - INFO - ├── Loss: 5.8528 2025-08-30 11:13:57 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 11:13:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:14:09 - pico-train - INFO - Step 70950 -- 🔄 Training Metrics 2025-08-30 11:14:09 - pico-train - INFO - ├── Loss: 5.7071 2025-08-30 11:14:09 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 11:14:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:14:22 - pico-train - INFO - Step 70975 -- 🔄 Training Metrics 2025-08-30 11:14:22 - pico-train - INFO - ├── Loss: 5.7500 2025-08-30 11:14:22 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 11:14:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:14:34 - pico-train - INFO - Step 71000 -- 💾 Saving Checkpoint 2025-08-30 11:16:34 - pico-train - INFO - Step 71000 -- 📊 Evaluation Results 2025-08-30 11:16:34 - pico-train - INFO - └── paloma: 6.106796255447029e+31 2025-08-30 11:16:38 - pico-train - INFO - Step 71000 -- 🔄 Training Metrics 2025-08-30 11:16:38 - pico-train - INFO - ├── Loss: 5.8512 2025-08-30 11:16:38 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 11:16:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:16:38 - pico-train - INFO - Step 71000 -- 📈 Saving Learning Dynamics 2025-08-30 11:16:53 - pico-train - INFO - Step 71025 -- 🔄 Training Metrics 2025-08-30 11:16:53 - pico-train - INFO - ├── Loss: 5.7849 2025-08-30 11:16:53 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 11:16:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:17:06 - pico-train - INFO - Step 71050 -- 🔄 Training Metrics 2025-08-30 11:17:06 - pico-train - INFO - ├── Loss: 5.7794 2025-08-30 11:17:06 - pico-train - INFO - ├── Learning Rate: 1.13e-05 2025-08-30 11:17:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:17:19 - pico-train - INFO - Step 71075 -- 🔄 Training Metrics 2025-08-30 11:17:19 - pico-train - INFO - ├── Loss: 5.8584 2025-08-30 11:17:19 - pico-train - INFO - ├── Learning Rate: 1.12e-05 2025-08-30 11:17:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:17:32 - pico-train - INFO - Step 71100 -- 🔄 Training Metrics 2025-08-30 11:17:32 - pico-train - INFO - ├── Loss: 5.7866 2025-08-30 11:17:32 - pico-train - INFO - ├── Learning Rate: 1.12e-05 2025-08-30 11:17:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:17:44 - pico-train - INFO - Step 71125 -- 🔄 Training Metrics 2025-08-30 11:17:44 - pico-train - INFO - ├── Loss: 5.7744 2025-08-30 11:17:44 - pico-train - INFO - ├── Learning Rate: 1.12e-05 2025-08-30 11:17:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:17:57 - pico-train - INFO - Step 71150 -- 🔄 Training Metrics 2025-08-30 11:17:57 - pico-train - INFO - ├── Loss: 5.8179 2025-08-30 11:17:57 - pico-train - INFO - ├── Learning Rate: 1.12e-05 2025-08-30 11:17:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:18:09 - pico-train - INFO - Step 71175 -- 🔄 Training Metrics 2025-08-30 11:18:09 - pico-train - INFO - ├── Loss: 5.8349 2025-08-30 11:18:09 - pico-train - INFO - ├── Learning Rate: 1.12e-05 2025-08-30 11:18:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:18:22 - pico-train - INFO - Step 71200 -- 🔄 Training Metrics 2025-08-30 11:18:22 - pico-train - INFO - ├── Loss: 5.7446 2025-08-30 11:18:22 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 11:18:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:18:35 - pico-train - INFO - Step 71225 -- 🔄 Training Metrics 2025-08-30 11:18:35 - pico-train - INFO - ├── Loss: 5.8961 2025-08-30 11:18:35 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 11:18:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:18:48 - pico-train - INFO - Step 71250 -- 🔄 Training Metrics 2025-08-30 11:18:48 - pico-train - INFO - ├── Loss: 5.7719 2025-08-30 11:18:48 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 11:18:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:19:01 - pico-train - INFO - Step 71275 -- 🔄 Training Metrics 2025-08-30 11:19:01 - pico-train - INFO - ├── Loss: 5.7171 2025-08-30 11:19:01 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 11:19:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:19:13 - pico-train - INFO - Step 71300 -- 🔄 Training Metrics 2025-08-30 11:19:13 - pico-train - INFO - ├── Loss: 5.7381 2025-08-30 11:19:13 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 11:19:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:19:26 - pico-train - INFO - Step 71325 -- 🔄 Training Metrics 2025-08-30 11:19:26 - pico-train - INFO - ├── Loss: 5.7906 2025-08-30 11:19:26 - pico-train - INFO - ├── Learning Rate: 1.11e-05 2025-08-30 11:19:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:19:39 - pico-train - INFO - Step 71350 -- 🔄 Training Metrics 2025-08-30 11:19:39 - pico-train - INFO - ├── Loss: 5.9247 2025-08-30 11:19:39 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 11:19:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:19:51 - pico-train - INFO - Step 71375 -- 🔄 Training Metrics 2025-08-30 11:19:51 - pico-train - INFO - ├── Loss: 5.8136 2025-08-30 11:19:51 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 11:19:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:20:04 - pico-train - INFO - Step 71400 -- 🔄 Training Metrics 2025-08-30 11:20:04 - pico-train - INFO - ├── Loss: 5.7196 2025-08-30 11:20:04 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 11:20:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:20:16 - pico-train - INFO - Step 71425 -- 🔄 Training Metrics 2025-08-30 11:20:16 - pico-train - INFO - ├── Loss: 5.7807 2025-08-30 11:20:16 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 11:20:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:20:29 - pico-train - INFO - Step 71450 -- 🔄 Training Metrics 2025-08-30 11:20:29 - pico-train - INFO - ├── Loss: 5.8609 2025-08-30 11:20:29 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 11:20:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:20:42 - pico-train - INFO - Step 71475 -- 🔄 Training Metrics 2025-08-30 11:20:42 - pico-train - INFO - ├── Loss: 5.7683 2025-08-30 11:20:42 - pico-train - INFO - ├── Learning Rate: 1.10e-05 2025-08-30 11:20:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:20:54 - pico-train - INFO - Step 71500 -- 💾 Saving Checkpoint 2025-08-30 11:22:50 - pico-train - INFO - Step 71500 -- 📊 Evaluation Results 2025-08-30 11:22:50 - pico-train - INFO - └── paloma: 6.282048257805562e+31 2025-08-30 11:22:53 - pico-train - INFO - Step 71500 -- 🔄 Training Metrics 2025-08-30 11:22:53 - pico-train - INFO - ├── Loss: 5.8034 2025-08-30 11:22:53 - pico-train - INFO - ├── Learning Rate: 1.09e-05 2025-08-30 11:22:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:22:53 - pico-train - INFO - Step 71500 -- 📈 Saving Learning Dynamics 2025-08-30 11:23:08 - pico-train - INFO - Step 71525 -- 🔄 Training Metrics 2025-08-30 11:23:08 - pico-train - INFO - ├── Loss: 5.7923 2025-08-30 11:23:08 - pico-train - INFO - ├── Learning Rate: 1.09e-05 2025-08-30 11:23:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:23:21 - pico-train - INFO - Step 71550 -- 🔄 Training Metrics 2025-08-30 11:23:21 - pico-train - INFO - ├── Loss: 5.8365 2025-08-30 11:23:21 - pico-train - INFO - ├── Learning Rate: 1.09e-05 2025-08-30 11:23:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:23:34 - pico-train - INFO - Step 71575 -- 🔄 Training Metrics 2025-08-30 11:23:34 - pico-train - INFO - ├── Loss: 5.7924 2025-08-30 11:23:34 - pico-train - INFO - ├── Learning Rate: 1.09e-05 2025-08-30 11:23:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:23:46 - pico-train - INFO - Step 71600 -- 🔄 Training Metrics 2025-08-30 11:23:46 - pico-train - INFO - ├── Loss: 5.8132 2025-08-30 11:23:46 - pico-train - INFO - ├── Learning Rate: 1.09e-05 2025-08-30 11:23:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:23:59 - pico-train - INFO - Step 71625 -- 🔄 Training Metrics 2025-08-30 11:23:59 - pico-train - INFO - ├── Loss: 5.8109 2025-08-30 11:23:59 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 11:23:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:24:11 - pico-train - INFO - Step 71650 -- 🔄 Training Metrics 2025-08-30 11:24:11 - pico-train - INFO - ├── Loss: 5.8357 2025-08-30 11:24:11 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 11:24:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:24:24 - pico-train - INFO - Step 71675 -- 🔄 Training Metrics 2025-08-30 11:24:24 - pico-train - INFO - ├── Loss: 5.8117 2025-08-30 11:24:24 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 11:24:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:24:37 - pico-train - INFO - Step 71700 -- 🔄 Training Metrics 2025-08-30 11:24:37 - pico-train - INFO - ├── Loss: 5.6849 2025-08-30 11:24:37 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 11:24:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:24:50 - pico-train - INFO - Step 71725 -- 🔄 Training Metrics 2025-08-30 11:24:50 - pico-train - INFO - ├── Loss: 5.8316 2025-08-30 11:24:50 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 11:24:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:25:03 - pico-train - INFO - Step 71750 -- 🔄 Training Metrics 2025-08-30 11:25:03 - pico-train - INFO - ├── Loss: 5.8852 2025-08-30 11:25:03 - pico-train - INFO - ├── Learning Rate: 1.08e-05 2025-08-30 11:25:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:25:15 - pico-train - INFO - Step 71775 -- 🔄 Training Metrics 2025-08-30 11:25:15 - pico-train - INFO - ├── Loss: 5.7825 2025-08-30 11:25:15 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 11:25:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:25:28 - pico-train - INFO - Step 71800 -- 🔄 Training Metrics 2025-08-30 11:25:28 - pico-train - INFO - ├── Loss: 5.8405 2025-08-30 11:25:28 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 11:25:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:25:41 - pico-train - INFO - Step 71825 -- 🔄 Training Metrics 2025-08-30 11:25:41 - pico-train - INFO - ├── Loss: 5.7973 2025-08-30 11:25:41 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 11:25:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:25:53 - pico-train - INFO - Step 71850 -- 🔄 Training Metrics 2025-08-30 11:25:53 - pico-train - INFO - ├── Loss: 5.8016 2025-08-30 11:25:53 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 11:25:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:26:06 - pico-train - INFO - Step 71875 -- 🔄 Training Metrics 2025-08-30 11:26:06 - pico-train - INFO - ├── Loss: 5.6851 2025-08-30 11:26:06 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 11:26:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:26:19 - pico-train - INFO - Step 71900 -- 🔄 Training Metrics 2025-08-30 11:26:19 - pico-train - INFO - ├── Loss: 5.7568 2025-08-30 11:26:19 - pico-train - INFO - ├── Learning Rate: 1.07e-05 2025-08-30 11:26:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:26:31 - pico-train - INFO - Step 71925 -- 🔄 Training Metrics 2025-08-30 11:26:31 - pico-train - INFO - ├── Loss: 5.7542 2025-08-30 11:26:31 - pico-train - INFO - ├── Learning Rate: 1.06e-05 2025-08-30 11:26:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:26:44 - pico-train - INFO - Step 71950 -- 🔄 Training Metrics 2025-08-30 11:26:44 - pico-train - INFO - ├── Loss: 5.6807 2025-08-30 11:26:44 - pico-train - INFO - ├── Learning Rate: 1.06e-05 2025-08-30 11:26:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:26:56 - pico-train - INFO - Step 71975 -- 🔄 Training Metrics 2025-08-30 11:26:56 - pico-train - INFO - ├── Loss: 5.7309 2025-08-30 11:26:56 - pico-train - INFO - ├── Learning Rate: 1.06e-05 2025-08-30 11:26:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:27:09 - pico-train - INFO - Step 72000 -- 💾 Saving Checkpoint 2025-08-30 11:29:13 - pico-train - INFO - Step 72000 -- 📊 Evaluation Results 2025-08-30 11:29:13 - pico-train - INFO - └── paloma: 6.442465619967253e+31 2025-08-30 11:29:15 - pico-train - INFO - Step 72000 -- 🔄 Training Metrics 2025-08-30 11:29:15 - pico-train - INFO - ├── Loss: 5.7989 2025-08-30 11:29:15 - pico-train - INFO - ├── Learning Rate: 1.06e-05 2025-08-30 11:29:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:29:15 - pico-train - INFO - Step 72000 -- 📈 Saving Learning Dynamics 2025-08-30 11:29:30 - pico-train - INFO - Step 72025 -- 🔄 Training Metrics 2025-08-30 11:29:30 - pico-train - INFO - ├── Loss: 5.7701 2025-08-30 11:29:30 - pico-train - INFO - ├── Learning Rate: 1.06e-05 2025-08-30 11:29:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:29:42 - pico-train - INFO - Step 72050 -- 🔄 Training Metrics 2025-08-30 11:29:42 - pico-train - INFO - ├── Loss: 5.7553 2025-08-30 11:29:42 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 11:29:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:29:55 - pico-train - INFO - Step 72075 -- 🔄 Training Metrics 2025-08-30 11:29:55 - pico-train - INFO - ├── Loss: 5.6550 2025-08-30 11:29:55 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 11:29:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:30:08 - pico-train - INFO - Step 72100 -- 🔄 Training Metrics 2025-08-30 11:30:08 - pico-train - INFO - ├── Loss: 5.7120 2025-08-30 11:30:08 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 11:30:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:30:20 - pico-train - INFO - Step 72125 -- 🔄 Training Metrics 2025-08-30 11:30:20 - pico-train - INFO - ├── Loss: 5.8457 2025-08-30 11:30:20 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 11:30:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:30:33 - pico-train - INFO - Step 72150 -- 🔄 Training Metrics 2025-08-30 11:30:33 - pico-train - INFO - ├── Loss: 5.7710 2025-08-30 11:30:33 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 11:30:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:30:46 - pico-train - INFO - Step 72175 -- 🔄 Training Metrics 2025-08-30 11:30:46 - pico-train - INFO - ├── Loss: 5.8311 2025-08-30 11:30:46 - pico-train - INFO - ├── Learning Rate: 1.05e-05 2025-08-30 11:30:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:30:58 - pico-train - INFO - Step 72200 -- 🔄 Training Metrics 2025-08-30 11:30:58 - pico-train - INFO - ├── Loss: 5.8419 2025-08-30 11:30:58 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 11:30:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:31:12 - pico-train - INFO - Step 72225 -- 🔄 Training Metrics 2025-08-30 11:31:12 - pico-train - INFO - ├── Loss: 5.7954 2025-08-30 11:31:12 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 11:31:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:31:24 - pico-train - INFO - Step 72250 -- 🔄 Training Metrics 2025-08-30 11:31:24 - pico-train - INFO - ├── Loss: 5.7894 2025-08-30 11:31:24 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 11:31:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:31:37 - pico-train - INFO - Step 72275 -- 🔄 Training Metrics 2025-08-30 11:31:37 - pico-train - INFO - ├── Loss: 5.7746 2025-08-30 11:31:37 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 11:31:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:31:50 - pico-train - INFO - Step 72300 -- 🔄 Training Metrics 2025-08-30 11:31:50 - pico-train - INFO - ├── Loss: 5.9178 2025-08-30 11:31:50 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 11:31:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:32:02 - pico-train - INFO - Step 72325 -- 🔄 Training Metrics 2025-08-30 11:32:02 - pico-train - INFO - ├── Loss: 5.8326 2025-08-30 11:32:02 - pico-train - INFO - ├── Learning Rate: 1.04e-05 2025-08-30 11:32:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:32:15 - pico-train - INFO - Step 72350 -- 🔄 Training Metrics 2025-08-30 11:32:15 - pico-train - INFO - ├── Loss: 5.8099 2025-08-30 11:32:15 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 11:32:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:32:27 - pico-train - INFO - Step 72375 -- 🔄 Training Metrics 2025-08-30 11:32:27 - pico-train - INFO - ├── Loss: 5.7497 2025-08-30 11:32:27 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 11:32:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:32:40 - pico-train - INFO - Step 72400 -- 🔄 Training Metrics 2025-08-30 11:32:40 - pico-train - INFO - ├── Loss: 5.7700 2025-08-30 11:32:40 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 11:32:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:32:53 - pico-train - INFO - Step 72425 -- 🔄 Training Metrics 2025-08-30 11:32:53 - pico-train - INFO - ├── Loss: 5.8295 2025-08-30 11:32:53 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 11:32:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:33:06 - pico-train - INFO - Step 72450 -- 🔄 Training Metrics 2025-08-30 11:33:06 - pico-train - INFO - ├── Loss: 5.7635 2025-08-30 11:33:06 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 11:33:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:33:18 - pico-train - INFO - Step 72475 -- 🔄 Training Metrics 2025-08-30 11:33:18 - pico-train - INFO - ├── Loss: 5.7644 2025-08-30 11:33:18 - pico-train - INFO - ├── Learning Rate: 1.03e-05 2025-08-30 11:33:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:33:31 - pico-train - INFO - Step 72500 -- 💾 Saving Checkpoint 2025-08-30 11:35:40 - pico-train - INFO - Step 72500 -- 📊 Evaluation Results 2025-08-30 11:35:40 - pico-train - INFO - └── paloma: 7.433151564209409e+31 2025-08-30 11:35:42 - pico-train - INFO - Step 72500 -- 🔄 Training Metrics 2025-08-30 11:35:42 - pico-train - INFO - ├── Loss: 5.7986 2025-08-30 11:35:42 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 11:35:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:35:42 - pico-train - INFO - Step 72500 -- 📈 Saving Learning Dynamics 2025-08-30 11:35:57 - pico-train - INFO - Step 72525 -- 🔄 Training Metrics 2025-08-30 11:35:57 - pico-train - INFO - ├── Loss: 5.8320 2025-08-30 11:35:57 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 11:35:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:36:10 - pico-train - INFO - Step 72550 -- 🔄 Training Metrics 2025-08-30 11:36:10 - pico-train - INFO - ├── Loss: 5.7602 2025-08-30 11:36:10 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 11:36:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:36:22 - pico-train - INFO - Step 72575 -- 🔄 Training Metrics 2025-08-30 11:36:22 - pico-train - INFO - ├── Loss: 5.7627 2025-08-30 11:36:22 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 11:36:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:36:35 - pico-train - INFO - Step 72600 -- 🔄 Training Metrics 2025-08-30 11:36:35 - pico-train - INFO - ├── Loss: 5.7779 2025-08-30 11:36:35 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 11:36:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:36:48 - pico-train - INFO - Step 72625 -- 🔄 Training Metrics 2025-08-30 11:36:48 - pico-train - INFO - ├── Loss: 5.8076 2025-08-30 11:36:48 - pico-train - INFO - ├── Learning Rate: 1.02e-05 2025-08-30 11:36:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:37:00 - pico-train - INFO - Step 72650 -- 🔄 Training Metrics 2025-08-30 11:37:00 - pico-train - INFO - ├── Loss: 5.8050 2025-08-30 11:37:00 - pico-train - INFO - ├── Learning Rate: 1.01e-05 2025-08-30 11:37:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:37:13 - pico-train - INFO - Step 72675 -- 🔄 Training Metrics 2025-08-30 11:37:13 - pico-train - INFO - ├── Loss: 5.8470 2025-08-30 11:37:13 - pico-train - INFO - ├── Learning Rate: 1.01e-05 2025-08-30 11:37:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:37:26 - pico-train - INFO - Step 72700 -- 🔄 Training Metrics 2025-08-30 11:37:26 - pico-train - INFO - ├── Loss: 5.7896 2025-08-30 11:37:26 - pico-train - INFO - ├── Learning Rate: 1.01e-05 2025-08-30 11:37:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:37:39 - pico-train - INFO - Step 72725 -- 🔄 Training Metrics 2025-08-30 11:37:39 - pico-train - INFO - ├── Loss: 5.7821 2025-08-30 11:37:39 - pico-train - INFO - ├── Learning Rate: 1.01e-05 2025-08-30 11:37:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:37:52 - pico-train - INFO - Step 72750 -- 🔄 Training Metrics 2025-08-30 11:37:52 - pico-train - INFO - ├── Loss: 5.7733 2025-08-30 11:37:52 - pico-train - INFO - ├── Learning Rate: 1.01e-05 2025-08-30 11:37:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:38:04 - pico-train - INFO - Step 72775 -- 🔄 Training Metrics 2025-08-30 11:38:04 - pico-train - INFO - ├── Loss: 5.8627 2025-08-30 11:38:04 - pico-train - INFO - ├── Learning Rate: 1.00e-05 2025-08-30 11:38:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:38:17 - pico-train - INFO - Step 72800 -- 🔄 Training Metrics 2025-08-30 11:38:17 - pico-train - INFO - ├── Loss: 5.8219 2025-08-30 11:38:17 - pico-train - INFO - ├── Learning Rate: 1.00e-05 2025-08-30 11:38:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:38:30 - pico-train - INFO - Step 72825 -- 🔄 Training Metrics 2025-08-30 11:38:30 - pico-train - INFO - ├── Loss: 5.8448 2025-08-30 11:38:30 - pico-train - INFO - ├── Learning Rate: 1.00e-05 2025-08-30 11:38:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:38:42 - pico-train - INFO - Step 72850 -- 🔄 Training Metrics 2025-08-30 11:38:42 - pico-train - INFO - ├── Loss: 5.7459 2025-08-30 11:38:42 - pico-train - INFO - ├── Learning Rate: 1.00e-05 2025-08-30 11:38:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:38:55 - pico-train - INFO - Step 72875 -- 🔄 Training Metrics 2025-08-30 11:38:55 - pico-train - INFO - ├── Loss: 5.8400 2025-08-30 11:38:55 - pico-train - INFO - ├── Learning Rate: 9.98e-06 2025-08-30 11:38:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:39:07 - pico-train - INFO - Step 72900 -- 🔄 Training Metrics 2025-08-30 11:39:07 - pico-train - INFO - ├── Loss: 5.7810 2025-08-30 11:39:07 - pico-train - INFO - ├── Learning Rate: 9.96e-06 2025-08-30 11:39:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:39:20 - pico-train - INFO - Step 72925 -- 🔄 Training Metrics 2025-08-30 11:39:20 - pico-train - INFO - ├── Loss: 5.8001 2025-08-30 11:39:20 - pico-train - INFO - ├── Learning Rate: 9.95e-06 2025-08-30 11:39:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:39:33 - pico-train - INFO - Step 72950 -- 🔄 Training Metrics 2025-08-30 11:39:33 - pico-train - INFO - ├── Loss: 5.8616 2025-08-30 11:39:33 - pico-train - INFO - ├── Learning Rate: 9.93e-06 2025-08-30 11:39:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:39:45 - pico-train - INFO - Step 72975 -- 🔄 Training Metrics 2025-08-30 11:39:45 - pico-train - INFO - ├── Loss: 5.8884 2025-08-30 11:39:45 - pico-train - INFO - ├── Learning Rate: 9.91e-06 2025-08-30 11:39:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:39:57 - pico-train - INFO - Step 73000 -- 💾 Saving Checkpoint 2025-08-30 11:41:54 - pico-train - INFO - Step 73000 -- 📊 Evaluation Results 2025-08-30 11:41:54 - pico-train - INFO - └── paloma: 8.156828131388013e+31 2025-08-30 11:41:56 - pico-train - INFO - Step 73000 -- 🔄 Training Metrics 2025-08-30 11:41:56 - pico-train - INFO - ├── Loss: 5.7843 2025-08-30 11:41:56 - pico-train - INFO - ├── Learning Rate: 9.89e-06 2025-08-30 11:41:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:41:56 - pico-train - INFO - Step 73000 -- 📈 Saving Learning Dynamics 2025-08-30 11:42:11 - pico-train - INFO - Step 73025 -- 🔄 Training Metrics 2025-08-30 11:42:11 - pico-train - INFO - ├── Loss: 5.7129 2025-08-30 11:42:11 - pico-train - INFO - ├── Learning Rate: 9.88e-06 2025-08-30 11:42:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:42:24 - pico-train - INFO - Step 73050 -- 🔄 Training Metrics 2025-08-30 11:42:24 - pico-train - INFO - ├── Loss: 5.8605 2025-08-30 11:42:24 - pico-train - INFO - ├── Learning Rate: 9.86e-06 2025-08-30 11:42:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:42:37 - pico-train - INFO - Step 73075 -- 🔄 Training Metrics 2025-08-30 11:42:37 - pico-train - INFO - ├── Loss: 5.8538 2025-08-30 11:42:37 - pico-train - INFO - ├── Learning Rate: 9.84e-06 2025-08-30 11:42:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:42:49 - pico-train - INFO - Step 73100 -- 🔄 Training Metrics 2025-08-30 11:42:49 - pico-train - INFO - ├── Loss: 5.8061 2025-08-30 11:42:49 - pico-train - INFO - ├── Learning Rate: 9.83e-06 2025-08-30 11:42:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:43:02 - pico-train - INFO - Step 73125 -- 🔄 Training Metrics 2025-08-30 11:43:02 - pico-train - INFO - ├── Loss: 5.6467 2025-08-30 11:43:02 - pico-train - INFO - ├── Learning Rate: 9.81e-06 2025-08-30 11:43:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:43:14 - pico-train - INFO - Step 73150 -- 🔄 Training Metrics 2025-08-30 11:43:14 - pico-train - INFO - ├── Loss: 5.7946 2025-08-30 11:43:14 - pico-train - INFO - ├── Learning Rate: 9.79e-06 2025-08-30 11:43:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:43:27 - pico-train - INFO - Step 73175 -- 🔄 Training Metrics 2025-08-30 11:43:27 - pico-train - INFO - ├── Loss: 5.7591 2025-08-30 11:43:27 - pico-train - INFO - ├── Learning Rate: 9.78e-06 2025-08-30 11:43:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:43:40 - pico-train - INFO - Step 73200 -- 🔄 Training Metrics 2025-08-30 11:43:40 - pico-train - INFO - ├── Loss: 5.7435 2025-08-30 11:43:40 - pico-train - INFO - ├── Learning Rate: 9.76e-06 2025-08-30 11:43:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:43:55 - pico-train - INFO - Step 73225 -- 🔄 Training Metrics 2025-08-30 11:43:55 - pico-train - INFO - ├── Loss: 5.7541 2025-08-30 11:43:55 - pico-train - INFO - ├── Learning Rate: 9.74e-06 2025-08-30 11:43:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:44:08 - pico-train - INFO - Step 73250 -- 🔄 Training Metrics 2025-08-30 11:44:08 - pico-train - INFO - ├── Loss: 5.8107 2025-08-30 11:44:08 - pico-train - INFO - ├── Learning Rate: 9.72e-06 2025-08-30 11:44:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:44:20 - pico-train - INFO - Step 73275 -- 🔄 Training Metrics 2025-08-30 11:44:20 - pico-train - INFO - ├── Loss: 5.7636 2025-08-30 11:44:20 - pico-train - INFO - ├── Learning Rate: 9.71e-06 2025-08-30 11:44:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:44:33 - pico-train - INFO - Step 73300 -- 🔄 Training Metrics 2025-08-30 11:44:33 - pico-train - INFO - ├── Loss: 5.7746 2025-08-30 11:44:33 - pico-train - INFO - ├── Learning Rate: 9.69e-06 2025-08-30 11:44:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:44:46 - pico-train - INFO - Step 73325 -- 🔄 Training Metrics 2025-08-30 11:44:46 - pico-train - INFO - ├── Loss: 5.8366 2025-08-30 11:44:46 - pico-train - INFO - ├── Learning Rate: 9.67e-06 2025-08-30 11:44:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:44:58 - pico-train - INFO - Step 73350 -- 🔄 Training Metrics 2025-08-30 11:44:58 - pico-train - INFO - ├── Loss: 5.8148 2025-08-30 11:44:58 - pico-train - INFO - ├── Learning Rate: 9.66e-06 2025-08-30 11:44:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:45:11 - pico-train - INFO - Step 73375 -- 🔄 Training Metrics 2025-08-30 11:45:11 - pico-train - INFO - ├── Loss: 5.8216 2025-08-30 11:45:11 - pico-train - INFO - ├── Learning Rate: 9.64e-06 2025-08-30 11:45:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:45:24 - pico-train - INFO - Step 73400 -- 🔄 Training Metrics 2025-08-30 11:45:24 - pico-train - INFO - ├── Loss: 5.8380 2025-08-30 11:45:24 - pico-train - INFO - ├── Learning Rate: 9.62e-06 2025-08-30 11:45:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:45:36 - pico-train - INFO - Step 73425 -- 🔄 Training Metrics 2025-08-30 11:45:36 - pico-train - INFO - ├── Loss: 5.7821 2025-08-30 11:45:36 - pico-train - INFO - ├── Learning Rate: 9.61e-06 2025-08-30 11:45:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:45:49 - pico-train - INFO - Step 73450 -- 🔄 Training Metrics 2025-08-30 11:45:49 - pico-train - INFO - ├── Loss: 5.7886 2025-08-30 11:45:49 - pico-train - INFO - ├── Learning Rate: 9.59e-06 2025-08-30 11:45:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:46:01 - pico-train - INFO - Step 73475 -- 🔄 Training Metrics 2025-08-30 11:46:01 - pico-train - INFO - ├── Loss: 5.7748 2025-08-30 11:46:01 - pico-train - INFO - ├── Learning Rate: 9.57e-06 2025-08-30 11:46:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:46:13 - pico-train - INFO - Step 73500 -- 💾 Saving Checkpoint 2025-08-30 11:48:25 - pico-train - INFO - Step 73500 -- 📊 Evaluation Results 2025-08-30 11:48:25 - pico-train - INFO - └── paloma: 9.704589730778985e+31 2025-08-30 11:48:27 - pico-train - INFO - Step 73500 -- 🔄 Training Metrics 2025-08-30 11:48:27 - pico-train - INFO - ├── Loss: 5.6891 2025-08-30 11:48:27 - pico-train - INFO - ├── Learning Rate: 9.56e-06 2025-08-30 11:48:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:48:27 - pico-train - INFO - Step 73500 -- 📈 Saving Learning Dynamics 2025-08-30 11:48:43 - pico-train - INFO - Step 73525 -- 🔄 Training Metrics 2025-08-30 11:48:43 - pico-train - INFO - ├── Loss: 5.7314 2025-08-30 11:48:43 - pico-train - INFO - ├── Learning Rate: 9.54e-06 2025-08-30 11:48:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:48:56 - pico-train - INFO - Step 73550 -- 🔄 Training Metrics 2025-08-30 11:48:56 - pico-train - INFO - ├── Loss: 5.6260 2025-08-30 11:48:56 - pico-train - INFO - ├── Learning Rate: 9.52e-06 2025-08-30 11:48:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:49:08 - pico-train - INFO - Step 73575 -- 🔄 Training Metrics 2025-08-30 11:49:08 - pico-train - INFO - ├── Loss: 5.7770 2025-08-30 11:49:08 - pico-train - INFO - ├── Learning Rate: 9.51e-06 2025-08-30 11:49:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:49:21 - pico-train - INFO - Step 73600 -- 🔄 Training Metrics 2025-08-30 11:49:21 - pico-train - INFO - ├── Loss: 5.7806 2025-08-30 11:49:21 - pico-train - INFO - ├── Learning Rate: 9.49e-06 2025-08-30 11:49:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:49:34 - pico-train - INFO - Step 73625 -- 🔄 Training Metrics 2025-08-30 11:49:34 - pico-train - INFO - ├── Loss: 5.7744 2025-08-30 11:49:34 - pico-train - INFO - ├── Learning Rate: 9.47e-06 2025-08-30 11:49:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:49:46 - pico-train - INFO - Step 73650 -- 🔄 Training Metrics 2025-08-30 11:49:46 - pico-train - INFO - ├── Loss: 5.7460 2025-08-30 11:49:46 - pico-train - INFO - ├── Learning Rate: 9.46e-06 2025-08-30 11:49:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:49:59 - pico-train - INFO - Step 73675 -- 🔄 Training Metrics 2025-08-30 11:49:59 - pico-train - INFO - ├── Loss: 5.8272 2025-08-30 11:49:59 - pico-train - INFO - ├── Learning Rate: 9.44e-06 2025-08-30 11:49:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:50:12 - pico-train - INFO - Step 73700 -- 🔄 Training Metrics 2025-08-30 11:50:12 - pico-train - INFO - ├── Loss: 5.7866 2025-08-30 11:50:12 - pico-train - INFO - ├── Learning Rate: 9.42e-06 2025-08-30 11:50:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:50:25 - pico-train - INFO - Step 73725 -- 🔄 Training Metrics 2025-08-30 11:50:25 - pico-train - INFO - ├── Loss: 5.7838 2025-08-30 11:50:25 - pico-train - INFO - ├── Learning Rate: 9.41e-06 2025-08-30 11:50:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:50:38 - pico-train - INFO - Step 73750 -- 🔄 Training Metrics 2025-08-30 11:50:38 - pico-train - INFO - ├── Loss: 5.6949 2025-08-30 11:50:38 - pico-train - INFO - ├── Learning Rate: 9.39e-06 2025-08-30 11:50:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:50:51 - pico-train - INFO - Step 73775 -- 🔄 Training Metrics 2025-08-30 11:50:51 - pico-train - INFO - ├── Loss: 5.7301 2025-08-30 11:50:51 - pico-train - INFO - ├── Learning Rate: 9.37e-06 2025-08-30 11:50:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:51:03 - pico-train - INFO - Step 73800 -- 🔄 Training Metrics 2025-08-30 11:51:03 - pico-train - INFO - ├── Loss: 5.7987 2025-08-30 11:51:03 - pico-train - INFO - ├── Learning Rate: 9.36e-06 2025-08-30 11:51:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:51:16 - pico-train - INFO - Step 73825 -- 🔄 Training Metrics 2025-08-30 11:51:16 - pico-train - INFO - ├── Loss: 5.8495 2025-08-30 11:51:16 - pico-train - INFO - ├── Learning Rate: 9.34e-06 2025-08-30 11:51:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:51:29 - pico-train - INFO - Step 73850 -- 🔄 Training Metrics 2025-08-30 11:51:29 - pico-train - INFO - ├── Loss: 5.7411 2025-08-30 11:51:29 - pico-train - INFO - ├── Learning Rate: 9.32e-06 2025-08-30 11:51:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:51:41 - pico-train - INFO - Step 73875 -- 🔄 Training Metrics 2025-08-30 11:51:41 - pico-train - INFO - ├── Loss: 5.7792 2025-08-30 11:51:41 - pico-train - INFO - ├── Learning Rate: 9.31e-06 2025-08-30 11:51:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:51:54 - pico-train - INFO - Step 73900 -- 🔄 Training Metrics 2025-08-30 11:51:54 - pico-train - INFO - ├── Loss: 5.8225 2025-08-30 11:51:54 - pico-train - INFO - ├── Learning Rate: 9.29e-06 2025-08-30 11:51:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:52:07 - pico-train - INFO - Step 73925 -- 🔄 Training Metrics 2025-08-30 11:52:07 - pico-train - INFO - ├── Loss: 5.7823 2025-08-30 11:52:07 - pico-train - INFO - ├── Learning Rate: 9.27e-06 2025-08-30 11:52:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:52:19 - pico-train - INFO - Step 73950 -- 🔄 Training Metrics 2025-08-30 11:52:19 - pico-train - INFO - ├── Loss: 5.6970 2025-08-30 11:52:19 - pico-train - INFO - ├── Learning Rate: 9.26e-06 2025-08-30 11:52:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:52:32 - pico-train - INFO - Step 73975 -- 🔄 Training Metrics 2025-08-30 11:52:32 - pico-train - INFO - ├── Loss: 5.7531 2025-08-30 11:52:32 - pico-train - INFO - ├── Learning Rate: 9.24e-06 2025-08-30 11:52:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:52:45 - pico-train - INFO - Step 74000 -- 💾 Saving Checkpoint 2025-08-30 11:54:39 - pico-train - INFO - Step 74000 -- 📊 Evaluation Results 2025-08-30 11:54:39 - pico-train - INFO - └── paloma: 8.636477783625786e+31 2025-08-30 11:54:42 - pico-train - INFO - Step 74000 -- 🔄 Training Metrics 2025-08-30 11:54:42 - pico-train - INFO - ├── Loss: 5.7592 2025-08-30 11:54:42 - pico-train - INFO - ├── Learning Rate: 9.22e-06 2025-08-30 11:54:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:54:42 - pico-train - INFO - Step 74000 -- 📈 Saving Learning Dynamics 2025-08-30 11:54:58 - pico-train - INFO - Step 74025 -- 🔄 Training Metrics 2025-08-30 11:54:58 - pico-train - INFO - ├── Loss: 5.7057 2025-08-30 11:54:58 - pico-train - INFO - ├── Learning Rate: 9.21e-06 2025-08-30 11:54:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:55:11 - pico-train - INFO - Step 74050 -- 🔄 Training Metrics 2025-08-30 11:55:11 - pico-train - INFO - ├── Loss: 5.8112 2025-08-30 11:55:11 - pico-train - INFO - ├── Learning Rate: 9.19e-06 2025-08-30 11:55:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:55:23 - pico-train - INFO - Step 74075 -- 🔄 Training Metrics 2025-08-30 11:55:23 - pico-train - INFO - ├── Loss: 5.8551 2025-08-30 11:55:23 - pico-train - INFO - ├── Learning Rate: 9.17e-06 2025-08-30 11:55:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:55:36 - pico-train - INFO - Step 74100 -- 🔄 Training Metrics 2025-08-30 11:55:36 - pico-train - INFO - ├── Loss: 5.7881 2025-08-30 11:55:36 - pico-train - INFO - ├── Learning Rate: 9.16e-06 2025-08-30 11:55:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:55:49 - pico-train - INFO - Step 74125 -- 🔄 Training Metrics 2025-08-30 11:55:49 - pico-train - INFO - ├── Loss: 5.7239 2025-08-30 11:55:49 - pico-train - INFO - ├── Learning Rate: 9.14e-06 2025-08-30 11:55:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:56:01 - pico-train - INFO - Step 74150 -- 🔄 Training Metrics 2025-08-30 11:56:01 - pico-train - INFO - ├── Loss: 5.7491 2025-08-30 11:56:01 - pico-train - INFO - ├── Learning Rate: 9.12e-06 2025-08-30 11:56:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:56:14 - pico-train - INFO - Step 74175 -- 🔄 Training Metrics 2025-08-30 11:56:14 - pico-train - INFO - ├── Loss: 5.7418 2025-08-30 11:56:14 - pico-train - INFO - ├── Learning Rate: 9.11e-06 2025-08-30 11:56:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:56:27 - pico-train - INFO - Step 74200 -- 🔄 Training Metrics 2025-08-30 11:56:27 - pico-train - INFO - ├── Loss: 5.8195 2025-08-30 11:56:27 - pico-train - INFO - ├── Learning Rate: 9.09e-06 2025-08-30 11:56:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:56:40 - pico-train - INFO - Step 74225 -- 🔄 Training Metrics 2025-08-30 11:56:40 - pico-train - INFO - ├── Loss: 5.8008 2025-08-30 11:56:40 - pico-train - INFO - ├── Learning Rate: 9.07e-06 2025-08-30 11:56:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:56:53 - pico-train - INFO - Step 74250 -- 🔄 Training Metrics 2025-08-30 11:56:53 - pico-train - INFO - ├── Loss: 5.7900 2025-08-30 11:56:53 - pico-train - INFO - ├── Learning Rate: 9.06e-06 2025-08-30 11:56:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:57:05 - pico-train - INFO - Step 74275 -- 🔄 Training Metrics 2025-08-30 11:57:05 - pico-train - INFO - ├── Loss: 5.8471 2025-08-30 11:57:05 - pico-train - INFO - ├── Learning Rate: 9.04e-06 2025-08-30 11:57:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:57:18 - pico-train - INFO - Step 74300 -- 🔄 Training Metrics 2025-08-30 11:57:18 - pico-train - INFO - ├── Loss: 5.8221 2025-08-30 11:57:18 - pico-train - INFO - ├── Learning Rate: 9.02e-06 2025-08-30 11:57:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:57:31 - pico-train - INFO - Step 74325 -- 🔄 Training Metrics 2025-08-30 11:57:31 - pico-train - INFO - ├── Loss: 5.7390 2025-08-30 11:57:31 - pico-train - INFO - ├── Learning Rate: 9.01e-06 2025-08-30 11:57:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:57:43 - pico-train - INFO - Step 74350 -- 🔄 Training Metrics 2025-08-30 11:57:43 - pico-train - INFO - ├── Loss: 5.7864 2025-08-30 11:57:43 - pico-train - INFO - ├── Learning Rate: 8.99e-06 2025-08-30 11:57:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:57:56 - pico-train - INFO - Step 74375 -- 🔄 Training Metrics 2025-08-30 11:57:56 - pico-train - INFO - ├── Loss: 5.8961 2025-08-30 11:57:56 - pico-train - INFO - ├── Learning Rate: 8.98e-06 2025-08-30 11:57:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:58:08 - pico-train - INFO - Step 74400 -- 🔄 Training Metrics 2025-08-30 11:58:08 - pico-train - INFO - ├── Loss: 5.7558 2025-08-30 11:58:08 - pico-train - INFO - ├── Learning Rate: 8.96e-06 2025-08-30 11:58:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:58:21 - pico-train - INFO - Step 74425 -- 🔄 Training Metrics 2025-08-30 11:58:21 - pico-train - INFO - ├── Loss: 5.7641 2025-08-30 11:58:21 - pico-train - INFO - ├── Learning Rate: 8.94e-06 2025-08-30 11:58:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:58:34 - pico-train - INFO - Step 74450 -- 🔄 Training Metrics 2025-08-30 11:58:34 - pico-train - INFO - ├── Loss: 5.7386 2025-08-30 11:58:34 - pico-train - INFO - ├── Learning Rate: 8.93e-06 2025-08-30 11:58:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:58:46 - pico-train - INFO - Step 74475 -- 🔄 Training Metrics 2025-08-30 11:58:46 - pico-train - INFO - ├── Loss: 5.7682 2025-08-30 11:58:46 - pico-train - INFO - ├── Learning Rate: 8.91e-06 2025-08-30 11:58:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 11:58:59 - pico-train - INFO - Step 74500 -- 💾 Saving Checkpoint 2025-08-30 12:00:53 - pico-train - INFO - Step 74500 -- 📊 Evaluation Results 2025-08-30 12:00:53 - pico-train - INFO - └── paloma: 9.875388203359053e+31 2025-08-30 12:00:55 - pico-train - INFO - Step 74500 -- 🔄 Training Metrics 2025-08-30 12:00:55 - pico-train - INFO - ├── Loss: 5.7399 2025-08-30 12:00:55 - pico-train - INFO - ├── Learning Rate: 8.89e-06 2025-08-30 12:00:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:00:55 - pico-train - INFO - Step 74500 -- 📈 Saving Learning Dynamics 2025-08-30 12:01:10 - pico-train - INFO - Step 74525 -- 🔄 Training Metrics 2025-08-30 12:01:10 - pico-train - INFO - ├── Loss: 5.7499 2025-08-30 12:01:10 - pico-train - INFO - ├── Learning Rate: 8.88e-06 2025-08-30 12:01:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:01:22 - pico-train - INFO - Step 74550 -- 🔄 Training Metrics 2025-08-30 12:01:22 - pico-train - INFO - ├── Loss: 5.8008 2025-08-30 12:01:22 - pico-train - INFO - ├── Learning Rate: 8.86e-06 2025-08-30 12:01:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:01:35 - pico-train - INFO - Step 74575 -- 🔄 Training Metrics 2025-08-30 12:01:35 - pico-train - INFO - ├── Loss: 5.8048 2025-08-30 12:01:35 - pico-train - INFO - ├── Learning Rate: 8.85e-06 2025-08-30 12:01:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:01:48 - pico-train - INFO - Step 74600 -- 🔄 Training Metrics 2025-08-30 12:01:48 - pico-train - INFO - ├── Loss: 5.7352 2025-08-30 12:01:48 - pico-train - INFO - ├── Learning Rate: 8.83e-06 2025-08-30 12:01:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:02:00 - pico-train - INFO - Step 74625 -- 🔄 Training Metrics 2025-08-30 12:02:00 - pico-train - INFO - ├── Loss: 5.7900 2025-08-30 12:02:00 - pico-train - INFO - ├── Learning Rate: 8.81e-06 2025-08-30 12:02:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:02:13 - pico-train - INFO - Step 74650 -- 🔄 Training Metrics 2025-08-30 12:02:13 - pico-train - INFO - ├── Loss: 5.8181 2025-08-30 12:02:13 - pico-train - INFO - ├── Learning Rate: 8.80e-06 2025-08-30 12:02:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:02:25 - pico-train - INFO - Step 74675 -- 🔄 Training Metrics 2025-08-30 12:02:25 - pico-train - INFO - ├── Loss: 5.8068 2025-08-30 12:02:25 - pico-train - INFO - ├── Learning Rate: 8.78e-06 2025-08-30 12:02:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:02:38 - pico-train - INFO - Step 74700 -- 🔄 Training Metrics 2025-08-30 12:02:38 - pico-train - INFO - ├── Loss: 5.7906 2025-08-30 12:02:38 - pico-train - INFO - ├── Learning Rate: 8.76e-06 2025-08-30 12:02:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:02:51 - pico-train - INFO - Step 74725 -- 🔄 Training Metrics 2025-08-30 12:02:51 - pico-train - INFO - ├── Loss: 5.7719 2025-08-30 12:02:51 - pico-train - INFO - ├── Learning Rate: 8.75e-06 2025-08-30 12:02:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:03:04 - pico-train - INFO - Step 74750 -- 🔄 Training Metrics 2025-08-30 12:03:04 - pico-train - INFO - ├── Loss: 5.7901 2025-08-30 12:03:04 - pico-train - INFO - ├── Learning Rate: 8.73e-06 2025-08-30 12:03:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:03:16 - pico-train - INFO - Step 74775 -- 🔄 Training Metrics 2025-08-30 12:03:16 - pico-train - INFO - ├── Loss: 5.7765 2025-08-30 12:03:16 - pico-train - INFO - ├── Learning Rate: 8.72e-06 2025-08-30 12:03:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:03:29 - pico-train - INFO - Step 74800 -- 🔄 Training Metrics 2025-08-30 12:03:29 - pico-train - INFO - ├── Loss: 5.7052 2025-08-30 12:03:29 - pico-train - INFO - ├── Learning Rate: 8.70e-06 2025-08-30 12:03:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:03:42 - pico-train - INFO - Step 74825 -- 🔄 Training Metrics 2025-08-30 12:03:42 - pico-train - INFO - ├── Loss: 5.7863 2025-08-30 12:03:42 - pico-train - INFO - ├── Learning Rate: 8.68e-06 2025-08-30 12:03:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:03:54 - pico-train - INFO - Step 74850 -- 🔄 Training Metrics 2025-08-30 12:03:54 - pico-train - INFO - ├── Loss: 5.7816 2025-08-30 12:03:54 - pico-train - INFO - ├── Learning Rate: 8.67e-06 2025-08-30 12:03:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:04:07 - pico-train - INFO - Step 74875 -- 🔄 Training Metrics 2025-08-30 12:04:07 - pico-train - INFO - ├── Loss: 5.7777 2025-08-30 12:04:07 - pico-train - INFO - ├── Learning Rate: 8.65e-06 2025-08-30 12:04:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:04:21 - pico-train - INFO - Step 74900 -- 🔄 Training Metrics 2025-08-30 12:04:21 - pico-train - INFO - ├── Loss: 5.8692 2025-08-30 12:04:21 - pico-train - INFO - ├── Learning Rate: 8.63e-06 2025-08-30 12:04:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:04:34 - pico-train - INFO - Step 74925 -- 🔄 Training Metrics 2025-08-30 12:04:34 - pico-train - INFO - ├── Loss: 5.7441 2025-08-30 12:04:34 - pico-train - INFO - ├── Learning Rate: 8.62e-06 2025-08-30 12:04:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:04:46 - pico-train - INFO - Step 74950 -- 🔄 Training Metrics 2025-08-30 12:04:46 - pico-train - INFO - ├── Loss: 5.8448 2025-08-30 12:04:46 - pico-train - INFO - ├── Learning Rate: 8.60e-06 2025-08-30 12:04:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:04:59 - pico-train - INFO - Step 74975 -- 🔄 Training Metrics 2025-08-30 12:04:59 - pico-train - INFO - ├── Loss: 5.7562 2025-08-30 12:04:59 - pico-train - INFO - ├── Learning Rate: 8.59e-06 2025-08-30 12:04:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:05:11 - pico-train - INFO - Step 75000 -- 💾 Saving Checkpoint