2025-08-30 23:07:44 - pico-train - INFO - Step 0 -- 📊 Evaluation Results 2025-08-30 23:07:44 - pico-train - INFO - └── paloma: inf 2025-08-30 23:07:44 - pico-train - INFO - ================================================== 2025-08-30 23:07:44 - pico-train - INFO - ✨ Training Configuration 2025-08-30 23:07:44 - pico-train - INFO - ================================================== 2025-08-30 23:07:44 - pico-train - INFO - ╭─────────────────────────────────────────────────────╮ 2025-08-30 23:07:44 - pico-train - INFO - │ checkpointing: │ 2025-08-30 23:07:44 - pico-train - INFO - │ checkpoints_dir: checkpoints │ 2025-08-30 23:07:44 - pico-train - INFO - │ evaluation: │ 2025-08-30 23:07:44 - pico-train - INFO - │ eval_results_dir: eval_results │ 2025-08-30 23:07:44 - pico-train - INFO - │ fabric_checkpoint_dir: fabric_state │ 2025-08-30 23:07:44 - pico-train - INFO - │ fabric_checkpoint_filename: checkpoint.pt │ 2025-08-30 23:07:44 - pico-train - INFO - │ hf_checkpoint: │ 2025-08-30 23:07:44 - pico-train - INFO - │ collection_slug: null │ 2025-08-30 23:07:44 - pico-train - INFO - │ repo_id: ThomasTheMaker/pico-decoder-tiny │ 2025-08-30 23:07:44 - pico-train - INFO - │ learning_dynamics: │ 2025-08-30 23:07:44 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 23:07:44 - pico-train - INFO - │ eval_data: null │ 2025-08-30 23:07:44 - pico-train - INFO - │ layer_suffixes: │ 2025-08-30 23:07:44 - pico-train - INFO - │ - attention.v_proj │ 2025-08-30 23:07:44 - pico-train - INFO - │ - attention.o_proj │ 2025-08-30 23:07:44 - pico-train - INFO - │ - swiglu.w_2 │ 2025-08-30 23:07:44 - pico-train - INFO - │ sequence_idx: -1 │ 2025-08-30 23:07:44 - pico-train - INFO - │ learning_dynamics_dir: learning_dynamics │ 2025-08-30 23:07:44 - pico-train - INFO - │ logs_dir: logs │ 2025-08-30 23:07:44 - pico-train - INFO - │ run_name: pico-decoder-tiny-dolma250M-v1 │ 2025-08-30 23:07:44 - pico-train - INFO - │ runs_dir: runs │ 2025-08-30 23:07:44 - pico-train - INFO - │ save_every_n_steps: 2000 │ 2025-08-30 23:07:44 - pico-train - INFO - │ save_to_hf: false │ 2025-08-30 23:07:44 - pico-train - INFO - │ training: │ 2025-08-30 23:07:44 - pico-train - INFO - │ auto_resume: true │ 2025-08-30 23:07:44 - pico-train - INFO - │ data: │ 2025-08-30 23:07:44 - pico-train - INFO - │ dataloader: │ 2025-08-30 23:07:44 - pico-train - INFO - │ batch_size: 16 │ 2025-08-30 23:07:44 - pico-train - INFO - │ dataset: │ 2025-08-30 23:07:44 - pico-train - INFO - │ name: pico-lm/pretokenized-dolma │ 2025-08-30 23:07:44 - pico-train - INFO - │ tokenizer: │ 2025-08-30 23:07:44 - pico-train - INFO - │ name: allenai/OLMo-7B-0724-hf │ 2025-08-30 23:07:44 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 23:07:44 - pico-train - INFO - │ evaluation: │ 2025-08-30 23:07:44 - pico-train - INFO - │ metrics: │ 2025-08-30 23:07:44 - pico-train - INFO - │ - paloma │ 2025-08-30 23:07:44 - pico-train - INFO - │ paloma: │ 2025-08-30 23:07:44 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 23:07:44 - pico-train - INFO - │ dataset_name: pico-lm/pretokenized-paloma-tinsy │ 2025-08-30 23:07:44 - pico-train - INFO - │ dataset_split: val │ 2025-08-30 23:07:44 - pico-train - INFO - │ max_length: 2048 │ 2025-08-30 23:07:44 - pico-train - INFO - │ model: │ 2025-08-30 23:07:44 - pico-train - INFO - │ activation_hidden_dim: 384 │ 2025-08-30 23:07:44 - pico-train - INFO - │ attention_n_heads: 12 │ 2025-08-30 23:07:44 - pico-train - INFO - │ attention_n_kv_heads: 4 │ 2025-08-30 23:07:44 - pico-train - INFO - │ batch_size: 1024 │ 2025-08-30 23:07:44 - pico-train - INFO - │ d_model: 96 │ 2025-08-30 23:07:44 - pico-train - INFO - │ max_seq_len: 2048 │ 2025-08-30 23:07:44 - pico-train - INFO - │ model_type: pico_decoder │ 2025-08-30 23:07:44 - pico-train - INFO - │ n_layers: 12 │ 2025-08-30 23:07:44 - pico-train - INFO - │ norm_eps: 1.0e-06 │ 2025-08-30 23:07:44 - pico-train - INFO - │ position_emb_theta: 10000.0 │ 2025-08-30 23:07:44 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 23:07:44 - pico-train - INFO - │ monitoring: │ 2025-08-30 23:07:44 - pico-train - INFO - │ logging: │ 2025-08-30 23:07:44 - pico-train - INFO - │ log_every_n_steps: 100 │ 2025-08-30 23:07:44 - pico-train - INFO - │ log_level: INFO │ 2025-08-30 23:07:44 - pico-train - INFO - │ save_to_wandb: false │ 2025-08-30 23:07:44 - pico-train - INFO - │ wandb: │ 2025-08-30 23:07:44 - pico-train - INFO - │ entity: boymyc │ 2025-08-30 23:07:44 - pico-train - INFO - │ project: pico-decoder-tiny │ 2025-08-30 23:07:44 - pico-train - INFO - │ training: │ 2025-08-30 23:07:44 - pico-train - INFO - │ fabric: │ 2025-08-30 23:07:44 - pico-train - INFO - │ accelerator: cuda │ 2025-08-30 23:07:44 - pico-train - INFO - │ num_devices: 1 │ 2025-08-30 23:07:44 - pico-train - INFO - │ num_nodes: 1 │ 2025-08-30 23:07:44 - pico-train - INFO - │ precision: bf16-mixed │ 2025-08-30 23:07:44 - pico-train - INFO - │ max_steps: 100000 │ 2025-08-30 23:07:44 - pico-train - INFO - │ optimization: │ 2025-08-30 23:07:44 - pico-train - INFO - │ gradient_accumulation_steps: 1 │ 2025-08-30 23:07:44 - pico-train - INFO - │ lr: 0.0002 │ 2025-08-30 23:07:44 - pico-train - INFO - │ lr_scheduler: cosine │ 2025-08-30 23:07:44 - pico-train - INFO - │ lr_warmup_steps: 2000 │ 2025-08-30 23:07:44 - pico-train - INFO - │ optimizer: adamw │ 2025-08-30 23:07:44 - pico-train - INFO - │ │ 2025-08-30 23:07:44 - pico-train - INFO - ╰─────────────────────────────────────────────────────╯ 2025-08-30 23:07:44 - pico-train - INFO - ================================================== 2025-08-30 23:07:44 - pico-train - INFO - ⛭ Runtime Summary: 2025-08-30 23:07:44 - pico-train - INFO - ================================================== 2025-08-30 23:07:44 - pico-train - INFO - Starting from step: 0 2025-08-30 23:07:44 - pico-train - INFO - Model Setup: 2025-08-30 23:07:44 - pico-train - INFO - └─ Total Parameters: 11,282,784 2025-08-30 23:07:44 - pico-train - INFO - └─ Trainable Parameters: 11,282,784 2025-08-30 23:07:44 - pico-train - INFO - Distributed Setup: 2025-08-30 23:07:44 - pico-train - INFO - └─ Number of Devices: 1 2025-08-30 23:07:44 - pico-train - INFO - └─ Device Type: NVIDIA H100 80GB HBM3 2025-08-30 23:07:44 - pico-train - INFO - └─ Available Memory: 85.03 GB 2025-08-30 23:07:44 - pico-train - INFO - Software Setup: 2025-08-30 23:07:44 - pico-train - INFO - └─ Python Version: 3.12.3 2025-08-30 23:07:44 - pico-train - INFO - └─ PyTorch Version: 2.8.0+cu128 2025-08-30 23:07:44 - pico-train - INFO - └─ CUDA Version: 12.8 2025-08-30 23:07:44 - pico-train - INFO - └─ Operating System: Linux 6.8.0-71-generic 2025-08-30 23:07:44 - pico-train - INFO - Batch Size Configuration: 2025-08-30 23:07:44 - pico-train - INFO - └─ Global Batch Size: 16 2025-08-30 23:07:44 - pico-train - INFO - └─ Per Device Batch Size: 16 2025-08-30 23:07:44 - pico-train - INFO - └─ Gradient Accumulation Steps: 1 2025-08-30 23:07:44 - pico-train - INFO - ================================================== 2025-08-30 23:07:45 - pico-train - INFO - Step 0 -- 🔄 Training Metrics 2025-08-30 23:07:45 - pico-train - INFO - ├── Loss: 10.9884 2025-08-30 23:07:45 - pico-train - INFO - ├── Learning Rate: 0.00e+00 2025-08-30 23:07:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:07:45 - pico-train - INFO - Step 0 -- 📈 Saving Learning Dynamics 2025-08-30 23:08:41 - pico-train - INFO - Step 100 -- 🔄 Training Metrics 2025-08-30 23:08:41 - pico-train - INFO - ├── Loss: 10.9746 2025-08-30 23:08:41 - pico-train - INFO - ├── Learning Rate: 1.00e-05 2025-08-30 23:08:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:09:35 - pico-train - INFO - Step 200 -- 🔄 Training Metrics 2025-08-30 23:09:35 - pico-train - INFO - ├── Loss: 10.7653 2025-08-30 23:09:35 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 23:09:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:10:29 - pico-train - INFO - Step 300 -- 🔄 Training Metrics 2025-08-30 23:10:29 - pico-train - INFO - ├── Loss: 10.2902 2025-08-30 23:10:29 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-30 23:10:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:11:23 - pico-train - INFO - Step 400 -- 🔄 Training Metrics 2025-08-30 23:11:23 - pico-train - INFO - ├── Loss: 9.8373 2025-08-30 23:11:23 - pico-train - INFO - ├── Learning Rate: 4.00e-05 2025-08-30 23:11:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:12:18 - pico-train - INFO - Step 500 -- 🔄 Training Metrics 2025-08-30 23:12:18 - pico-train - INFO - ├── Loss: 9.3629 2025-08-30 23:12:18 - pico-train - INFO - ├── Learning Rate: 5.00e-05 2025-08-30 23:12:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:13:12 - pico-train - INFO - Step 600 -- 🔄 Training Metrics 2025-08-30 23:13:12 - pico-train - INFO - ├── Loss: 8.8887 2025-08-30 23:13:12 - pico-train - INFO - ├── Learning Rate: 6.00e-05 2025-08-30 23:13:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:14:07 - pico-train - INFO - Step 700 -- 🔄 Training Metrics 2025-08-30 23:14:07 - pico-train - INFO - ├── Loss: 8.4408 2025-08-30 23:14:07 - pico-train - INFO - ├── Learning Rate: 7.00e-05 2025-08-30 23:14:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:15:02 - pico-train - INFO - Step 800 -- 🔄 Training Metrics 2025-08-30 23:15:02 - pico-train - INFO - ├── Loss: 8.0906 2025-08-30 23:15:02 - pico-train - INFO - ├── Learning Rate: 8.00e-05 2025-08-30 23:15:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:15:57 - pico-train - INFO - Step 900 -- 🔄 Training Metrics 2025-08-30 23:15:57 - pico-train - INFO - ├── Loss: 7.8459 2025-08-30 23:15:57 - pico-train - INFO - ├── Learning Rate: 9.00e-05 2025-08-30 23:15:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:16:52 - pico-train - INFO - Step 1000 -- 🔄 Training Metrics 2025-08-30 23:16:52 - pico-train - INFO - ├── Loss: 7.6972 2025-08-30 23:16:52 - pico-train - INFO - ├── Learning Rate: 1.00e-04 2025-08-30 23:16:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:17:47 - pico-train - INFO - Step 1100 -- 🔄 Training Metrics 2025-08-30 23:17:47 - pico-train - INFO - ├── Loss: 7.5570 2025-08-30 23:17:47 - pico-train - INFO - ├── Learning Rate: 1.10e-04 2025-08-30 23:17:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:18:42 - pico-train - INFO - Step 1200 -- 🔄 Training Metrics 2025-08-30 23:18:42 - pico-train - INFO - ├── Loss: 7.4823 2025-08-30 23:18:42 - pico-train - INFO - ├── Learning Rate: 1.20e-04 2025-08-30 23:18:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:19:38 - pico-train - INFO - Step 1300 -- 🔄 Training Metrics 2025-08-30 23:19:38 - pico-train - INFO - ├── Loss: 7.3624 2025-08-30 23:19:38 - pico-train - INFO - ├── Learning Rate: 1.30e-04 2025-08-30 23:19:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:20:33 - pico-train - INFO - Step 1400 -- 🔄 Training Metrics 2025-08-30 23:20:33 - pico-train - INFO - ├── Loss: 7.2538 2025-08-30 23:20:33 - pico-train - INFO - ├── Learning Rate: 1.40e-04 2025-08-30 23:20:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:21:28 - pico-train - INFO - Step 1500 -- 🔄 Training Metrics 2025-08-30 23:21:28 - pico-train - INFO - ├── Loss: 7.1582 2025-08-30 23:21:28 - pico-train - INFO - ├── Learning Rate: 1.50e-04 2025-08-30 23:21:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:22:23 - pico-train - INFO - Step 1600 -- 🔄 Training Metrics 2025-08-30 23:22:23 - pico-train - INFO - ├── Loss: 7.0462 2025-08-30 23:22:23 - pico-train - INFO - ├── Learning Rate: 1.60e-04 2025-08-30 23:22:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:23:18 - pico-train - INFO - Step 1700 -- 🔄 Training Metrics 2025-08-30 23:23:18 - pico-train - INFO - ├── Loss: 6.9729 2025-08-30 23:23:18 - pico-train - INFO - ├── Learning Rate: 1.70e-04 2025-08-30 23:23:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:24:13 - pico-train - INFO - Step 1800 -- 🔄 Training Metrics 2025-08-30 23:24:13 - pico-train - INFO - ├── Loss: 6.8825 2025-08-30 23:24:13 - pico-train - INFO - ├── Learning Rate: 1.80e-04 2025-08-30 23:24:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:25:08 - pico-train - INFO - Step 1900 -- 🔄 Training Metrics 2025-08-30 23:25:08 - pico-train - INFO - ├── Loss: 6.8003 2025-08-30 23:25:08 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-30 23:25:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:26:02 - pico-train - INFO - Step 2000 -- 💾 Saving Checkpoint 2025-08-30 23:28:07 - pico-train - INFO - Step 2000 -- 📊 Evaluation Results 2025-08-30 23:28:07 - pico-train - INFO - └── paloma: 9.921214391079047e+20 2025-08-30 23:28:07 - pico-train - INFO - Step 2000 -- 🔄 Training Metrics 2025-08-30 23:28:07 - pico-train - INFO - ├── Loss: 6.7360 2025-08-30 23:28:07 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:28:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:28:07 - pico-train - INFO - Step 2000 -- 📈 Saving Learning Dynamics 2025-08-30 23:29:04 - pico-train - INFO - Step 2100 -- 🔄 Training Metrics 2025-08-30 23:29:04 - pico-train - INFO - ├── Loss: 6.6658 2025-08-30 23:29:04 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:29:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:29:58 - pico-train - INFO - Step 2200 -- 🔄 Training Metrics 2025-08-30 23:29:58 - pico-train - INFO - ├── Loss: 6.6040 2025-08-30 23:29:58 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:29:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:30:53 - pico-train - INFO - Step 2300 -- 🔄 Training Metrics 2025-08-30 23:30:53 - pico-train - INFO - ├── Loss: 6.5360 2025-08-30 23:30:53 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:30:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:31:47 - pico-train - INFO - Step 2400 -- 🔄 Training Metrics 2025-08-30 23:31:47 - pico-train - INFO - ├── Loss: 6.5011 2025-08-30 23:31:47 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:31:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:32:43 - pico-train - INFO - Step 2500 -- 🔄 Training Metrics 2025-08-30 23:32:43 - pico-train - INFO - ├── Loss: 6.4541 2025-08-30 23:32:43 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:32:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:33:38 - pico-train - INFO - Step 2600 -- 🔄 Training Metrics 2025-08-30 23:33:38 - pico-train - INFO - ├── Loss: 6.4299 2025-08-30 23:33:38 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:33:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:34:34 - pico-train - INFO - Step 2700 -- 🔄 Training Metrics 2025-08-30 23:34:34 - pico-train - INFO - ├── Loss: 6.3677 2025-08-30 23:34:34 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:34:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:35:28 - pico-train - INFO - Step 2800 -- 🔄 Training Metrics 2025-08-30 23:35:28 - pico-train - INFO - ├── Loss: 6.3537 2025-08-30 23:35:28 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:35:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:36:24 - pico-train - INFO - Step 2900 -- 🔄 Training Metrics 2025-08-30 23:36:24 - pico-train - INFO - ├── Loss: 6.3225 2025-08-30 23:36:24 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:36:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:37:19 - pico-train - INFO - Step 3000 -- 🔄 Training Metrics 2025-08-30 23:37:19 - pico-train - INFO - ├── Loss: 6.2806 2025-08-30 23:37:19 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:37:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:38:14 - pico-train - INFO - Step 3100 -- 🔄 Training Metrics 2025-08-30 23:38:14 - pico-train - INFO - ├── Loss: 6.2626 2025-08-30 23:38:14 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:38:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:39:09 - pico-train - INFO - Step 3200 -- 🔄 Training Metrics 2025-08-30 23:39:09 - pico-train - INFO - ├── Loss: 6.2206 2025-08-30 23:39:09 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:39:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:40:04 - pico-train - INFO - Step 3300 -- 🔄 Training Metrics 2025-08-30 23:40:04 - pico-train - INFO - ├── Loss: 6.2140 2025-08-30 23:40:04 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:40:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:40:58 - pico-train - INFO - Step 3400 -- 🔄 Training Metrics 2025-08-30 23:40:58 - pico-train - INFO - ├── Loss: 6.1525 2025-08-30 23:40:58 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:40:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:41:52 - pico-train - INFO - Step 3500 -- 🔄 Training Metrics 2025-08-30 23:41:52 - pico-train - INFO - ├── Loss: 6.1104 2025-08-30 23:41:52 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:41:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:42:46 - pico-train - INFO - Step 3600 -- 🔄 Training Metrics 2025-08-30 23:42:46 - pico-train - INFO - ├── Loss: 6.1327 2025-08-30 23:42:46 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:42:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:43:40 - pico-train - INFO - Step 3700 -- 🔄 Training Metrics 2025-08-30 23:43:40 - pico-train - INFO - ├── Loss: 6.1046 2025-08-30 23:43:40 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:43:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:44:33 - pico-train - INFO - Step 3800 -- 🔄 Training Metrics 2025-08-30 23:44:33 - pico-train - INFO - ├── Loss: 6.0910 2025-08-30 23:44:33 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:44:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:45:28 - pico-train - INFO - Step 3900 -- 🔄 Training Metrics 2025-08-30 23:45:28 - pico-train - INFO - ├── Loss: 6.0369 2025-08-30 23:45:28 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:45:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:46:22 - pico-train - INFO - Step 4000 -- 💾 Saving Checkpoint 2025-08-30 23:48:26 - pico-train - INFO - Step 4000 -- 📊 Evaluation Results 2025-08-30 23:48:26 - pico-train - INFO - └── paloma: 3.516233710165711e+23 2025-08-30 23:48:26 - pico-train - INFO - Step 4000 -- 🔄 Training Metrics 2025-08-30 23:48:26 - pico-train - INFO - ├── Loss: 6.0450 2025-08-30 23:48:26 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:48:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:48:26 - pico-train - INFO - Step 4000 -- 📈 Saving Learning Dynamics 2025-08-30 23:49:24 - pico-train - INFO - Step 4100 -- 🔄 Training Metrics 2025-08-30 23:49:24 - pico-train - INFO - ├── Loss: 6.0120 2025-08-30 23:49:24 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:49:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:50:18 - pico-train - INFO - Step 4200 -- 🔄 Training Metrics 2025-08-30 23:50:18 - pico-train - INFO - ├── Loss: 5.9897 2025-08-30 23:50:18 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:50:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:51:14 - pico-train - INFO - Step 4300 -- 🔄 Training Metrics 2025-08-30 23:51:14 - pico-train - INFO - ├── Loss: 5.9636 2025-08-30 23:51:14 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:51:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:52:09 - pico-train - INFO - Step 4400 -- 🔄 Training Metrics 2025-08-30 23:52:09 - pico-train - INFO - ├── Loss: 5.9759 2025-08-30 23:52:09 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:52:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:53:04 - pico-train - INFO - Step 4500 -- 🔄 Training Metrics 2025-08-30 23:53:04 - pico-train - INFO - ├── Loss: 5.9551 2025-08-30 23:53:04 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:53:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:53:58 - pico-train - INFO - Step 4600 -- 🔄 Training Metrics 2025-08-30 23:53:58 - pico-train - INFO - ├── Loss: 5.9165 2025-08-30 23:53:58 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:53:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:54:53 - pico-train - INFO - Step 4700 -- 🔄 Training Metrics 2025-08-30 23:54:53 - pico-train - INFO - ├── Loss: 5.8939 2025-08-30 23:54:53 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:54:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:55:48 - pico-train - INFO - Step 4800 -- 🔄 Training Metrics 2025-08-30 23:55:48 - pico-train - INFO - ├── Loss: 5.8769 2025-08-30 23:55:48 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:55:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:56:43 - pico-train - INFO - Step 4900 -- 🔄 Training Metrics 2025-08-30 23:56:43 - pico-train - INFO - ├── Loss: 5.8605 2025-08-30 23:56:43 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:56:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:57:39 - pico-train - INFO - Step 5000 -- 🔄 Training Metrics 2025-08-30 23:57:39 - pico-train - INFO - ├── Loss: 5.8727 2025-08-30 23:57:39 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:57:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:58:33 - pico-train - INFO - Step 5100 -- 🔄 Training Metrics 2025-08-30 23:58:33 - pico-train - INFO - ├── Loss: 5.8446 2025-08-30 23:58:33 - pico-train - INFO - ├── Learning Rate: 2.00e-04 2025-08-30 23:58:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 23:59:29 - pico-train - INFO - Step 5200 -- 🔄 Training Metrics 2025-08-30 23:59:29 - pico-train - INFO - ├── Loss: 5.8289 2025-08-30 23:59:29 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-30 23:59:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:00:24 - pico-train - INFO - Step 5300 -- 🔄 Training Metrics 2025-08-31 00:00:24 - pico-train - INFO - ├── Loss: 5.8220 2025-08-31 00:00:24 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:00:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:01:19 - pico-train - INFO - Step 5400 -- 🔄 Training Metrics 2025-08-31 00:01:19 - pico-train - INFO - ├── Loss: 5.8132 2025-08-31 00:01:19 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:01:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:02:15 - pico-train - INFO - Step 5500 -- 🔄 Training Metrics 2025-08-31 00:02:15 - pico-train - INFO - ├── Loss: 5.7878 2025-08-31 00:02:15 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:02:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:03:09 - pico-train - INFO - Step 5600 -- 🔄 Training Metrics 2025-08-31 00:03:09 - pico-train - INFO - ├── Loss: 5.7639 2025-08-31 00:03:09 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:03:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:04:04 - pico-train - INFO - Step 5700 -- 🔄 Training Metrics 2025-08-31 00:04:04 - pico-train - INFO - ├── Loss: 5.7698 2025-08-31 00:04:04 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:04:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:04:59 - pico-train - INFO - Step 5800 -- 🔄 Training Metrics 2025-08-31 00:04:59 - pico-train - INFO - ├── Loss: 5.7458 2025-08-31 00:04:59 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:04:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:05:55 - pico-train - INFO - Step 5900 -- 🔄 Training Metrics 2025-08-31 00:05:55 - pico-train - INFO - ├── Loss: 5.7482 2025-08-31 00:05:55 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:05:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:06:49 - pico-train - INFO - Step 6000 -- 💾 Saving Checkpoint 2025-08-31 00:08:44 - pico-train - INFO - Step 6000 -- 📊 Evaluation Results 2025-08-31 00:08:44 - pico-train - INFO - └── paloma: 5.501352274432362e+26 2025-08-31 00:08:44 - pico-train - INFO - Step 6000 -- 🔄 Training Metrics 2025-08-31 00:08:44 - pico-train - INFO - ├── Loss: 5.7402 2025-08-31 00:08:44 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:08:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:08:44 - pico-train - INFO - Step 6000 -- 📈 Saving Learning Dynamics 2025-08-31 00:09:41 - pico-train - INFO - Step 6100 -- 🔄 Training Metrics 2025-08-31 00:09:41 - pico-train - INFO - ├── Loss: 5.7377 2025-08-31 00:09:41 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:09:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:10:36 - pico-train - INFO - Step 6200 -- 🔄 Training Metrics 2025-08-31 00:10:36 - pico-train - INFO - ├── Loss: 5.6952 2025-08-31 00:10:36 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:10:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:11:31 - pico-train - INFO - Step 6300 -- 🔄 Training Metrics 2025-08-31 00:11:31 - pico-train - INFO - ├── Loss: 5.6845 2025-08-31 00:11:31 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:11:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:12:27 - pico-train - INFO - Step 6400 -- 🔄 Training Metrics 2025-08-31 00:12:27 - pico-train - INFO - ├── Loss: 5.6903 2025-08-31 00:12:27 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:12:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:13:22 - pico-train - INFO - Step 6500 -- 🔄 Training Metrics 2025-08-31 00:13:22 - pico-train - INFO - ├── Loss: 5.6877 2025-08-31 00:13:22 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:13:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:14:17 - pico-train - INFO - Step 6600 -- 🔄 Training Metrics 2025-08-31 00:14:17 - pico-train - INFO - ├── Loss: 5.6538 2025-08-31 00:14:17 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:14:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:15:12 - pico-train - INFO - Step 6700 -- 🔄 Training Metrics 2025-08-31 00:15:12 - pico-train - INFO - ├── Loss: 5.6437 2025-08-31 00:15:12 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:15:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:16:07 - pico-train - INFO - Step 6800 -- 🔄 Training Metrics 2025-08-31 00:16:07 - pico-train - INFO - ├── Loss: 5.6444 2025-08-31 00:16:07 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:16:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:17:03 - pico-train - INFO - Step 6900 -- 🔄 Training Metrics 2025-08-31 00:17:03 - pico-train - INFO - ├── Loss: 5.6238 2025-08-31 00:17:03 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:17:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:17:57 - pico-train - INFO - Step 7000 -- 🔄 Training Metrics 2025-08-31 00:17:57 - pico-train - INFO - ├── Loss: 5.6188 2025-08-31 00:17:57 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:17:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:18:53 - pico-train - INFO - Step 7100 -- 🔄 Training Metrics 2025-08-31 00:18:53 - pico-train - INFO - ├── Loss: 5.5782 2025-08-31 00:18:53 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:18:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:19:48 - pico-train - INFO - Step 7200 -- 🔄 Training Metrics 2025-08-31 00:19:48 - pico-train - INFO - ├── Loss: 5.6112 2025-08-31 00:19:48 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:19:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:20:43 - pico-train - INFO - Step 7300 -- 🔄 Training Metrics 2025-08-31 00:20:43 - pico-train - INFO - ├── Loss: 5.5985 2025-08-31 00:20:43 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:20:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:21:38 - pico-train - INFO - Step 7400 -- 🔄 Training Metrics 2025-08-31 00:21:38 - pico-train - INFO - ├── Loss: 5.6009 2025-08-31 00:21:38 - pico-train - INFO - ├── Learning Rate: 1.99e-04 2025-08-31 00:21:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:22:33 - pico-train - INFO - Step 7500 -- 🔄 Training Metrics 2025-08-31 00:22:33 - pico-train - INFO - ├── Loss: 5.5714 2025-08-31 00:22:33 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:22:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:23:28 - pico-train - INFO - Step 7600 -- 🔄 Training Metrics 2025-08-31 00:23:28 - pico-train - INFO - ├── Loss: 5.5714 2025-08-31 00:23:28 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:23:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:24:25 - pico-train - INFO - Step 7700 -- 🔄 Training Metrics 2025-08-31 00:24:25 - pico-train - INFO - ├── Loss: 5.5653 2025-08-31 00:24:25 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:24:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:25:20 - pico-train - INFO - Step 7800 -- 🔄 Training Metrics 2025-08-31 00:25:20 - pico-train - INFO - ├── Loss: 5.5559 2025-08-31 00:25:20 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:25:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:26:15 - pico-train - INFO - Step 7900 -- 🔄 Training Metrics 2025-08-31 00:26:15 - pico-train - INFO - ├── Loss: 5.5568 2025-08-31 00:26:15 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:26:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:27:10 - pico-train - INFO - Step 8000 -- 💾 Saving Checkpoint 2025-08-31 00:29:13 - pico-train - INFO - Step 8000 -- 📊 Evaluation Results 2025-08-31 00:29:13 - pico-train - INFO - └── paloma: 2.7741731039516784e+30 2025-08-31 00:29:14 - pico-train - INFO - Step 8000 -- 🔄 Training Metrics 2025-08-31 00:29:14 - pico-train - INFO - ├── Loss: 5.5269 2025-08-31 00:29:14 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:29:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:29:14 - pico-train - INFO - Step 8000 -- 📈 Saving Learning Dynamics 2025-08-31 00:30:11 - pico-train - INFO - Step 8100 -- 🔄 Training Metrics 2025-08-31 00:30:11 - pico-train - INFO - ├── Loss: 5.5294 2025-08-31 00:30:11 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:30:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:31:05 - pico-train - INFO - Step 8200 -- 🔄 Training Metrics 2025-08-31 00:31:05 - pico-train - INFO - ├── Loss: 5.5352 2025-08-31 00:31:05 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:31:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:31:59 - pico-train - INFO - Step 8300 -- 🔄 Training Metrics 2025-08-31 00:31:59 - pico-train - INFO - ├── Loss: 5.5280 2025-08-31 00:31:59 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:31:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:32:54 - pico-train - INFO - Step 8400 -- 🔄 Training Metrics 2025-08-31 00:32:54 - pico-train - INFO - ├── Loss: 5.4997 2025-08-31 00:32:54 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:32:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:33:50 - pico-train - INFO - Step 8500 -- 🔄 Training Metrics 2025-08-31 00:33:50 - pico-train - INFO - ├── Loss: 5.4873 2025-08-31 00:33:50 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:33:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:34:45 - pico-train - INFO - Step 8600 -- 🔄 Training Metrics 2025-08-31 00:34:45 - pico-train - INFO - ├── Loss: 5.5047 2025-08-31 00:34:45 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:34:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:35:41 - pico-train - INFO - Step 8700 -- 🔄 Training Metrics 2025-08-31 00:35:41 - pico-train - INFO - ├── Loss: 5.4809 2025-08-31 00:35:41 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:35:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:36:35 - pico-train - INFO - Step 8800 -- 🔄 Training Metrics 2025-08-31 00:36:35 - pico-train - INFO - ├── Loss: 5.4859 2025-08-31 00:36:35 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:36:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:37:31 - pico-train - INFO - Step 8900 -- 🔄 Training Metrics 2025-08-31 00:37:31 - pico-train - INFO - ├── Loss: 5.4690 2025-08-31 00:37:31 - pico-train - INFO - ├── Learning Rate: 1.98e-04 2025-08-31 00:37:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:38:28 - pico-train - INFO - Step 9000 -- 🔄 Training Metrics 2025-08-31 00:38:28 - pico-train - INFO - ├── Loss: 5.4476 2025-08-31 00:38:28 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:38:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:39:24 - pico-train - INFO - Step 9100 -- 🔄 Training Metrics 2025-08-31 00:39:24 - pico-train - INFO - ├── Loss: 5.4383 2025-08-31 00:39:24 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:39:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:40:18 - pico-train - INFO - Step 9200 -- 🔄 Training Metrics 2025-08-31 00:40:18 - pico-train - INFO - ├── Loss: 5.4569 2025-08-31 00:40:18 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:40:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:41:14 - pico-train - INFO - Step 9300 -- 🔄 Training Metrics 2025-08-31 00:41:14 - pico-train - INFO - ├── Loss: 5.4367 2025-08-31 00:41:14 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:41:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:42:09 - pico-train - INFO - Step 9400 -- 🔄 Training Metrics 2025-08-31 00:42:09 - pico-train - INFO - ├── Loss: 5.4623 2025-08-31 00:42:09 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:42:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:43:05 - pico-train - INFO - Step 9500 -- 🔄 Training Metrics 2025-08-31 00:43:05 - pico-train - INFO - ├── Loss: 5.4268 2025-08-31 00:43:05 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:43:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:44:01 - pico-train - INFO - Step 9600 -- 🔄 Training Metrics 2025-08-31 00:44:01 - pico-train - INFO - ├── Loss: 5.4639 2025-08-31 00:44:01 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:44:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:44:55 - pico-train - INFO - Step 9700 -- 🔄 Training Metrics 2025-08-31 00:44:55 - pico-train - INFO - ├── Loss: 5.4521 2025-08-31 00:44:55 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:44:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:45:51 - pico-train - INFO - Step 9800 -- 🔄 Training Metrics 2025-08-31 00:45:51 - pico-train - INFO - ├── Loss: 5.4139 2025-08-31 00:45:51 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:45:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:46:46 - pico-train - INFO - Step 9900 -- 🔄 Training Metrics 2025-08-31 00:46:46 - pico-train - INFO - ├── Loss: 5.4026 2025-08-31 00:46:46 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:46:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:47:41 - pico-train - INFO - Step 10000 -- 💾 Saving Checkpoint 2025-08-31 00:49:33 - pico-train - INFO - Step 10000 -- 📊 Evaluation Results 2025-08-31 00:49:33 - pico-train - INFO - └── paloma: 1.0181753654885411e+35 2025-08-31 00:49:34 - pico-train - INFO - Step 10000 -- 🔄 Training Metrics 2025-08-31 00:49:34 - pico-train - INFO - ├── Loss: 5.4191 2025-08-31 00:49:34 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:49:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:49:34 - pico-train - INFO - Step 10000 -- 📈 Saving Learning Dynamics 2025-08-31 00:50:31 - pico-train - INFO - Step 10100 -- 🔄 Training Metrics 2025-08-31 00:50:31 - pico-train - INFO - ├── Loss: 5.3756 2025-08-31 00:50:31 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:50:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:51:24 - pico-train - INFO - Step 10200 -- 🔄 Training Metrics 2025-08-31 00:51:24 - pico-train - INFO - ├── Loss: 5.3976 2025-08-31 00:51:24 - pico-train - INFO - ├── Learning Rate: 1.97e-04 2025-08-31 00:51:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:52:21 - pico-train - INFO - Step 10300 -- 🔄 Training Metrics 2025-08-31 00:52:21 - pico-train - INFO - ├── Loss: 5.4049 2025-08-31 00:52:21 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:52:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:53:16 - pico-train - INFO - Step 10400 -- 🔄 Training Metrics 2025-08-31 00:53:16 - pico-train - INFO - ├── Loss: 5.3991 2025-08-31 00:53:16 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:53:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:54:11 - pico-train - INFO - Step 10500 -- 🔄 Training Metrics 2025-08-31 00:54:11 - pico-train - INFO - ├── Loss: 5.4016 2025-08-31 00:54:11 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:54:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:55:06 - pico-train - INFO - Step 10600 -- 🔄 Training Metrics 2025-08-31 00:55:06 - pico-train - INFO - ├── Loss: 5.3924 2025-08-31 00:55:06 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:55:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:56:01 - pico-train - INFO - Step 10700 -- 🔄 Training Metrics 2025-08-31 00:56:01 - pico-train - INFO - ├── Loss: 5.3781 2025-08-31 00:56:01 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:56:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:56:57 - pico-train - INFO - Step 10800 -- 🔄 Training Metrics 2025-08-31 00:56:57 - pico-train - INFO - ├── Loss: 5.3433 2025-08-31 00:56:57 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:56:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:57:52 - pico-train - INFO - Step 10900 -- 🔄 Training Metrics 2025-08-31 00:57:52 - pico-train - INFO - ├── Loss: 5.3610 2025-08-31 00:57:52 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:57:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:58:48 - pico-train - INFO - Step 11000 -- 🔄 Training Metrics 2025-08-31 00:58:48 - pico-train - INFO - ├── Loss: 5.3561 2025-08-31 00:58:48 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:58:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 00:59:42 - pico-train - INFO - Step 11100 -- 🔄 Training Metrics 2025-08-31 00:59:42 - pico-train - INFO - ├── Loss: 5.3818 2025-08-31 00:59:42 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 00:59:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:00:38 - pico-train - INFO - Step 11200 -- 🔄 Training Metrics 2025-08-31 01:00:38 - pico-train - INFO - ├── Loss: 5.3596 2025-08-31 01:00:38 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 01:00:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:01:33 - pico-train - INFO - Step 11300 -- 🔄 Training Metrics 2025-08-31 01:01:33 - pico-train - INFO - ├── Loss: 5.3558 2025-08-31 01:01:33 - pico-train - INFO - ├── Learning Rate: 1.96e-04 2025-08-31 01:01:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:02:29 - pico-train - INFO - Step 11400 -- 🔄 Training Metrics 2025-08-31 01:02:29 - pico-train - INFO - ├── Loss: 5.3484 2025-08-31 01:02:29 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:02:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:03:24 - pico-train - INFO - Step 11500 -- 🔄 Training Metrics 2025-08-31 01:03:24 - pico-train - INFO - ├── Loss: 5.3399 2025-08-31 01:03:24 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:03:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:04:20 - pico-train - INFO - Step 11600 -- 🔄 Training Metrics 2025-08-31 01:04:20 - pico-train - INFO - ├── Loss: 5.3252 2025-08-31 01:04:20 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:04:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:05:15 - pico-train - INFO - Step 11700 -- 🔄 Training Metrics 2025-08-31 01:05:15 - pico-train - INFO - ├── Loss: 5.3194 2025-08-31 01:05:15 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:05:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:06:11 - pico-train - INFO - Step 11800 -- 🔄 Training Metrics 2025-08-31 01:06:11 - pico-train - INFO - ├── Loss: 5.3387 2025-08-31 01:06:11 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:06:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:07:06 - pico-train - INFO - Step 11900 -- 🔄 Training Metrics 2025-08-31 01:07:06 - pico-train - INFO - ├── Loss: 5.3459 2025-08-31 01:07:06 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:07:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:08:00 - pico-train - INFO - Step 12000 -- 💾 Saving Checkpoint 2025-08-31 01:09:55 - pico-train - INFO - Step 12000 -- 📊 Evaluation Results 2025-08-31 01:09:55 - pico-train - INFO - └── paloma: inf 2025-08-31 01:09:55 - pico-train - INFO - Step 12000 -- 🔄 Training Metrics 2025-08-31 01:09:55 - pico-train - INFO - ├── Loss: 5.3216 2025-08-31 01:09:55 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:09:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:09:55 - pico-train - INFO - Step 12000 -- 📈 Saving Learning Dynamics 2025-08-31 01:10:52 - pico-train - INFO - Step 12100 -- 🔄 Training Metrics 2025-08-31 01:10:52 - pico-train - INFO - ├── Loss: 5.3164 2025-08-31 01:10:52 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:10:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:11:47 - pico-train - INFO - Step 12200 -- 🔄 Training Metrics 2025-08-31 01:11:47 - pico-train - INFO - ├── Loss: 5.3075 2025-08-31 01:11:47 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:11:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:12:41 - pico-train - INFO - Step 12300 -- 🔄 Training Metrics 2025-08-31 01:12:41 - pico-train - INFO - ├── Loss: 5.2908 2025-08-31 01:12:41 - pico-train - INFO - ├── Learning Rate: 1.95e-04 2025-08-31 01:12:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:13:36 - pico-train - INFO - Step 12400 -- 🔄 Training Metrics 2025-08-31 01:13:36 - pico-train - INFO - ├── Loss: 5.2970 2025-08-31 01:13:36 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:13:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:14:31 - pico-train - INFO - Step 12500 -- 🔄 Training Metrics 2025-08-31 01:14:31 - pico-train - INFO - ├── Loss: 5.2850 2025-08-31 01:14:31 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:14:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:15:26 - pico-train - INFO - Step 12600 -- 🔄 Training Metrics 2025-08-31 01:15:26 - pico-train - INFO - ├── Loss: 5.3027 2025-08-31 01:15:26 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:15:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:16:22 - pico-train - INFO - Step 12700 -- 🔄 Training Metrics 2025-08-31 01:16:22 - pico-train - INFO - ├── Loss: 5.2792 2025-08-31 01:16:22 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:16:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:17:31 - pico-train - INFO - Step 12800 -- 🔄 Training Metrics 2025-08-31 01:17:31 - pico-train - INFO - ├── Loss: 5.3026 2025-08-31 01:17:31 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:17:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:18:25 - pico-train - INFO - Step 12900 -- 🔄 Training Metrics 2025-08-31 01:18:25 - pico-train - INFO - ├── Loss: 5.2918 2025-08-31 01:18:25 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:18:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:19:21 - pico-train - INFO - Step 13000 -- 🔄 Training Metrics 2025-08-31 01:19:21 - pico-train - INFO - ├── Loss: 5.3032 2025-08-31 01:19:21 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:19:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:20:16 - pico-train - INFO - Step 13100 -- 🔄 Training Metrics 2025-08-31 01:20:16 - pico-train - INFO - ├── Loss: 5.2887 2025-08-31 01:20:16 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:20:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:21:12 - pico-train - INFO - Step 13200 -- 🔄 Training Metrics 2025-08-31 01:21:12 - pico-train - INFO - ├── Loss: 5.2853 2025-08-31 01:21:12 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:21:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:22:07 - pico-train - INFO - Step 13300 -- 🔄 Training Metrics 2025-08-31 01:22:07 - pico-train - INFO - ├── Loss: 5.2829 2025-08-31 01:22:07 - pico-train - INFO - ├── Learning Rate: 1.94e-04 2025-08-31 01:22:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:23:02 - pico-train - INFO - Step 13400 -- 🔄 Training Metrics 2025-08-31 01:23:02 - pico-train - INFO - ├── Loss: 5.2773 2025-08-31 01:23:02 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:23:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:23:57 - pico-train - INFO - Step 13500 -- 🔄 Training Metrics 2025-08-31 01:23:57 - pico-train - INFO - ├── Loss: 5.2645 2025-08-31 01:23:57 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:23:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:24:53 - pico-train - INFO - Step 13600 -- 🔄 Training Metrics 2025-08-31 01:24:53 - pico-train - INFO - ├── Loss: 5.2692 2025-08-31 01:24:53 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:24:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:25:48 - pico-train - INFO - Step 13700 -- 🔄 Training Metrics 2025-08-31 01:25:48 - pico-train - INFO - ├── Loss: 5.2500 2025-08-31 01:25:48 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:25:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:26:44 - pico-train - INFO - Step 13800 -- 🔄 Training Metrics 2025-08-31 01:26:44 - pico-train - INFO - ├── Loss: 5.2655 2025-08-31 01:26:44 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:26:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:27:39 - pico-train - INFO - Step 13900 -- 🔄 Training Metrics 2025-08-31 01:27:39 - pico-train - INFO - ├── Loss: 5.2591 2025-08-31 01:27:39 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:27:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:28:34 - pico-train - INFO - Step 14000 -- 💾 Saving Checkpoint 2025-08-31 01:30:35 - pico-train - INFO - Step 14000 -- 📊 Evaluation Results 2025-08-31 01:30:35 - pico-train - INFO - └── paloma: inf 2025-08-31 01:30:36 - pico-train - INFO - Step 14000 -- 🔄 Training Metrics 2025-08-31 01:30:36 - pico-train - INFO - ├── Loss: 5.2632 2025-08-31 01:30:36 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:30:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:30:36 - pico-train - INFO - Step 14000 -- 📈 Saving Learning Dynamics 2025-08-31 01:31:33 - pico-train - INFO - Step 14100 -- 🔄 Training Metrics 2025-08-31 01:31:33 - pico-train - INFO - ├── Loss: 5.2383 2025-08-31 01:31:33 - pico-train - INFO - ├── Learning Rate: 1.93e-04 2025-08-31 01:31:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:32:27 - pico-train - INFO - Step 14200 -- 🔄 Training Metrics 2025-08-31 01:32:27 - pico-train - INFO - ├── Loss: 5.2505 2025-08-31 01:32:27 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:32:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:33:21 - pico-train - INFO - Step 14300 -- 🔄 Training Metrics 2025-08-31 01:33:21 - pico-train - INFO - ├── Loss: 5.2444 2025-08-31 01:33:21 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:33:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:34:16 - pico-train - INFO - Step 14400 -- 🔄 Training Metrics 2025-08-31 01:34:16 - pico-train - INFO - ├── Loss: 5.2410 2025-08-31 01:34:16 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:34:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:35:10 - pico-train - INFO - Step 14500 -- 🔄 Training Metrics 2025-08-31 01:35:10 - pico-train - INFO - ├── Loss: 5.2553 2025-08-31 01:35:10 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:35:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:36:04 - pico-train - INFO - Step 14600 -- 🔄 Training Metrics 2025-08-31 01:36:04 - pico-train - INFO - ├── Loss: 5.2326 2025-08-31 01:36:04 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:36:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:36:58 - pico-train - INFO - Step 14700 -- 🔄 Training Metrics 2025-08-31 01:36:58 - pico-train - INFO - ├── Loss: 5.2398 2025-08-31 01:36:58 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:36:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:37:52 - pico-train - INFO - Step 14800 -- 🔄 Training Metrics 2025-08-31 01:37:52 - pico-train - INFO - ├── Loss: 5.2355 2025-08-31 01:37:52 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:37:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:38:46 - pico-train - INFO - Step 14900 -- 🔄 Training Metrics 2025-08-31 01:38:46 - pico-train - INFO - ├── Loss: 5.2329 2025-08-31 01:38:46 - pico-train - INFO - ├── Learning Rate: 1.92e-04 2025-08-31 01:38:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:39:41 - pico-train - INFO - Step 15000 -- 🔄 Training Metrics 2025-08-31 01:39:41 - pico-train - INFO - ├── Loss: 5.2309 2025-08-31 01:39:41 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:39:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:40:35 - pico-train - INFO - Step 15100 -- 🔄 Training Metrics 2025-08-31 01:40:35 - pico-train - INFO - ├── Loss: 5.2283 2025-08-31 01:40:35 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:40:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:41:29 - pico-train - INFO - Step 15200 -- 🔄 Training Metrics 2025-08-31 01:41:29 - pico-train - INFO - ├── Loss: 5.2301 2025-08-31 01:41:29 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:41:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:42:23 - pico-train - INFO - Step 15300 -- 🔄 Training Metrics 2025-08-31 01:42:23 - pico-train - INFO - ├── Loss: 5.2524 2025-08-31 01:42:23 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:42:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:43:18 - pico-train - INFO - Step 15400 -- 🔄 Training Metrics 2025-08-31 01:43:18 - pico-train - INFO - ├── Loss: 5.2164 2025-08-31 01:43:18 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:43:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:44:14 - pico-train - INFO - Step 15500 -- 🔄 Training Metrics 2025-08-31 01:44:14 - pico-train - INFO - ├── Loss: 5.2332 2025-08-31 01:44:14 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:44:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:45:09 - pico-train - INFO - Step 15600 -- 🔄 Training Metrics 2025-08-31 01:45:09 - pico-train - INFO - ├── Loss: 5.2263 2025-08-31 01:45:09 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:45:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:46:04 - pico-train - INFO - Step 15700 -- 🔄 Training Metrics 2025-08-31 01:46:04 - pico-train - INFO - ├── Loss: 5.2087 2025-08-31 01:46:04 - pico-train - INFO - ├── Learning Rate: 1.91e-04 2025-08-31 01:46:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:46:59 - pico-train - INFO - Step 15800 -- 🔄 Training Metrics 2025-08-31 01:46:59 - pico-train - INFO - ├── Loss: 5.2199 2025-08-31 01:46:59 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-31 01:46:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:47:55 - pico-train - INFO - Step 15900 -- 🔄 Training Metrics 2025-08-31 01:47:55 - pico-train - INFO - ├── Loss: 5.2056 2025-08-31 01:47:55 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-31 01:47:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:48:49 - pico-train - INFO - Step 16000 -- 💾 Saving Checkpoint 2025-08-31 01:50:39 - pico-train - INFO - Step 16000 -- 📊 Evaluation Results 2025-08-31 01:50:39 - pico-train - INFO - └── paloma: inf 2025-08-31 01:50:39 - pico-train - INFO - Step 16000 -- 🔄 Training Metrics 2025-08-31 01:50:39 - pico-train - INFO - ├── Loss: 5.2088 2025-08-31 01:50:39 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-31 01:50:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:50:39 - pico-train - INFO - Step 16000 -- 📈 Saving Learning Dynamics 2025-08-31 01:51:35 - pico-train - INFO - Step 16100 -- 🔄 Training Metrics 2025-08-31 01:51:35 - pico-train - INFO - ├── Loss: 5.1931 2025-08-31 01:51:35 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-31 01:51:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:52:29 - pico-train - INFO - Step 16200 -- 🔄 Training Metrics 2025-08-31 01:52:29 - pico-train - INFO - ├── Loss: 5.1773 2025-08-31 01:52:29 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-31 01:52:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:53:24 - pico-train - INFO - Step 16300 -- 🔄 Training Metrics 2025-08-31 01:53:24 - pico-train - INFO - ├── Loss: 5.2032 2025-08-31 01:53:24 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-31 01:53:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:54:18 - pico-train - INFO - Step 16400 -- 🔄 Training Metrics 2025-08-31 01:54:18 - pico-train - INFO - ├── Loss: 5.1868 2025-08-31 01:54:18 - pico-train - INFO - ├── Learning Rate: 1.90e-04 2025-08-31 01:54:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:55:12 - pico-train - INFO - Step 16500 -- 🔄 Training Metrics 2025-08-31 01:55:12 - pico-train - INFO - ├── Loss: 5.1764 2025-08-31 01:55:12 - pico-train - INFO - ├── Learning Rate: 1.89e-04 2025-08-31 01:55:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:56:05 - pico-train - INFO - Step 16600 -- 🔄 Training Metrics 2025-08-31 01:56:05 - pico-train - INFO - ├── Loss: 5.2021 2025-08-31 01:56:05 - pico-train - INFO - ├── Learning Rate: 1.89e-04 2025-08-31 01:56:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:57:00 - pico-train - INFO - Step 16700 -- 🔄 Training Metrics 2025-08-31 01:57:00 - pico-train - INFO - ├── Loss: 5.1797 2025-08-31 01:57:00 - pico-train - INFO - ├── Learning Rate: 1.89e-04 2025-08-31 01:57:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:57:54 - pico-train - INFO - Step 16800 -- 🔄 Training Metrics 2025-08-31 01:57:54 - pico-train - INFO - ├── Loss: 5.1489 2025-08-31 01:57:54 - pico-train - INFO - ├── Learning Rate: 1.89e-04 2025-08-31 01:57:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:58:48 - pico-train - INFO - Step 16900 -- 🔄 Training Metrics 2025-08-31 01:58:48 - pico-train - INFO - ├── Loss: 5.1805 2025-08-31 01:58:48 - pico-train - INFO - ├── Learning Rate: 1.89e-04 2025-08-31 01:58:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 01:59:42 - pico-train - INFO - Step 17000 -- 🔄 Training Metrics 2025-08-31 01:59:42 - pico-train - INFO - ├── Loss: 5.1844 2025-08-31 01:59:42 - pico-train - INFO - ├── Learning Rate: 1.89e-04 2025-08-31 01:59:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:00:37 - pico-train - INFO - Step 17100 -- 🔄 Training Metrics 2025-08-31 02:00:37 - pico-train - INFO - ├── Loss: 5.1826 2025-08-31 02:00:37 - pico-train - INFO - ├── Learning Rate: 1.89e-04 2025-08-31 02:00:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:01:31 - pico-train - INFO - Step 17200 -- 🔄 Training Metrics 2025-08-31 02:01:31 - pico-train - INFO - ├── Loss: 5.1568 2025-08-31 02:01:31 - pico-train - INFO - ├── Learning Rate: 1.88e-04 2025-08-31 02:01:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:02:25 - pico-train - INFO - Step 17300 -- 🔄 Training Metrics 2025-08-31 02:02:25 - pico-train - INFO - ├── Loss: 5.2075 2025-08-31 02:02:25 - pico-train - INFO - ├── Learning Rate: 1.88e-04 2025-08-31 02:02:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:03:19 - pico-train - INFO - Step 17400 -- 🔄 Training Metrics 2025-08-31 02:03:19 - pico-train - INFO - ├── Loss: 5.1649 2025-08-31 02:03:19 - pico-train - INFO - ├── Learning Rate: 1.88e-04 2025-08-31 02:03:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:04:12 - pico-train - INFO - Step 17500 -- 🔄 Training Metrics 2025-08-31 02:04:12 - pico-train - INFO - ├── Loss: 5.1506 2025-08-31 02:04:12 - pico-train - INFO - ├── Learning Rate: 1.88e-04 2025-08-31 02:04:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:05:06 - pico-train - INFO - Step 17600 -- 🔄 Training Metrics 2025-08-31 02:05:06 - pico-train - INFO - ├── Loss: 5.1757 2025-08-31 02:05:06 - pico-train - INFO - ├── Learning Rate: 1.88e-04 2025-08-31 02:05:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:06:00 - pico-train - INFO - Step 17700 -- 🔄 Training Metrics 2025-08-31 02:06:00 - pico-train - INFO - ├── Loss: 5.1580 2025-08-31 02:06:00 - pico-train - INFO - ├── Learning Rate: 1.88e-04 2025-08-31 02:06:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:06:56 - pico-train - INFO - Step 17800 -- 🔄 Training Metrics 2025-08-31 02:06:56 - pico-train - INFO - ├── Loss: 5.1309 2025-08-31 02:06:56 - pico-train - INFO - ├── Learning Rate: 1.87e-04 2025-08-31 02:06:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:07:51 - pico-train - INFO - Step 17900 -- 🔄 Training Metrics 2025-08-31 02:07:51 - pico-train - INFO - ├── Loss: 5.1601 2025-08-31 02:07:51 - pico-train - INFO - ├── Learning Rate: 1.87e-04 2025-08-31 02:07:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:08:46 - pico-train - INFO - Step 18000 -- 💾 Saving Checkpoint 2025-08-31 02:10:35 - pico-train - INFO - Step 18000 -- 📊 Evaluation Results 2025-08-31 02:10:35 - pico-train - INFO - └── paloma: inf 2025-08-31 02:10:36 - pico-train - INFO - Step 18000 -- 🔄 Training Metrics 2025-08-31 02:10:36 - pico-train - INFO - ├── Loss: 5.1612 2025-08-31 02:10:36 - pico-train - INFO - ├── Learning Rate: 1.87e-04 2025-08-31 02:10:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:10:36 - pico-train - INFO - Step 18000 -- 📈 Saving Learning Dynamics 2025-08-31 02:11:32 - pico-train - INFO - Step 18100 -- 🔄 Training Metrics 2025-08-31 02:11:32 - pico-train - INFO - ├── Loss: 5.1556 2025-08-31 02:11:32 - pico-train - INFO - ├── Learning Rate: 1.87e-04 2025-08-31 02:11:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:12:27 - pico-train - INFO - Step 18200 -- 🔄 Training Metrics 2025-08-31 02:12:27 - pico-train - INFO - ├── Loss: 5.1406 2025-08-31 02:12:27 - pico-train - INFO - ├── Learning Rate: 1.87e-04 2025-08-31 02:12:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:13:21 - pico-train - INFO - Step 18300 -- 🔄 Training Metrics 2025-08-31 02:13:21 - pico-train - INFO - ├── Loss: 5.1410 2025-08-31 02:13:21 - pico-train - INFO - ├── Learning Rate: 1.87e-04 2025-08-31 02:13:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:14:15 - pico-train - INFO - Step 18400 -- 🔄 Training Metrics 2025-08-31 02:14:15 - pico-train - INFO - ├── Loss: 5.1468 2025-08-31 02:14:15 - pico-train - INFO - ├── Learning Rate: 1.86e-04 2025-08-31 02:14:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:15:09 - pico-train - INFO - Step 18500 -- 🔄 Training Metrics 2025-08-31 02:15:09 - pico-train - INFO - ├── Loss: 5.1310 2025-08-31 02:15:09 - pico-train - INFO - ├── Learning Rate: 1.86e-04 2025-08-31 02:15:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:16:03 - pico-train - INFO - Step 18600 -- 🔄 Training Metrics 2025-08-31 02:16:03 - pico-train - INFO - ├── Loss: 5.1406 2025-08-31 02:16:03 - pico-train - INFO - ├── Learning Rate: 1.86e-04 2025-08-31 02:16:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:16:58 - pico-train - INFO - Step 18700 -- 🔄 Training Metrics 2025-08-31 02:16:58 - pico-train - INFO - ├── Loss: 5.1457 2025-08-31 02:16:58 - pico-train - INFO - ├── Learning Rate: 1.86e-04 2025-08-31 02:16:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:17:52 - pico-train - INFO - Step 18800 -- 🔄 Training Metrics 2025-08-31 02:17:52 - pico-train - INFO - ├── Loss: 5.1176 2025-08-31 02:17:52 - pico-train - INFO - ├── Learning Rate: 1.86e-04 2025-08-31 02:17:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:18:45 - pico-train - INFO - Step 18900 -- 🔄 Training Metrics 2025-08-31 02:18:45 - pico-train - INFO - ├── Loss: 5.1283 2025-08-31 02:18:45 - pico-train - INFO - ├── Learning Rate: 1.86e-04 2025-08-31 02:18:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:19:40 - pico-train - INFO - Step 19000 -- 🔄 Training Metrics 2025-08-31 02:19:40 - pico-train - INFO - ├── Loss: 5.1459 2025-08-31 02:19:40 - pico-train - INFO - ├── Learning Rate: 1.86e-04 2025-08-31 02:19:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:20:34 - pico-train - INFO - Step 19100 -- 🔄 Training Metrics 2025-08-31 02:20:34 - pico-train - INFO - ├── Loss: 5.1400 2025-08-31 02:20:34 - pico-train - INFO - ├── Learning Rate: 1.85e-04 2025-08-31 02:20:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:21:30 - pico-train - INFO - Step 19200 -- 🔄 Training Metrics 2025-08-31 02:21:30 - pico-train - INFO - ├── Loss: 5.1267 2025-08-31 02:21:30 - pico-train - INFO - ├── Learning Rate: 1.85e-04 2025-08-31 02:21:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:22:23 - pico-train - INFO - Step 19300 -- 🔄 Training Metrics 2025-08-31 02:22:23 - pico-train - INFO - ├── Loss: 5.1289 2025-08-31 02:22:23 - pico-train - INFO - ├── Learning Rate: 1.85e-04 2025-08-31 02:22:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:23:17 - pico-train - INFO - Step 19400 -- 🔄 Training Metrics 2025-08-31 02:23:17 - pico-train - INFO - ├── Loss: 5.1261 2025-08-31 02:23:17 - pico-train - INFO - ├── Learning Rate: 1.85e-04 2025-08-31 02:23:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:24:12 - pico-train - INFO - Step 19500 -- 🔄 Training Metrics 2025-08-31 02:24:12 - pico-train - INFO - ├── Loss: 5.1383 2025-08-31 02:24:12 - pico-train - INFO - ├── Learning Rate: 1.85e-04 2025-08-31 02:24:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:25:06 - pico-train - INFO - Step 19600 -- 🔄 Training Metrics 2025-08-31 02:25:06 - pico-train - INFO - ├── Loss: 5.1403 2025-08-31 02:25:06 - pico-train - INFO - ├── Learning Rate: 1.85e-04 2025-08-31 02:25:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:26:00 - pico-train - INFO - Step 19700 -- 🔄 Training Metrics 2025-08-31 02:26:00 - pico-train - INFO - ├── Loss: 5.1261 2025-08-31 02:26:00 - pico-train - INFO - ├── Learning Rate: 1.84e-04 2025-08-31 02:26:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:26:54 - pico-train - INFO - Step 19800 -- 🔄 Training Metrics 2025-08-31 02:26:54 - pico-train - INFO - ├── Loss: 5.1311 2025-08-31 02:26:54 - pico-train - INFO - ├── Learning Rate: 1.84e-04 2025-08-31 02:26:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:27:48 - pico-train - INFO - Step 19900 -- 🔄 Training Metrics 2025-08-31 02:27:48 - pico-train - INFO - ├── Loss: 5.1058 2025-08-31 02:27:48 - pico-train - INFO - ├── Learning Rate: 1.84e-04 2025-08-31 02:27:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:28:42 - pico-train - INFO - Step 20000 -- 💾 Saving Checkpoint 2025-08-31 02:30:31 - pico-train - INFO - Step 20000 -- 📊 Evaluation Results 2025-08-31 02:30:31 - pico-train - INFO - └── paloma: inf 2025-08-31 02:30:32 - pico-train - INFO - Step 20000 -- 🔄 Training Metrics 2025-08-31 02:30:32 - pico-train - INFO - ├── Loss: 5.1195 2025-08-31 02:30:32 - pico-train - INFO - ├── Learning Rate: 1.84e-04 2025-08-31 02:30:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:30:32 - pico-train - INFO - Step 20000 -- 📈 Saving Learning Dynamics 2025-08-31 02:31:29 - pico-train - INFO - Step 20100 -- 🔄 Training Metrics 2025-08-31 02:31:29 - pico-train - INFO - ├── Loss: 5.1018 2025-08-31 02:31:29 - pico-train - INFO - ├── Learning Rate: 1.84e-04 2025-08-31 02:31:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:32:25 - pico-train - INFO - Step 20200 -- 🔄 Training Metrics 2025-08-31 02:32:25 - pico-train - INFO - ├── Loss: 5.1165 2025-08-31 02:32:25 - pico-train - INFO - ├── Learning Rate: 1.83e-04 2025-08-31 02:32:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:33:20 - pico-train - INFO - Step 20300 -- 🔄 Training Metrics 2025-08-31 02:33:20 - pico-train - INFO - ├── Loss: 5.1355 2025-08-31 02:33:20 - pico-train - INFO - ├── Learning Rate: 1.83e-04 2025-08-31 02:33:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:34:15 - pico-train - INFO - Step 20400 -- 🔄 Training Metrics 2025-08-31 02:34:15 - pico-train - INFO - ├── Loss: 5.1157 2025-08-31 02:34:15 - pico-train - INFO - ├── Learning Rate: 1.83e-04 2025-08-31 02:34:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:35:11 - pico-train - INFO - Step 20500 -- 🔄 Training Metrics 2025-08-31 02:35:11 - pico-train - INFO - ├── Loss: 5.1164 2025-08-31 02:35:11 - pico-train - INFO - ├── Learning Rate: 1.83e-04 2025-08-31 02:35:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:36:06 - pico-train - INFO - Step 20600 -- 🔄 Training Metrics 2025-08-31 02:36:06 - pico-train - INFO - ├── Loss: 5.1290 2025-08-31 02:36:06 - pico-train - INFO - ├── Learning Rate: 1.83e-04 2025-08-31 02:36:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:37:01 - pico-train - INFO - Step 20700 -- 🔄 Training Metrics 2025-08-31 02:37:01 - pico-train - INFO - ├── Loss: 5.1012 2025-08-31 02:37:01 - pico-train - INFO - ├── Learning Rate: 1.83e-04 2025-08-31 02:37:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:37:57 - pico-train - INFO - Step 20800 -- 🔄 Training Metrics 2025-08-31 02:37:57 - pico-train - INFO - ├── Loss: 5.1103 2025-08-31 02:37:57 - pico-train - INFO - ├── Learning Rate: 1.82e-04 2025-08-31 02:37:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:38:52 - pico-train - INFO - Step 20900 -- 🔄 Training Metrics 2025-08-31 02:38:52 - pico-train - INFO - ├── Loss: 5.1041 2025-08-31 02:38:52 - pico-train - INFO - ├── Learning Rate: 1.82e-04 2025-08-31 02:38:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:39:48 - pico-train - INFO - Step 21000 -- 🔄 Training Metrics 2025-08-31 02:39:48 - pico-train - INFO - ├── Loss: 5.1055 2025-08-31 02:39:48 - pico-train - INFO - ├── Learning Rate: 1.82e-04 2025-08-31 02:39:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:40:42 - pico-train - INFO - Step 21100 -- 🔄 Training Metrics 2025-08-31 02:40:42 - pico-train - INFO - ├── Loss: 5.0964 2025-08-31 02:40:42 - pico-train - INFO - ├── Learning Rate: 1.82e-04 2025-08-31 02:40:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:41:37 - pico-train - INFO - Step 21200 -- 🔄 Training Metrics 2025-08-31 02:41:37 - pico-train - INFO - ├── Loss: 5.1160 2025-08-31 02:41:37 - pico-train - INFO - ├── Learning Rate: 1.82e-04 2025-08-31 02:41:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:42:31 - pico-train - INFO - Step 21300 -- 🔄 Training Metrics 2025-08-31 02:42:31 - pico-train - INFO - ├── Loss: 5.1048 2025-08-31 02:42:31 - pico-train - INFO - ├── Learning Rate: 1.81e-04 2025-08-31 02:42:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:43:26 - pico-train - INFO - Step 21400 -- 🔄 Training Metrics 2025-08-31 02:43:26 - pico-train - INFO - ├── Loss: 5.0806 2025-08-31 02:43:26 - pico-train - INFO - ├── Learning Rate: 1.81e-04 2025-08-31 02:43:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:44:21 - pico-train - INFO - Step 21500 -- 🔄 Training Metrics 2025-08-31 02:44:21 - pico-train - INFO - ├── Loss: 5.1073 2025-08-31 02:44:21 - pico-train - INFO - ├── Learning Rate: 1.81e-04 2025-08-31 02:44:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:45:15 - pico-train - INFO - Step 21600 -- 🔄 Training Metrics 2025-08-31 02:45:15 - pico-train - INFO - ├── Loss: 5.0786 2025-08-31 02:45:15 - pico-train - INFO - ├── Learning Rate: 1.81e-04 2025-08-31 02:45:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:46:09 - pico-train - INFO - Step 21700 -- 🔄 Training Metrics 2025-08-31 02:46:09 - pico-train - INFO - ├── Loss: 5.0965 2025-08-31 02:46:09 - pico-train - INFO - ├── Learning Rate: 1.81e-04 2025-08-31 02:46:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:47:04 - pico-train - INFO - Step 21800 -- 🔄 Training Metrics 2025-08-31 02:47:04 - pico-train - INFO - ├── Loss: 5.1086 2025-08-31 02:47:04 - pico-train - INFO - ├── Learning Rate: 1.81e-04 2025-08-31 02:47:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:47:58 - pico-train - INFO - Step 21900 -- 🔄 Training Metrics 2025-08-31 02:47:58 - pico-train - INFO - ├── Loss: 5.1030 2025-08-31 02:47:58 - pico-train - INFO - ├── Learning Rate: 1.80e-04 2025-08-31 02:47:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:48:51 - pico-train - INFO - Step 22000 -- 💾 Saving Checkpoint 2025-08-31 02:50:45 - pico-train - INFO - Step 22000 -- 📊 Evaluation Results 2025-08-31 02:50:45 - pico-train - INFO - └── paloma: inf 2025-08-31 02:50:45 - pico-train - INFO - Step 22000 -- 🔄 Training Metrics 2025-08-31 02:50:45 - pico-train - INFO - ├── Loss: 5.0792 2025-08-31 02:50:45 - pico-train - INFO - ├── Learning Rate: 1.80e-04 2025-08-31 02:50:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:50:45 - pico-train - INFO - Step 22000 -- 📈 Saving Learning Dynamics 2025-08-31 02:51:42 - pico-train - INFO - Step 22100 -- 🔄 Training Metrics 2025-08-31 02:51:42 - pico-train - INFO - ├── Loss: 5.0856 2025-08-31 02:51:42 - pico-train - INFO - ├── Learning Rate: 1.80e-04 2025-08-31 02:51:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:52:36 - pico-train - INFO - Step 22200 -- 🔄 Training Metrics 2025-08-31 02:52:36 - pico-train - INFO - ├── Loss: 5.0824 2025-08-31 02:52:36 - pico-train - INFO - ├── Learning Rate: 1.80e-04 2025-08-31 02:52:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:53:30 - pico-train - INFO - Step 22300 -- 🔄 Training Metrics 2025-08-31 02:53:30 - pico-train - INFO - ├── Loss: 5.0941 2025-08-31 02:53:30 - pico-train - INFO - ├── Learning Rate: 1.80e-04 2025-08-31 02:53:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:54:25 - pico-train - INFO - Step 22400 -- 🔄 Training Metrics 2025-08-31 02:54:25 - pico-train - INFO - ├── Loss: 5.0779 2025-08-31 02:54:25 - pico-train - INFO - ├── Learning Rate: 1.79e-04 2025-08-31 02:54:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:55:18 - pico-train - INFO - Step 22500 -- 🔄 Training Metrics 2025-08-31 02:55:18 - pico-train - INFO - ├── Loss: 5.0808 2025-08-31 02:55:18 - pico-train - INFO - ├── Learning Rate: 1.79e-04 2025-08-31 02:55:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:56:12 - pico-train - INFO - Step 22600 -- 🔄 Training Metrics 2025-08-31 02:56:12 - pico-train - INFO - ├── Loss: 5.0870 2025-08-31 02:56:12 - pico-train - INFO - ├── Learning Rate: 1.79e-04 2025-08-31 02:56:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:57:07 - pico-train - INFO - Step 22700 -- 🔄 Training Metrics 2025-08-31 02:57:07 - pico-train - INFO - ├── Loss: 5.0751 2025-08-31 02:57:07 - pico-train - INFO - ├── Learning Rate: 1.79e-04 2025-08-31 02:57:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:58:01 - pico-train - INFO - Step 22800 -- 🔄 Training Metrics 2025-08-31 02:58:01 - pico-train - INFO - ├── Loss: 5.0664 2025-08-31 02:58:01 - pico-train - INFO - ├── Learning Rate: 1.79e-04 2025-08-31 02:58:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:58:56 - pico-train - INFO - Step 22900 -- 🔄 Training Metrics 2025-08-31 02:58:56 - pico-train - INFO - ├── Loss: 5.0716 2025-08-31 02:58:56 - pico-train - INFO - ├── Learning Rate: 1.78e-04 2025-08-31 02:58:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 02:59:49 - pico-train - INFO - Step 23000 -- 🔄 Training Metrics 2025-08-31 02:59:49 - pico-train - INFO - ├── Loss: 5.0720 2025-08-31 02:59:49 - pico-train - INFO - ├── Learning Rate: 1.78e-04 2025-08-31 02:59:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:00:45 - pico-train - INFO - Step 23100 -- 🔄 Training Metrics 2025-08-31 03:00:45 - pico-train - INFO - ├── Loss: 5.0657 2025-08-31 03:00:45 - pico-train - INFO - ├── Learning Rate: 1.78e-04 2025-08-31 03:00:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:01:39 - pico-train - INFO - Step 23200 -- 🔄 Training Metrics 2025-08-31 03:01:39 - pico-train - INFO - ├── Loss: 5.0631 2025-08-31 03:01:39 - pico-train - INFO - ├── Learning Rate: 1.78e-04 2025-08-31 03:01:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:02:33 - pico-train - INFO - Step 23300 -- 🔄 Training Metrics 2025-08-31 03:02:33 - pico-train - INFO - ├── Loss: 5.0873 2025-08-31 03:02:33 - pico-train - INFO - ├── Learning Rate: 1.78e-04 2025-08-31 03:02:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:03:27 - pico-train - INFO - Step 23400 -- 🔄 Training Metrics 2025-08-31 03:03:27 - pico-train - INFO - ├── Loss: 5.0613 2025-08-31 03:03:27 - pico-train - INFO - ├── Learning Rate: 1.77e-04 2025-08-31 03:03:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:04:21 - pico-train - INFO - Step 23500 -- 🔄 Training Metrics 2025-08-31 03:04:21 - pico-train - INFO - ├── Loss: 5.0463 2025-08-31 03:04:21 - pico-train - INFO - ├── Learning Rate: 1.77e-04 2025-08-31 03:04:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:05:16 - pico-train - INFO - Step 23600 -- 🔄 Training Metrics 2025-08-31 03:05:16 - pico-train - INFO - ├── Loss: 5.0611 2025-08-31 03:05:16 - pico-train - INFO - ├── Learning Rate: 1.77e-04 2025-08-31 03:05:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:06:10 - pico-train - INFO - Step 23700 -- 🔄 Training Metrics 2025-08-31 03:06:10 - pico-train - INFO - ├── Loss: 5.0502 2025-08-31 03:06:10 - pico-train - INFO - ├── Learning Rate: 1.77e-04 2025-08-31 03:06:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:07:04 - pico-train - INFO - Step 23800 -- 🔄 Training Metrics 2025-08-31 03:07:04 - pico-train - INFO - ├── Loss: 5.0518 2025-08-31 03:07:04 - pico-train - INFO - ├── Learning Rate: 1.77e-04 2025-08-31 03:07:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:07:58 - pico-train - INFO - Step 23900 -- 🔄 Training Metrics 2025-08-31 03:07:58 - pico-train - INFO - ├── Loss: 5.0439 2025-08-31 03:07:58 - pico-train - INFO - ├── Learning Rate: 1.76e-04 2025-08-31 03:07:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:08:51 - pico-train - INFO - Step 24000 -- 💾 Saving Checkpoint 2025-08-31 03:10:41 - pico-train - INFO - Step 24000 -- 📊 Evaluation Results 2025-08-31 03:10:41 - pico-train - INFO - └── paloma: inf 2025-08-31 03:10:41 - pico-train - INFO - Step 24000 -- 🔄 Training Metrics 2025-08-31 03:10:41 - pico-train - INFO - ├── Loss: 5.0522 2025-08-31 03:10:41 - pico-train - INFO - ├── Learning Rate: 1.76e-04 2025-08-31 03:10:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:10:41 - pico-train - INFO - Step 24000 -- 📈 Saving Learning Dynamics 2025-08-31 03:11:38 - pico-train - INFO - Step 24100 -- 🔄 Training Metrics 2025-08-31 03:11:38 - pico-train - INFO - ├── Loss: 5.0477 2025-08-31 03:11:38 - pico-train - INFO - ├── Learning Rate: 1.76e-04 2025-08-31 03:11:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:12:32 - pico-train - INFO - Step 24200 -- 🔄 Training Metrics 2025-08-31 03:12:32 - pico-train - INFO - ├── Loss: 5.0433 2025-08-31 03:12:32 - pico-train - INFO - ├── Learning Rate: 1.76e-04 2025-08-31 03:12:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:13:26 - pico-train - INFO - Step 24300 -- 🔄 Training Metrics 2025-08-31 03:13:26 - pico-train - INFO - ├── Loss: 5.0522 2025-08-31 03:13:26 - pico-train - INFO - ├── Learning Rate: 1.76e-04 2025-08-31 03:13:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:14:21 - pico-train - INFO - Step 24400 -- 🔄 Training Metrics 2025-08-31 03:14:21 - pico-train - INFO - ├── Loss: 5.0452 2025-08-31 03:14:21 - pico-train - INFO - ├── Learning Rate: 1.75e-04 2025-08-31 03:14:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:15:15 - pico-train - INFO - Step 24500 -- 🔄 Training Metrics 2025-08-31 03:15:15 - pico-train - INFO - ├── Loss: 5.0629 2025-08-31 03:15:15 - pico-train - INFO - ├── Learning Rate: 1.75e-04 2025-08-31 03:15:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:16:09 - pico-train - INFO - Step 24600 -- 🔄 Training Metrics 2025-08-31 03:16:09 - pico-train - INFO - ├── Loss: 5.0439 2025-08-31 03:16:09 - pico-train - INFO - ├── Learning Rate: 1.75e-04 2025-08-31 03:16:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:17:03 - pico-train - INFO - Step 24700 -- 🔄 Training Metrics 2025-08-31 03:17:03 - pico-train - INFO - ├── Loss: 5.0322 2025-08-31 03:17:03 - pico-train - INFO - ├── Learning Rate: 1.75e-04 2025-08-31 03:17:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:17:56 - pico-train - INFO - Step 24800 -- 🔄 Training Metrics 2025-08-31 03:17:56 - pico-train - INFO - ├── Loss: 5.0504 2025-08-31 03:17:56 - pico-train - INFO - ├── Learning Rate: 1.74e-04 2025-08-31 03:17:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:18:51 - pico-train - INFO - Step 24900 -- 🔄 Training Metrics 2025-08-31 03:18:51 - pico-train - INFO - ├── Loss: 5.0361 2025-08-31 03:18:51 - pico-train - INFO - ├── Learning Rate: 1.74e-04 2025-08-31 03:18:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:19:46 - pico-train - INFO - Step 25000 -- 🔄 Training Metrics 2025-08-31 03:19:46 - pico-train - INFO - ├── Loss: 5.0214 2025-08-31 03:19:46 - pico-train - INFO - ├── Learning Rate: 1.74e-04 2025-08-31 03:19:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:20:41 - pico-train - INFO - Step 25100 -- 🔄 Training Metrics 2025-08-31 03:20:41 - pico-train - INFO - ├── Loss: 5.0437 2025-08-31 03:20:41 - pico-train - INFO - ├── Learning Rate: 1.74e-04 2025-08-31 03:20:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:21:36 - pico-train - INFO - Step 25200 -- 🔄 Training Metrics 2025-08-31 03:21:36 - pico-train - INFO - ├── Loss: 5.0433 2025-08-31 03:21:36 - pico-train - INFO - ├── Learning Rate: 1.74e-04 2025-08-31 03:21:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:22:31 - pico-train - INFO - Step 25300 -- 🔄 Training Metrics 2025-08-31 03:22:31 - pico-train - INFO - ├── Loss: 5.0522 2025-08-31 03:22:31 - pico-train - INFO - ├── Learning Rate: 1.73e-04 2025-08-31 03:22:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:23:26 - pico-train - INFO - Step 25400 -- 🔄 Training Metrics 2025-08-31 03:23:26 - pico-train - INFO - ├── Loss: 5.0399 2025-08-31 03:23:26 - pico-train - INFO - ├── Learning Rate: 1.73e-04 2025-08-31 03:23:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:24:21 - pico-train - INFO - Step 25500 -- 🔄 Training Metrics 2025-08-31 03:24:21 - pico-train - INFO - ├── Loss: 5.0391 2025-08-31 03:24:21 - pico-train - INFO - ├── Learning Rate: 1.73e-04 2025-08-31 03:24:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:25:18 - pico-train - INFO - Step 25600 -- 🔄 Training Metrics 2025-08-31 03:25:18 - pico-train - INFO - ├── Loss: 5.0489 2025-08-31 03:25:18 - pico-train - INFO - ├── Learning Rate: 1.73e-04 2025-08-31 03:25:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:26:12 - pico-train - INFO - Step 25700 -- 🔄 Training Metrics 2025-08-31 03:26:12 - pico-train - INFO - ├── Loss: 5.0378 2025-08-31 03:26:12 - pico-train - INFO - ├── Learning Rate: 1.73e-04 2025-08-31 03:26:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:27:08 - pico-train - INFO - Step 25800 -- 🔄 Training Metrics 2025-08-31 03:27:08 - pico-train - INFO - ├── Loss: 5.0416 2025-08-31 03:27:08 - pico-train - INFO - ├── Learning Rate: 1.72e-04 2025-08-31 03:27:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:28:03 - pico-train - INFO - Step 25900 -- 🔄 Training Metrics 2025-08-31 03:28:03 - pico-train - INFO - ├── Loss: 5.0106 2025-08-31 03:28:03 - pico-train - INFO - ├── Learning Rate: 1.72e-04 2025-08-31 03:28:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:28:58 - pico-train - INFO - Step 26000 -- 💾 Saving Checkpoint 2025-08-31 03:30:52 - pico-train - INFO - Step 26000 -- 📊 Evaluation Results 2025-08-31 03:30:52 - pico-train - INFO - └── paloma: inf 2025-08-31 03:30:53 - pico-train - INFO - Step 26000 -- 🔄 Training Metrics 2025-08-31 03:30:53 - pico-train - INFO - ├── Loss: 5.0129 2025-08-31 03:30:53 - pico-train - INFO - ├── Learning Rate: 1.72e-04 2025-08-31 03:30:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:30:53 - pico-train - INFO - Step 26000 -- 📈 Saving Learning Dynamics 2025-08-31 03:31:50 - pico-train - INFO - Step 26100 -- 🔄 Training Metrics 2025-08-31 03:31:50 - pico-train - INFO - ├── Loss: 5.0560 2025-08-31 03:31:50 - pico-train - INFO - ├── Learning Rate: 1.72e-04 2025-08-31 03:31:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:32:45 - pico-train - INFO - Step 26200 -- 🔄 Training Metrics 2025-08-31 03:32:45 - pico-train - INFO - ├── Loss: 5.0231 2025-08-31 03:32:45 - pico-train - INFO - ├── Learning Rate: 1.71e-04 2025-08-31 03:32:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:33:40 - pico-train - INFO - Step 26300 -- 🔄 Training Metrics 2025-08-31 03:33:40 - pico-train - INFO - ├── Loss: 5.0203 2025-08-31 03:33:40 - pico-train - INFO - ├── Learning Rate: 1.71e-04 2025-08-31 03:33:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:34:36 - pico-train - INFO - Step 26400 -- 🔄 Training Metrics 2025-08-31 03:34:36 - pico-train - INFO - ├── Loss: 5.0435 2025-08-31 03:34:36 - pico-train - INFO - ├── Learning Rate: 1.71e-04 2025-08-31 03:34:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:35:31 - pico-train - INFO - Step 26500 -- 🔄 Training Metrics 2025-08-31 03:35:31 - pico-train - INFO - ├── Loss: 5.0079 2025-08-31 03:35:31 - pico-train - INFO - ├── Learning Rate: 1.71e-04 2025-08-31 03:35:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:36:26 - pico-train - INFO - Step 26600 -- 🔄 Training Metrics 2025-08-31 03:36:26 - pico-train - INFO - ├── Loss: 5.0246 2025-08-31 03:36:26 - pico-train - INFO - ├── Learning Rate: 1.70e-04 2025-08-31 03:36:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:37:21 - pico-train - INFO - Step 26700 -- 🔄 Training Metrics 2025-08-31 03:37:21 - pico-train - INFO - ├── Loss: 5.0400 2025-08-31 03:37:21 - pico-train - INFO - ├── Learning Rate: 1.70e-04 2025-08-31 03:37:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:38:17 - pico-train - INFO - Step 26800 -- 🔄 Training Metrics 2025-08-31 03:38:17 - pico-train - INFO - ├── Loss: 5.0359 2025-08-31 03:38:17 - pico-train - INFO - ├── Learning Rate: 1.70e-04 2025-08-31 03:38:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:39:13 - pico-train - INFO - Step 26900 -- 🔄 Training Metrics 2025-08-31 03:39:13 - pico-train - INFO - ├── Loss: 4.9974 2025-08-31 03:39:13 - pico-train - INFO - ├── Learning Rate: 1.70e-04 2025-08-31 03:39:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:40:08 - pico-train - INFO - Step 27000 -- 🔄 Training Metrics 2025-08-31 03:40:08 - pico-train - INFO - ├── Loss: 5.0186 2025-08-31 03:40:08 - pico-train - INFO - ├── Learning Rate: 1.70e-04 2025-08-31 03:40:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:41:03 - pico-train - INFO - Step 27100 -- 🔄 Training Metrics 2025-08-31 03:41:03 - pico-train - INFO - ├── Loss: 5.0101 2025-08-31 03:41:03 - pico-train - INFO - ├── Learning Rate: 1.69e-04 2025-08-31 03:41:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:41:59 - pico-train - INFO - Step 27200 -- 🔄 Training Metrics 2025-08-31 03:41:59 - pico-train - INFO - ├── Loss: 5.0228 2025-08-31 03:41:59 - pico-train - INFO - ├── Learning Rate: 1.69e-04 2025-08-31 03:41:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:42:54 - pico-train - INFO - Step 27300 -- 🔄 Training Metrics 2025-08-31 03:42:54 - pico-train - INFO - ├── Loss: 5.0295 2025-08-31 03:42:54 - pico-train - INFO - ├── Learning Rate: 1.69e-04 2025-08-31 03:42:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:43:49 - pico-train - INFO - Step 27400 -- 🔄 Training Metrics 2025-08-31 03:43:49 - pico-train - INFO - ├── Loss: 5.0014 2025-08-31 03:43:49 - pico-train - INFO - ├── Learning Rate: 1.69e-04 2025-08-31 03:43:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:44:44 - pico-train - INFO - Step 27500 -- 🔄 Training Metrics 2025-08-31 03:44:44 - pico-train - INFO - ├── Loss: 5.0173 2025-08-31 03:44:44 - pico-train - INFO - ├── Learning Rate: 1.68e-04 2025-08-31 03:44:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:45:40 - pico-train - INFO - Step 27600 -- 🔄 Training Metrics 2025-08-31 03:45:40 - pico-train - INFO - ├── Loss: 4.9924 2025-08-31 03:45:40 - pico-train - INFO - ├── Learning Rate: 1.68e-04 2025-08-31 03:45:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:46:35 - pico-train - INFO - Step 27700 -- 🔄 Training Metrics 2025-08-31 03:46:35 - pico-train - INFO - ├── Loss: 5.0067 2025-08-31 03:46:35 - pico-train - INFO - ├── Learning Rate: 1.68e-04 2025-08-31 03:46:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:47:30 - pico-train - INFO - Step 27800 -- 🔄 Training Metrics 2025-08-31 03:47:30 - pico-train - INFO - ├── Loss: 5.0232 2025-08-31 03:47:30 - pico-train - INFO - ├── Learning Rate: 1.68e-04 2025-08-31 03:47:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:48:26 - pico-train - INFO - Step 27900 -- 🔄 Training Metrics 2025-08-31 03:48:26 - pico-train - INFO - ├── Loss: 5.0190 2025-08-31 03:48:26 - pico-train - INFO - ├── Learning Rate: 1.67e-04 2025-08-31 03:48:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:49:20 - pico-train - INFO - Step 28000 -- 💾 Saving Checkpoint 2025-08-31 03:51:25 - pico-train - INFO - Step 28000 -- 📊 Evaluation Results 2025-08-31 03:51:25 - pico-train - INFO - └── paloma: inf 2025-08-31 03:51:25 - pico-train - INFO - Step 28000 -- 🔄 Training Metrics 2025-08-31 03:51:25 - pico-train - INFO - ├── Loss: 5.0087 2025-08-31 03:51:25 - pico-train - INFO - ├── Learning Rate: 1.67e-04 2025-08-31 03:51:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:51:25 - pico-train - INFO - Step 28000 -- 📈 Saving Learning Dynamics 2025-08-31 03:52:22 - pico-train - INFO - Step 28100 -- 🔄 Training Metrics 2025-08-31 03:52:22 - pico-train - INFO - ├── Loss: 4.9986 2025-08-31 03:52:22 - pico-train - INFO - ├── Learning Rate: 1.67e-04 2025-08-31 03:52:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:53:17 - pico-train - INFO - Step 28200 -- 🔄 Training Metrics 2025-08-31 03:53:17 - pico-train - INFO - ├── Loss: 5.0120 2025-08-31 03:53:17 - pico-train - INFO - ├── Learning Rate: 1.67e-04 2025-08-31 03:53:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:54:11 - pico-train - INFO - Step 28300 -- 🔄 Training Metrics 2025-08-31 03:54:11 - pico-train - INFO - ├── Loss: 4.9886 2025-08-31 03:54:11 - pico-train - INFO - ├── Learning Rate: 1.67e-04 2025-08-31 03:54:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:55:05 - pico-train - INFO - Step 28400 -- 🔄 Training Metrics 2025-08-31 03:55:05 - pico-train - INFO - ├── Loss: 5.0119 2025-08-31 03:55:05 - pico-train - INFO - ├── Learning Rate: 1.66e-04 2025-08-31 03:55:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:55:59 - pico-train - INFO - Step 28500 -- 🔄 Training Metrics 2025-08-31 03:55:59 - pico-train - INFO - ├── Loss: 5.0120 2025-08-31 03:55:59 - pico-train - INFO - ├── Learning Rate: 1.66e-04 2025-08-31 03:55:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:56:53 - pico-train - INFO - Step 28600 -- 🔄 Training Metrics 2025-08-31 03:56:53 - pico-train - INFO - ├── Loss: 5.0072 2025-08-31 03:56:53 - pico-train - INFO - ├── Learning Rate: 1.66e-04 2025-08-31 03:56:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:57:47 - pico-train - INFO - Step 28700 -- 🔄 Training Metrics 2025-08-31 03:57:47 - pico-train - INFO - ├── Loss: 5.0099 2025-08-31 03:57:47 - pico-train - INFO - ├── Learning Rate: 1.66e-04 2025-08-31 03:57:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:58:41 - pico-train - INFO - Step 28800 -- 🔄 Training Metrics 2025-08-31 03:58:41 - pico-train - INFO - ├── Loss: 5.0086 2025-08-31 03:58:41 - pico-train - INFO - ├── Learning Rate: 1.65e-04 2025-08-31 03:58:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 03:59:35 - pico-train - INFO - Step 28900 -- 🔄 Training Metrics 2025-08-31 03:59:35 - pico-train - INFO - ├── Loss: 4.9908 2025-08-31 03:59:35 - pico-train - INFO - ├── Learning Rate: 1.65e-04 2025-08-31 03:59:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:00:29 - pico-train - INFO - Step 29000 -- 🔄 Training Metrics 2025-08-31 04:00:29 - pico-train - INFO - ├── Loss: 4.9947 2025-08-31 04:00:29 - pico-train - INFO - ├── Learning Rate: 1.65e-04 2025-08-31 04:00:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:01:24 - pico-train - INFO - Step 29100 -- 🔄 Training Metrics 2025-08-31 04:01:24 - pico-train - INFO - ├── Loss: 5.0001 2025-08-31 04:01:24 - pico-train - INFO - ├── Learning Rate: 1.65e-04 2025-08-31 04:01:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:02:18 - pico-train - INFO - Step 29200 -- 🔄 Training Metrics 2025-08-31 04:02:18 - pico-train - INFO - ├── Loss: 4.9991 2025-08-31 04:02:18 - pico-train - INFO - ├── Learning Rate: 1.64e-04 2025-08-31 04:02:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:03:12 - pico-train - INFO - Step 29300 -- 🔄 Training Metrics 2025-08-31 04:03:12 - pico-train - INFO - ├── Loss: 4.9885 2025-08-31 04:03:12 - pico-train - INFO - ├── Learning Rate: 1.64e-04 2025-08-31 04:03:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:04:06 - pico-train - INFO - Step 29400 -- 🔄 Training Metrics 2025-08-31 04:04:06 - pico-train - INFO - ├── Loss: 4.9985 2025-08-31 04:04:06 - pico-train - INFO - ├── Learning Rate: 1.64e-04 2025-08-31 04:04:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:05:02 - pico-train - INFO - Step 29500 -- 🔄 Training Metrics 2025-08-31 04:05:02 - pico-train - INFO - ├── Loss: 4.9928 2025-08-31 04:05:02 - pico-train - INFO - ├── Learning Rate: 1.64e-04 2025-08-31 04:05:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:05:56 - pico-train - INFO - Step 29600 -- 🔄 Training Metrics 2025-08-31 04:05:56 - pico-train - INFO - ├── Loss: 5.0076 2025-08-31 04:05:56 - pico-train - INFO - ├── Learning Rate: 1.63e-04 2025-08-31 04:05:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:06:50 - pico-train - INFO - Step 29700 -- 🔄 Training Metrics 2025-08-31 04:06:50 - pico-train - INFO - ├── Loss: 4.9919 2025-08-31 04:06:50 - pico-train - INFO - ├── Learning Rate: 1.63e-04 2025-08-31 04:06:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:07:44 - pico-train - INFO - Step 29800 -- 🔄 Training Metrics 2025-08-31 04:07:44 - pico-train - INFO - ├── Loss: 5.0125 2025-08-31 04:07:44 - pico-train - INFO - ├── Learning Rate: 1.63e-04 2025-08-31 04:07:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:08:38 - pico-train - INFO - Step 29900 -- 🔄 Training Metrics 2025-08-31 04:08:38 - pico-train - INFO - ├── Loss: 4.9857 2025-08-31 04:08:38 - pico-train - INFO - ├── Learning Rate: 1.63e-04 2025-08-31 04:08:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:09:32 - pico-train - INFO - Step 30000 -- 💾 Saving Checkpoint 2025-08-31 04:11:22 - pico-train - INFO - Step 30000 -- 📊 Evaluation Results 2025-08-31 04:11:22 - pico-train - INFO - └── paloma: inf 2025-08-31 04:11:22 - pico-train - INFO - Step 30000 -- 🔄 Training Metrics 2025-08-31 04:11:22 - pico-train - INFO - ├── Loss: 4.9901 2025-08-31 04:11:22 - pico-train - INFO - ├── Learning Rate: 1.62e-04 2025-08-31 04:11:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:11:22 - pico-train - INFO - Step 30000 -- 📈 Saving Learning Dynamics 2025-08-31 04:12:19 - pico-train - INFO - Step 30100 -- 🔄 Training Metrics 2025-08-31 04:12:19 - pico-train - INFO - ├── Loss: 4.9848 2025-08-31 04:12:19 - pico-train - INFO - ├── Learning Rate: 1.62e-04 2025-08-31 04:12:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:13:14 - pico-train - INFO - Step 30200 -- 🔄 Training Metrics 2025-08-31 04:13:14 - pico-train - INFO - ├── Loss: 4.9702 2025-08-31 04:13:14 - pico-train - INFO - ├── Learning Rate: 1.62e-04 2025-08-31 04:13:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:14:07 - pico-train - INFO - Step 30300 -- 🔄 Training Metrics 2025-08-31 04:14:07 - pico-train - INFO - ├── Loss: 4.9776 2025-08-31 04:14:07 - pico-train - INFO - ├── Learning Rate: 1.62e-04 2025-08-31 04:14:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:15:02 - pico-train - INFO - Step 30400 -- 🔄 Training Metrics 2025-08-31 04:15:02 - pico-train - INFO - ├── Loss: 4.9791 2025-08-31 04:15:02 - pico-train - INFO - ├── Learning Rate: 1.61e-04 2025-08-31 04:15:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:15:56 - pico-train - INFO - Step 30500 -- 🔄 Training Metrics 2025-08-31 04:15:56 - pico-train - INFO - ├── Loss: 4.9795 2025-08-31 04:15:56 - pico-train - INFO - ├── Learning Rate: 1.61e-04 2025-08-31 04:15:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:16:50 - pico-train - INFO - Step 30600 -- 🔄 Training Metrics 2025-08-31 04:16:50 - pico-train - INFO - ├── Loss: 4.9955 2025-08-31 04:16:50 - pico-train - INFO - ├── Learning Rate: 1.61e-04 2025-08-31 04:16:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:17:44 - pico-train - INFO - Step 30700 -- 🔄 Training Metrics 2025-08-31 04:17:44 - pico-train - INFO - ├── Loss: 4.9795 2025-08-31 04:17:44 - pico-train - INFO - ├── Learning Rate: 1.61e-04 2025-08-31 04:17:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:18:39 - pico-train - INFO - Step 30800 -- 🔄 Training Metrics 2025-08-31 04:18:39 - pico-train - INFO - ├── Loss: 4.9634 2025-08-31 04:18:39 - pico-train - INFO - ├── Learning Rate: 1.60e-04 2025-08-31 04:18:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:19:34 - pico-train - INFO - Step 30900 -- 🔄 Training Metrics 2025-08-31 04:19:34 - pico-train - INFO - ├── Loss: 4.9775 2025-08-31 04:19:34 - pico-train - INFO - ├── Learning Rate: 1.60e-04 2025-08-31 04:19:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:20:28 - pico-train - INFO - Step 31000 -- 🔄 Training Metrics 2025-08-31 04:20:28 - pico-train - INFO - ├── Loss: 4.9883 2025-08-31 04:20:28 - pico-train - INFO - ├── Learning Rate: 1.60e-04 2025-08-31 04:20:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:21:22 - pico-train - INFO - Step 31100 -- 🔄 Training Metrics 2025-08-31 04:21:22 - pico-train - INFO - ├── Loss: 4.9492 2025-08-31 04:21:22 - pico-train - INFO - ├── Learning Rate: 1.60e-04 2025-08-31 04:21:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:22:16 - pico-train - INFO - Step 31200 -- 🔄 Training Metrics 2025-08-31 04:22:16 - pico-train - INFO - ├── Loss: 4.9783 2025-08-31 04:22:16 - pico-train - INFO - ├── Learning Rate: 1.59e-04 2025-08-31 04:22:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:23:10 - pico-train - INFO - Step 31300 -- 🔄 Training Metrics 2025-08-31 04:23:10 - pico-train - INFO - ├── Loss: 4.9732 2025-08-31 04:23:10 - pico-train - INFO - ├── Learning Rate: 1.59e-04 2025-08-31 04:23:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:24:04 - pico-train - INFO - Step 31400 -- 🔄 Training Metrics 2025-08-31 04:24:04 - pico-train - INFO - ├── Loss: 4.9639 2025-08-31 04:24:04 - pico-train - INFO - ├── Learning Rate: 1.59e-04 2025-08-31 04:24:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:24:58 - pico-train - INFO - Step 31500 -- 🔄 Training Metrics 2025-08-31 04:24:58 - pico-train - INFO - ├── Loss: 4.9706 2025-08-31 04:24:58 - pico-train - INFO - ├── Learning Rate: 1.59e-04 2025-08-31 04:24:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:25:53 - pico-train - INFO - Step 31600 -- 🔄 Training Metrics 2025-08-31 04:25:53 - pico-train - INFO - ├── Loss: 4.9916 2025-08-31 04:25:53 - pico-train - INFO - ├── Learning Rate: 1.58e-04 2025-08-31 04:25:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:26:46 - pico-train - INFO - Step 31700 -- 🔄 Training Metrics 2025-08-31 04:26:46 - pico-train - INFO - ├── Loss: 4.9708 2025-08-31 04:26:46 - pico-train - INFO - ├── Learning Rate: 1.58e-04 2025-08-31 04:26:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:27:41 - pico-train - INFO - Step 31800 -- 🔄 Training Metrics 2025-08-31 04:27:41 - pico-train - INFO - ├── Loss: 4.9447 2025-08-31 04:27:41 - pico-train - INFO - ├── Learning Rate: 1.58e-04 2025-08-31 04:27:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:28:35 - pico-train - INFO - Step 31900 -- 🔄 Training Metrics 2025-08-31 04:28:35 - pico-train - INFO - ├── Loss: 4.9892 2025-08-31 04:28:35 - pico-train - INFO - ├── Learning Rate: 1.57e-04 2025-08-31 04:28:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:29:28 - pico-train - INFO - Step 32000 -- 💾 Saving Checkpoint 2025-08-31 04:31:19 - pico-train - INFO - Step 32000 -- 📊 Evaluation Results 2025-08-31 04:31:19 - pico-train - INFO - └── paloma: inf 2025-08-31 04:31:20 - pico-train - INFO - Step 32000 -- 🔄 Training Metrics 2025-08-31 04:31:20 - pico-train - INFO - ├── Loss: 4.9585 2025-08-31 04:31:20 - pico-train - INFO - ├── Learning Rate: 1.57e-04 2025-08-31 04:31:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:31:20 - pico-train - INFO - Step 32000 -- 📈 Saving Learning Dynamics 2025-08-31 04:32:16 - pico-train - INFO - Step 32100 -- 🔄 Training Metrics 2025-08-31 04:32:16 - pico-train - INFO - ├── Loss: 4.9840 2025-08-31 04:32:16 - pico-train - INFO - ├── Learning Rate: 1.57e-04 2025-08-31 04:32:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:33:11 - pico-train - INFO - Step 32200 -- 🔄 Training Metrics 2025-08-31 04:33:11 - pico-train - INFO - ├── Loss: 4.9498 2025-08-31 04:33:11 - pico-train - INFO - ├── Learning Rate: 1.57e-04 2025-08-31 04:33:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:34:05 - pico-train - INFO - Step 32300 -- 🔄 Training Metrics 2025-08-31 04:34:05 - pico-train - INFO - ├── Loss: 4.9513 2025-08-31 04:34:05 - pico-train - INFO - ├── Learning Rate: 1.56e-04 2025-08-31 04:34:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:34:59 - pico-train - INFO - Step 32400 -- 🔄 Training Metrics 2025-08-31 04:34:59 - pico-train - INFO - ├── Loss: 4.9563 2025-08-31 04:34:59 - pico-train - INFO - ├── Learning Rate: 1.56e-04 2025-08-31 04:34:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:35:54 - pico-train - INFO - Step 32500 -- 🔄 Training Metrics 2025-08-31 04:35:54 - pico-train - INFO - ├── Loss: 4.9439 2025-08-31 04:35:54 - pico-train - INFO - ├── Learning Rate: 1.56e-04 2025-08-31 04:35:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:36:47 - pico-train - INFO - Step 32600 -- 🔄 Training Metrics 2025-08-31 04:36:47 - pico-train - INFO - ├── Loss: 4.9768 2025-08-31 04:36:47 - pico-train - INFO - ├── Learning Rate: 1.56e-04 2025-08-31 04:36:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:37:41 - pico-train - INFO - Step 32700 -- 🔄 Training Metrics 2025-08-31 04:37:41 - pico-train - INFO - ├── Loss: 4.9640 2025-08-31 04:37:41 - pico-train - INFO - ├── Learning Rate: 1.55e-04 2025-08-31 04:37:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:38:36 - pico-train - INFO - Step 32800 -- 🔄 Training Metrics 2025-08-31 04:38:36 - pico-train - INFO - ├── Loss: 4.9570 2025-08-31 04:38:36 - pico-train - INFO - ├── Learning Rate: 1.55e-04 2025-08-31 04:38:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:39:30 - pico-train - INFO - Step 32900 -- 🔄 Training Metrics 2025-08-31 04:39:30 - pico-train - INFO - ├── Loss: 4.9885 2025-08-31 04:39:30 - pico-train - INFO - ├── Learning Rate: 1.55e-04 2025-08-31 04:39:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:40:24 - pico-train - INFO - Step 33000 -- 🔄 Training Metrics 2025-08-31 04:40:24 - pico-train - INFO - ├── Loss: 4.9528 2025-08-31 04:40:24 - pico-train - INFO - ├── Learning Rate: 1.55e-04 2025-08-31 04:40:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:41:18 - pico-train - INFO - Step 33100 -- 🔄 Training Metrics 2025-08-31 04:41:18 - pico-train - INFO - ├── Loss: 4.9572 2025-08-31 04:41:18 - pico-train - INFO - ├── Learning Rate: 1.54e-04 2025-08-31 04:41:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:42:12 - pico-train - INFO - Step 33200 -- 🔄 Training Metrics 2025-08-31 04:42:12 - pico-train - INFO - ├── Loss: 4.9850 2025-08-31 04:42:12 - pico-train - INFO - ├── Learning Rate: 1.54e-04 2025-08-31 04:42:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:43:08 - pico-train - INFO - Step 33300 -- 🔄 Training Metrics 2025-08-31 04:43:08 - pico-train - INFO - ├── Loss: 4.9591 2025-08-31 04:43:08 - pico-train - INFO - ├── Learning Rate: 1.54e-04 2025-08-31 04:43:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:44:01 - pico-train - INFO - Step 33400 -- 🔄 Training Metrics 2025-08-31 04:44:01 - pico-train - INFO - ├── Loss: 4.9522 2025-08-31 04:44:01 - pico-train - INFO - ├── Learning Rate: 1.53e-04 2025-08-31 04:44:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:44:56 - pico-train - INFO - Step 33500 -- 🔄 Training Metrics 2025-08-31 04:44:56 - pico-train - INFO - ├── Loss: 4.9655 2025-08-31 04:44:56 - pico-train - INFO - ├── Learning Rate: 1.53e-04 2025-08-31 04:44:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:45:50 - pico-train - INFO - Step 33600 -- 🔄 Training Metrics 2025-08-31 04:45:50 - pico-train - INFO - ├── Loss: 4.9481 2025-08-31 04:45:50 - pico-train - INFO - ├── Learning Rate: 1.53e-04 2025-08-31 04:45:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:46:44 - pico-train - INFO - Step 33700 -- 🔄 Training Metrics 2025-08-31 04:46:44 - pico-train - INFO - ├── Loss: 4.9614 2025-08-31 04:46:44 - pico-train - INFO - ├── Learning Rate: 1.53e-04 2025-08-31 04:46:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:47:38 - pico-train - INFO - Step 33800 -- 🔄 Training Metrics 2025-08-31 04:47:38 - pico-train - INFO - ├── Loss: 4.9347 2025-08-31 04:47:38 - pico-train - INFO - ├── Learning Rate: 1.52e-04 2025-08-31 04:47:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:48:32 - pico-train - INFO - Step 33900 -- 🔄 Training Metrics 2025-08-31 04:48:32 - pico-train - INFO - ├── Loss: 4.9518 2025-08-31 04:48:32 - pico-train - INFO - ├── Learning Rate: 1.52e-04 2025-08-31 04:48:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:49:26 - pico-train - INFO - Step 34000 -- 💾 Saving Checkpoint 2025-08-31 04:51:22 - pico-train - INFO - Step 34000 -- 📊 Evaluation Results 2025-08-31 04:51:22 - pico-train - INFO - └── paloma: inf 2025-08-31 04:51:22 - pico-train - INFO - Step 34000 -- 🔄 Training Metrics 2025-08-31 04:51:22 - pico-train - INFO - ├── Loss: 4.9519 2025-08-31 04:51:22 - pico-train - INFO - ├── Learning Rate: 1.52e-04 2025-08-31 04:51:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:51:22 - pico-train - INFO - Step 34000 -- 📈 Saving Learning Dynamics 2025-08-31 04:52:19 - pico-train - INFO - Step 34100 -- 🔄 Training Metrics 2025-08-31 04:52:19 - pico-train - INFO - ├── Loss: 4.9583 2025-08-31 04:52:19 - pico-train - INFO - ├── Learning Rate: 1.52e-04 2025-08-31 04:52:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:53:14 - pico-train - INFO - Step 34200 -- 🔄 Training Metrics 2025-08-31 04:53:14 - pico-train - INFO - ├── Loss: 4.9762 2025-08-31 04:53:14 - pico-train - INFO - ├── Learning Rate: 1.51e-04 2025-08-31 04:53:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:54:08 - pico-train - INFO - Step 34300 -- 🔄 Training Metrics 2025-08-31 04:54:08 - pico-train - INFO - ├── Loss: 4.9671 2025-08-31 04:54:08 - pico-train - INFO - ├── Learning Rate: 1.51e-04 2025-08-31 04:54:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:55:02 - pico-train - INFO - Step 34400 -- 🔄 Training Metrics 2025-08-31 04:55:02 - pico-train - INFO - ├── Loss: 4.9657 2025-08-31 04:55:02 - pico-train - INFO - ├── Learning Rate: 1.51e-04 2025-08-31 04:55:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:55:56 - pico-train - INFO - Step 34500 -- 🔄 Training Metrics 2025-08-31 04:55:56 - pico-train - INFO - ├── Loss: 4.9669 2025-08-31 04:55:56 - pico-train - INFO - ├── Learning Rate: 1.50e-04 2025-08-31 04:55:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:56:52 - pico-train - INFO - Step 34600 -- 🔄 Training Metrics 2025-08-31 04:56:52 - pico-train - INFO - ├── Loss: 4.9462 2025-08-31 04:56:52 - pico-train - INFO - ├── Learning Rate: 1.50e-04 2025-08-31 04:56:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:57:47 - pico-train - INFO - Step 34700 -- 🔄 Training Metrics 2025-08-31 04:57:47 - pico-train - INFO - ├── Loss: 4.9412 2025-08-31 04:57:47 - pico-train - INFO - ├── Learning Rate: 1.50e-04 2025-08-31 04:57:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:58:42 - pico-train - INFO - Step 34800 -- 🔄 Training Metrics 2025-08-31 04:58:42 - pico-train - INFO - ├── Loss: 4.9356 2025-08-31 04:58:42 - pico-train - INFO - ├── Learning Rate: 1.50e-04 2025-08-31 04:58:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 04:59:37 - pico-train - INFO - Step 34900 -- 🔄 Training Metrics 2025-08-31 04:59:37 - pico-train - INFO - ├── Loss: 4.9345 2025-08-31 04:59:37 - pico-train - INFO - ├── Learning Rate: 1.49e-04 2025-08-31 04:59:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:00:33 - pico-train - INFO - Step 35000 -- 🔄 Training Metrics 2025-08-31 05:00:33 - pico-train - INFO - ├── Loss: 4.9402 2025-08-31 05:00:33 - pico-train - INFO - ├── Learning Rate: 1.49e-04 2025-08-31 05:00:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:01:28 - pico-train - INFO - Step 35100 -- 🔄 Training Metrics 2025-08-31 05:01:28 - pico-train - INFO - ├── Loss: 4.9421 2025-08-31 05:01:28 - pico-train - INFO - ├── Learning Rate: 1.49e-04 2025-08-31 05:01:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:02:23 - pico-train - INFO - Step 35200 -- 🔄 Training Metrics 2025-08-31 05:02:23 - pico-train - INFO - ├── Loss: 4.9274 2025-08-31 05:02:23 - pico-train - INFO - ├── Learning Rate: 1.49e-04 2025-08-31 05:02:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:03:18 - pico-train - INFO - Step 35300 -- 🔄 Training Metrics 2025-08-31 05:03:18 - pico-train - INFO - ├── Loss: 4.9680 2025-08-31 05:03:18 - pico-train - INFO - ├── Learning Rate: 1.48e-04 2025-08-31 05:03:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:04:14 - pico-train - INFO - Step 35400 -- 🔄 Training Metrics 2025-08-31 05:04:14 - pico-train - INFO - ├── Loss: 4.9663 2025-08-31 05:04:14 - pico-train - INFO - ├── Learning Rate: 1.48e-04 2025-08-31 05:04:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:05:09 - pico-train - INFO - Step 35500 -- 🔄 Training Metrics 2025-08-31 05:05:09 - pico-train - INFO - ├── Loss: 4.9410 2025-08-31 05:05:09 - pico-train - INFO - ├── Learning Rate: 1.48e-04 2025-08-31 05:05:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:06:04 - pico-train - INFO - Step 35600 -- 🔄 Training Metrics 2025-08-31 05:06:04 - pico-train - INFO - ├── Loss: 4.9500 2025-08-31 05:06:04 - pico-train - INFO - ├── Learning Rate: 1.47e-04 2025-08-31 05:06:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:07:00 - pico-train - INFO - Step 35700 -- 🔄 Training Metrics 2025-08-31 05:07:00 - pico-train - INFO - ├── Loss: 4.9276 2025-08-31 05:07:00 - pico-train - INFO - ├── Learning Rate: 1.47e-04 2025-08-31 05:07:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:07:55 - pico-train - INFO - Step 35800 -- 🔄 Training Metrics 2025-08-31 05:07:55 - pico-train - INFO - ├── Loss: 4.9673 2025-08-31 05:07:55 - pico-train - INFO - ├── Learning Rate: 1.47e-04 2025-08-31 05:07:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:08:50 - pico-train - INFO - Step 35900 -- 🔄 Training Metrics 2025-08-31 05:08:50 - pico-train - INFO - ├── Loss: 4.9396 2025-08-31 05:08:50 - pico-train - INFO - ├── Learning Rate: 1.47e-04 2025-08-31 05:08:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:09:44 - pico-train - INFO - Step 36000 -- 💾 Saving Checkpoint 2025-08-31 05:11:40 - pico-train - INFO - Step 36000 -- 📊 Evaluation Results 2025-08-31 05:11:40 - pico-train - INFO - └── paloma: inf 2025-08-31 05:11:41 - pico-train - INFO - Step 36000 -- 🔄 Training Metrics 2025-08-31 05:11:41 - pico-train - INFO - ├── Loss: 4.9613 2025-08-31 05:11:41 - pico-train - INFO - ├── Learning Rate: 1.46e-04 2025-08-31 05:11:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:11:41 - pico-train - INFO - Step 36000 -- 📈 Saving Learning Dynamics 2025-08-31 05:12:38 - pico-train - INFO - Step 36100 -- 🔄 Training Metrics 2025-08-31 05:12:38 - pico-train - INFO - ├── Loss: 4.9352 2025-08-31 05:12:38 - pico-train - INFO - ├── Learning Rate: 1.46e-04 2025-08-31 05:12:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:13:33 - pico-train - INFO - Step 36200 -- 🔄 Training Metrics 2025-08-31 05:13:33 - pico-train - INFO - ├── Loss: 4.9431 2025-08-31 05:13:33 - pico-train - INFO - ├── Learning Rate: 1.46e-04 2025-08-31 05:13:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:14:28 - pico-train - INFO - Step 36300 -- 🔄 Training Metrics 2025-08-31 05:14:28 - pico-train - INFO - ├── Loss: 4.9406 2025-08-31 05:14:28 - pico-train - INFO - ├── Learning Rate: 1.45e-04 2025-08-31 05:14:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:15:24 - pico-train - INFO - Step 36400 -- 🔄 Training Metrics 2025-08-31 05:15:24 - pico-train - INFO - ├── Loss: 4.9587 2025-08-31 05:15:24 - pico-train - INFO - ├── Learning Rate: 1.45e-04 2025-08-31 05:15:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:16:19 - pico-train - INFO - Step 36500 -- 🔄 Training Metrics 2025-08-31 05:16:19 - pico-train - INFO - ├── Loss: 4.9296 2025-08-31 05:16:19 - pico-train - INFO - ├── Learning Rate: 1.45e-04 2025-08-31 05:16:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:17:14 - pico-train - INFO - Step 36600 -- 🔄 Training Metrics 2025-08-31 05:17:14 - pico-train - INFO - ├── Loss: 4.9252 2025-08-31 05:17:14 - pico-train - INFO - ├── Learning Rate: 1.45e-04 2025-08-31 05:17:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:18:09 - pico-train - INFO - Step 36700 -- 🔄 Training Metrics 2025-08-31 05:18:09 - pico-train - INFO - ├── Loss: 4.9333 2025-08-31 05:18:09 - pico-train - INFO - ├── Learning Rate: 1.44e-04 2025-08-31 05:18:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:19:04 - pico-train - INFO - Step 36800 -- 🔄 Training Metrics 2025-08-31 05:19:04 - pico-train - INFO - ├── Loss: 4.9394 2025-08-31 05:19:04 - pico-train - INFO - ├── Learning Rate: 1.44e-04 2025-08-31 05:19:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:19:59 - pico-train - INFO - Step 36900 -- 🔄 Training Metrics 2025-08-31 05:19:59 - pico-train - INFO - ├── Loss: 4.9517 2025-08-31 05:19:59 - pico-train - INFO - ├── Learning Rate: 1.44e-04 2025-08-31 05:19:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:20:55 - pico-train - INFO - Step 37000 -- 🔄 Training Metrics 2025-08-31 05:20:55 - pico-train - INFO - ├── Loss: 4.9360 2025-08-31 05:20:55 - pico-train - INFO - ├── Learning Rate: 1.43e-04 2025-08-31 05:20:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:21:50 - pico-train - INFO - Step 37100 -- 🔄 Training Metrics 2025-08-31 05:21:50 - pico-train - INFO - ├── Loss: 4.9356 2025-08-31 05:21:50 - pico-train - INFO - ├── Learning Rate: 1.43e-04 2025-08-31 05:21:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:22:46 - pico-train - INFO - Step 37200 -- 🔄 Training Metrics 2025-08-31 05:22:46 - pico-train - INFO - ├── Loss: 4.9186 2025-08-31 05:22:46 - pico-train - INFO - ├── Learning Rate: 1.43e-04 2025-08-31 05:22:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:23:41 - pico-train - INFO - Step 37300 -- 🔄 Training Metrics 2025-08-31 05:23:41 - pico-train - INFO - ├── Loss: 4.9428 2025-08-31 05:23:41 - pico-train - INFO - ├── Learning Rate: 1.43e-04 2025-08-31 05:23:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:24:37 - pico-train - INFO - Step 37400 -- 🔄 Training Metrics 2025-08-31 05:24:37 - pico-train - INFO - ├── Loss: 4.9358 2025-08-31 05:24:37 - pico-train - INFO - ├── Learning Rate: 1.42e-04 2025-08-31 05:24:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:25:32 - pico-train - INFO - Step 37500 -- 🔄 Training Metrics 2025-08-31 05:25:32 - pico-train - INFO - ├── Loss: 4.9206 2025-08-31 05:25:32 - pico-train - INFO - ├── Learning Rate: 1.42e-04 2025-08-31 05:25:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:26:27 - pico-train - INFO - Step 37600 -- 🔄 Training Metrics 2025-08-31 05:26:27 - pico-train - INFO - ├── Loss: 4.9373 2025-08-31 05:26:27 - pico-train - INFO - ├── Learning Rate: 1.42e-04 2025-08-31 05:26:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:27:22 - pico-train - INFO - Step 37700 -- 🔄 Training Metrics 2025-08-31 05:27:22 - pico-train - INFO - ├── Loss: 4.9388 2025-08-31 05:27:22 - pico-train - INFO - ├── Learning Rate: 1.41e-04 2025-08-31 05:27:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:28:18 - pico-train - INFO - Step 37800 -- 🔄 Training Metrics 2025-08-31 05:28:18 - pico-train - INFO - ├── Loss: 4.9301 2025-08-31 05:28:18 - pico-train - INFO - ├── Learning Rate: 1.41e-04 2025-08-31 05:28:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:29:13 - pico-train - INFO - Step 37900 -- 🔄 Training Metrics 2025-08-31 05:29:13 - pico-train - INFO - ├── Loss: 4.9262 2025-08-31 05:29:13 - pico-train - INFO - ├── Learning Rate: 1.41e-04 2025-08-31 05:29:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:30:08 - pico-train - INFO - Step 38000 -- 💾 Saving Checkpoint 2025-08-31 05:32:03 - pico-train - INFO - Step 38000 -- 📊 Evaluation Results 2025-08-31 05:32:03 - pico-train - INFO - └── paloma: inf 2025-08-31 05:32:03 - pico-train - INFO - Step 38000 -- 🔄 Training Metrics 2025-08-31 05:32:03 - pico-train - INFO - ├── Loss: 4.9291 2025-08-31 05:32:03 - pico-train - INFO - ├── Learning Rate: 1.40e-04 2025-08-31 05:32:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:32:03 - pico-train - INFO - Step 38000 -- 📈 Saving Learning Dynamics 2025-08-31 05:32:59 - pico-train - INFO - Step 38100 -- 🔄 Training Metrics 2025-08-31 05:32:59 - pico-train - INFO - ├── Loss: 4.9438 2025-08-31 05:32:59 - pico-train - INFO - ├── Learning Rate: 1.40e-04 2025-08-31 05:32:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:33:54 - pico-train - INFO - Step 38200 -- 🔄 Training Metrics 2025-08-31 05:33:54 - pico-train - INFO - ├── Loss: 4.9386 2025-08-31 05:33:54 - pico-train - INFO - ├── Learning Rate: 1.40e-04 2025-08-31 05:33:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:34:48 - pico-train - INFO - Step 38300 -- 🔄 Training Metrics 2025-08-31 05:34:48 - pico-train - INFO - ├── Loss: 4.9439 2025-08-31 05:34:48 - pico-train - INFO - ├── Learning Rate: 1.40e-04 2025-08-31 05:34:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:35:43 - pico-train - INFO - Step 38400 -- 🔄 Training Metrics 2025-08-31 05:35:43 - pico-train - INFO - ├── Loss: 4.9467 2025-08-31 05:35:43 - pico-train - INFO - ├── Learning Rate: 1.39e-04 2025-08-31 05:35:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:36:37 - pico-train - INFO - Step 38500 -- 🔄 Training Metrics 2025-08-31 05:36:37 - pico-train - INFO - ├── Loss: 4.9535 2025-08-31 05:36:37 - pico-train - INFO - ├── Learning Rate: 1.39e-04 2025-08-31 05:36:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:37:31 - pico-train - INFO - Step 38600 -- 🔄 Training Metrics 2025-08-31 05:37:31 - pico-train - INFO - ├── Loss: 4.9062 2025-08-31 05:37:31 - pico-train - INFO - ├── Learning Rate: 1.39e-04 2025-08-31 05:37:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:38:25 - pico-train - INFO - Step 38700 -- 🔄 Training Metrics 2025-08-31 05:38:25 - pico-train - INFO - ├── Loss: 4.9308 2025-08-31 05:38:25 - pico-train - INFO - ├── Learning Rate: 1.38e-04 2025-08-31 05:38:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:39:19 - pico-train - INFO - Step 38800 -- 🔄 Training Metrics 2025-08-31 05:39:19 - pico-train - INFO - ├── Loss: 4.9026 2025-08-31 05:39:19 - pico-train - INFO - ├── Learning Rate: 1.38e-04 2025-08-31 05:39:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:40:14 - pico-train - INFO - Step 38900 -- 🔄 Training Metrics 2025-08-31 05:40:14 - pico-train - INFO - ├── Loss: 4.9223 2025-08-31 05:40:14 - pico-train - INFO - ├── Learning Rate: 1.38e-04 2025-08-31 05:40:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:41:07 - pico-train - INFO - Step 39000 -- 🔄 Training Metrics 2025-08-31 05:41:07 - pico-train - INFO - ├── Loss: 4.9212 2025-08-31 05:41:07 - pico-train - INFO - ├── Learning Rate: 1.38e-04 2025-08-31 05:41:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:42:01 - pico-train - INFO - Step 39100 -- 🔄 Training Metrics 2025-08-31 05:42:01 - pico-train - INFO - ├── Loss: 4.9162 2025-08-31 05:42:01 - pico-train - INFO - ├── Learning Rate: 1.37e-04 2025-08-31 05:42:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:42:56 - pico-train - INFO - Step 39200 -- 🔄 Training Metrics 2025-08-31 05:42:56 - pico-train - INFO - ├── Loss: 4.9188 2025-08-31 05:42:56 - pico-train - INFO - ├── Learning Rate: 1.37e-04 2025-08-31 05:42:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:43:50 - pico-train - INFO - Step 39300 -- 🔄 Training Metrics 2025-08-31 05:43:50 - pico-train - INFO - ├── Loss: 4.9239 2025-08-31 05:43:50 - pico-train - INFO - ├── Learning Rate: 1.37e-04 2025-08-31 05:43:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:44:44 - pico-train - INFO - Step 39400 -- 🔄 Training Metrics 2025-08-31 05:44:44 - pico-train - INFO - ├── Loss: 4.9182 2025-08-31 05:44:44 - pico-train - INFO - ├── Learning Rate: 1.36e-04 2025-08-31 05:44:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:45:38 - pico-train - INFO - Step 39500 -- 🔄 Training Metrics 2025-08-31 05:45:38 - pico-train - INFO - ├── Loss: 4.9126 2025-08-31 05:45:38 - pico-train - INFO - ├── Learning Rate: 1.36e-04 2025-08-31 05:45:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:46:32 - pico-train - INFO - Step 39600 -- 🔄 Training Metrics 2025-08-31 05:46:32 - pico-train - INFO - ├── Loss: 4.9245 2025-08-31 05:46:32 - pico-train - INFO - ├── Learning Rate: 1.36e-04 2025-08-31 05:46:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:47:27 - pico-train - INFO - Step 39700 -- 🔄 Training Metrics 2025-08-31 05:47:27 - pico-train - INFO - ├── Loss: 4.9458 2025-08-31 05:47:27 - pico-train - INFO - ├── Learning Rate: 1.35e-04 2025-08-31 05:47:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:48:21 - pico-train - INFO - Step 39800 -- 🔄 Training Metrics 2025-08-31 05:48:21 - pico-train - INFO - ├── Loss: 4.9233 2025-08-31 05:48:21 - pico-train - INFO - ├── Learning Rate: 1.35e-04 2025-08-31 05:48:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:49:15 - pico-train - INFO - Step 39900 -- 🔄 Training Metrics 2025-08-31 05:49:15 - pico-train - INFO - ├── Loss: 4.9178 2025-08-31 05:49:15 - pico-train - INFO - ├── Learning Rate: 1.35e-04 2025-08-31 05:49:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:50:09 - pico-train - INFO - Step 40000 -- 💾 Saving Checkpoint 2025-08-31 05:52:13 - pico-train - INFO - Step 40000 -- 📊 Evaluation Results 2025-08-31 05:52:13 - pico-train - INFO - └── paloma: inf 2025-08-31 05:52:14 - pico-train - INFO - Step 40000 -- 🔄 Training Metrics 2025-08-31 05:52:14 - pico-train - INFO - ├── Loss: 4.9145 2025-08-31 05:52:14 - pico-train - INFO - ├── Learning Rate: 1.35e-04 2025-08-31 05:52:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:52:14 - pico-train - INFO - Step 40000 -- 📈 Saving Learning Dynamics 2025-08-31 05:53:10 - pico-train - INFO - Step 40100 -- 🔄 Training Metrics 2025-08-31 05:53:10 - pico-train - INFO - ├── Loss: 4.9239 2025-08-31 05:53:10 - pico-train - INFO - ├── Learning Rate: 1.34e-04 2025-08-31 05:53:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:54:05 - pico-train - INFO - Step 40200 -- 🔄 Training Metrics 2025-08-31 05:54:05 - pico-train - INFO - ├── Loss: 4.9170 2025-08-31 05:54:05 - pico-train - INFO - ├── Learning Rate: 1.34e-04 2025-08-31 05:54:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:55:00 - pico-train - INFO - Step 40300 -- 🔄 Training Metrics 2025-08-31 05:55:00 - pico-train - INFO - ├── Loss: 4.9209 2025-08-31 05:55:00 - pico-train - INFO - ├── Learning Rate: 1.34e-04 2025-08-31 05:55:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:55:55 - pico-train - INFO - Step 40400 -- 🔄 Training Metrics 2025-08-31 05:55:55 - pico-train - INFO - ├── Loss: 4.9120 2025-08-31 05:55:55 - pico-train - INFO - ├── Learning Rate: 1.33e-04 2025-08-31 05:55:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:56:50 - pico-train - INFO - Step 40500 -- 🔄 Training Metrics 2025-08-31 05:56:50 - pico-train - INFO - ├── Loss: 4.9072 2025-08-31 05:56:50 - pico-train - INFO - ├── Learning Rate: 1.33e-04 2025-08-31 05:56:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:57:45 - pico-train - INFO - Step 40600 -- 🔄 Training Metrics 2025-08-31 05:57:45 - pico-train - INFO - ├── Loss: 4.9122 2025-08-31 05:57:45 - pico-train - INFO - ├── Learning Rate: 1.33e-04 2025-08-31 05:57:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:58:41 - pico-train - INFO - Step 40700 -- 🔄 Training Metrics 2025-08-31 05:58:41 - pico-train - INFO - ├── Loss: 4.9219 2025-08-31 05:58:41 - pico-train - INFO - ├── Learning Rate: 1.32e-04 2025-08-31 05:58:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 05:59:36 - pico-train - INFO - Step 40800 -- 🔄 Training Metrics 2025-08-31 05:59:36 - pico-train - INFO - ├── Loss: 4.8843 2025-08-31 05:59:36 - pico-train - INFO - ├── Learning Rate: 1.32e-04 2025-08-31 05:59:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:00:31 - pico-train - INFO - Step 40900 -- 🔄 Training Metrics 2025-08-31 06:00:31 - pico-train - INFO - ├── Loss: 4.9325 2025-08-31 06:00:31 - pico-train - INFO - ├── Learning Rate: 1.32e-04 2025-08-31 06:00:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:01:28 - pico-train - INFO - Step 41000 -- 🔄 Training Metrics 2025-08-31 06:01:28 - pico-train - INFO - ├── Loss: 4.8745 2025-08-31 06:01:28 - pico-train - INFO - ├── Learning Rate: 1.32e-04 2025-08-31 06:01:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:02:23 - pico-train - INFO - Step 41100 -- 🔄 Training Metrics 2025-08-31 06:02:23 - pico-train - INFO - ├── Loss: 4.9077 2025-08-31 06:02:23 - pico-train - INFO - ├── Learning Rate: 1.31e-04 2025-08-31 06:02:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:03:18 - pico-train - INFO - Step 41200 -- 🔄 Training Metrics 2025-08-31 06:03:18 - pico-train - INFO - ├── Loss: 4.9190 2025-08-31 06:03:18 - pico-train - INFO - ├── Learning Rate: 1.31e-04 2025-08-31 06:03:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:04:13 - pico-train - INFO - Step 41300 -- 🔄 Training Metrics 2025-08-31 06:04:13 - pico-train - INFO - ├── Loss: 4.9054 2025-08-31 06:04:13 - pico-train - INFO - ├── Learning Rate: 1.31e-04 2025-08-31 06:04:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:05:08 - pico-train - INFO - Step 41400 -- 🔄 Training Metrics 2025-08-31 06:05:08 - pico-train - INFO - ├── Loss: 4.8996 2025-08-31 06:05:08 - pico-train - INFO - ├── Learning Rate: 1.30e-04 2025-08-31 06:05:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:06:04 - pico-train - INFO - Step 41500 -- 🔄 Training Metrics 2025-08-31 06:06:04 - pico-train - INFO - ├── Loss: 4.9230 2025-08-31 06:06:04 - pico-train - INFO - ├── Learning Rate: 1.30e-04 2025-08-31 06:06:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:06:59 - pico-train - INFO - Step 41600 -- 🔄 Training Metrics 2025-08-31 06:06:59 - pico-train - INFO - ├── Loss: 4.9031 2025-08-31 06:06:59 - pico-train - INFO - ├── Learning Rate: 1.30e-04 2025-08-31 06:06:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:07:54 - pico-train - INFO - Step 41700 -- 🔄 Training Metrics 2025-08-31 06:07:54 - pico-train - INFO - ├── Loss: 4.9136 2025-08-31 06:07:54 - pico-train - INFO - ├── Learning Rate: 1.29e-04 2025-08-31 06:07:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:08:49 - pico-train - INFO - Step 41800 -- 🔄 Training Metrics 2025-08-31 06:08:49 - pico-train - INFO - ├── Loss: 4.9200 2025-08-31 06:08:49 - pico-train - INFO - ├── Learning Rate: 1.29e-04 2025-08-31 06:08:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:09:44 - pico-train - INFO - Step 41900 -- 🔄 Training Metrics 2025-08-31 06:09:44 - pico-train - INFO - ├── Loss: 4.8982 2025-08-31 06:09:44 - pico-train - INFO - ├── Learning Rate: 1.29e-04 2025-08-31 06:09:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:10:39 - pico-train - INFO - Step 42000 -- 💾 Saving Checkpoint 2025-08-31 06:12:41 - pico-train - INFO - Step 42000 -- 📊 Evaluation Results 2025-08-31 06:12:41 - pico-train - INFO - └── paloma: inf 2025-08-31 06:12:41 - pico-train - INFO - Step 42000 -- 🔄 Training Metrics 2025-08-31 06:12:41 - pico-train - INFO - ├── Loss: 4.8824 2025-08-31 06:12:41 - pico-train - INFO - ├── Learning Rate: 1.28e-04 2025-08-31 06:12:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:12:41 - pico-train - INFO - Step 42000 -- 📈 Saving Learning Dynamics 2025-08-31 06:13:38 - pico-train - INFO - Step 42100 -- 🔄 Training Metrics 2025-08-31 06:13:38 - pico-train - INFO - ├── Loss: 4.8966 2025-08-31 06:13:38 - pico-train - INFO - ├── Learning Rate: 1.28e-04 2025-08-31 06:13:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:14:32 - pico-train - INFO - Step 42200 -- 🔄 Training Metrics 2025-08-31 06:14:32 - pico-train - INFO - ├── Loss: 4.8971 2025-08-31 06:14:32 - pico-train - INFO - ├── Learning Rate: 1.28e-04 2025-08-31 06:14:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:15:27 - pico-train - INFO - Step 42300 -- 🔄 Training Metrics 2025-08-31 06:15:27 - pico-train - INFO - ├── Loss: 4.9271 2025-08-31 06:15:27 - pico-train - INFO - ├── Learning Rate: 1.28e-04 2025-08-31 06:15:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:16:21 - pico-train - INFO - Step 42400 -- 🔄 Training Metrics 2025-08-31 06:16:21 - pico-train - INFO - ├── Loss: 4.9046 2025-08-31 06:16:21 - pico-train - INFO - ├── Learning Rate: 1.27e-04 2025-08-31 06:16:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:17:16 - pico-train - INFO - Step 42500 -- 🔄 Training Metrics 2025-08-31 06:17:16 - pico-train - INFO - ├── Loss: 4.9010 2025-08-31 06:17:16 - pico-train - INFO - ├── Learning Rate: 1.27e-04 2025-08-31 06:17:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:18:10 - pico-train - INFO - Step 42600 -- 🔄 Training Metrics 2025-08-31 06:18:10 - pico-train - INFO - ├── Loss: 4.9314 2025-08-31 06:18:10 - pico-train - INFO - ├── Learning Rate: 1.27e-04 2025-08-31 06:18:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:19:04 - pico-train - INFO - Step 42700 -- 🔄 Training Metrics 2025-08-31 06:19:04 - pico-train - INFO - ├── Loss: 4.8923 2025-08-31 06:19:04 - pico-train - INFO - ├── Learning Rate: 1.26e-04 2025-08-31 06:19:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:19:58 - pico-train - INFO - Step 42800 -- 🔄 Training Metrics 2025-08-31 06:19:58 - pico-train - INFO - ├── Loss: 4.9007 2025-08-31 06:19:58 - pico-train - INFO - ├── Learning Rate: 1.26e-04 2025-08-31 06:19:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:20:53 - pico-train - INFO - Step 42900 -- 🔄 Training Metrics 2025-08-31 06:20:53 - pico-train - INFO - ├── Loss: 4.8903 2025-08-31 06:20:53 - pico-train - INFO - ├── Learning Rate: 1.26e-04 2025-08-31 06:20:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:21:47 - pico-train - INFO - Step 43000 -- 🔄 Training Metrics 2025-08-31 06:21:47 - pico-train - INFO - ├── Loss: 4.9252 2025-08-31 06:21:47 - pico-train - INFO - ├── Learning Rate: 1.25e-04 2025-08-31 06:21:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:22:41 - pico-train - INFO - Step 43100 -- 🔄 Training Metrics 2025-08-31 06:22:41 - pico-train - INFO - ├── Loss: 4.8924 2025-08-31 06:22:41 - pico-train - INFO - ├── Learning Rate: 1.25e-04 2025-08-31 06:22:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:23:35 - pico-train - INFO - Step 43200 -- 🔄 Training Metrics 2025-08-31 06:23:35 - pico-train - INFO - ├── Loss: 4.8955 2025-08-31 06:23:35 - pico-train - INFO - ├── Learning Rate: 1.25e-04 2025-08-31 06:23:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:24:29 - pico-train - INFO - Step 43300 -- 🔄 Training Metrics 2025-08-31 06:24:29 - pico-train - INFO - ├── Loss: 4.8651 2025-08-31 06:24:29 - pico-train - INFO - ├── Learning Rate: 1.24e-04 2025-08-31 06:24:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:25:24 - pico-train - INFO - Step 43400 -- 🔄 Training Metrics 2025-08-31 06:25:24 - pico-train - INFO - ├── Loss: 4.9018 2025-08-31 06:25:24 - pico-train - INFO - ├── Learning Rate: 1.24e-04 2025-08-31 06:25:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:26:18 - pico-train - INFO - Step 43500 -- 🔄 Training Metrics 2025-08-31 06:26:18 - pico-train - INFO - ├── Loss: 4.9001 2025-08-31 06:26:18 - pico-train - INFO - ├── Learning Rate: 1.24e-04 2025-08-31 06:26:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:27:14 - pico-train - INFO - Step 43600 -- 🔄 Training Metrics 2025-08-31 06:27:14 - pico-train - INFO - ├── Loss: 4.8980 2025-08-31 06:27:14 - pico-train - INFO - ├── Learning Rate: 1.24e-04 2025-08-31 06:27:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:28:09 - pico-train - INFO - Step 43700 -- 🔄 Training Metrics 2025-08-31 06:28:09 - pico-train - INFO - ├── Loss: 4.9205 2025-08-31 06:28:09 - pico-train - INFO - ├── Learning Rate: 1.23e-04 2025-08-31 06:28:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:29:05 - pico-train - INFO - Step 43800 -- 🔄 Training Metrics 2025-08-31 06:29:05 - pico-train - INFO - ├── Loss: 4.8935 2025-08-31 06:29:05 - pico-train - INFO - ├── Learning Rate: 1.23e-04 2025-08-31 06:29:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:30:00 - pico-train - INFO - Step 43900 -- 🔄 Training Metrics 2025-08-31 06:30:00 - pico-train - INFO - ├── Loss: 4.8905 2025-08-31 06:30:00 - pico-train - INFO - ├── Learning Rate: 1.23e-04 2025-08-31 06:30:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:30:54 - pico-train - INFO - Step 44000 -- 💾 Saving Checkpoint 2025-08-31 06:32:59 - pico-train - INFO - Step 44000 -- 📊 Evaluation Results 2025-08-31 06:32:59 - pico-train - INFO - └── paloma: inf 2025-08-31 06:32:59 - pico-train - INFO - Step 44000 -- 🔄 Training Metrics 2025-08-31 06:32:59 - pico-train - INFO - ├── Loss: 4.9013 2025-08-31 06:32:59 - pico-train - INFO - ├── Learning Rate: 1.22e-04 2025-08-31 06:32:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:32:59 - pico-train - INFO - Step 44000 -- 📈 Saving Learning Dynamics 2025-08-31 06:33:56 - pico-train - INFO - Step 44100 -- 🔄 Training Metrics 2025-08-31 06:33:56 - pico-train - INFO - ├── Loss: 4.8915 2025-08-31 06:33:56 - pico-train - INFO - ├── Learning Rate: 1.22e-04 2025-08-31 06:33:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:34:50 - pico-train - INFO - Step 44200 -- 🔄 Training Metrics 2025-08-31 06:34:50 - pico-train - INFO - ├── Loss: 4.8805 2025-08-31 06:34:50 - pico-train - INFO - ├── Learning Rate: 1.22e-04 2025-08-31 06:34:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:35:45 - pico-train - INFO - Step 44300 -- 🔄 Training Metrics 2025-08-31 06:35:45 - pico-train - INFO - ├── Loss: 4.8928 2025-08-31 06:35:45 - pico-train - INFO - ├── Learning Rate: 1.21e-04 2025-08-31 06:35:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:36:39 - pico-train - INFO - Step 44400 -- 🔄 Training Metrics 2025-08-31 06:36:39 - pico-train - INFO - ├── Loss: 4.8799 2025-08-31 06:36:39 - pico-train - INFO - ├── Learning Rate: 1.21e-04 2025-08-31 06:36:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:37:33 - pico-train - INFO - Step 44500 -- 🔄 Training Metrics 2025-08-31 06:37:33 - pico-train - INFO - ├── Loss: 4.9167 2025-08-31 06:37:33 - pico-train - INFO - ├── Learning Rate: 1.21e-04 2025-08-31 06:37:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:38:29 - pico-train - INFO - Step 44600 -- 🔄 Training Metrics 2025-08-31 06:38:29 - pico-train - INFO - ├── Loss: 4.8424 2025-08-31 06:38:29 - pico-train - INFO - ├── Learning Rate: 1.20e-04 2025-08-31 06:38:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:39:24 - pico-train - INFO - Step 44700 -- 🔄 Training Metrics 2025-08-31 06:39:24 - pico-train - INFO - ├── Loss: 4.8779 2025-08-31 06:39:24 - pico-train - INFO - ├── Learning Rate: 1.20e-04 2025-08-31 06:39:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:40:21 - pico-train - INFO - Step 44800 -- 🔄 Training Metrics 2025-08-31 06:40:21 - pico-train - INFO - ├── Loss: 4.9088 2025-08-31 06:40:21 - pico-train - INFO - ├── Learning Rate: 1.20e-04 2025-08-31 06:40:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:41:15 - pico-train - INFO - Step 44900 -- 🔄 Training Metrics 2025-08-31 06:41:15 - pico-train - INFO - ├── Loss: 4.9030 2025-08-31 06:41:15 - pico-train - INFO - ├── Learning Rate: 1.19e-04 2025-08-31 06:41:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:42:09 - pico-train - INFO - Step 45000 -- 🔄 Training Metrics 2025-08-31 06:42:09 - pico-train - INFO - ├── Loss: 4.8993 2025-08-31 06:42:09 - pico-train - INFO - ├── Learning Rate: 1.19e-04 2025-08-31 06:42:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:43:03 - pico-train - INFO - Step 45100 -- 🔄 Training Metrics 2025-08-31 06:43:03 - pico-train - INFO - ├── Loss: 4.8970 2025-08-31 06:43:03 - pico-train - INFO - ├── Learning Rate: 1.19e-04 2025-08-31 06:43:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:43:57 - pico-train - INFO - Step 45200 -- 🔄 Training Metrics 2025-08-31 06:43:57 - pico-train - INFO - ├── Loss: 4.8831 2025-08-31 06:43:57 - pico-train - INFO - ├── Learning Rate: 1.18e-04 2025-08-31 06:43:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:44:51 - pico-train - INFO - Step 45300 -- 🔄 Training Metrics 2025-08-31 06:44:51 - pico-train - INFO - ├── Loss: 4.8729 2025-08-31 06:44:51 - pico-train - INFO - ├── Learning Rate: 1.18e-04 2025-08-31 06:44:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:45:45 - pico-train - INFO - Step 45400 -- 🔄 Training Metrics 2025-08-31 06:45:45 - pico-train - INFO - ├── Loss: 4.8859 2025-08-31 06:45:45 - pico-train - INFO - ├── Learning Rate: 1.18e-04 2025-08-31 06:45:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:46:39 - pico-train - INFO - Step 45500 -- 🔄 Training Metrics 2025-08-31 06:46:39 - pico-train - INFO - ├── Loss: 4.9129 2025-08-31 06:46:39 - pico-train - INFO - ├── Learning Rate: 1.18e-04 2025-08-31 06:46:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:47:34 - pico-train - INFO - Step 45600 -- 🔄 Training Metrics 2025-08-31 06:47:34 - pico-train - INFO - ├── Loss: 4.8550 2025-08-31 06:47:34 - pico-train - INFO - ├── Learning Rate: 1.17e-04 2025-08-31 06:47:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:48:28 - pico-train - INFO - Step 45700 -- 🔄 Training Metrics 2025-08-31 06:48:28 - pico-train - INFO - ├── Loss: 4.8901 2025-08-31 06:48:28 - pico-train - INFO - ├── Learning Rate: 1.17e-04 2025-08-31 06:48:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:49:22 - pico-train - INFO - Step 45800 -- 🔄 Training Metrics 2025-08-31 06:49:22 - pico-train - INFO - ├── Loss: 4.8900 2025-08-31 06:49:22 - pico-train - INFO - ├── Learning Rate: 1.17e-04 2025-08-31 06:49:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:50:17 - pico-train - INFO - Step 45900 -- 🔄 Training Metrics 2025-08-31 06:50:17 - pico-train - INFO - ├── Loss: 4.8725 2025-08-31 06:50:17 - pico-train - INFO - ├── Learning Rate: 1.16e-04 2025-08-31 06:50:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:51:12 - pico-train - INFO - Step 46000 -- 💾 Saving Checkpoint 2025-08-31 06:53:09 - pico-train - INFO - Step 46000 -- 📊 Evaluation Results 2025-08-31 06:53:09 - pico-train - INFO - └── paloma: inf 2025-08-31 06:53:10 - pico-train - INFO - Step 46000 -- 🔄 Training Metrics 2025-08-31 06:53:10 - pico-train - INFO - ├── Loss: 4.8772 2025-08-31 06:53:10 - pico-train - INFO - ├── Learning Rate: 1.16e-04 2025-08-31 06:53:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:53:10 - pico-train - INFO - Step 46000 -- 📈 Saving Learning Dynamics 2025-08-31 06:54:09 - pico-train - INFO - Step 46100 -- 🔄 Training Metrics 2025-08-31 06:54:09 - pico-train - INFO - ├── Loss: 4.8649 2025-08-31 06:54:09 - pico-train - INFO - ├── Learning Rate: 1.16e-04 2025-08-31 06:54:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:55:03 - pico-train - INFO - Step 46200 -- 🔄 Training Metrics 2025-08-31 06:55:03 - pico-train - INFO - ├── Loss: 4.8980 2025-08-31 06:55:03 - pico-train - INFO - ├── Learning Rate: 1.15e-04 2025-08-31 06:55:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:55:59 - pico-train - INFO - Step 46300 -- 🔄 Training Metrics 2025-08-31 06:55:59 - pico-train - INFO - ├── Loss: 4.8867 2025-08-31 06:55:59 - pico-train - INFO - ├── Learning Rate: 1.15e-04 2025-08-31 06:55:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:56:54 - pico-train - INFO - Step 46400 -- 🔄 Training Metrics 2025-08-31 06:56:54 - pico-train - INFO - ├── Loss: 4.8807 2025-08-31 06:56:54 - pico-train - INFO - ├── Learning Rate: 1.15e-04 2025-08-31 06:56:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:57:49 - pico-train - INFO - Step 46500 -- 🔄 Training Metrics 2025-08-31 06:57:49 - pico-train - INFO - ├── Loss: 4.8779 2025-08-31 06:57:49 - pico-train - INFO - ├── Learning Rate: 1.14e-04 2025-08-31 06:57:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:58:45 - pico-train - INFO - Step 46600 -- 🔄 Training Metrics 2025-08-31 06:58:45 - pico-train - INFO - ├── Loss: 4.8908 2025-08-31 06:58:45 - pico-train - INFO - ├── Learning Rate: 1.14e-04 2025-08-31 06:58:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 06:59:40 - pico-train - INFO - Step 46700 -- 🔄 Training Metrics 2025-08-31 06:59:40 - pico-train - INFO - ├── Loss: 4.8882 2025-08-31 06:59:40 - pico-train - INFO - ├── Learning Rate: 1.14e-04 2025-08-31 06:59:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:00:35 - pico-train - INFO - Step 46800 -- 🔄 Training Metrics 2025-08-31 07:00:35 - pico-train - INFO - ├── Loss: 4.8877 2025-08-31 07:00:35 - pico-train - INFO - ├── Learning Rate: 1.13e-04 2025-08-31 07:00:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:01:31 - pico-train - INFO - Step 46900 -- 🔄 Training Metrics 2025-08-31 07:01:31 - pico-train - INFO - ├── Loss: 4.8686 2025-08-31 07:01:31 - pico-train - INFO - ├── Learning Rate: 1.13e-04 2025-08-31 07:01:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:02:26 - pico-train - INFO - Step 47000 -- 🔄 Training Metrics 2025-08-31 07:02:26 - pico-train - INFO - ├── Loss: 4.8701 2025-08-31 07:02:26 - pico-train - INFO - ├── Learning Rate: 1.13e-04 2025-08-31 07:02:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:03:22 - pico-train - INFO - Step 47100 -- 🔄 Training Metrics 2025-08-31 07:03:22 - pico-train - INFO - ├── Loss: 4.8670 2025-08-31 07:03:22 - pico-train - INFO - ├── Learning Rate: 1.12e-04 2025-08-31 07:03:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:04:16 - pico-train - INFO - Step 47200 -- 🔄 Training Metrics 2025-08-31 07:04:16 - pico-train - INFO - ├── Loss: 4.8849 2025-08-31 07:04:16 - pico-train - INFO - ├── Learning Rate: 1.12e-04 2025-08-31 07:04:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:05:12 - pico-train - INFO - Step 47300 -- 🔄 Training Metrics 2025-08-31 07:05:12 - pico-train - INFO - ├── Loss: 4.8665 2025-08-31 07:05:12 - pico-train - INFO - ├── Learning Rate: 1.12e-04 2025-08-31 07:05:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:06:08 - pico-train - INFO - Step 47400 -- 🔄 Training Metrics 2025-08-31 07:06:08 - pico-train - INFO - ├── Loss: 4.8595 2025-08-31 07:06:08 - pico-train - INFO - ├── Learning Rate: 1.12e-04 2025-08-31 07:06:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:07:04 - pico-train - INFO - Step 47500 -- 🔄 Training Metrics 2025-08-31 07:07:04 - pico-train - INFO - ├── Loss: 4.8680 2025-08-31 07:07:04 - pico-train - INFO - ├── Learning Rate: 1.11e-04 2025-08-31 07:07:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:07:58 - pico-train - INFO - Step 47600 -- 🔄 Training Metrics 2025-08-31 07:07:58 - pico-train - INFO - ├── Loss: 4.8867 2025-08-31 07:07:58 - pico-train - INFO - ├── Learning Rate: 1.11e-04 2025-08-31 07:07:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:08:54 - pico-train - INFO - Step 47700 -- 🔄 Training Metrics 2025-08-31 07:08:54 - pico-train - INFO - ├── Loss: 4.8761 2025-08-31 07:08:54 - pico-train - INFO - ├── Learning Rate: 1.11e-04 2025-08-31 07:08:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:09:49 - pico-train - INFO - Step 47800 -- 🔄 Training Metrics 2025-08-31 07:09:49 - pico-train - INFO - ├── Loss: 4.8965 2025-08-31 07:09:49 - pico-train - INFO - ├── Learning Rate: 1.10e-04 2025-08-31 07:09:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:10:45 - pico-train - INFO - Step 47900 -- 🔄 Training Metrics 2025-08-31 07:10:45 - pico-train - INFO - ├── Loss: 4.8890 2025-08-31 07:10:45 - pico-train - INFO - ├── Learning Rate: 1.10e-04 2025-08-31 07:10:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:11:39 - pico-train - INFO - Step 48000 -- 💾 Saving Checkpoint 2025-08-31 07:13:30 - pico-train - INFO - Step 48000 -- 📊 Evaluation Results 2025-08-31 07:13:30 - pico-train - INFO - └── paloma: inf 2025-08-31 07:13:31 - pico-train - INFO - Step 48000 -- 🔄 Training Metrics 2025-08-31 07:13:31 - pico-train - INFO - ├── Loss: 4.8801 2025-08-31 07:13:31 - pico-train - INFO - ├── Learning Rate: 1.10e-04 2025-08-31 07:13:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:13:31 - pico-train - INFO - Step 48000 -- 📈 Saving Learning Dynamics 2025-08-31 07:14:27 - pico-train - INFO - Step 48100 -- 🔄 Training Metrics 2025-08-31 07:14:27 - pico-train - INFO - ├── Loss: 4.8751 2025-08-31 07:14:27 - pico-train - INFO - ├── Learning Rate: 1.09e-04 2025-08-31 07:14:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:15:21 - pico-train - INFO - Step 48200 -- 🔄 Training Metrics 2025-08-31 07:15:21 - pico-train - INFO - ├── Loss: 4.8703 2025-08-31 07:15:21 - pico-train - INFO - ├── Learning Rate: 1.09e-04 2025-08-31 07:15:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:16:16 - pico-train - INFO - Step 48300 -- 🔄 Training Metrics 2025-08-31 07:16:16 - pico-train - INFO - ├── Loss: 4.8749 2025-08-31 07:16:16 - pico-train - INFO - ├── Learning Rate: 1.09e-04 2025-08-31 07:16:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:17:10 - pico-train - INFO - Step 48400 -- 🔄 Training Metrics 2025-08-31 07:17:10 - pico-train - INFO - ├── Loss: 4.8782 2025-08-31 07:17:10 - pico-train - INFO - ├── Learning Rate: 1.08e-04 2025-08-31 07:17:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:18:04 - pico-train - INFO - Step 48500 -- 🔄 Training Metrics 2025-08-31 07:18:04 - pico-train - INFO - ├── Loss: 4.8676 2025-08-31 07:18:04 - pico-train - INFO - ├── Learning Rate: 1.08e-04 2025-08-31 07:18:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:18:58 - pico-train - INFO - Step 48600 -- 🔄 Training Metrics 2025-08-31 07:18:58 - pico-train - INFO - ├── Loss: 4.8610 2025-08-31 07:18:58 - pico-train - INFO - ├── Learning Rate: 1.08e-04 2025-08-31 07:18:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:19:53 - pico-train - INFO - Step 48700 -- 🔄 Training Metrics 2025-08-31 07:19:53 - pico-train - INFO - ├── Loss: 4.8672 2025-08-31 07:19:53 - pico-train - INFO - ├── Learning Rate: 1.07e-04 2025-08-31 07:19:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:20:48 - pico-train - INFO - Step 48800 -- 🔄 Training Metrics 2025-08-31 07:20:48 - pico-train - INFO - ├── Loss: 4.8771 2025-08-31 07:20:48 - pico-train - INFO - ├── Learning Rate: 1.07e-04 2025-08-31 07:20:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:21:42 - pico-train - INFO - Step 48900 -- 🔄 Training Metrics 2025-08-31 07:21:42 - pico-train - INFO - ├── Loss: 4.8739 2025-08-31 07:21:42 - pico-train - INFO - ├── Learning Rate: 1.07e-04 2025-08-31 07:21:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:22:36 - pico-train - INFO - Step 49000 -- 🔄 Training Metrics 2025-08-31 07:22:36 - pico-train - INFO - ├── Loss: 4.8754 2025-08-31 07:22:36 - pico-train - INFO - ├── Learning Rate: 1.06e-04 2025-08-31 07:22:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:23:30 - pico-train - INFO - Step 49100 -- 🔄 Training Metrics 2025-08-31 07:23:30 - pico-train - INFO - ├── Loss: 4.8521 2025-08-31 07:23:30 - pico-train - INFO - ├── Learning Rate: 1.06e-04 2025-08-31 07:23:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:24:24 - pico-train - INFO - Step 49200 -- 🔄 Training Metrics 2025-08-31 07:24:24 - pico-train - INFO - ├── Loss: 4.8698 2025-08-31 07:24:24 - pico-train - INFO - ├── Learning Rate: 1.06e-04 2025-08-31 07:24:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:25:19 - pico-train - INFO - Step 49300 -- 🔄 Training Metrics 2025-08-31 07:25:19 - pico-train - INFO - ├── Loss: 4.8818 2025-08-31 07:25:19 - pico-train - INFO - ├── Learning Rate: 1.05e-04 2025-08-31 07:25:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:26:13 - pico-train - INFO - Step 49400 -- 🔄 Training Metrics 2025-08-31 07:26:13 - pico-train - INFO - ├── Loss: 4.8477 2025-08-31 07:26:13 - pico-train - INFO - ├── Learning Rate: 1.05e-04 2025-08-31 07:26:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:27:07 - pico-train - INFO - Step 49500 -- 🔄 Training Metrics 2025-08-31 07:27:07 - pico-train - INFO - ├── Loss: 4.8804 2025-08-31 07:27:07 - pico-train - INFO - ├── Learning Rate: 1.05e-04 2025-08-31 07:27:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:28:01 - pico-train - INFO - Step 49600 -- 🔄 Training Metrics 2025-08-31 07:28:01 - pico-train - INFO - ├── Loss: 4.8608 2025-08-31 07:28:01 - pico-train - INFO - ├── Learning Rate: 1.04e-04 2025-08-31 07:28:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:28:55 - pico-train - INFO - Step 49700 -- 🔄 Training Metrics 2025-08-31 07:28:55 - pico-train - INFO - ├── Loss: 4.8419 2025-08-31 07:28:55 - pico-train - INFO - ├── Learning Rate: 1.04e-04 2025-08-31 07:28:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:29:49 - pico-train - INFO - Step 49800 -- 🔄 Training Metrics 2025-08-31 07:29:49 - pico-train - INFO - ├── Loss: 4.8767 2025-08-31 07:29:49 - pico-train - INFO - ├── Learning Rate: 1.04e-04 2025-08-31 07:29:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:30:44 - pico-train - INFO - Step 49900 -- 🔄 Training Metrics 2025-08-31 07:30:44 - pico-train - INFO - ├── Loss: 4.8593 2025-08-31 07:30:44 - pico-train - INFO - ├── Learning Rate: 1.04e-04 2025-08-31 07:30:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:31:37 - pico-train - INFO - Step 50000 -- 💾 Saving Checkpoint 2025-08-31 07:33:41 - pico-train - INFO - Step 50000 -- 📊 Evaluation Results 2025-08-31 07:33:41 - pico-train - INFO - └── paloma: inf 2025-08-31 07:33:42 - pico-train - INFO - Step 50000 -- 🔄 Training Metrics 2025-08-31 07:33:42 - pico-train - INFO - ├── Loss: 4.8786 2025-08-31 07:33:42 - pico-train - INFO - ├── Learning Rate: 1.03e-04 2025-08-31 07:33:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:33:42 - pico-train - INFO - Step 50000 -- 📈 Saving Learning Dynamics 2025-08-31 07:34:38 - pico-train - INFO - Step 50100 -- 🔄 Training Metrics 2025-08-31 07:34:38 - pico-train - INFO - ├── Loss: 4.8819 2025-08-31 07:34:38 - pico-train - INFO - ├── Learning Rate: 1.03e-04 2025-08-31 07:34:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:35:32 - pico-train - INFO - Step 50200 -- 🔄 Training Metrics 2025-08-31 07:35:32 - pico-train - INFO - ├── Loss: 4.8675 2025-08-31 07:35:32 - pico-train - INFO - ├── Learning Rate: 1.03e-04 2025-08-31 07:35:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:36:26 - pico-train - INFO - Step 50300 -- 🔄 Training Metrics 2025-08-31 07:36:26 - pico-train - INFO - ├── Loss: 4.8522 2025-08-31 07:36:26 - pico-train - INFO - ├── Learning Rate: 1.02e-04 2025-08-31 07:36:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:37:20 - pico-train - INFO - Step 50400 -- 🔄 Training Metrics 2025-08-31 07:37:20 - pico-train - INFO - ├── Loss: 4.8488 2025-08-31 07:37:20 - pico-train - INFO - ├── Learning Rate: 1.02e-04 2025-08-31 07:37:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:38:14 - pico-train - INFO - Step 50500 -- 🔄 Training Metrics 2025-08-31 07:38:14 - pico-train - INFO - ├── Loss: 4.8619 2025-08-31 07:38:14 - pico-train - INFO - ├── Learning Rate: 1.02e-04 2025-08-31 07:38:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:39:08 - pico-train - INFO - Step 50600 -- 🔄 Training Metrics 2025-08-31 07:39:08 - pico-train - INFO - ├── Loss: 4.8661 2025-08-31 07:39:08 - pico-train - INFO - ├── Learning Rate: 1.01e-04 2025-08-31 07:39:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:40:02 - pico-train - INFO - Step 50700 -- 🔄 Training Metrics 2025-08-31 07:40:02 - pico-train - INFO - ├── Loss: 4.8595 2025-08-31 07:40:02 - pico-train - INFO - ├── Learning Rate: 1.01e-04 2025-08-31 07:40:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:40:56 - pico-train - INFO - Step 50800 -- 🔄 Training Metrics 2025-08-31 07:40:56 - pico-train - INFO - ├── Loss: 4.8543 2025-08-31 07:40:56 - pico-train - INFO - ├── Learning Rate: 1.01e-04 2025-08-31 07:40:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:41:50 - pico-train - INFO - Step 50900 -- 🔄 Training Metrics 2025-08-31 07:41:50 - pico-train - INFO - ├── Loss: 4.8711 2025-08-31 07:41:50 - pico-train - INFO - ├── Learning Rate: 1.00e-04 2025-08-31 07:41:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:42:44 - pico-train - INFO - Step 51000 -- 🔄 Training Metrics 2025-08-31 07:42:44 - pico-train - INFO - ├── Loss: 4.8639 2025-08-31 07:42:44 - pico-train - INFO - ├── Learning Rate: 1.00e-04 2025-08-31 07:42:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:43:38 - pico-train - INFO - Step 51100 -- 🔄 Training Metrics 2025-08-31 07:43:38 - pico-train - INFO - ├── Loss: 4.8527 2025-08-31 07:43:38 - pico-train - INFO - ├── Learning Rate: 9.97e-05 2025-08-31 07:43:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:44:33 - pico-train - INFO - Step 51200 -- 🔄 Training Metrics 2025-08-31 07:44:33 - pico-train - INFO - ├── Loss: 4.8604 2025-08-31 07:44:33 - pico-train - INFO - ├── Learning Rate: 9.94e-05 2025-08-31 07:44:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:45:27 - pico-train - INFO - Step 51300 -- 🔄 Training Metrics 2025-08-31 07:45:27 - pico-train - INFO - ├── Loss: 4.8577 2025-08-31 07:45:27 - pico-train - INFO - ├── Learning Rate: 9.90e-05 2025-08-31 07:45:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:46:21 - pico-train - INFO - Step 51400 -- 🔄 Training Metrics 2025-08-31 07:46:21 - pico-train - INFO - ├── Loss: 4.8589 2025-08-31 07:46:21 - pico-train - INFO - ├── Learning Rate: 9.87e-05 2025-08-31 07:46:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:47:15 - pico-train - INFO - Step 51500 -- 🔄 Training Metrics 2025-08-31 07:47:15 - pico-train - INFO - ├── Loss: 4.8714 2025-08-31 07:47:15 - pico-train - INFO - ├── Learning Rate: 9.84e-05 2025-08-31 07:47:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:48:10 - pico-train - INFO - Step 51600 -- 🔄 Training Metrics 2025-08-31 07:48:10 - pico-train - INFO - ├── Loss: 4.8822 2025-08-31 07:48:10 - pico-train - INFO - ├── Learning Rate: 9.81e-05 2025-08-31 07:48:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:49:04 - pico-train - INFO - Step 51700 -- 🔄 Training Metrics 2025-08-31 07:49:04 - pico-train - INFO - ├── Loss: 4.8642 2025-08-31 07:49:04 - pico-train - INFO - ├── Learning Rate: 9.78e-05 2025-08-31 07:49:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:49:57 - pico-train - INFO - Step 51800 -- 🔄 Training Metrics 2025-08-31 07:49:57 - pico-train - INFO - ├── Loss: 4.8631 2025-08-31 07:49:57 - pico-train - INFO - ├── Learning Rate: 9.74e-05 2025-08-31 07:49:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:50:52 - pico-train - INFO - Step 51900 -- 🔄 Training Metrics 2025-08-31 07:50:52 - pico-train - INFO - ├── Loss: 4.8383 2025-08-31 07:50:52 - pico-train - INFO - ├── Learning Rate: 9.71e-05 2025-08-31 07:50:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:51:46 - pico-train - INFO - Step 52000 -- 💾 Saving Checkpoint 2025-08-31 07:53:35 - pico-train - INFO - Step 52000 -- 📊 Evaluation Results 2025-08-31 07:53:35 - pico-train - INFO - └── paloma: inf 2025-08-31 07:53:36 - pico-train - INFO - Step 52000 -- 🔄 Training Metrics 2025-08-31 07:53:36 - pico-train - INFO - ├── Loss: 4.8496 2025-08-31 07:53:36 - pico-train - INFO - ├── Learning Rate: 9.68e-05 2025-08-31 07:53:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:53:36 - pico-train - INFO - Step 52000 -- 📈 Saving Learning Dynamics 2025-08-31 07:54:33 - pico-train - INFO - Step 52100 -- 🔄 Training Metrics 2025-08-31 07:54:33 - pico-train - INFO - ├── Loss: 4.8515 2025-08-31 07:54:33 - pico-train - INFO - ├── Learning Rate: 9.65e-05 2025-08-31 07:54:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:55:27 - pico-train - INFO - Step 52200 -- 🔄 Training Metrics 2025-08-31 07:55:27 - pico-train - INFO - ├── Loss: 4.8598 2025-08-31 07:55:27 - pico-train - INFO - ├── Learning Rate: 9.62e-05 2025-08-31 07:55:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:56:21 - pico-train - INFO - Step 52300 -- 🔄 Training Metrics 2025-08-31 07:56:21 - pico-train - INFO - ├── Loss: 4.8696 2025-08-31 07:56:21 - pico-train - INFO - ├── Learning Rate: 9.58e-05 2025-08-31 07:56:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:57:15 - pico-train - INFO - Step 52400 -- 🔄 Training Metrics 2025-08-31 07:57:15 - pico-train - INFO - ├── Loss: 4.8359 2025-08-31 07:57:15 - pico-train - INFO - ├── Learning Rate: 9.55e-05 2025-08-31 07:57:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:58:11 - pico-train - INFO - Step 52500 -- 🔄 Training Metrics 2025-08-31 07:58:11 - pico-train - INFO - ├── Loss: 4.8444 2025-08-31 07:58:11 - pico-train - INFO - ├── Learning Rate: 9.52e-05 2025-08-31 07:58:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 07:59:05 - pico-train - INFO - Step 52600 -- 🔄 Training Metrics 2025-08-31 07:59:05 - pico-train - INFO - ├── Loss: 4.8626 2025-08-31 07:59:05 - pico-train - INFO - ├── Learning Rate: 9.49e-05 2025-08-31 07:59:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:00:01 - pico-train - INFO - Step 52700 -- 🔄 Training Metrics 2025-08-31 08:00:01 - pico-train - INFO - ├── Loss: 4.8555 2025-08-31 08:00:01 - pico-train - INFO - ├── Learning Rate: 9.46e-05 2025-08-31 08:00:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:00:56 - pico-train - INFO - Step 52800 -- 🔄 Training Metrics 2025-08-31 08:00:56 - pico-train - INFO - ├── Loss: 4.8361 2025-08-31 08:00:56 - pico-train - INFO - ├── Learning Rate: 9.42e-05 2025-08-31 08:00:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:01:51 - pico-train - INFO - Step 52900 -- 🔄 Training Metrics 2025-08-31 08:01:51 - pico-train - INFO - ├── Loss: 4.8518 2025-08-31 08:01:51 - pico-train - INFO - ├── Learning Rate: 9.39e-05 2025-08-31 08:01:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:02:47 - pico-train - INFO - Step 53000 -- 🔄 Training Metrics 2025-08-31 08:02:47 - pico-train - INFO - ├── Loss: 4.8508 2025-08-31 08:02:47 - pico-train - INFO - ├── Learning Rate: 9.36e-05 2025-08-31 08:02:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:03:41 - pico-train - INFO - Step 53100 -- 🔄 Training Metrics 2025-08-31 08:03:41 - pico-train - INFO - ├── Loss: 4.8585 2025-08-31 08:03:41 - pico-train - INFO - ├── Learning Rate: 9.33e-05 2025-08-31 08:03:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:04:37 - pico-train - INFO - Step 53200 -- 🔄 Training Metrics 2025-08-31 08:04:37 - pico-train - INFO - ├── Loss: 4.8520 2025-08-31 08:04:37 - pico-train - INFO - ├── Learning Rate: 9.30e-05 2025-08-31 08:04:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:05:32 - pico-train - INFO - Step 53300 -- 🔄 Training Metrics 2025-08-31 08:05:32 - pico-train - INFO - ├── Loss: 4.8462 2025-08-31 08:05:32 - pico-train - INFO - ├── Learning Rate: 9.26e-05 2025-08-31 08:05:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:06:28 - pico-train - INFO - Step 53400 -- 🔄 Training Metrics 2025-08-31 08:06:28 - pico-train - INFO - ├── Loss: 4.8443 2025-08-31 08:06:28 - pico-train - INFO - ├── Learning Rate: 9.23e-05 2025-08-31 08:06:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:07:23 - pico-train - INFO - Step 53500 -- 🔄 Training Metrics 2025-08-31 08:07:23 - pico-train - INFO - ├── Loss: 4.8567 2025-08-31 08:07:23 - pico-train - INFO - ├── Learning Rate: 9.20e-05 2025-08-31 08:07:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:08:18 - pico-train - INFO - Step 53600 -- 🔄 Training Metrics 2025-08-31 08:08:18 - pico-train - INFO - ├── Loss: 4.8256 2025-08-31 08:08:18 - pico-train - INFO - ├── Learning Rate: 9.17e-05 2025-08-31 08:08:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:09:13 - pico-train - INFO - Step 53700 -- 🔄 Training Metrics 2025-08-31 08:09:13 - pico-train - INFO - ├── Loss: 4.8237 2025-08-31 08:09:13 - pico-train - INFO - ├── Learning Rate: 9.14e-05 2025-08-31 08:09:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:10:10 - pico-train - INFO - Step 53800 -- 🔄 Training Metrics 2025-08-31 08:10:10 - pico-train - INFO - ├── Loss: 4.8332 2025-08-31 08:10:10 - pico-train - INFO - ├── Learning Rate: 9.10e-05 2025-08-31 08:10:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:11:05 - pico-train - INFO - Step 53900 -- 🔄 Training Metrics 2025-08-31 08:11:05 - pico-train - INFO - ├── Loss: 4.8663 2025-08-31 08:11:05 - pico-train - INFO - ├── Learning Rate: 9.07e-05 2025-08-31 08:11:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:11:59 - pico-train - INFO - Step 54000 -- 💾 Saving Checkpoint 2025-08-31 08:14:03 - pico-train - INFO - Step 54000 -- 📊 Evaluation Results 2025-08-31 08:14:03 - pico-train - INFO - └── paloma: inf 2025-08-31 08:14:04 - pico-train - INFO - Step 54000 -- 🔄 Training Metrics 2025-08-31 08:14:04 - pico-train - INFO - ├── Loss: 4.8621 2025-08-31 08:14:04 - pico-train - INFO - ├── Learning Rate: 9.04e-05 2025-08-31 08:14:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:14:04 - pico-train - INFO - Step 54000 -- 📈 Saving Learning Dynamics 2025-08-31 08:15:01 - pico-train - INFO - Step 54100 -- 🔄 Training Metrics 2025-08-31 08:15:01 - pico-train - INFO - ├── Loss: 4.8374 2025-08-31 08:15:01 - pico-train - INFO - ├── Learning Rate: 9.01e-05 2025-08-31 08:15:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:15:55 - pico-train - INFO - Step 54200 -- 🔄 Training Metrics 2025-08-31 08:15:55 - pico-train - INFO - ├── Loss: 4.8494 2025-08-31 08:15:55 - pico-train - INFO - ├── Learning Rate: 8.98e-05 2025-08-31 08:15:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:16:50 - pico-train - INFO - Step 54300 -- 🔄 Training Metrics 2025-08-31 08:16:50 - pico-train - INFO - ├── Loss: 4.8265 2025-08-31 08:16:50 - pico-train - INFO - ├── Learning Rate: 8.94e-05 2025-08-31 08:16:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:17:45 - pico-train - INFO - Step 54400 -- 🔄 Training Metrics 2025-08-31 08:17:45 - pico-train - INFO - ├── Loss: 4.8626 2025-08-31 08:17:45 - pico-train - INFO - ├── Learning Rate: 8.91e-05 2025-08-31 08:17:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:18:40 - pico-train - INFO - Step 54500 -- 🔄 Training Metrics 2025-08-31 08:18:40 - pico-train - INFO - ├── Loss: 4.8459 2025-08-31 08:18:40 - pico-train - INFO - ├── Learning Rate: 8.88e-05 2025-08-31 08:18:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:19:35 - pico-train - INFO - Step 54600 -- 🔄 Training Metrics 2025-08-31 08:19:35 - pico-train - INFO - ├── Loss: 4.8332 2025-08-31 08:19:35 - pico-train - INFO - ├── Learning Rate: 8.85e-05 2025-08-31 08:19:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:20:30 - pico-train - INFO - Step 54700 -- 🔄 Training Metrics 2025-08-31 08:20:30 - pico-train - INFO - ├── Loss: 4.8507 2025-08-31 08:20:30 - pico-train - INFO - ├── Learning Rate: 8.82e-05 2025-08-31 08:20:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:21:26 - pico-train - INFO - Step 54800 -- 🔄 Training Metrics 2025-08-31 08:21:26 - pico-train - INFO - ├── Loss: 4.8588 2025-08-31 08:21:26 - pico-train - INFO - ├── Learning Rate: 8.78e-05 2025-08-31 08:21:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:22:21 - pico-train - INFO - Step 54900 -- 🔄 Training Metrics 2025-08-31 08:22:21 - pico-train - INFO - ├── Loss: 4.8781 2025-08-31 08:22:21 - pico-train - INFO - ├── Learning Rate: 8.75e-05 2025-08-31 08:22:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:23:16 - pico-train - INFO - Step 55000 -- 🔄 Training Metrics 2025-08-31 08:23:16 - pico-train - INFO - ├── Loss: 4.8365 2025-08-31 08:23:16 - pico-train - INFO - ├── Learning Rate: 8.72e-05 2025-08-31 08:23:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:24:12 - pico-train - INFO - Step 55100 -- 🔄 Training Metrics 2025-08-31 08:24:12 - pico-train - INFO - ├── Loss: 4.8395 2025-08-31 08:24:12 - pico-train - INFO - ├── Learning Rate: 8.69e-05 2025-08-31 08:24:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:25:08 - pico-train - INFO - Step 55200 -- 🔄 Training Metrics 2025-08-31 08:25:08 - pico-train - INFO - ├── Loss: 4.8306 2025-08-31 08:25:08 - pico-train - INFO - ├── Learning Rate: 8.66e-05 2025-08-31 08:25:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:26:03 - pico-train - INFO - Step 55300 -- 🔄 Training Metrics 2025-08-31 08:26:03 - pico-train - INFO - ├── Loss: 4.8421 2025-08-31 08:26:03 - pico-train - INFO - ├── Learning Rate: 8.63e-05 2025-08-31 08:26:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:26:58 - pico-train - INFO - Step 55400 -- 🔄 Training Metrics 2025-08-31 08:26:58 - pico-train - INFO - ├── Loss: 4.8538 2025-08-31 08:26:58 - pico-train - INFO - ├── Learning Rate: 8.59e-05 2025-08-31 08:26:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:27:52 - pico-train - INFO - Step 55500 -- 🔄 Training Metrics 2025-08-31 08:27:52 - pico-train - INFO - ├── Loss: 4.8317 2025-08-31 08:27:52 - pico-train - INFO - ├── Learning Rate: 8.56e-05 2025-08-31 08:27:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:28:46 - pico-train - INFO - Step 55600 -- 🔄 Training Metrics 2025-08-31 08:28:46 - pico-train - INFO - ├── Loss: 4.8330 2025-08-31 08:28:46 - pico-train - INFO - ├── Learning Rate: 8.53e-05 2025-08-31 08:28:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:29:40 - pico-train - INFO - Step 55700 -- 🔄 Training Metrics 2025-08-31 08:29:40 - pico-train - INFO - ├── Loss: 4.8138 2025-08-31 08:29:40 - pico-train - INFO - ├── Learning Rate: 8.50e-05 2025-08-31 08:29:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:30:35 - pico-train - INFO - Step 55800 -- 🔄 Training Metrics 2025-08-31 08:30:35 - pico-train - INFO - ├── Loss: 4.8473 2025-08-31 08:30:35 - pico-train - INFO - ├── Learning Rate: 8.47e-05 2025-08-31 08:30:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:31:28 - pico-train - INFO - Step 55900 -- 🔄 Training Metrics 2025-08-31 08:31:28 - pico-train - INFO - ├── Loss: 4.8469 2025-08-31 08:31:28 - pico-train - INFO - ├── Learning Rate: 8.44e-05 2025-08-31 08:31:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:32:22 - pico-train - INFO - Step 56000 -- 💾 Saving Checkpoint 2025-08-31 08:34:19 - pico-train - INFO - Step 56000 -- 📊 Evaluation Results 2025-08-31 08:34:19 - pico-train - INFO - └── paloma: inf 2025-08-31 08:34:20 - pico-train - INFO - Step 56000 -- 🔄 Training Metrics 2025-08-31 08:34:20 - pico-train - INFO - ├── Loss: 4.8561 2025-08-31 08:34:20 - pico-train - INFO - ├── Learning Rate: 8.40e-05 2025-08-31 08:34:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:34:20 - pico-train - INFO - Step 56000 -- 📈 Saving Learning Dynamics 2025-08-31 08:35:17 - pico-train - INFO - Step 56100 -- 🔄 Training Metrics 2025-08-31 08:35:17 - pico-train - INFO - ├── Loss: 4.8491 2025-08-31 08:35:17 - pico-train - INFO - ├── Learning Rate: 8.37e-05 2025-08-31 08:35:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:36:11 - pico-train - INFO - Step 56200 -- 🔄 Training Metrics 2025-08-31 08:36:11 - pico-train - INFO - ├── Loss: 4.8459 2025-08-31 08:36:11 - pico-train - INFO - ├── Learning Rate: 8.34e-05 2025-08-31 08:36:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:37:06 - pico-train - INFO - Step 56300 -- 🔄 Training Metrics 2025-08-31 08:37:06 - pico-train - INFO - ├── Loss: 4.8345 2025-08-31 08:37:06 - pico-train - INFO - ├── Learning Rate: 8.31e-05 2025-08-31 08:37:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:38:01 - pico-train - INFO - Step 56400 -- 🔄 Training Metrics 2025-08-31 08:38:01 - pico-train - INFO - ├── Loss: 4.8510 2025-08-31 08:38:01 - pico-train - INFO - ├── Learning Rate: 8.28e-05 2025-08-31 08:38:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:38:55 - pico-train - INFO - Step 56500 -- 🔄 Training Metrics 2025-08-31 08:38:55 - pico-train - INFO - ├── Loss: 4.8172 2025-08-31 08:38:55 - pico-train - INFO - ├── Learning Rate: 8.25e-05 2025-08-31 08:38:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:39:50 - pico-train - INFO - Step 56600 -- 🔄 Training Metrics 2025-08-31 08:39:50 - pico-train - INFO - ├── Loss: 4.8297 2025-08-31 08:39:50 - pico-train - INFO - ├── Learning Rate: 8.21e-05 2025-08-31 08:39:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:40:44 - pico-train - INFO - Step 56700 -- 🔄 Training Metrics 2025-08-31 08:40:44 - pico-train - INFO - ├── Loss: 4.8399 2025-08-31 08:40:44 - pico-train - INFO - ├── Learning Rate: 8.18e-05 2025-08-31 08:40:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:41:37 - pico-train - INFO - Step 56800 -- 🔄 Training Metrics 2025-08-31 08:41:37 - pico-train - INFO - ├── Loss: 4.8370 2025-08-31 08:41:37 - pico-train - INFO - ├── Learning Rate: 8.15e-05 2025-08-31 08:41:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:42:32 - pico-train - INFO - Step 56900 -- 🔄 Training Metrics 2025-08-31 08:42:32 - pico-train - INFO - ├── Loss: 4.8458 2025-08-31 08:42:32 - pico-train - INFO - ├── Learning Rate: 8.12e-05 2025-08-31 08:42:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:43:26 - pico-train - INFO - Step 57000 -- 🔄 Training Metrics 2025-08-31 08:43:26 - pico-train - INFO - ├── Loss: 4.8466 2025-08-31 08:43:26 - pico-train - INFO - ├── Learning Rate: 8.09e-05 2025-08-31 08:43:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:44:20 - pico-train - INFO - Step 57100 -- 🔄 Training Metrics 2025-08-31 08:44:20 - pico-train - INFO - ├── Loss: 4.8173 2025-08-31 08:44:20 - pico-train - INFO - ├── Learning Rate: 8.06e-05 2025-08-31 08:44:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:45:14 - pico-train - INFO - Step 57200 -- 🔄 Training Metrics 2025-08-31 08:45:14 - pico-train - INFO - ├── Loss: 4.8302 2025-08-31 08:45:14 - pico-train - INFO - ├── Learning Rate: 8.03e-05 2025-08-31 08:45:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:46:08 - pico-train - INFO - Step 57300 -- 🔄 Training Metrics 2025-08-31 08:46:08 - pico-train - INFO - ├── Loss: 4.8262 2025-08-31 08:46:08 - pico-train - INFO - ├── Learning Rate: 7.99e-05 2025-08-31 08:46:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:47:02 - pico-train - INFO - Step 57400 -- 🔄 Training Metrics 2025-08-31 08:47:02 - pico-train - INFO - ├── Loss: 4.8268 2025-08-31 08:47:02 - pico-train - INFO - ├── Learning Rate: 7.96e-05 2025-08-31 08:47:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:47:57 - pico-train - INFO - Step 57500 -- 🔄 Training Metrics 2025-08-31 08:47:57 - pico-train - INFO - ├── Loss: 4.8415 2025-08-31 08:47:57 - pico-train - INFO - ├── Learning Rate: 7.93e-05 2025-08-31 08:47:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:48:52 - pico-train - INFO - Step 57600 -- 🔄 Training Metrics 2025-08-31 08:48:52 - pico-train - INFO - ├── Loss: 4.8350 2025-08-31 08:48:52 - pico-train - INFO - ├── Learning Rate: 7.90e-05 2025-08-31 08:48:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:49:46 - pico-train - INFO - Step 57700 -- 🔄 Training Metrics 2025-08-31 08:49:46 - pico-train - INFO - ├── Loss: 4.8597 2025-08-31 08:49:46 - pico-train - INFO - ├── Learning Rate: 7.87e-05 2025-08-31 08:49:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:50:40 - pico-train - INFO - Step 57800 -- 🔄 Training Metrics 2025-08-31 08:50:40 - pico-train - INFO - ├── Loss: 4.8310 2025-08-31 08:50:40 - pico-train - INFO - ├── Learning Rate: 7.84e-05 2025-08-31 08:50:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:51:35 - pico-train - INFO - Step 57900 -- 🔄 Training Metrics 2025-08-31 08:51:35 - pico-train - INFO - ├── Loss: 4.8333 2025-08-31 08:51:35 - pico-train - INFO - ├── Learning Rate: 7.81e-05 2025-08-31 08:51:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:52:28 - pico-train - INFO - Step 58000 -- 💾 Saving Checkpoint 2025-08-31 08:54:22 - pico-train - INFO - Step 58000 -- 📊 Evaluation Results 2025-08-31 08:54:22 - pico-train - INFO - └── paloma: inf 2025-08-31 08:54:22 - pico-train - INFO - Step 58000 -- 🔄 Training Metrics 2025-08-31 08:54:22 - pico-train - INFO - ├── Loss: 4.8290 2025-08-31 08:54:22 - pico-train - INFO - ├── Learning Rate: 7.77e-05 2025-08-31 08:54:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:54:22 - pico-train - INFO - Step 58000 -- 📈 Saving Learning Dynamics 2025-08-31 08:55:19 - pico-train - INFO - Step 58100 -- 🔄 Training Metrics 2025-08-31 08:55:19 - pico-train - INFO - ├── Loss: 4.8301 2025-08-31 08:55:19 - pico-train - INFO - ├── Learning Rate: 7.74e-05 2025-08-31 08:55:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:56:13 - pico-train - INFO - Step 58200 -- 🔄 Training Metrics 2025-08-31 08:56:13 - pico-train - INFO - ├── Loss: 4.8193 2025-08-31 08:56:13 - pico-train - INFO - ├── Learning Rate: 7.71e-05 2025-08-31 08:56:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:57:08 - pico-train - INFO - Step 58300 -- 🔄 Training Metrics 2025-08-31 08:57:08 - pico-train - INFO - ├── Loss: 4.8361 2025-08-31 08:57:08 - pico-train - INFO - ├── Learning Rate: 7.68e-05 2025-08-31 08:57:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:58:04 - pico-train - INFO - Step 58400 -- 🔄 Training Metrics 2025-08-31 08:58:04 - pico-train - INFO - ├── Loss: 4.8375 2025-08-31 08:58:04 - pico-train - INFO - ├── Learning Rate: 7.65e-05 2025-08-31 08:58:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:58:59 - pico-train - INFO - Step 58500 -- 🔄 Training Metrics 2025-08-31 08:58:59 - pico-train - INFO - ├── Loss: 4.8183 2025-08-31 08:58:59 - pico-train - INFO - ├── Learning Rate: 7.62e-05 2025-08-31 08:58:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 08:59:55 - pico-train - INFO - Step 58600 -- 🔄 Training Metrics 2025-08-31 08:59:55 - pico-train - INFO - ├── Loss: 4.8259 2025-08-31 08:59:55 - pico-train - INFO - ├── Learning Rate: 7.59e-05 2025-08-31 08:59:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:00:49 - pico-train - INFO - Step 58700 -- 🔄 Training Metrics 2025-08-31 09:00:49 - pico-train - INFO - ├── Loss: 4.8395 2025-08-31 09:00:49 - pico-train - INFO - ├── Learning Rate: 7.56e-05 2025-08-31 09:00:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:01:45 - pico-train - INFO - Step 58800 -- 🔄 Training Metrics 2025-08-31 09:01:45 - pico-train - INFO - ├── Loss: 4.8104 2025-08-31 09:01:45 - pico-train - INFO - ├── Learning Rate: 7.53e-05 2025-08-31 09:01:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:02:41 - pico-train - INFO - Step 58900 -- 🔄 Training Metrics 2025-08-31 09:02:41 - pico-train - INFO - ├── Loss: 4.8455 2025-08-31 09:02:41 - pico-train - INFO - ├── Learning Rate: 7.49e-05 2025-08-31 09:02:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:03:36 - pico-train - INFO - Step 59000 -- 🔄 Training Metrics 2025-08-31 09:03:36 - pico-train - INFO - ├── Loss: 4.8379 2025-08-31 09:03:36 - pico-train - INFO - ├── Learning Rate: 7.46e-05 2025-08-31 09:03:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:04:32 - pico-train - INFO - Step 59100 -- 🔄 Training Metrics 2025-08-31 09:04:32 - pico-train - INFO - ├── Loss: 4.8267 2025-08-31 09:04:32 - pico-train - INFO - ├── Learning Rate: 7.43e-05 2025-08-31 09:04:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:05:27 - pico-train - INFO - Step 59200 -- 🔄 Training Metrics 2025-08-31 09:05:27 - pico-train - INFO - ├── Loss: 4.8410 2025-08-31 09:05:27 - pico-train - INFO - ├── Learning Rate: 7.40e-05 2025-08-31 09:05:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:06:22 - pico-train - INFO - Step 59300 -- 🔄 Training Metrics 2025-08-31 09:06:22 - pico-train - INFO - ├── Loss: 4.8488 2025-08-31 09:06:22 - pico-train - INFO - ├── Learning Rate: 7.37e-05 2025-08-31 09:06:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:07:18 - pico-train - INFO - Step 59400 -- 🔄 Training Metrics 2025-08-31 09:07:18 - pico-train - INFO - ├── Loss: 4.8227 2025-08-31 09:07:18 - pico-train - INFO - ├── Learning Rate: 7.34e-05 2025-08-31 09:07:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:08:12 - pico-train - INFO - Step 59500 -- 🔄 Training Metrics 2025-08-31 09:08:12 - pico-train - INFO - ├── Loss: 4.8042 2025-08-31 09:08:12 - pico-train - INFO - ├── Learning Rate: 7.31e-05 2025-08-31 09:08:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:09:08 - pico-train - INFO - Step 59600 -- 🔄 Training Metrics 2025-08-31 09:09:08 - pico-train - INFO - ├── Loss: 4.8441 2025-08-31 09:09:08 - pico-train - INFO - ├── Learning Rate: 7.28e-05 2025-08-31 09:09:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:10:03 - pico-train - INFO - Step 59700 -- 🔄 Training Metrics 2025-08-31 09:10:03 - pico-train - INFO - ├── Loss: 4.8590 2025-08-31 09:10:03 - pico-train - INFO - ├── Learning Rate: 7.25e-05 2025-08-31 09:10:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:10:58 - pico-train - INFO - Step 59800 -- 🔄 Training Metrics 2025-08-31 09:10:58 - pico-train - INFO - ├── Loss: 4.8339 2025-08-31 09:10:58 - pico-train - INFO - ├── Learning Rate: 7.22e-05 2025-08-31 09:10:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:11:54 - pico-train - INFO - Step 59900 -- 🔄 Training Metrics 2025-08-31 09:11:54 - pico-train - INFO - ├── Loss: 4.8216 2025-08-31 09:11:54 - pico-train - INFO - ├── Learning Rate: 7.19e-05 2025-08-31 09:11:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:12:48 - pico-train - INFO - Step 60000 -- 💾 Saving Checkpoint 2025-08-31 09:14:51 - pico-train - INFO - Step 60000 -- 📊 Evaluation Results 2025-08-31 09:14:51 - pico-train - INFO - └── paloma: inf 2025-08-31 09:14:52 - pico-train - INFO - Step 60000 -- 🔄 Training Metrics 2025-08-31 09:14:52 - pico-train - INFO - ├── Loss: 4.8345 2025-08-31 09:14:52 - pico-train - INFO - ├── Learning Rate: 7.15e-05 2025-08-31 09:14:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:14:52 - pico-train - INFO - Step 60000 -- 📈 Saving Learning Dynamics 2025-08-31 09:15:49 - pico-train - INFO - Step 60100 -- 🔄 Training Metrics 2025-08-31 09:15:49 - pico-train - INFO - ├── Loss: 4.8207 2025-08-31 09:15:49 - pico-train - INFO - ├── Learning Rate: 7.12e-05 2025-08-31 09:15:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:16:44 - pico-train - INFO - Step 60200 -- 🔄 Training Metrics 2025-08-31 09:16:44 - pico-train - INFO - ├── Loss: 4.8181 2025-08-31 09:16:44 - pico-train - INFO - ├── Learning Rate: 7.09e-05 2025-08-31 09:16:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:17:38 - pico-train - INFO - Step 60300 -- 🔄 Training Metrics 2025-08-31 09:17:38 - pico-train - INFO - ├── Loss: 4.8059 2025-08-31 09:17:38 - pico-train - INFO - ├── Learning Rate: 7.06e-05 2025-08-31 09:17:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:18:32 - pico-train - INFO - Step 60400 -- 🔄 Training Metrics 2025-08-31 09:18:32 - pico-train - INFO - ├── Loss: 4.8367 2025-08-31 09:18:32 - pico-train - INFO - ├── Learning Rate: 7.03e-05 2025-08-31 09:18:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:19:26 - pico-train - INFO - Step 60500 -- 🔄 Training Metrics 2025-08-31 09:19:26 - pico-train - INFO - ├── Loss: 4.8237 2025-08-31 09:19:26 - pico-train - INFO - ├── Learning Rate: 7.00e-05 2025-08-31 09:19:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:20:21 - pico-train - INFO - Step 60600 -- 🔄 Training Metrics 2025-08-31 09:20:21 - pico-train - INFO - ├── Loss: 4.8291 2025-08-31 09:20:21 - pico-train - INFO - ├── Learning Rate: 6.97e-05 2025-08-31 09:20:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:21:15 - pico-train - INFO - Step 60700 -- 🔄 Training Metrics 2025-08-31 09:21:15 - pico-train - INFO - ├── Loss: 4.8317 2025-08-31 09:21:15 - pico-train - INFO - ├── Learning Rate: 6.94e-05 2025-08-31 09:21:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:22:09 - pico-train - INFO - Step 60800 -- 🔄 Training Metrics 2025-08-31 09:22:09 - pico-train - INFO - ├── Loss: 4.8204 2025-08-31 09:22:09 - pico-train - INFO - ├── Learning Rate: 6.91e-05 2025-08-31 09:22:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:23:03 - pico-train - INFO - Step 60900 -- 🔄 Training Metrics 2025-08-31 09:23:03 - pico-train - INFO - ├── Loss: 4.8455 2025-08-31 09:23:03 - pico-train - INFO - ├── Learning Rate: 6.88e-05 2025-08-31 09:23:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:23:57 - pico-train - INFO - Step 61000 -- 🔄 Training Metrics 2025-08-31 09:23:57 - pico-train - INFO - ├── Loss: 4.8133 2025-08-31 09:23:57 - pico-train - INFO - ├── Learning Rate: 6.85e-05 2025-08-31 09:23:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:24:52 - pico-train - INFO - Step 61100 -- 🔄 Training Metrics 2025-08-31 09:24:52 - pico-train - INFO - ├── Loss: 4.8155 2025-08-31 09:24:52 - pico-train - INFO - ├── Learning Rate: 6.82e-05 2025-08-31 09:24:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:25:46 - pico-train - INFO - Step 61200 -- 🔄 Training Metrics 2025-08-31 09:25:46 - pico-train - INFO - ├── Loss: 4.8151 2025-08-31 09:25:46 - pico-train - INFO - ├── Learning Rate: 6.79e-05 2025-08-31 09:25:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:26:40 - pico-train - INFO - Step 61300 -- 🔄 Training Metrics 2025-08-31 09:26:40 - pico-train - INFO - ├── Loss: 4.8111 2025-08-31 09:26:40 - pico-train - INFO - ├── Learning Rate: 6.76e-05 2025-08-31 09:26:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:27:34 - pico-train - INFO - Step 61400 -- 🔄 Training Metrics 2025-08-31 09:27:34 - pico-train - INFO - ├── Loss: 4.8221 2025-08-31 09:27:34 - pico-train - INFO - ├── Learning Rate: 6.73e-05 2025-08-31 09:27:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:28:29 - pico-train - INFO - Step 61500 -- 🔄 Training Metrics 2025-08-31 09:28:29 - pico-train - INFO - ├── Loss: 4.8183 2025-08-31 09:28:29 - pico-train - INFO - ├── Learning Rate: 6.70e-05 2025-08-31 09:28:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:29:23 - pico-train - INFO - Step 61600 -- 🔄 Training Metrics 2025-08-31 09:29:23 - pico-train - INFO - ├── Loss: 4.8133 2025-08-31 09:29:23 - pico-train - INFO - ├── Learning Rate: 6.67e-05 2025-08-31 09:29:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:30:18 - pico-train - INFO - Step 61700 -- 🔄 Training Metrics 2025-08-31 09:30:18 - pico-train - INFO - ├── Loss: 4.8242 2025-08-31 09:30:18 - pico-train - INFO - ├── Learning Rate: 6.64e-05 2025-08-31 09:30:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:31:12 - pico-train - INFO - Step 61800 -- 🔄 Training Metrics 2025-08-31 09:31:12 - pico-train - INFO - ├── Loss: 4.8117 2025-08-31 09:31:12 - pico-train - INFO - ├── Learning Rate: 6.61e-05 2025-08-31 09:31:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:32:06 - pico-train - INFO - Step 61900 -- 🔄 Training Metrics 2025-08-31 09:32:06 - pico-train - INFO - ├── Loss: 4.8329 2025-08-31 09:32:06 - pico-train - INFO - ├── Learning Rate: 6.58e-05 2025-08-31 09:32:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:33:00 - pico-train - INFO - Step 62000 -- 💾 Saving Checkpoint 2025-08-31 09:34:51 - pico-train - INFO - Step 62000 -- 📊 Evaluation Results 2025-08-31 09:34:51 - pico-train - INFO - └── paloma: inf 2025-08-31 09:34:52 - pico-train - INFO - Step 62000 -- 🔄 Training Metrics 2025-08-31 09:34:52 - pico-train - INFO - ├── Loss: 4.8042 2025-08-31 09:34:52 - pico-train - INFO - ├── Learning Rate: 6.55e-05 2025-08-31 09:34:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:34:52 - pico-train - INFO - Step 62000 -- 📈 Saving Learning Dynamics 2025-08-31 09:35:50 - pico-train - INFO - Step 62100 -- 🔄 Training Metrics 2025-08-31 09:35:50 - pico-train - INFO - ├── Loss: 4.8256 2025-08-31 09:35:50 - pico-train - INFO - ├── Learning Rate: 6.52e-05 2025-08-31 09:35:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:36:45 - pico-train - INFO - Step 62200 -- 🔄 Training Metrics 2025-08-31 09:36:45 - pico-train - INFO - ├── Loss: 4.8249 2025-08-31 09:36:45 - pico-train - INFO - ├── Learning Rate: 6.49e-05 2025-08-31 09:36:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:37:40 - pico-train - INFO - Step 62300 -- 🔄 Training Metrics 2025-08-31 09:37:40 - pico-train - INFO - ├── Loss: 4.8133 2025-08-31 09:37:40 - pico-train - INFO - ├── Learning Rate: 6.46e-05 2025-08-31 09:37:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:38:34 - pico-train - INFO - Step 62400 -- 🔄 Training Metrics 2025-08-31 09:38:34 - pico-train - INFO - ├── Loss: 4.8239 2025-08-31 09:38:34 - pico-train - INFO - ├── Learning Rate: 6.43e-05 2025-08-31 09:38:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:39:29 - pico-train - INFO - Step 62500 -- 🔄 Training Metrics 2025-08-31 09:39:29 - pico-train - INFO - ├── Loss: 4.8297 2025-08-31 09:39:29 - pico-train - INFO - ├── Learning Rate: 6.40e-05 2025-08-31 09:39:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:40:23 - pico-train - INFO - Step 62600 -- 🔄 Training Metrics 2025-08-31 09:40:23 - pico-train - INFO - ├── Loss: 4.8202 2025-08-31 09:40:23 - pico-train - INFO - ├── Learning Rate: 6.37e-05 2025-08-31 09:40:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:41:17 - pico-train - INFO - Step 62700 -- 🔄 Training Metrics 2025-08-31 09:41:17 - pico-train - INFO - ├── Loss: 4.7957 2025-08-31 09:41:17 - pico-train - INFO - ├── Learning Rate: 6.34e-05 2025-08-31 09:41:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:42:13 - pico-train - INFO - Step 62800 -- 🔄 Training Metrics 2025-08-31 09:42:13 - pico-train - INFO - ├── Loss: 4.8361 2025-08-31 09:42:13 - pico-train - INFO - ├── Learning Rate: 6.31e-05 2025-08-31 09:42:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:43:09 - pico-train - INFO - Step 62900 -- 🔄 Training Metrics 2025-08-31 09:43:09 - pico-train - INFO - ├── Loss: 4.8381 2025-08-31 09:43:09 - pico-train - INFO - ├── Learning Rate: 6.28e-05 2025-08-31 09:43:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:44:04 - pico-train - INFO - Step 63000 -- 🔄 Training Metrics 2025-08-31 09:44:04 - pico-train - INFO - ├── Loss: 4.8256 2025-08-31 09:44:04 - pico-train - INFO - ├── Learning Rate: 6.25e-05 2025-08-31 09:44:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:44:59 - pico-train - INFO - Step 63100 -- 🔄 Training Metrics 2025-08-31 09:44:59 - pico-train - INFO - ├── Loss: 4.8416 2025-08-31 09:44:59 - pico-train - INFO - ├── Learning Rate: 6.22e-05 2025-08-31 09:44:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:45:54 - pico-train - INFO - Step 63200 -- 🔄 Training Metrics 2025-08-31 09:45:54 - pico-train - INFO - ├── Loss: 4.8099 2025-08-31 09:45:54 - pico-train - INFO - ├── Learning Rate: 6.19e-05 2025-08-31 09:45:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:46:49 - pico-train - INFO - Step 63300 -- 🔄 Training Metrics 2025-08-31 09:46:49 - pico-train - INFO - ├── Loss: 4.8293 2025-08-31 09:46:49 - pico-train - INFO - ├── Learning Rate: 6.16e-05 2025-08-31 09:46:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:47:45 - pico-train - INFO - Step 63400 -- 🔄 Training Metrics 2025-08-31 09:47:45 - pico-train - INFO - ├── Loss: 4.8126 2025-08-31 09:47:45 - pico-train - INFO - ├── Learning Rate: 6.13e-05 2025-08-31 09:47:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:48:40 - pico-train - INFO - Step 63500 -- 🔄 Training Metrics 2025-08-31 09:48:40 - pico-train - INFO - ├── Loss: 4.8182 2025-08-31 09:48:40 - pico-train - INFO - ├── Learning Rate: 6.10e-05 2025-08-31 09:48:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:49:35 - pico-train - INFO - Step 63600 -- 🔄 Training Metrics 2025-08-31 09:49:35 - pico-train - INFO - ├── Loss: 4.8346 2025-08-31 09:49:35 - pico-train - INFO - ├── Learning Rate: 6.07e-05 2025-08-31 09:49:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:50:30 - pico-train - INFO - Step 63700 -- 🔄 Training Metrics 2025-08-31 09:50:30 - pico-train - INFO - ├── Loss: 4.8124 2025-08-31 09:50:30 - pico-train - INFO - ├── Learning Rate: 6.04e-05 2025-08-31 09:50:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:51:25 - pico-train - INFO - Step 63800 -- 🔄 Training Metrics 2025-08-31 09:51:25 - pico-train - INFO - ├── Loss: 4.8222 2025-08-31 09:51:25 - pico-train - INFO - ├── Learning Rate: 6.01e-05 2025-08-31 09:51:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:52:20 - pico-train - INFO - Step 63900 -- 🔄 Training Metrics 2025-08-31 09:52:20 - pico-train - INFO - ├── Loss: 4.8265 2025-08-31 09:52:20 - pico-train - INFO - ├── Learning Rate: 5.98e-05 2025-08-31 09:52:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:53:15 - pico-train - INFO - Step 64000 -- 💾 Saving Checkpoint 2025-08-31 09:55:19 - pico-train - INFO - Step 64000 -- 📊 Evaluation Results 2025-08-31 09:55:19 - pico-train - INFO - └── paloma: inf 2025-08-31 09:55:21 - pico-train - INFO - Step 64000 -- 🔄 Training Metrics 2025-08-31 09:55:21 - pico-train - INFO - ├── Loss: 4.7990 2025-08-31 09:55:21 - pico-train - INFO - ├── Learning Rate: 5.95e-05 2025-08-31 09:55:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:55:21 - pico-train - INFO - Step 64000 -- 📈 Saving Learning Dynamics 2025-08-31 09:56:18 - pico-train - INFO - Step 64100 -- 🔄 Training Metrics 2025-08-31 09:56:18 - pico-train - INFO - ├── Loss: 4.8110 2025-08-31 09:56:18 - pico-train - INFO - ├── Learning Rate: 5.92e-05 2025-08-31 09:56:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:57:14 - pico-train - INFO - Step 64200 -- 🔄 Training Metrics 2025-08-31 09:57:14 - pico-train - INFO - ├── Loss: 4.7969 2025-08-31 09:57:14 - pico-train - INFO - ├── Learning Rate: 5.89e-05 2025-08-31 09:57:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:58:10 - pico-train - INFO - Step 64300 -- 🔄 Training Metrics 2025-08-31 09:58:10 - pico-train - INFO - ├── Loss: 4.8197 2025-08-31 09:58:10 - pico-train - INFO - ├── Learning Rate: 5.86e-05 2025-08-31 09:58:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 09:59:05 - pico-train - INFO - Step 64400 -- 🔄 Training Metrics 2025-08-31 09:59:05 - pico-train - INFO - ├── Loss: 4.8353 2025-08-31 09:59:05 - pico-train - INFO - ├── Learning Rate: 5.84e-05 2025-08-31 09:59:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:00:00 - pico-train - INFO - Step 64500 -- 🔄 Training Metrics 2025-08-31 10:00:00 - pico-train - INFO - ├── Loss: 4.8159 2025-08-31 10:00:00 - pico-train - INFO - ├── Learning Rate: 5.81e-05 2025-08-31 10:00:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:00:55 - pico-train - INFO - Step 64600 -- 🔄 Training Metrics 2025-08-31 10:00:55 - pico-train - INFO - ├── Loss: 4.8396 2025-08-31 10:00:55 - pico-train - INFO - ├── Learning Rate: 5.78e-05 2025-08-31 10:00:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:01:50 - pico-train - INFO - Step 64700 -- 🔄 Training Metrics 2025-08-31 10:01:50 - pico-train - INFO - ├── Loss: 4.8071 2025-08-31 10:01:50 - pico-train - INFO - ├── Learning Rate: 5.75e-05 2025-08-31 10:01:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:02:46 - pico-train - INFO - Step 64800 -- 🔄 Training Metrics 2025-08-31 10:02:46 - pico-train - INFO - ├── Loss: 4.8250 2025-08-31 10:02:46 - pico-train - INFO - ├── Learning Rate: 5.72e-05 2025-08-31 10:02:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:03:41 - pico-train - INFO - Step 64900 -- 🔄 Training Metrics 2025-08-31 10:03:41 - pico-train - INFO - ├── Loss: 4.8292 2025-08-31 10:03:41 - pico-train - INFO - ├── Learning Rate: 5.69e-05 2025-08-31 10:03:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:04:37 - pico-train - INFO - Step 65000 -- 🔄 Training Metrics 2025-08-31 10:04:37 - pico-train - INFO - ├── Loss: 4.8136 2025-08-31 10:04:37 - pico-train - INFO - ├── Learning Rate: 5.66e-05 2025-08-31 10:04:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:05:32 - pico-train - INFO - Step 65100 -- 🔄 Training Metrics 2025-08-31 10:05:32 - pico-train - INFO - ├── Loss: 4.7972 2025-08-31 10:05:32 - pico-train - INFO - ├── Learning Rate: 5.63e-05 2025-08-31 10:05:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:06:27 - pico-train - INFO - Step 65200 -- 🔄 Training Metrics 2025-08-31 10:06:27 - pico-train - INFO - ├── Loss: 4.7909 2025-08-31 10:06:27 - pico-train - INFO - ├── Learning Rate: 5.60e-05 2025-08-31 10:06:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:07:24 - pico-train - INFO - Step 65300 -- 🔄 Training Metrics 2025-08-31 10:07:24 - pico-train - INFO - ├── Loss: 4.8193 2025-08-31 10:07:24 - pico-train - INFO - ├── Learning Rate: 5.57e-05 2025-08-31 10:07:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:08:18 - pico-train - INFO - Step 65400 -- 🔄 Training Metrics 2025-08-31 10:08:18 - pico-train - INFO - ├── Loss: 4.8241 2025-08-31 10:08:18 - pico-train - INFO - ├── Learning Rate: 5.55e-05 2025-08-31 10:08:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:09:14 - pico-train - INFO - Step 65500 -- 🔄 Training Metrics 2025-08-31 10:09:14 - pico-train - INFO - ├── Loss: 4.8292 2025-08-31 10:09:14 - pico-train - INFO - ├── Learning Rate: 5.52e-05 2025-08-31 10:09:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:10:09 - pico-train - INFO - Step 65600 -- 🔄 Training Metrics 2025-08-31 10:10:09 - pico-train - INFO - ├── Loss: 4.8186 2025-08-31 10:10:09 - pico-train - INFO - ├── Learning Rate: 5.49e-05 2025-08-31 10:10:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:11:05 - pico-train - INFO - Step 65700 -- 🔄 Training Metrics 2025-08-31 10:11:05 - pico-train - INFO - ├── Loss: 4.8086 2025-08-31 10:11:05 - pico-train - INFO - ├── Learning Rate: 5.46e-05 2025-08-31 10:11:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:12:00 - pico-train - INFO - Step 65800 -- 🔄 Training Metrics 2025-08-31 10:12:00 - pico-train - INFO - ├── Loss: 4.8059 2025-08-31 10:12:00 - pico-train - INFO - ├── Learning Rate: 5.43e-05 2025-08-31 10:12:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:12:55 - pico-train - INFO - Step 65900 -- 🔄 Training Metrics 2025-08-31 10:12:55 - pico-train - INFO - ├── Loss: 4.7922 2025-08-31 10:12:55 - pico-train - INFO - ├── Learning Rate: 5.40e-05 2025-08-31 10:12:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:13:50 - pico-train - INFO - Step 66000 -- 💾 Saving Checkpoint 2025-08-31 10:15:42 - pico-train - INFO - Step 66000 -- 📊 Evaluation Results 2025-08-31 10:15:42 - pico-train - INFO - └── paloma: inf 2025-08-31 10:15:43 - pico-train - INFO - Step 66000 -- 🔄 Training Metrics 2025-08-31 10:15:43 - pico-train - INFO - ├── Loss: 4.8014 2025-08-31 10:15:43 - pico-train - INFO - ├── Learning Rate: 5.37e-05 2025-08-31 10:15:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:15:43 - pico-train - INFO - Step 66000 -- 📈 Saving Learning Dynamics 2025-08-31 10:16:40 - pico-train - INFO - Step 66100 -- 🔄 Training Metrics 2025-08-31 10:16:40 - pico-train - INFO - ├── Loss: 4.8062 2025-08-31 10:16:40 - pico-train - INFO - ├── Learning Rate: 5.35e-05 2025-08-31 10:16:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:17:34 - pico-train - INFO - Step 66200 -- 🔄 Training Metrics 2025-08-31 10:17:34 - pico-train - INFO - ├── Loss: 4.8138 2025-08-31 10:17:34 - pico-train - INFO - ├── Learning Rate: 5.32e-05 2025-08-31 10:17:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:18:29 - pico-train - INFO - Step 66300 -- 🔄 Training Metrics 2025-08-31 10:18:29 - pico-train - INFO - ├── Loss: 4.8191 2025-08-31 10:18:29 - pico-train - INFO - ├── Learning Rate: 5.29e-05 2025-08-31 10:18:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:19:22 - pico-train - INFO - Step 66400 -- 🔄 Training Metrics 2025-08-31 10:19:22 - pico-train - INFO - ├── Loss: 4.7987 2025-08-31 10:19:22 - pico-train - INFO - ├── Learning Rate: 5.26e-05 2025-08-31 10:19:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:20:16 - pico-train - INFO - Step 66500 -- 🔄 Training Metrics 2025-08-31 10:20:16 - pico-train - INFO - ├── Loss: 4.7927 2025-08-31 10:20:16 - pico-train - INFO - ├── Learning Rate: 5.23e-05 2025-08-31 10:20:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:21:12 - pico-train - INFO - Step 66600 -- 🔄 Training Metrics 2025-08-31 10:21:12 - pico-train - INFO - ├── Loss: 4.8114 2025-08-31 10:21:12 - pico-train - INFO - ├── Learning Rate: 5.20e-05 2025-08-31 10:21:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:22:06 - pico-train - INFO - Step 66700 -- 🔄 Training Metrics 2025-08-31 10:22:06 - pico-train - INFO - ├── Loss: 4.7948 2025-08-31 10:22:06 - pico-train - INFO - ├── Learning Rate: 5.18e-05 2025-08-31 10:22:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:23:00 - pico-train - INFO - Step 66800 -- 🔄 Training Metrics 2025-08-31 10:23:00 - pico-train - INFO - ├── Loss: 4.8224 2025-08-31 10:23:00 - pico-train - INFO - ├── Learning Rate: 5.15e-05 2025-08-31 10:23:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:23:54 - pico-train - INFO - Step 66900 -- 🔄 Training Metrics 2025-08-31 10:23:54 - pico-train - INFO - ├── Loss: 4.7917 2025-08-31 10:23:54 - pico-train - INFO - ├── Learning Rate: 5.12e-05 2025-08-31 10:23:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:24:49 - pico-train - INFO - Step 67000 -- 🔄 Training Metrics 2025-08-31 10:24:49 - pico-train - INFO - ├── Loss: 4.8179 2025-08-31 10:24:49 - pico-train - INFO - ├── Learning Rate: 5.09e-05 2025-08-31 10:24:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:25:43 - pico-train - INFO - Step 67100 -- 🔄 Training Metrics 2025-08-31 10:25:43 - pico-train - INFO - ├── Loss: 4.8135 2025-08-31 10:25:43 - pico-train - INFO - ├── Learning Rate: 5.06e-05 2025-08-31 10:25:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:26:37 - pico-train - INFO - Step 67200 -- 🔄 Training Metrics 2025-08-31 10:26:37 - pico-train - INFO - ├── Loss: 4.7937 2025-08-31 10:26:37 - pico-train - INFO - ├── Learning Rate: 5.04e-05 2025-08-31 10:26:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:27:31 - pico-train - INFO - Step 67300 -- 🔄 Training Metrics 2025-08-31 10:27:31 - pico-train - INFO - ├── Loss: 4.7951 2025-08-31 10:27:31 - pico-train - INFO - ├── Learning Rate: 5.01e-05 2025-08-31 10:27:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:28:25 - pico-train - INFO - Step 67400 -- 🔄 Training Metrics 2025-08-31 10:28:25 - pico-train - INFO - ├── Loss: 4.7929 2025-08-31 10:28:25 - pico-train - INFO - ├── Learning Rate: 4.98e-05 2025-08-31 10:28:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:29:20 - pico-train - INFO - Step 67500 -- 🔄 Training Metrics 2025-08-31 10:29:20 - pico-train - INFO - ├── Loss: 4.8136 2025-08-31 10:29:20 - pico-train - INFO - ├── Learning Rate: 4.95e-05 2025-08-31 10:29:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:30:14 - pico-train - INFO - Step 67600 -- 🔄 Training Metrics 2025-08-31 10:30:14 - pico-train - INFO - ├── Loss: 4.7972 2025-08-31 10:30:14 - pico-train - INFO - ├── Learning Rate: 4.93e-05 2025-08-31 10:30:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:31:08 - pico-train - INFO - Step 67700 -- 🔄 Training Metrics 2025-08-31 10:31:08 - pico-train - INFO - ├── Loss: 4.8047 2025-08-31 10:31:08 - pico-train - INFO - ├── Learning Rate: 4.90e-05 2025-08-31 10:31:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:32:02 - pico-train - INFO - Step 67800 -- 🔄 Training Metrics 2025-08-31 10:32:02 - pico-train - INFO - ├── Loss: 4.8156 2025-08-31 10:32:02 - pico-train - INFO - ├── Learning Rate: 4.87e-05 2025-08-31 10:32:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:32:57 - pico-train - INFO - Step 67900 -- 🔄 Training Metrics 2025-08-31 10:32:57 - pico-train - INFO - ├── Loss: 4.7831 2025-08-31 10:32:57 - pico-train - INFO - ├── Learning Rate: 4.84e-05 2025-08-31 10:32:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:33:51 - pico-train - INFO - Step 68000 -- 💾 Saving Checkpoint 2025-08-31 10:35:54 - pico-train - INFO - Step 68000 -- 📊 Evaluation Results 2025-08-31 10:35:54 - pico-train - INFO - └── paloma: inf 2025-08-31 10:35:54 - pico-train - INFO - Step 68000 -- 🔄 Training Metrics 2025-08-31 10:35:54 - pico-train - INFO - ├── Loss: 4.8151 2025-08-31 10:35:54 - pico-train - INFO - ├── Learning Rate: 4.82e-05 2025-08-31 10:35:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:35:54 - pico-train - INFO - Step 68000 -- 📈 Saving Learning Dynamics 2025-08-31 10:36:52 - pico-train - INFO - Step 68100 -- 🔄 Training Metrics 2025-08-31 10:36:52 - pico-train - INFO - ├── Loss: 4.8111 2025-08-31 10:36:52 - pico-train - INFO - ├── Learning Rate: 4.79e-05 2025-08-31 10:36:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:37:47 - pico-train - INFO - Step 68200 -- 🔄 Training Metrics 2025-08-31 10:37:47 - pico-train - INFO - ├── Loss: 4.8168 2025-08-31 10:37:47 - pico-train - INFO - ├── Learning Rate: 4.76e-05 2025-08-31 10:37:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:38:42 - pico-train - INFO - Step 68300 -- 🔄 Training Metrics 2025-08-31 10:38:42 - pico-train - INFO - ├── Loss: 4.8167 2025-08-31 10:38:42 - pico-train - INFO - ├── Learning Rate: 4.73e-05 2025-08-31 10:38:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:39:38 - pico-train - INFO - Step 68400 -- 🔄 Training Metrics 2025-08-31 10:39:38 - pico-train - INFO - ├── Loss: 4.8310 2025-08-31 10:39:38 - pico-train - INFO - ├── Learning Rate: 4.71e-05 2025-08-31 10:39:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:40:33 - pico-train - INFO - Step 68500 -- 🔄 Training Metrics 2025-08-31 10:40:33 - pico-train - INFO - ├── Loss: 4.8160 2025-08-31 10:40:33 - pico-train - INFO - ├── Learning Rate: 4.68e-05 2025-08-31 10:40:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:41:28 - pico-train - INFO - Step 68600 -- 🔄 Training Metrics 2025-08-31 10:41:28 - pico-train - INFO - ├── Loss: 4.7912 2025-08-31 10:41:28 - pico-train - INFO - ├── Learning Rate: 4.65e-05 2025-08-31 10:41:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:42:23 - pico-train - INFO - Step 68700 -- 🔄 Training Metrics 2025-08-31 10:42:23 - pico-train - INFO - ├── Loss: 4.7639 2025-08-31 10:42:23 - pico-train - INFO - ├── Learning Rate: 4.63e-05 2025-08-31 10:42:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:43:19 - pico-train - INFO - Step 68800 -- 🔄 Training Metrics 2025-08-31 10:43:19 - pico-train - INFO - ├── Loss: 4.8171 2025-08-31 10:43:19 - pico-train - INFO - ├── Learning Rate: 4.60e-05 2025-08-31 10:43:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:44:15 - pico-train - INFO - Step 68900 -- 🔄 Training Metrics 2025-08-31 10:44:15 - pico-train - INFO - ├── Loss: 4.7934 2025-08-31 10:44:15 - pico-train - INFO - ├── Learning Rate: 4.57e-05 2025-08-31 10:44:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:45:10 - pico-train - INFO - Step 69000 -- 🔄 Training Metrics 2025-08-31 10:45:10 - pico-train - INFO - ├── Loss: 4.8097 2025-08-31 10:45:10 - pico-train - INFO - ├── Learning Rate: 4.54e-05 2025-08-31 10:45:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:46:05 - pico-train - INFO - Step 69100 -- 🔄 Training Metrics 2025-08-31 10:46:05 - pico-train - INFO - ├── Loss: 4.8128 2025-08-31 10:46:05 - pico-train - INFO - ├── Learning Rate: 4.52e-05 2025-08-31 10:46:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:47:01 - pico-train - INFO - Step 69200 -- 🔄 Training Metrics 2025-08-31 10:47:01 - pico-train - INFO - ├── Loss: 4.8255 2025-08-31 10:47:01 - pico-train - INFO - ├── Learning Rate: 4.49e-05 2025-08-31 10:47:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:47:57 - pico-train - INFO - Step 69300 -- 🔄 Training Metrics 2025-08-31 10:47:57 - pico-train - INFO - ├── Loss: 4.8021 2025-08-31 10:47:57 - pico-train - INFO - ├── Learning Rate: 4.46e-05 2025-08-31 10:47:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:48:52 - pico-train - INFO - Step 69400 -- 🔄 Training Metrics 2025-08-31 10:48:52 - pico-train - INFO - ├── Loss: 4.8067 2025-08-31 10:48:52 - pico-train - INFO - ├── Learning Rate: 4.44e-05 2025-08-31 10:48:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:49:47 - pico-train - INFO - Step 69500 -- 🔄 Training Metrics 2025-08-31 10:49:47 - pico-train - INFO - ├── Loss: 4.7975 2025-08-31 10:49:47 - pico-train - INFO - ├── Learning Rate: 4.41e-05 2025-08-31 10:49:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:50:42 - pico-train - INFO - Step 69600 -- 🔄 Training Metrics 2025-08-31 10:50:42 - pico-train - INFO - ├── Loss: 4.8162 2025-08-31 10:50:42 - pico-train - INFO - ├── Learning Rate: 4.38e-05 2025-08-31 10:50:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:51:38 - pico-train - INFO - Step 69700 -- 🔄 Training Metrics 2025-08-31 10:51:38 - pico-train - INFO - ├── Loss: 4.8043 2025-08-31 10:51:38 - pico-train - INFO - ├── Learning Rate: 4.36e-05 2025-08-31 10:51:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:52:33 - pico-train - INFO - Step 69800 -- 🔄 Training Metrics 2025-08-31 10:52:33 - pico-train - INFO - ├── Loss: 4.7863 2025-08-31 10:52:33 - pico-train - INFO - ├── Learning Rate: 4.33e-05 2025-08-31 10:52:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:53:28 - pico-train - INFO - Step 69900 -- 🔄 Training Metrics 2025-08-31 10:53:28 - pico-train - INFO - ├── Loss: 4.8059 2025-08-31 10:53:28 - pico-train - INFO - ├── Learning Rate: 4.31e-05 2025-08-31 10:53:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:54:23 - pico-train - INFO - Step 70000 -- 💾 Saving Checkpoint 2025-08-31 10:56:27 - pico-train - INFO - Step 70000 -- 📊 Evaluation Results 2025-08-31 10:56:27 - pico-train - INFO - └── paloma: inf 2025-08-31 10:56:28 - pico-train - INFO - Step 70000 -- 🔄 Training Metrics 2025-08-31 10:56:28 - pico-train - INFO - ├── Loss: 4.7928 2025-08-31 10:56:28 - pico-train - INFO - ├── Learning Rate: 4.28e-05 2025-08-31 10:56:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:56:28 - pico-train - INFO - Step 70000 -- 📈 Saving Learning Dynamics 2025-08-31 10:57:24 - pico-train - INFO - Step 70100 -- 🔄 Training Metrics 2025-08-31 10:57:24 - pico-train - INFO - ├── Loss: 4.8159 2025-08-31 10:57:24 - pico-train - INFO - ├── Learning Rate: 4.25e-05 2025-08-31 10:57:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:58:18 - pico-train - INFO - Step 70200 -- 🔄 Training Metrics 2025-08-31 10:58:18 - pico-train - INFO - ├── Loss: 4.8087 2025-08-31 10:58:18 - pico-train - INFO - ├── Learning Rate: 4.23e-05 2025-08-31 10:58:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 10:59:13 - pico-train - INFO - Step 70300 -- 🔄 Training Metrics 2025-08-31 10:59:13 - pico-train - INFO - ├── Loss: 4.8082 2025-08-31 10:59:13 - pico-train - INFO - ├── Learning Rate: 4.20e-05 2025-08-31 10:59:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:00:08 - pico-train - INFO - Step 70400 -- 🔄 Training Metrics 2025-08-31 11:00:08 - pico-train - INFO - ├── Loss: 4.7849 2025-08-31 11:00:08 - pico-train - INFO - ├── Learning Rate: 4.17e-05 2025-08-31 11:00:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:01:02 - pico-train - INFO - Step 70500 -- 🔄 Training Metrics 2025-08-31 11:01:02 - pico-train - INFO - ├── Loss: 4.7940 2025-08-31 11:01:02 - pico-train - INFO - ├── Learning Rate: 4.15e-05 2025-08-31 11:01:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:01:56 - pico-train - INFO - Step 70600 -- 🔄 Training Metrics 2025-08-31 11:01:56 - pico-train - INFO - ├── Loss: 4.7964 2025-08-31 11:01:56 - pico-train - INFO - ├── Learning Rate: 4.12e-05 2025-08-31 11:01:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:02:50 - pico-train - INFO - Step 70700 -- 🔄 Training Metrics 2025-08-31 11:02:50 - pico-train - INFO - ├── Loss: 4.7733 2025-08-31 11:02:50 - pico-train - INFO - ├── Learning Rate: 4.10e-05 2025-08-31 11:02:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:03:44 - pico-train - INFO - Step 70800 -- 🔄 Training Metrics 2025-08-31 11:03:44 - pico-train - INFO - ├── Loss: 4.8100 2025-08-31 11:03:44 - pico-train - INFO - ├── Learning Rate: 4.07e-05 2025-08-31 11:03:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:04:38 - pico-train - INFO - Step 70900 -- 🔄 Training Metrics 2025-08-31 11:04:38 - pico-train - INFO - ├── Loss: 4.8093 2025-08-31 11:04:38 - pico-train - INFO - ├── Learning Rate: 4.04e-05 2025-08-31 11:04:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:05:32 - pico-train - INFO - Step 71000 -- 🔄 Training Metrics 2025-08-31 11:05:32 - pico-train - INFO - ├── Loss: 4.8045 2025-08-31 11:05:32 - pico-train - INFO - ├── Learning Rate: 4.02e-05 2025-08-31 11:05:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:06:26 - pico-train - INFO - Step 71100 -- 🔄 Training Metrics 2025-08-31 11:06:26 - pico-train - INFO - ├── Loss: 4.7921 2025-08-31 11:06:26 - pico-train - INFO - ├── Learning Rate: 3.99e-05 2025-08-31 11:06:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:07:21 - pico-train - INFO - Step 71200 -- 🔄 Training Metrics 2025-08-31 11:07:21 - pico-train - INFO - ├── Loss: 4.8089 2025-08-31 11:07:21 - pico-train - INFO - ├── Learning Rate: 3.97e-05 2025-08-31 11:07:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:08:15 - pico-train - INFO - Step 71300 -- 🔄 Training Metrics 2025-08-31 11:08:15 - pico-train - INFO - ├── Loss: 4.8052 2025-08-31 11:08:15 - pico-train - INFO - ├── Learning Rate: 3.94e-05 2025-08-31 11:08:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:09:09 - pico-train - INFO - Step 71400 -- 🔄 Training Metrics 2025-08-31 11:09:09 - pico-train - INFO - ├── Loss: 4.7895 2025-08-31 11:09:09 - pico-train - INFO - ├── Learning Rate: 3.92e-05 2025-08-31 11:09:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:10:03 - pico-train - INFO - Step 71500 -- 🔄 Training Metrics 2025-08-31 11:10:03 - pico-train - INFO - ├── Loss: 4.8153 2025-08-31 11:10:03 - pico-train - INFO - ├── Learning Rate: 3.89e-05 2025-08-31 11:10:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:10:57 - pico-train - INFO - Step 71600 -- 🔄 Training Metrics 2025-08-31 11:10:57 - pico-train - INFO - ├── Loss: 4.8102 2025-08-31 11:10:57 - pico-train - INFO - ├── Learning Rate: 3.87e-05 2025-08-31 11:10:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:11:54 - pico-train - INFO - Step 71700 -- 🔄 Training Metrics 2025-08-31 11:11:54 - pico-train - INFO - ├── Loss: 4.7847 2025-08-31 11:11:54 - pico-train - INFO - ├── Learning Rate: 3.84e-05 2025-08-31 11:11:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:12:49 - pico-train - INFO - Step 71800 -- 🔄 Training Metrics 2025-08-31 11:12:49 - pico-train - INFO - ├── Loss: 4.7882 2025-08-31 11:12:49 - pico-train - INFO - ├── Learning Rate: 3.82e-05 2025-08-31 11:12:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:13:44 - pico-train - INFO - Step 71900 -- 🔄 Training Metrics 2025-08-31 11:13:44 - pico-train - INFO - ├── Loss: 4.8031 2025-08-31 11:13:44 - pico-train - INFO - ├── Learning Rate: 3.79e-05 2025-08-31 11:13:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:14:39 - pico-train - INFO - Step 72000 -- 💾 Saving Checkpoint 2025-08-31 11:16:41 - pico-train - INFO - Step 72000 -- 📊 Evaluation Results 2025-08-31 11:16:41 - pico-train - INFO - └── paloma: inf 2025-08-31 11:16:41 - pico-train - INFO - Step 72000 -- 🔄 Training Metrics 2025-08-31 11:16:41 - pico-train - INFO - ├── Loss: 4.8050 2025-08-31 11:16:41 - pico-train - INFO - ├── Learning Rate: 3.77e-05 2025-08-31 11:16:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:16:41 - pico-train - INFO - Step 72000 -- 📈 Saving Learning Dynamics 2025-08-31 11:17:38 - pico-train - INFO - Step 72100 -- 🔄 Training Metrics 2025-08-31 11:17:38 - pico-train - INFO - ├── Loss: 4.7998 2025-08-31 11:17:38 - pico-train - INFO - ├── Learning Rate: 3.74e-05 2025-08-31 11:17:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:18:33 - pico-train - INFO - Step 72200 -- 🔄 Training Metrics 2025-08-31 11:18:33 - pico-train - INFO - ├── Loss: 4.8083 2025-08-31 11:18:33 - pico-train - INFO - ├── Learning Rate: 3.72e-05 2025-08-31 11:18:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:19:26 - pico-train - INFO - Step 72300 -- 🔄 Training Metrics 2025-08-31 11:19:26 - pico-train - INFO - ├── Loss: 4.8192 2025-08-31 11:19:26 - pico-train - INFO - ├── Learning Rate: 3.69e-05 2025-08-31 11:19:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:20:20 - pico-train - INFO - Step 72400 -- 🔄 Training Metrics 2025-08-31 11:20:20 - pico-train - INFO - ├── Loss: 4.7900 2025-08-31 11:20:20 - pico-train - INFO - ├── Learning Rate: 3.67e-05 2025-08-31 11:20:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:21:15 - pico-train - INFO - Step 72500 -- 🔄 Training Metrics 2025-08-31 11:21:15 - pico-train - INFO - ├── Loss: 4.7962 2025-08-31 11:21:15 - pico-train - INFO - ├── Learning Rate: 3.64e-05 2025-08-31 11:21:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:22:09 - pico-train - INFO - Step 72600 -- 🔄 Training Metrics 2025-08-31 11:22:09 - pico-train - INFO - ├── Loss: 4.8326 2025-08-31 11:22:09 - pico-train - INFO - ├── Learning Rate: 3.62e-05 2025-08-31 11:22:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:23:03 - pico-train - INFO - Step 72700 -- 🔄 Training Metrics 2025-08-31 11:23:03 - pico-train - INFO - ├── Loss: 4.7872 2025-08-31 11:23:03 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-31 11:23:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:23:57 - pico-train - INFO - Step 72800 -- 🔄 Training Metrics 2025-08-31 11:23:57 - pico-train - INFO - ├── Loss: 4.7991 2025-08-31 11:23:57 - pico-train - INFO - ├── Learning Rate: 3.57e-05 2025-08-31 11:23:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:24:51 - pico-train - INFO - Step 72900 -- 🔄 Training Metrics 2025-08-31 11:24:51 - pico-train - INFO - ├── Loss: 4.7827 2025-08-31 11:24:51 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-31 11:24:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:25:47 - pico-train - INFO - Step 73000 -- 🔄 Training Metrics 2025-08-31 11:25:47 - pico-train - INFO - ├── Loss: 4.7871 2025-08-31 11:25:47 - pico-train - INFO - ├── Learning Rate: 3.52e-05 2025-08-31 11:25:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:26:41 - pico-train - INFO - Step 73100 -- 🔄 Training Metrics 2025-08-31 11:26:41 - pico-train - INFO - ├── Loss: 4.7871 2025-08-31 11:26:41 - pico-train - INFO - ├── Learning Rate: 3.49e-05 2025-08-31 11:26:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:27:35 - pico-train - INFO - Step 73200 -- 🔄 Training Metrics 2025-08-31 11:27:35 - pico-train - INFO - ├── Loss: 4.7996 2025-08-31 11:27:35 - pico-train - INFO - ├── Learning Rate: 3.47e-05 2025-08-31 11:27:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:28:29 - pico-train - INFO - Step 73300 -- 🔄 Training Metrics 2025-08-31 11:28:29 - pico-train - INFO - ├── Loss: 4.8164 2025-08-31 11:28:29 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-31 11:28:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:29:23 - pico-train - INFO - Step 73400 -- 🔄 Training Metrics 2025-08-31 11:29:23 - pico-train - INFO - ├── Loss: 4.7787 2025-08-31 11:29:23 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-31 11:29:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:30:18 - pico-train - INFO - Step 73500 -- 🔄 Training Metrics 2025-08-31 11:30:18 - pico-train - INFO - ├── Loss: 4.7794 2025-08-31 11:30:18 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-31 11:30:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:31:12 - pico-train - INFO - Step 73600 -- 🔄 Training Metrics 2025-08-31 11:31:12 - pico-train - INFO - ├── Loss: 4.7911 2025-08-31 11:31:12 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-31 11:31:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:32:06 - pico-train - INFO - Step 73700 -- 🔄 Training Metrics 2025-08-31 11:32:06 - pico-train - INFO - ├── Loss: 4.8028 2025-08-31 11:32:06 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-31 11:32:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:33:00 - pico-train - INFO - Step 73800 -- 🔄 Training Metrics 2025-08-31 11:33:00 - pico-train - INFO - ├── Loss: 4.7945 2025-08-31 11:33:00 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-31 11:33:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:33:54 - pico-train - INFO - Step 73900 -- 🔄 Training Metrics 2025-08-31 11:33:54 - pico-train - INFO - ├── Loss: 4.8114 2025-08-31 11:33:54 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-31 11:33:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:34:48 - pico-train - INFO - Step 74000 -- 💾 Saving Checkpoint 2025-08-31 11:36:42 - pico-train - INFO - Step 74000 -- 📊 Evaluation Results 2025-08-31 11:36:42 - pico-train - INFO - └── paloma: inf 2025-08-31 11:36:42 - pico-train - INFO - Step 74000 -- 🔄 Training Metrics 2025-08-31 11:36:42 - pico-train - INFO - ├── Loss: 4.8108 2025-08-31 11:36:42 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-31 11:36:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:36:42 - pico-train - INFO - Step 74000 -- 📈 Saving Learning Dynamics 2025-08-31 11:37:40 - pico-train - INFO - Step 74100 -- 🔄 Training Metrics 2025-08-31 11:37:40 - pico-train - INFO - ├── Loss: 4.7847 2025-08-31 11:37:40 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-31 11:37:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:38:34 - pico-train - INFO - Step 74200 -- 🔄 Training Metrics 2025-08-31 11:38:34 - pico-train - INFO - ├── Loss: 4.7943 2025-08-31 11:38:34 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-31 11:38:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:39:31 - pico-train - INFO - Step 74300 -- 🔄 Training Metrics 2025-08-31 11:39:31 - pico-train - INFO - ├── Loss: 4.7812 2025-08-31 11:39:31 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-31 11:39:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:40:26 - pico-train - INFO - Step 74400 -- 🔄 Training Metrics 2025-08-31 11:40:26 - pico-train - INFO - ├── Loss: 4.8010 2025-08-31 11:40:26 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-31 11:40:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:41:21 - pico-train - INFO - Step 74500 -- 🔄 Training Metrics 2025-08-31 11:41:21 - pico-train - INFO - ├── Loss: 4.7899 2025-08-31 11:41:21 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-31 11:41:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:42:16 - pico-train - INFO - Step 74600 -- 🔄 Training Metrics 2025-08-31 11:42:16 - pico-train - INFO - ├── Loss: 4.8248 2025-08-31 11:42:16 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-31 11:42:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:43:11 - pico-train - INFO - Step 74700 -- 🔄 Training Metrics 2025-08-31 11:43:11 - pico-train - INFO - ├── Loss: 4.8035 2025-08-31 11:43:11 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-31 11:43:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:44:06 - pico-train - INFO - Step 74800 -- 🔄 Training Metrics 2025-08-31 11:44:06 - pico-train - INFO - ├── Loss: 4.7996 2025-08-31 11:44:06 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-31 11:44:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:45:01 - pico-train - INFO - Step 74900 -- 🔄 Training Metrics 2025-08-31 11:45:01 - pico-train - INFO - ├── Loss: 4.7822 2025-08-31 11:45:01 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-31 11:45:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:45:57 - pico-train - INFO - Step 75000 -- 🔄 Training Metrics 2025-08-31 11:45:57 - pico-train - INFO - ├── Loss: 4.7939 2025-08-31 11:45:57 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-31 11:45:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:46:51 - pico-train - INFO - Step 75100 -- 🔄 Training Metrics 2025-08-31 11:46:51 - pico-train - INFO - ├── Loss: 4.7926 2025-08-31 11:46:51 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-31 11:46:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:47:47 - pico-train - INFO - Step 75200 -- 🔄 Training Metrics 2025-08-31 11:47:47 - pico-train - INFO - ├── Loss: 4.7906 2025-08-31 11:47:47 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-31 11:47:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:48:42 - pico-train - INFO - Step 75300 -- 🔄 Training Metrics 2025-08-31 11:48:42 - pico-train - INFO - ├── Loss: 4.8084 2025-08-31 11:48:42 - pico-train - INFO - ├── Learning Rate: 2.97e-05 2025-08-31 11:48:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:49:37 - pico-train - INFO - Step 75400 -- 🔄 Training Metrics 2025-08-31 11:49:37 - pico-train - INFO - ├── Loss: 4.7913 2025-08-31 11:49:37 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-31 11:49:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:50:32 - pico-train - INFO - Step 75500 -- 🔄 Training Metrics 2025-08-31 11:50:32 - pico-train - INFO - ├── Loss: 4.7965 2025-08-31 11:50:32 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-31 11:50:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:51:28 - pico-train - INFO - Step 75600 -- 🔄 Training Metrics 2025-08-31 11:51:28 - pico-train - INFO - ├── Loss: 4.7760 2025-08-31 11:51:28 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-31 11:51:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:52:23 - pico-train - INFO - Step 75700 -- 🔄 Training Metrics 2025-08-31 11:52:23 - pico-train - INFO - ├── Loss: 4.7892 2025-08-31 11:52:23 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-31 11:52:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:53:19 - pico-train - INFO - Step 75800 -- 🔄 Training Metrics 2025-08-31 11:53:19 - pico-train - INFO - ├── Loss: 4.7945 2025-08-31 11:53:19 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-31 11:53:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:54:14 - pico-train - INFO - Step 75900 -- 🔄 Training Metrics 2025-08-31 11:54:14 - pico-train - INFO - ├── Loss: 4.7981 2025-08-31 11:54:14 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-31 11:54:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:55:08 - pico-train - INFO - Step 76000 -- 💾 Saving Checkpoint 2025-08-31 11:57:12 - pico-train - INFO - Step 76000 -- 📊 Evaluation Results 2025-08-31 11:57:12 - pico-train - INFO - └── paloma: inf 2025-08-31 11:57:13 - pico-train - INFO - Step 76000 -- 🔄 Training Metrics 2025-08-31 11:57:13 - pico-train - INFO - ├── Loss: 4.7743 2025-08-31 11:57:13 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-31 11:57:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:57:13 - pico-train - INFO - Step 76000 -- 📈 Saving Learning Dynamics 2025-08-31 11:58:09 - pico-train - INFO - Step 76100 -- 🔄 Training Metrics 2025-08-31 11:58:09 - pico-train - INFO - ├── Loss: 4.7833 2025-08-31 11:58:09 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-31 11:58:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:59:04 - pico-train - INFO - Step 76200 -- 🔄 Training Metrics 2025-08-31 11:59:04 - pico-train - INFO - ├── Loss: 4.8036 2025-08-31 11:59:04 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-31 11:59:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 11:59:58 - pico-train - INFO - Step 76300 -- 🔄 Training Metrics 2025-08-31 11:59:58 - pico-train - INFO - ├── Loss: 4.8020 2025-08-31 11:59:58 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-31 11:59:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:00:52 - pico-train - INFO - Step 76400 -- 🔄 Training Metrics 2025-08-31 12:00:52 - pico-train - INFO - ├── Loss: 4.7706 2025-08-31 12:00:52 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-31 12:00:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:01:46 - pico-train - INFO - Step 76500 -- 🔄 Training Metrics 2025-08-31 12:01:46 - pico-train - INFO - ├── Loss: 4.7852 2025-08-31 12:01:46 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-31 12:01:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:02:40 - pico-train - INFO - Step 76600 -- 🔄 Training Metrics 2025-08-31 12:02:40 - pico-train - INFO - ├── Loss: 4.8081 2025-08-31 12:02:40 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-31 12:02:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:03:35 - pico-train - INFO - Step 76700 -- 🔄 Training Metrics 2025-08-31 12:03:35 - pico-train - INFO - ├── Loss: 4.7811 2025-08-31 12:03:35 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-31 12:03:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:04:30 - pico-train - INFO - Step 76800 -- 🔄 Training Metrics 2025-08-31 12:04:30 - pico-train - INFO - ├── Loss: 4.7783 2025-08-31 12:04:30 - pico-train - INFO - ├── Learning Rate: 2.64e-05 2025-08-31 12:04:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:05:24 - pico-train - INFO - Step 76900 -- 🔄 Training Metrics 2025-08-31 12:05:24 - pico-train - INFO - ├── Loss: 4.7936 2025-08-31 12:05:24 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-31 12:05:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:06:18 - pico-train - INFO - Step 77000 -- 🔄 Training Metrics 2025-08-31 12:06:18 - pico-train - INFO - ├── Loss: 4.7972 2025-08-31 12:06:18 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-31 12:06:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:07:13 - pico-train - INFO - Step 77100 -- 🔄 Training Metrics 2025-08-31 12:07:13 - pico-train - INFO - ├── Loss: 4.8170 2025-08-31 12:07:13 - pico-train - INFO - ├── Learning Rate: 2.58e-05 2025-08-31 12:07:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:08:07 - pico-train - INFO - Step 77200 -- 🔄 Training Metrics 2025-08-31 12:08:07 - pico-train - INFO - ├── Loss: 4.7975 2025-08-31 12:08:07 - pico-train - INFO - ├── Learning Rate: 2.55e-05 2025-08-31 12:08:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:09:12 - pico-train - INFO - Step 77300 -- 🔄 Training Metrics 2025-08-31 12:09:12 - pico-train - INFO - ├── Loss: 4.7712 2025-08-31 12:09:12 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-31 12:09:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:10:06 - pico-train - INFO - Step 77400 -- 🔄 Training Metrics 2025-08-31 12:10:06 - pico-train - INFO - ├── Loss: 4.7869 2025-08-31 12:10:06 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-31 12:10:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:11:01 - pico-train - INFO - Step 77500 -- 🔄 Training Metrics 2025-08-31 12:11:01 - pico-train - INFO - ├── Loss: 4.8086 2025-08-31 12:11:01 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-31 12:11:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:11:55 - pico-train - INFO - Step 77600 -- 🔄 Training Metrics 2025-08-31 12:11:55 - pico-train - INFO - ├── Loss: 4.7980 2025-08-31 12:11:55 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-31 12:11:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:12:50 - pico-train - INFO - Step 77700 -- 🔄 Training Metrics 2025-08-31 12:12:50 - pico-train - INFO - ├── Loss: 4.7845 2025-08-31 12:12:50 - pico-train - INFO - ├── Learning Rate: 2.45e-05 2025-08-31 12:12:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:13:44 - pico-train - INFO - Step 77800 -- 🔄 Training Metrics 2025-08-31 12:13:44 - pico-train - INFO - ├── Loss: 4.7779 2025-08-31 12:13:44 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-31 12:13:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:14:38 - pico-train - INFO - Step 77900 -- 🔄 Training Metrics 2025-08-31 12:14:38 - pico-train - INFO - ├── Loss: 4.7901 2025-08-31 12:14:38 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-31 12:14:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:15:32 - pico-train - INFO - Step 78000 -- 💾 Saving Checkpoint 2025-08-31 12:17:30 - pico-train - INFO - Step 78000 -- 📊 Evaluation Results 2025-08-31 12:17:30 - pico-train - INFO - └── paloma: inf 2025-08-31 12:17:31 - pico-train - INFO - Step 78000 -- 🔄 Training Metrics 2025-08-31 12:17:31 - pico-train - INFO - ├── Loss: 4.7973 2025-08-31 12:17:31 - pico-train - INFO - ├── Learning Rate: 2.39e-05 2025-08-31 12:17:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:17:31 - pico-train - INFO - Step 78000 -- 📈 Saving Learning Dynamics 2025-08-31 12:18:29 - pico-train - INFO - Step 78100 -- 🔄 Training Metrics 2025-08-31 12:18:29 - pico-train - INFO - ├── Loss: 4.7623 2025-08-31 12:18:29 - pico-train - INFO - ├── Learning Rate: 2.36e-05 2025-08-31 12:18:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:19:22 - pico-train - INFO - Step 78200 -- 🔄 Training Metrics 2025-08-31 12:19:22 - pico-train - INFO - ├── Loss: 4.8050 2025-08-31 12:19:22 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-31 12:19:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:20:17 - pico-train - INFO - Step 78300 -- 🔄 Training Metrics 2025-08-31 12:20:17 - pico-train - INFO - ├── Loss: 4.7813 2025-08-31 12:20:17 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-31 12:20:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:21:11 - pico-train - INFO - Step 78400 -- 🔄 Training Metrics 2025-08-31 12:21:11 - pico-train - INFO - ├── Loss: 4.8007 2025-08-31 12:21:11 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-31 12:21:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:22:05 - pico-train - INFO - Step 78500 -- 🔄 Training Metrics 2025-08-31 12:22:05 - pico-train - INFO - ├── Loss: 4.8198 2025-08-31 12:22:05 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-31 12:22:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:23:00 - pico-train - INFO - Step 78600 -- 🔄 Training Metrics 2025-08-31 12:23:00 - pico-train - INFO - ├── Loss: 4.7736 2025-08-31 12:23:00 - pico-train - INFO - ├── Learning Rate: 2.26e-05 2025-08-31 12:23:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:23:53 - pico-train - INFO - Step 78700 -- 🔄 Training Metrics 2025-08-31 12:23:53 - pico-train - INFO - ├── Loss: 4.7748 2025-08-31 12:23:53 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-31 12:23:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:24:48 - pico-train - INFO - Step 78800 -- 🔄 Training Metrics 2025-08-31 12:24:48 - pico-train - INFO - ├── Loss: 4.7643 2025-08-31 12:24:48 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-31 12:24:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:25:42 - pico-train - INFO - Step 78900 -- 🔄 Training Metrics 2025-08-31 12:25:42 - pico-train - INFO - ├── Loss: 4.7825 2025-08-31 12:25:42 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-31 12:25:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:26:36 - pico-train - INFO - Step 79000 -- 🔄 Training Metrics 2025-08-31 12:26:36 - pico-train - INFO - ├── Loss: 4.7849 2025-08-31 12:26:36 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-31 12:26:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:27:30 - pico-train - INFO - Step 79100 -- 🔄 Training Metrics 2025-08-31 12:27:30 - pico-train - INFO - ├── Loss: 4.7719 2025-08-31 12:27:30 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-31 12:27:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:28:24 - pico-train - INFO - Step 79200 -- 🔄 Training Metrics 2025-08-31 12:28:24 - pico-train - INFO - ├── Loss: 4.7833 2025-08-31 12:28:24 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-31 12:28:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:29:18 - pico-train - INFO - Step 79300 -- 🔄 Training Metrics 2025-08-31 12:29:18 - pico-train - INFO - ├── Loss: 4.8105 2025-08-31 12:29:18 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-31 12:29:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:30:14 - pico-train - INFO - Step 79400 -- 🔄 Training Metrics 2025-08-31 12:30:14 - pico-train - INFO - ├── Loss: 4.7941 2025-08-31 12:30:14 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-31 12:30:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:31:08 - pico-train - INFO - Step 79500 -- 🔄 Training Metrics 2025-08-31 12:31:08 - pico-train - INFO - ├── Loss: 4.7789 2025-08-31 12:31:08 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-31 12:31:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:32:02 - pico-train - INFO - Step 79600 -- 🔄 Training Metrics 2025-08-31 12:32:02 - pico-train - INFO - ├── Loss: 4.8027 2025-08-31 12:32:02 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-31 12:32:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:32:56 - pico-train - INFO - Step 79700 -- 🔄 Training Metrics 2025-08-31 12:32:56 - pico-train - INFO - ├── Loss: 4.7815 2025-08-31 12:32:56 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-31 12:32:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:33:50 - pico-train - INFO - Step 79800 -- 🔄 Training Metrics 2025-08-31 12:33:50 - pico-train - INFO - ├── Loss: 4.7826 2025-08-31 12:33:50 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-31 12:33:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:34:44 - pico-train - INFO - Step 79900 -- 🔄 Training Metrics 2025-08-31 12:34:44 - pico-train - INFO - ├── Loss: 4.7777 2025-08-31 12:34:44 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-31 12:34:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:35:38 - pico-train - INFO - Step 80000 -- 💾 Saving Checkpoint 2025-08-31 12:37:26 - pico-train - INFO - Step 80000 -- 📊 Evaluation Results 2025-08-31 12:37:26 - pico-train - INFO - └── paloma: inf 2025-08-31 12:37:27 - pico-train - INFO - Step 80000 -- 🔄 Training Metrics 2025-08-31 12:37:27 - pico-train - INFO - ├── Loss: 4.7797 2025-08-31 12:37:27 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:37:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:37:27 - pico-train - INFO - Step 80000 -- 📈 Saving Learning Dynamics 2025-08-31 12:38:23 - pico-train - INFO - Step 80100 -- 🔄 Training Metrics 2025-08-31 12:38:23 - pico-train - INFO - ├── Loss: 4.8004 2025-08-31 12:38:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:38:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:39:19 - pico-train - INFO - Step 80200 -- 🔄 Training Metrics 2025-08-31 12:39:19 - pico-train - INFO - ├── Loss: 4.8047 2025-08-31 12:39:19 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:39:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:40:14 - pico-train - INFO - Step 80300 -- 🔄 Training Metrics 2025-08-31 12:40:14 - pico-train - INFO - ├── Loss: 4.7894 2025-08-31 12:40:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:40:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:41:10 - pico-train - INFO - Step 80400 -- 🔄 Training Metrics 2025-08-31 12:41:10 - pico-train - INFO - ├── Loss: 4.7800 2025-08-31 12:41:10 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:41:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:42:05 - pico-train - INFO - Step 80500 -- 🔄 Training Metrics 2025-08-31 12:42:05 - pico-train - INFO - ├── Loss: 4.7781 2025-08-31 12:42:05 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:42:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:43:00 - pico-train - INFO - Step 80600 -- 🔄 Training Metrics 2025-08-31 12:43:00 - pico-train - INFO - ├── Loss: 4.7872 2025-08-31 12:43:00 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:43:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:43:56 - pico-train - INFO - Step 80700 -- 🔄 Training Metrics 2025-08-31 12:43:56 - pico-train - INFO - ├── Loss: 4.7820 2025-08-31 12:43:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:43:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:44:52 - pico-train - INFO - Step 80800 -- 🔄 Training Metrics 2025-08-31 12:44:52 - pico-train - INFO - ├── Loss: 4.7758 2025-08-31 12:44:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:44:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:45:47 - pico-train - INFO - Step 80900 -- 🔄 Training Metrics 2025-08-31 12:45:47 - pico-train - INFO - ├── Loss: 4.7693 2025-08-31 12:45:47 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:45:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:46:41 - pico-train - INFO - Step 81000 -- 🔄 Training Metrics 2025-08-31 12:46:41 - pico-train - INFO - ├── Loss: 4.8022 2025-08-31 12:46:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:46:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:47:37 - pico-train - INFO - Step 81100 -- 🔄 Training Metrics 2025-08-31 12:47:37 - pico-train - INFO - ├── Loss: 4.7741 2025-08-31 12:47:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:47:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:48:32 - pico-train - INFO - Step 81200 -- 🔄 Training Metrics 2025-08-31 12:48:32 - pico-train - INFO - ├── Loss: 4.7801 2025-08-31 12:48:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:48:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:49:27 - pico-train - INFO - Step 81300 -- 🔄 Training Metrics 2025-08-31 12:49:27 - pico-train - INFO - ├── Loss: 4.7930 2025-08-31 12:49:27 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:49:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:50:22 - pico-train - INFO - Step 81400 -- 🔄 Training Metrics 2025-08-31 12:50:22 - pico-train - INFO - ├── Loss: 4.7744 2025-08-31 12:50:22 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:50:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:51:17 - pico-train - INFO - Step 81500 -- 🔄 Training Metrics 2025-08-31 12:51:17 - pico-train - INFO - ├── Loss: 4.7913 2025-08-31 12:51:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:51:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:52:12 - pico-train - INFO - Step 81600 -- 🔄 Training Metrics 2025-08-31 12:52:12 - pico-train - INFO - ├── Loss: 4.7747 2025-08-31 12:52:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:52:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:53:08 - pico-train - INFO - Step 81700 -- 🔄 Training Metrics 2025-08-31 12:53:08 - pico-train - INFO - ├── Loss: 4.7951 2025-08-31 12:53:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:53:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:54:03 - pico-train - INFO - Step 81800 -- 🔄 Training Metrics 2025-08-31 12:54:03 - pico-train - INFO - ├── Loss: 4.7843 2025-08-31 12:54:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:54:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:54:58 - pico-train - INFO - Step 81900 -- 🔄 Training Metrics 2025-08-31 12:54:58 - pico-train - INFO - ├── Loss: 4.7868 2025-08-31 12:54:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:54:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:55:53 - pico-train - INFO - Step 82000 -- 💾 Saving Checkpoint 2025-08-31 12:57:44 - pico-train - INFO - Step 82000 -- 📊 Evaluation Results 2025-08-31 12:57:44 - pico-train - INFO - └── paloma: inf 2025-08-31 12:57:44 - pico-train - INFO - Step 82000 -- 🔄 Training Metrics 2025-08-31 12:57:44 - pico-train - INFO - ├── Loss: 4.7822 2025-08-31 12:57:44 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:57:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:57:44 - pico-train - INFO - Step 82000 -- 📈 Saving Learning Dynamics 2025-08-31 12:58:41 - pico-train - INFO - Step 82100 -- 🔄 Training Metrics 2025-08-31 12:58:41 - pico-train - INFO - ├── Loss: 4.7911 2025-08-31 12:58:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:58:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 12:59:35 - pico-train - INFO - Step 82200 -- 🔄 Training Metrics 2025-08-31 12:59:35 - pico-train - INFO - ├── Loss: 4.7880 2025-08-31 12:59:35 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 12:59:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:00:29 - pico-train - INFO - Step 82300 -- 🔄 Training Metrics 2025-08-31 13:00:29 - pico-train - INFO - ├── Loss: 4.7872 2025-08-31 13:00:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:00:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:01:23 - pico-train - INFO - Step 82400 -- 🔄 Training Metrics 2025-08-31 13:01:23 - pico-train - INFO - ├── Loss: 4.7660 2025-08-31 13:01:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:01:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:02:17 - pico-train - INFO - Step 82500 -- 🔄 Training Metrics 2025-08-31 13:02:17 - pico-train - INFO - ├── Loss: 4.7859 2025-08-31 13:02:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:02:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:03:11 - pico-train - INFO - Step 82600 -- 🔄 Training Metrics 2025-08-31 13:03:11 - pico-train - INFO - ├── Loss: 4.8142 2025-08-31 13:03:11 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:03:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:04:06 - pico-train - INFO - Step 82700 -- 🔄 Training Metrics 2025-08-31 13:04:06 - pico-train - INFO - ├── Loss: 4.7781 2025-08-31 13:04:06 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:04:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:05:00 - pico-train - INFO - Step 82800 -- 🔄 Training Metrics 2025-08-31 13:05:00 - pico-train - INFO - ├── Loss: 4.7692 2025-08-31 13:05:00 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:05:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:05:53 - pico-train - INFO - Step 82900 -- 🔄 Training Metrics 2025-08-31 13:05:53 - pico-train - INFO - ├── Loss: 4.7854 2025-08-31 13:05:53 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:05:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:06:48 - pico-train - INFO - Step 83000 -- 🔄 Training Metrics 2025-08-31 13:06:48 - pico-train - INFO - ├── Loss: 4.7918 2025-08-31 13:06:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:06:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:07:42 - pico-train - INFO - Step 83100 -- 🔄 Training Metrics 2025-08-31 13:07:42 - pico-train - INFO - ├── Loss: 4.7694 2025-08-31 13:07:42 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:07:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:08:37 - pico-train - INFO - Step 83200 -- 🔄 Training Metrics 2025-08-31 13:08:37 - pico-train - INFO - ├── Loss: 4.7765 2025-08-31 13:08:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:08:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:09:31 - pico-train - INFO - Step 83300 -- 🔄 Training Metrics 2025-08-31 13:09:31 - pico-train - INFO - ├── Loss: 4.7761 2025-08-31 13:09:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:09:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:10:25 - pico-train - INFO - Step 83400 -- 🔄 Training Metrics 2025-08-31 13:10:25 - pico-train - INFO - ├── Loss: 4.7765 2025-08-31 13:10:25 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:10:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:11:20 - pico-train - INFO - Step 83500 -- 🔄 Training Metrics 2025-08-31 13:11:20 - pico-train - INFO - ├── Loss: 4.7859 2025-08-31 13:11:20 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:11:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:12:14 - pico-train - INFO - Step 83600 -- 🔄 Training Metrics 2025-08-31 13:12:14 - pico-train - INFO - ├── Loss: 4.7846 2025-08-31 13:12:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:12:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:13:08 - pico-train - INFO - Step 83700 -- 🔄 Training Metrics 2025-08-31 13:13:08 - pico-train - INFO - ├── Loss: 4.7743 2025-08-31 13:13:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:13:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:14:02 - pico-train - INFO - Step 83800 -- 🔄 Training Metrics 2025-08-31 13:14:02 - pico-train - INFO - ├── Loss: 4.7898 2025-08-31 13:14:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:14:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:14:56 - pico-train - INFO - Step 83900 -- 🔄 Training Metrics 2025-08-31 13:14:56 - pico-train - INFO - ├── Loss: 4.7866 2025-08-31 13:14:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:14:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:15:50 - pico-train - INFO - Step 84000 -- 💾 Saving Checkpoint 2025-08-31 13:17:40 - pico-train - INFO - Step 84000 -- 📊 Evaluation Results 2025-08-31 13:17:40 - pico-train - INFO - └── paloma: inf 2025-08-31 13:17:40 - pico-train - INFO - Step 84000 -- 🔄 Training Metrics 2025-08-31 13:17:40 - pico-train - INFO - ├── Loss: 4.7617 2025-08-31 13:17:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:17:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:17:40 - pico-train - INFO - Step 84000 -- 📈 Saving Learning Dynamics 2025-08-31 13:18:37 - pico-train - INFO - Step 84100 -- 🔄 Training Metrics 2025-08-31 13:18:37 - pico-train - INFO - ├── Loss: 4.7942 2025-08-31 13:18:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:18:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:19:31 - pico-train - INFO - Step 84200 -- 🔄 Training Metrics 2025-08-31 13:19:31 - pico-train - INFO - ├── Loss: 4.7938 2025-08-31 13:19:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:19:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:20:25 - pico-train - INFO - Step 84300 -- 🔄 Training Metrics 2025-08-31 13:20:25 - pico-train - INFO - ├── Loss: 4.7635 2025-08-31 13:20:25 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:20:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:21:19 - pico-train - INFO - Step 84400 -- 🔄 Training Metrics 2025-08-31 13:21:19 - pico-train - INFO - ├── Loss: 4.7841 2025-08-31 13:21:19 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:21:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:22:14 - pico-train - INFO - Step 84500 -- 🔄 Training Metrics 2025-08-31 13:22:14 - pico-train - INFO - ├── Loss: 4.7762 2025-08-31 13:22:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:22:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:23:08 - pico-train - INFO - Step 84600 -- 🔄 Training Metrics 2025-08-31 13:23:08 - pico-train - INFO - ├── Loss: 4.7988 2025-08-31 13:23:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:23:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:24:02 - pico-train - INFO - Step 84700 -- 🔄 Training Metrics 2025-08-31 13:24:02 - pico-train - INFO - ├── Loss: 4.7888 2025-08-31 13:24:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:24:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:24:57 - pico-train - INFO - Step 84800 -- 🔄 Training Metrics 2025-08-31 13:24:57 - pico-train - INFO - ├── Loss: 4.7819 2025-08-31 13:24:57 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:24:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:25:51 - pico-train - INFO - Step 84900 -- 🔄 Training Metrics 2025-08-31 13:25:51 - pico-train - INFO - ├── Loss: 4.7949 2025-08-31 13:25:51 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:25:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:26:45 - pico-train - INFO - Step 85000 -- 🔄 Training Metrics 2025-08-31 13:26:45 - pico-train - INFO - ├── Loss: 4.8050 2025-08-31 13:26:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:26:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:27:39 - pico-train - INFO - Step 85100 -- 🔄 Training Metrics 2025-08-31 13:27:39 - pico-train - INFO - ├── Loss: 4.7981 2025-08-31 13:27:39 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:27:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:28:33 - pico-train - INFO - Step 85200 -- 🔄 Training Metrics 2025-08-31 13:28:33 - pico-train - INFO - ├── Loss: 4.8045 2025-08-31 13:28:33 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:28:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:29:27 - pico-train - INFO - Step 85300 -- 🔄 Training Metrics 2025-08-31 13:29:27 - pico-train - INFO - ├── Loss: 4.7893 2025-08-31 13:29:27 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:29:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:30:21 - pico-train - INFO - Step 85400 -- 🔄 Training Metrics 2025-08-31 13:30:21 - pico-train - INFO - ├── Loss: 4.7978 2025-08-31 13:30:21 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:30:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:31:16 - pico-train - INFO - Step 85500 -- 🔄 Training Metrics 2025-08-31 13:31:16 - pico-train - INFO - ├── Loss: 4.7824 2025-08-31 13:31:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:31:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:32:10 - pico-train - INFO - Step 85600 -- 🔄 Training Metrics 2025-08-31 13:32:10 - pico-train - INFO - ├── Loss: 4.7912 2025-08-31 13:32:10 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:32:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:33:04 - pico-train - INFO - Step 85700 -- 🔄 Training Metrics 2025-08-31 13:33:04 - pico-train - INFO - ├── Loss: 4.7704 2025-08-31 13:33:04 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:33:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:33:59 - pico-train - INFO - Step 85800 -- 🔄 Training Metrics 2025-08-31 13:33:59 - pico-train - INFO - ├── Loss: 4.7781 2025-08-31 13:33:59 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:33:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:34:53 - pico-train - INFO - Step 85900 -- 🔄 Training Metrics 2025-08-31 13:34:53 - pico-train - INFO - ├── Loss: 4.8134 2025-08-31 13:34:53 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:34:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:35:46 - pico-train - INFO - Step 86000 -- 💾 Saving Checkpoint 2025-08-31 13:37:36 - pico-train - INFO - Step 86000 -- 📊 Evaluation Results 2025-08-31 13:37:36 - pico-train - INFO - └── paloma: inf 2025-08-31 13:37:36 - pico-train - INFO - Step 86000 -- 🔄 Training Metrics 2025-08-31 13:37:36 - pico-train - INFO - ├── Loss: 4.7769 2025-08-31 13:37:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:37:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:37:36 - pico-train - INFO - Step 86000 -- 📈 Saving Learning Dynamics 2025-08-31 13:38:33 - pico-train - INFO - Step 86100 -- 🔄 Training Metrics 2025-08-31 13:38:33 - pico-train - INFO - ├── Loss: 4.7967 2025-08-31 13:38:33 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:38:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:39:28 - pico-train - INFO - Step 86200 -- 🔄 Training Metrics 2025-08-31 13:39:28 - pico-train - INFO - ├── Loss: 4.7857 2025-08-31 13:39:28 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:39:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:40:22 - pico-train - INFO - Step 86300 -- 🔄 Training Metrics 2025-08-31 13:40:22 - pico-train - INFO - ├── Loss: 4.7666 2025-08-31 13:40:22 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:40:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:41:16 - pico-train - INFO - Step 86400 -- 🔄 Training Metrics 2025-08-31 13:41:16 - pico-train - INFO - ├── Loss: 4.8334 2025-08-31 13:41:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:41:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:42:10 - pico-train - INFO - Step 86500 -- 🔄 Training Metrics 2025-08-31 13:42:10 - pico-train - INFO - ├── Loss: 4.7574 2025-08-31 13:42:10 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:42:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:43:04 - pico-train - INFO - Step 86600 -- 🔄 Training Metrics 2025-08-31 13:43:04 - pico-train - INFO - ├── Loss: 4.7646 2025-08-31 13:43:04 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:43:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:43:59 - pico-train - INFO - Step 86700 -- 🔄 Training Metrics 2025-08-31 13:43:59 - pico-train - INFO - ├── Loss: 4.7595 2025-08-31 13:43:59 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:43:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:44:53 - pico-train - INFO - Step 86800 -- 🔄 Training Metrics 2025-08-31 13:44:53 - pico-train - INFO - ├── Loss: 4.7842 2025-08-31 13:44:53 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:44:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:45:48 - pico-train - INFO - Step 86900 -- 🔄 Training Metrics 2025-08-31 13:45:48 - pico-train - INFO - ├── Loss: 4.7730 2025-08-31 13:45:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:45:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:46:41 - pico-train - INFO - Step 87000 -- 🔄 Training Metrics 2025-08-31 13:46:41 - pico-train - INFO - ├── Loss: 4.7845 2025-08-31 13:46:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:46:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:47:37 - pico-train - INFO - Step 87100 -- 🔄 Training Metrics 2025-08-31 13:47:37 - pico-train - INFO - ├── Loss: 4.7705 2025-08-31 13:47:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:47:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:48:31 - pico-train - INFO - Step 87200 -- 🔄 Training Metrics 2025-08-31 13:48:31 - pico-train - INFO - ├── Loss: 4.7807 2025-08-31 13:48:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:48:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:49:25 - pico-train - INFO - Step 87300 -- 🔄 Training Metrics 2025-08-31 13:49:25 - pico-train - INFO - ├── Loss: 4.7455 2025-08-31 13:49:25 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:49:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:50:19 - pico-train - INFO - Step 87400 -- 🔄 Training Metrics 2025-08-31 13:50:19 - pico-train - INFO - ├── Loss: 4.7594 2025-08-31 13:50:19 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:50:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:51:13 - pico-train - INFO - Step 87500 -- 🔄 Training Metrics 2025-08-31 13:51:13 - pico-train - INFO - ├── Loss: 4.7864 2025-08-31 13:51:13 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:51:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:52:07 - pico-train - INFO - Step 87600 -- 🔄 Training Metrics 2025-08-31 13:52:07 - pico-train - INFO - ├── Loss: 4.7863 2025-08-31 13:52:07 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:52:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:53:02 - pico-train - INFO - Step 87700 -- 🔄 Training Metrics 2025-08-31 13:53:02 - pico-train - INFO - ├── Loss: 4.7823 2025-08-31 13:53:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:53:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:53:56 - pico-train - INFO - Step 87800 -- 🔄 Training Metrics 2025-08-31 13:53:56 - pico-train - INFO - ├── Loss: 4.7758 2025-08-31 13:53:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:53:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:54:51 - pico-train - INFO - Step 87900 -- 🔄 Training Metrics 2025-08-31 13:54:51 - pico-train - INFO - ├── Loss: 4.7933 2025-08-31 13:54:51 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:54:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:55:46 - pico-train - INFO - Step 88000 -- 💾 Saving Checkpoint 2025-08-31 13:57:38 - pico-train - INFO - Step 88000 -- 📊 Evaluation Results 2025-08-31 13:57:38 - pico-train - INFO - └── paloma: inf 2025-08-31 13:57:39 - pico-train - INFO - Step 88000 -- 🔄 Training Metrics 2025-08-31 13:57:39 - pico-train - INFO - ├── Loss: 4.7787 2025-08-31 13:57:39 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:57:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:57:39 - pico-train - INFO - Step 88000 -- 📈 Saving Learning Dynamics 2025-08-31 13:58:36 - pico-train - INFO - Step 88100 -- 🔄 Training Metrics 2025-08-31 13:58:36 - pico-train - INFO - ├── Loss: 4.8045 2025-08-31 13:58:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:58:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 13:59:30 - pico-train - INFO - Step 88200 -- 🔄 Training Metrics 2025-08-31 13:59:30 - pico-train - INFO - ├── Loss: 4.7801 2025-08-31 13:59:30 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 13:59:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:00:24 - pico-train - INFO - Step 88300 -- 🔄 Training Metrics 2025-08-31 14:00:24 - pico-train - INFO - ├── Loss: 4.7828 2025-08-31 14:00:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:00:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:01:19 - pico-train - INFO - Step 88400 -- 🔄 Training Metrics 2025-08-31 14:01:19 - pico-train - INFO - ├── Loss: 4.7886 2025-08-31 14:01:19 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:01:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:02:13 - pico-train - INFO - Step 88500 -- 🔄 Training Metrics 2025-08-31 14:02:13 - pico-train - INFO - ├── Loss: 4.7916 2025-08-31 14:02:13 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:02:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:03:07 - pico-train - INFO - Step 88600 -- 🔄 Training Metrics 2025-08-31 14:03:07 - pico-train - INFO - ├── Loss: 4.7897 2025-08-31 14:03:07 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:03:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:04:02 - pico-train - INFO - Step 88700 -- 🔄 Training Metrics 2025-08-31 14:04:02 - pico-train - INFO - ├── Loss: 4.7861 2025-08-31 14:04:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:04:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:04:55 - pico-train - INFO - Step 88800 -- 🔄 Training Metrics 2025-08-31 14:04:55 - pico-train - INFO - ├── Loss: 4.7768 2025-08-31 14:04:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:04:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:05:49 - pico-train - INFO - Step 88900 -- 🔄 Training Metrics 2025-08-31 14:05:49 - pico-train - INFO - ├── Loss: 4.7696 2025-08-31 14:05:49 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:05:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:06:45 - pico-train - INFO - Step 89000 -- 🔄 Training Metrics 2025-08-31 14:06:45 - pico-train - INFO - ├── Loss: 4.7836 2025-08-31 14:06:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:06:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:07:40 - pico-train - INFO - Step 89100 -- 🔄 Training Metrics 2025-08-31 14:07:40 - pico-train - INFO - ├── Loss: 4.7812 2025-08-31 14:07:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:07:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:08:35 - pico-train - INFO - Step 89200 -- 🔄 Training Metrics 2025-08-31 14:08:35 - pico-train - INFO - ├── Loss: 4.7896 2025-08-31 14:08:35 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:08:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:09:30 - pico-train - INFO - Step 89300 -- 🔄 Training Metrics 2025-08-31 14:09:30 - pico-train - INFO - ├── Loss: 4.7900 2025-08-31 14:09:30 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:09:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:10:26 - pico-train - INFO - Step 89400 -- 🔄 Training Metrics 2025-08-31 14:10:26 - pico-train - INFO - ├── Loss: 4.7914 2025-08-31 14:10:26 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:10:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:11:21 - pico-train - INFO - Step 89500 -- 🔄 Training Metrics 2025-08-31 14:11:21 - pico-train - INFO - ├── Loss: 4.7921 2025-08-31 14:11:21 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:11:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:12:18 - pico-train - INFO - Step 89600 -- 🔄 Training Metrics 2025-08-31 14:12:18 - pico-train - INFO - ├── Loss: 4.7674 2025-08-31 14:12:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:12:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:13:12 - pico-train - INFO - Step 89700 -- 🔄 Training Metrics 2025-08-31 14:13:12 - pico-train - INFO - ├── Loss: 4.7811 2025-08-31 14:13:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:13:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:14:08 - pico-train - INFO - Step 89800 -- 🔄 Training Metrics 2025-08-31 14:14:08 - pico-train - INFO - ├── Loss: 4.7763 2025-08-31 14:14:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:14:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:15:03 - pico-train - INFO - Step 89900 -- 🔄 Training Metrics 2025-08-31 14:15:03 - pico-train - INFO - ├── Loss: 4.7684 2025-08-31 14:15:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:15:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:15:58 - pico-train - INFO - Step 90000 -- 💾 Saving Checkpoint 2025-08-31 14:17:57 - pico-train - INFO - Step 90000 -- 📊 Evaluation Results 2025-08-31 14:17:57 - pico-train - INFO - └── paloma: inf 2025-08-31 14:17:58 - pico-train - INFO - Step 90000 -- 🔄 Training Metrics 2025-08-31 14:17:58 - pico-train - INFO - ├── Loss: 4.7820 2025-08-31 14:17:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:17:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:17:58 - pico-train - INFO - Step 90000 -- 📈 Saving Learning Dynamics 2025-08-31 14:18:55 - pico-train - INFO - Step 90100 -- 🔄 Training Metrics 2025-08-31 14:18:55 - pico-train - INFO - ├── Loss: 4.7878 2025-08-31 14:18:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:18:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:19:48 - pico-train - INFO - Step 90200 -- 🔄 Training Metrics 2025-08-31 14:19:48 - pico-train - INFO - ├── Loss: 4.7774 2025-08-31 14:19:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:19:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:20:43 - pico-train - INFO - Step 90300 -- 🔄 Training Metrics 2025-08-31 14:20:43 - pico-train - INFO - ├── Loss: 4.7770 2025-08-31 14:20:43 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:20:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:21:37 - pico-train - INFO - Step 90400 -- 🔄 Training Metrics 2025-08-31 14:21:37 - pico-train - INFO - ├── Loss: 4.7744 2025-08-31 14:21:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:21:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:22:31 - pico-train - INFO - Step 90500 -- 🔄 Training Metrics 2025-08-31 14:22:31 - pico-train - INFO - ├── Loss: 4.7964 2025-08-31 14:22:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:22:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:23:26 - pico-train - INFO - Step 90600 -- 🔄 Training Metrics 2025-08-31 14:23:26 - pico-train - INFO - ├── Loss: 4.7863 2025-08-31 14:23:26 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:23:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:24:19 - pico-train - INFO - Step 90700 -- 🔄 Training Metrics 2025-08-31 14:24:19 - pico-train - INFO - ├── Loss: 4.7872 2025-08-31 14:24:19 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:24:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:25:14 - pico-train - INFO - Step 90800 -- 🔄 Training Metrics 2025-08-31 14:25:14 - pico-train - INFO - ├── Loss: 4.7766 2025-08-31 14:25:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:25:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:26:09 - pico-train - INFO - Step 90900 -- 🔄 Training Metrics 2025-08-31 14:26:09 - pico-train - INFO - ├── Loss: 4.7846 2025-08-31 14:26:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:26:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:27:03 - pico-train - INFO - Step 91000 -- 🔄 Training Metrics 2025-08-31 14:27:03 - pico-train - INFO - ├── Loss: 4.7906 2025-08-31 14:27:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:27:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:27:57 - pico-train - INFO - Step 91100 -- 🔄 Training Metrics 2025-08-31 14:27:57 - pico-train - INFO - ├── Loss: 4.7908 2025-08-31 14:27:57 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:27:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:28:52 - pico-train - INFO - Step 91200 -- 🔄 Training Metrics 2025-08-31 14:28:52 - pico-train - INFO - ├── Loss: 4.7946 2025-08-31 14:28:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:28:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:29:46 - pico-train - INFO - Step 91300 -- 🔄 Training Metrics 2025-08-31 14:29:46 - pico-train - INFO - ├── Loss: 4.7753 2025-08-31 14:29:46 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:29:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:30:40 - pico-train - INFO - Step 91400 -- 🔄 Training Metrics 2025-08-31 14:30:40 - pico-train - INFO - ├── Loss: 4.7783 2025-08-31 14:30:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:30:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:31:34 - pico-train - INFO - Step 91500 -- 🔄 Training Metrics 2025-08-31 14:31:34 - pico-train - INFO - ├── Loss: 4.7776 2025-08-31 14:31:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:31:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:32:28 - pico-train - INFO - Step 91600 -- 🔄 Training Metrics 2025-08-31 14:32:28 - pico-train - INFO - ├── Loss: 4.7963 2025-08-31 14:32:28 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:32:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:33:23 - pico-train - INFO - Step 91700 -- 🔄 Training Metrics 2025-08-31 14:33:23 - pico-train - INFO - ├── Loss: 4.7611 2025-08-31 14:33:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:33:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:34:17 - pico-train - INFO - Step 91800 -- 🔄 Training Metrics 2025-08-31 14:34:17 - pico-train - INFO - ├── Loss: 4.7856 2025-08-31 14:34:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:34:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:35:11 - pico-train - INFO - Step 91900 -- 🔄 Training Metrics 2025-08-31 14:35:11 - pico-train - INFO - ├── Loss: 4.7588 2025-08-31 14:35:11 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:35:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:36:04 - pico-train - INFO - Step 92000 -- 💾 Saving Checkpoint 2025-08-31 14:38:06 - pico-train - INFO - Step 92000 -- 📊 Evaluation Results 2025-08-31 14:38:06 - pico-train - INFO - └── paloma: inf 2025-08-31 14:38:06 - pico-train - INFO - Step 92000 -- 🔄 Training Metrics 2025-08-31 14:38:06 - pico-train - INFO - ├── Loss: 4.7726 2025-08-31 14:38:06 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:38:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:38:06 - pico-train - INFO - Step 92000 -- 📈 Saving Learning Dynamics 2025-08-31 14:39:03 - pico-train - INFO - Step 92100 -- 🔄 Training Metrics 2025-08-31 14:39:03 - pico-train - INFO - ├── Loss: 4.7884 2025-08-31 14:39:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:39:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:39:59 - pico-train - INFO - Step 92200 -- 🔄 Training Metrics 2025-08-31 14:39:59 - pico-train - INFO - ├── Loss: 4.7676 2025-08-31 14:39:59 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:39:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:40:53 - pico-train - INFO - Step 92300 -- 🔄 Training Metrics 2025-08-31 14:40:53 - pico-train - INFO - ├── Loss: 4.7904 2025-08-31 14:40:53 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:40:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:41:46 - pico-train - INFO - Step 92400 -- 🔄 Training Metrics 2025-08-31 14:41:46 - pico-train - INFO - ├── Loss: 4.7861 2025-08-31 14:41:46 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:41:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:42:41 - pico-train - INFO - Step 92500 -- 🔄 Training Metrics 2025-08-31 14:42:41 - pico-train - INFO - ├── Loss: 4.8081 2025-08-31 14:42:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:42:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:43:35 - pico-train - INFO - Step 92600 -- 🔄 Training Metrics 2025-08-31 14:43:35 - pico-train - INFO - ├── Loss: 4.7588 2025-08-31 14:43:35 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:43:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:44:29 - pico-train - INFO - Step 92700 -- 🔄 Training Metrics 2025-08-31 14:44:29 - pico-train - INFO - ├── Loss: 4.8001 2025-08-31 14:44:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:44:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:45:23 - pico-train - INFO - Step 92800 -- 🔄 Training Metrics 2025-08-31 14:45:23 - pico-train - INFO - ├── Loss: 4.8004 2025-08-31 14:45:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:45:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:46:17 - pico-train - INFO - Step 92900 -- 🔄 Training Metrics 2025-08-31 14:46:17 - pico-train - INFO - ├── Loss: 4.7781 2025-08-31 14:46:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:46:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:47:11 - pico-train - INFO - Step 93000 -- 🔄 Training Metrics 2025-08-31 14:47:11 - pico-train - INFO - ├── Loss: 4.7952 2025-08-31 14:47:11 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:47:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:48:07 - pico-train - INFO - Step 93100 -- 🔄 Training Metrics 2025-08-31 14:48:07 - pico-train - INFO - ├── Loss: 4.7549 2025-08-31 14:48:07 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:48:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:49:02 - pico-train - INFO - Step 93200 -- 🔄 Training Metrics 2025-08-31 14:49:02 - pico-train - INFO - ├── Loss: 4.7780 2025-08-31 14:49:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:49:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:49:57 - pico-train - INFO - Step 93300 -- 🔄 Training Metrics 2025-08-31 14:49:57 - pico-train - INFO - ├── Loss: 4.7779 2025-08-31 14:49:57 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:49:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:50:52 - pico-train - INFO - Step 93400 -- 🔄 Training Metrics 2025-08-31 14:50:52 - pico-train - INFO - ├── Loss: 4.7857 2025-08-31 14:50:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:50:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:51:48 - pico-train - INFO - Step 93500 -- 🔄 Training Metrics 2025-08-31 14:51:48 - pico-train - INFO - ├── Loss: 4.7841 2025-08-31 14:51:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:51:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:52:44 - pico-train - INFO - Step 93600 -- 🔄 Training Metrics 2025-08-31 14:52:44 - pico-train - INFO - ├── Loss: 4.7888 2025-08-31 14:52:44 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:52:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:53:39 - pico-train - INFO - Step 93700 -- 🔄 Training Metrics 2025-08-31 14:53:39 - pico-train - INFO - ├── Loss: 4.7693 2025-08-31 14:53:39 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:53:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:54:34 - pico-train - INFO - Step 93800 -- 🔄 Training Metrics 2025-08-31 14:54:34 - pico-train - INFO - ├── Loss: 4.7761 2025-08-31 14:54:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:54:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:55:29 - pico-train - INFO - Step 93900 -- 🔄 Training Metrics 2025-08-31 14:55:29 - pico-train - INFO - ├── Loss: 4.7933 2025-08-31 14:55:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:55:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:56:24 - pico-train - INFO - Step 94000 -- 💾 Saving Checkpoint 2025-08-31 14:58:28 - pico-train - INFO - Step 94000 -- 📊 Evaluation Results 2025-08-31 14:58:28 - pico-train - INFO - └── paloma: inf 2025-08-31 14:58:29 - pico-train - INFO - Step 94000 -- 🔄 Training Metrics 2025-08-31 14:58:29 - pico-train - INFO - ├── Loss: 4.7828 2025-08-31 14:58:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:58:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 14:58:29 - pico-train - INFO - Step 94000 -- 📈 Saving Learning Dynamics 2025-08-31 14:59:25 - pico-train - INFO - Step 94100 -- 🔄 Training Metrics 2025-08-31 14:59:25 - pico-train - INFO - ├── Loss: 4.7809 2025-08-31 14:59:25 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 14:59:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:00:20 - pico-train - INFO - Step 94200 -- 🔄 Training Metrics 2025-08-31 15:00:20 - pico-train - INFO - ├── Loss: 4.7890 2025-08-31 15:00:20 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:00:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:01:14 - pico-train - INFO - Step 94300 -- 🔄 Training Metrics 2025-08-31 15:01:14 - pico-train - INFO - ├── Loss: 4.7835 2025-08-31 15:01:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:01:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:02:08 - pico-train - INFO - Step 94400 -- 🔄 Training Metrics 2025-08-31 15:02:08 - pico-train - INFO - ├── Loss: 4.7594 2025-08-31 15:02:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:02:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:03:03 - pico-train - INFO - Step 94500 -- 🔄 Training Metrics 2025-08-31 15:03:03 - pico-train - INFO - ├── Loss: 4.7865 2025-08-31 15:03:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:03:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:03:58 - pico-train - INFO - Step 94600 -- 🔄 Training Metrics 2025-08-31 15:03:58 - pico-train - INFO - ├── Loss: 4.7815 2025-08-31 15:03:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:03:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:04:52 - pico-train - INFO - Step 94700 -- 🔄 Training Metrics 2025-08-31 15:04:52 - pico-train - INFO - ├── Loss: 4.7723 2025-08-31 15:04:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:04:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:05:47 - pico-train - INFO - Step 94800 -- 🔄 Training Metrics 2025-08-31 15:05:47 - pico-train - INFO - ├── Loss: 4.7648 2025-08-31 15:05:47 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:05:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:06:41 - pico-train - INFO - Step 94900 -- 🔄 Training Metrics 2025-08-31 15:06:41 - pico-train - INFO - ├── Loss: 4.7664 2025-08-31 15:06:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:06:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:07:35 - pico-train - INFO - Step 95000 -- 🔄 Training Metrics 2025-08-31 15:07:35 - pico-train - INFO - ├── Loss: 4.7738 2025-08-31 15:07:35 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:07:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:08:29 - pico-train - INFO - Step 95100 -- 🔄 Training Metrics 2025-08-31 15:08:29 - pico-train - INFO - ├── Loss: 4.7868 2025-08-31 15:08:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:08:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:09:23 - pico-train - INFO - Step 95200 -- 🔄 Training Metrics 2025-08-31 15:09:23 - pico-train - INFO - ├── Loss: 4.7702 2025-08-31 15:09:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:09:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:10:17 - pico-train - INFO - Step 95300 -- 🔄 Training Metrics 2025-08-31 15:10:17 - pico-train - INFO - ├── Loss: 4.7908 2025-08-31 15:10:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:10:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:11:12 - pico-train - INFO - Step 95400 -- 🔄 Training Metrics 2025-08-31 15:11:12 - pico-train - INFO - ├── Loss: 4.7643 2025-08-31 15:11:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:11:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:12:06 - pico-train - INFO - Step 95500 -- 🔄 Training Metrics 2025-08-31 15:12:06 - pico-train - INFO - ├── Loss: 4.8021 2025-08-31 15:12:06 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:12:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:13:00 - pico-train - INFO - Step 95600 -- 🔄 Training Metrics 2025-08-31 15:13:00 - pico-train - INFO - ├── Loss: 4.7819 2025-08-31 15:13:00 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:13:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:13:54 - pico-train - INFO - Step 95700 -- 🔄 Training Metrics 2025-08-31 15:13:54 - pico-train - INFO - ├── Loss: 4.7955 2025-08-31 15:13:54 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:13:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:14:48 - pico-train - INFO - Step 95800 -- 🔄 Training Metrics 2025-08-31 15:14:48 - pico-train - INFO - ├── Loss: 4.7821 2025-08-31 15:14:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:14:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:15:42 - pico-train - INFO - Step 95900 -- 🔄 Training Metrics 2025-08-31 15:15:42 - pico-train - INFO - ├── Loss: 4.7720 2025-08-31 15:15:42 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:15:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:16:36 - pico-train - INFO - Step 96000 -- 💾 Saving Checkpoint 2025-08-31 15:18:29 - pico-train - INFO - Step 96000 -- 📊 Evaluation Results 2025-08-31 15:18:29 - pico-train - INFO - └── paloma: inf 2025-08-31 15:18:30 - pico-train - INFO - Step 96000 -- 🔄 Training Metrics 2025-08-31 15:18:30 - pico-train - INFO - ├── Loss: 4.7744 2025-08-31 15:18:30 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:18:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:18:30 - pico-train - INFO - Step 96000 -- 📈 Saving Learning Dynamics 2025-08-31 15:19:26 - pico-train - INFO - Step 96100 -- 🔄 Training Metrics 2025-08-31 15:19:26 - pico-train - INFO - ├── Loss: 4.7928 2025-08-31 15:19:26 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:19:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:20:21 - pico-train - INFO - Step 96200 -- 🔄 Training Metrics 2025-08-31 15:20:21 - pico-train - INFO - ├── Loss: 4.7880 2025-08-31 15:20:21 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:20:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:21:15 - pico-train - INFO - Step 96300 -- 🔄 Training Metrics 2025-08-31 15:21:15 - pico-train - INFO - ├── Loss: 4.7508 2025-08-31 15:21:15 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:21:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:22:09 - pico-train - INFO - Step 96400 -- 🔄 Training Metrics 2025-08-31 15:22:09 - pico-train - INFO - ├── Loss: 4.8135 2025-08-31 15:22:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:22:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:23:04 - pico-train - INFO - Step 96500 -- 🔄 Training Metrics 2025-08-31 15:23:04 - pico-train - INFO - ├── Loss: 4.7808 2025-08-31 15:23:04 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:23:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:23:57 - pico-train - INFO - Step 96600 -- 🔄 Training Metrics 2025-08-31 15:23:57 - pico-train - INFO - ├── Loss: 4.7726 2025-08-31 15:23:57 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:23:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:24:52 - pico-train - INFO - Step 96700 -- 🔄 Training Metrics 2025-08-31 15:24:52 - pico-train - INFO - ├── Loss: 4.7980 2025-08-31 15:24:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:24:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:25:46 - pico-train - INFO - Step 96800 -- 🔄 Training Metrics 2025-08-31 15:25:46 - pico-train - INFO - ├── Loss: 4.7686 2025-08-31 15:25:46 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:25:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:26:41 - pico-train - INFO - Step 96900 -- 🔄 Training Metrics 2025-08-31 15:26:41 - pico-train - INFO - ├── Loss: 4.7789 2025-08-31 15:26:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:26:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:27:35 - pico-train - INFO - Step 97000 -- 🔄 Training Metrics 2025-08-31 15:27:35 - pico-train - INFO - ├── Loss: 4.7608 2025-08-31 15:27:35 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:27:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:28:29 - pico-train - INFO - Step 97100 -- 🔄 Training Metrics 2025-08-31 15:28:29 - pico-train - INFO - ├── Loss: 4.7781 2025-08-31 15:28:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:28:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:29:23 - pico-train - INFO - Step 97200 -- 🔄 Training Metrics 2025-08-31 15:29:23 - pico-train - INFO - ├── Loss: 4.7450 2025-08-31 15:29:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:29:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:30:18 - pico-train - INFO - Step 97300 -- 🔄 Training Metrics 2025-08-31 15:30:18 - pico-train - INFO - ├── Loss: 4.7597 2025-08-31 15:30:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:30:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:31:12 - pico-train - INFO - Step 97400 -- 🔄 Training Metrics 2025-08-31 15:31:12 - pico-train - INFO - ├── Loss: 4.7810 2025-08-31 15:31:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:31:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:32:07 - pico-train - INFO - Step 97500 -- 🔄 Training Metrics 2025-08-31 15:32:07 - pico-train - INFO - ├── Loss: 4.7981 2025-08-31 15:32:07 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:32:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:33:01 - pico-train - INFO - Step 97600 -- 🔄 Training Metrics 2025-08-31 15:33:01 - pico-train - INFO - ├── Loss: 4.7707 2025-08-31 15:33:01 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:33:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:33:55 - pico-train - INFO - Step 97700 -- 🔄 Training Metrics 2025-08-31 15:33:55 - pico-train - INFO - ├── Loss: 4.8093 2025-08-31 15:33:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:33:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:34:50 - pico-train - INFO - Step 97800 -- 🔄 Training Metrics 2025-08-31 15:34:50 - pico-train - INFO - ├── Loss: 4.7746 2025-08-31 15:34:50 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:34:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:35:43 - pico-train - INFO - Step 97900 -- 🔄 Training Metrics 2025-08-31 15:35:43 - pico-train - INFO - ├── Loss: 4.7664 2025-08-31 15:35:43 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:35:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:36:37 - pico-train - INFO - Step 98000 -- 💾 Saving Checkpoint 2025-08-31 15:38:26 - pico-train - INFO - Step 98000 -- 📊 Evaluation Results 2025-08-31 15:38:26 - pico-train - INFO - └── paloma: inf 2025-08-31 15:38:27 - pico-train - INFO - Step 98000 -- 🔄 Training Metrics 2025-08-31 15:38:27 - pico-train - INFO - ├── Loss: 4.7693 2025-08-31 15:38:27 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:38:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:38:27 - pico-train - INFO - Step 98000 -- 📈 Saving Learning Dynamics 2025-08-31 15:39:24 - pico-train - INFO - Step 98100 -- 🔄 Training Metrics 2025-08-31 15:39:24 - pico-train - INFO - ├── Loss: 4.7886 2025-08-31 15:39:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:39:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:40:18 - pico-train - INFO - Step 98200 -- 🔄 Training Metrics 2025-08-31 15:40:18 - pico-train - INFO - ├── Loss: 4.7912 2025-08-31 15:40:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:40:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:41:12 - pico-train - INFO - Step 98300 -- 🔄 Training Metrics 2025-08-31 15:41:12 - pico-train - INFO - ├── Loss: 4.7646 2025-08-31 15:41:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:41:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:42:06 - pico-train - INFO - Step 98400 -- 🔄 Training Metrics 2025-08-31 15:42:06 - pico-train - INFO - ├── Loss: 4.8105 2025-08-31 15:42:06 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:42:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:43:00 - pico-train - INFO - Step 98500 -- 🔄 Training Metrics 2025-08-31 15:43:00 - pico-train - INFO - ├── Loss: 4.7712 2025-08-31 15:43:00 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:43:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:43:55 - pico-train - INFO - Step 98600 -- 🔄 Training Metrics 2025-08-31 15:43:55 - pico-train - INFO - ├── Loss: 4.8066 2025-08-31 15:43:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:43:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:44:51 - pico-train - INFO - Step 98700 -- 🔄 Training Metrics 2025-08-31 15:44:51 - pico-train - INFO - ├── Loss: 4.7833 2025-08-31 15:44:51 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:44:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:45:45 - pico-train - INFO - Step 98800 -- 🔄 Training Metrics 2025-08-31 15:45:45 - pico-train - INFO - ├── Loss: 4.7803 2025-08-31 15:45:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:45:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:46:41 - pico-train - INFO - Step 98900 -- 🔄 Training Metrics 2025-08-31 15:46:41 - pico-train - INFO - ├── Loss: 4.7488 2025-08-31 15:46:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:46:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:47:36 - pico-train - INFO - Step 99000 -- 🔄 Training Metrics 2025-08-31 15:47:36 - pico-train - INFO - ├── Loss: 4.7897 2025-08-31 15:47:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:47:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:48:31 - pico-train - INFO - Step 99100 -- 🔄 Training Metrics 2025-08-31 15:48:31 - pico-train - INFO - ├── Loss: 4.7685 2025-08-31 15:48:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:48:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:49:27 - pico-train - INFO - Step 99200 -- 🔄 Training Metrics 2025-08-31 15:49:27 - pico-train - INFO - ├── Loss: 4.7708 2025-08-31 15:49:27 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:49:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:50:21 - pico-train - INFO - Step 99300 -- 🔄 Training Metrics 2025-08-31 15:50:21 - pico-train - INFO - ├── Loss: 4.7858 2025-08-31 15:50:21 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:50:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:51:17 - pico-train - INFO - Step 99400 -- 🔄 Training Metrics 2025-08-31 15:51:17 - pico-train - INFO - ├── Loss: 4.7736 2025-08-31 15:51:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:51:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:52:12 - pico-train - INFO - Step 99500 -- 🔄 Training Metrics 2025-08-31 15:52:12 - pico-train - INFO - ├── Loss: 4.7551 2025-08-31 15:52:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:52:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:53:08 - pico-train - INFO - Step 99600 -- 🔄 Training Metrics 2025-08-31 15:53:08 - pico-train - INFO - ├── Loss: 4.7529 2025-08-31 15:53:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:53:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:54:03 - pico-train - INFO - Step 99700 -- 🔄 Training Metrics 2025-08-31 15:54:03 - pico-train - INFO - ├── Loss: 4.7696 2025-08-31 15:54:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:54:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:54:57 - pico-train - INFO - Step 99800 -- 🔄 Training Metrics 2025-08-31 15:54:57 - pico-train - INFO - ├── Loss: 4.7800 2025-08-31 15:54:57 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:54:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:55:52 - pico-train - INFO - Step 99900 -- 🔄 Training Metrics 2025-08-31 15:55:52 - pico-train - INFO - ├── Loss: 4.7969 2025-08-31 15:55:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-31 15:55:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-31 15:56:46 - pico-train - INFO - Step 100000 -- 💾 Saving Checkpoint 2025-08-31 15:58:49 - pico-train - INFO - Step 100000 -- 📊 Evaluation Results 2025-08-31 15:58:49 - pico-train - INFO - └── paloma: inf 2025-08-31 15:58:49 - pico-train - INFO - 🎉 Training complete! Final step: 100000