2025-08-30 12:55:54 - pico-train - INFO - Step 75000 -- 📊 Evaluation Results 2025-08-30 12:55:54 - pico-train - INFO - └── paloma: 1.1276895271509786e+32 2025-08-30 12:55:58 - pico-train - INFO - ================================================== 2025-08-30 12:55:58 - pico-train - INFO - ✨ Training Configuration 2025-08-30 12:55:58 - pico-train - INFO - ================================================== 2025-08-30 12:55:58 - pico-train - INFO - ╭─────────────────────────────────────────────────────╮ 2025-08-30 12:55:58 - pico-train - INFO - │ checkpointing: │ 2025-08-30 12:55:58 - pico-train - INFO - │ checkpoints_dir: checkpoints │ 2025-08-30 12:55:58 - pico-train - INFO - │ evaluation: │ 2025-08-30 12:55:58 - pico-train - INFO - │ eval_results_dir: eval_results │ 2025-08-30 12:55:58 - pico-train - INFO - │ fabric_checkpoint_dir: fabric_state │ 2025-08-30 12:55:58 - pico-train - INFO - │ fabric_checkpoint_filename: checkpoint.pt │ 2025-08-30 12:55:58 - pico-train - INFO - │ hf_checkpoint: │ 2025-08-30 12:55:58 - pico-train - INFO - │ collection_slug: null │ 2025-08-30 12:55:58 - pico-train - INFO - │ repo_id: ThomasTheMaker/pico-decoder-tiny │ 2025-08-30 12:55:58 - pico-train - INFO - │ learning_dynamics: │ 2025-08-30 12:55:58 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 12:55:58 - pico-train - INFO - │ eval_data: null │ 2025-08-30 12:55:58 - pico-train - INFO - │ layer_suffixes: │ 2025-08-30 12:55:58 - pico-train - INFO - │ - attention.v_proj │ 2025-08-30 12:55:58 - pico-train - INFO - │ - attention.o_proj │ 2025-08-30 12:55:58 - pico-train - INFO - │ - swiglu.w_2 │ 2025-08-30 12:55:58 - pico-train - INFO - │ sequence_idx: -1 │ 2025-08-30 12:55:58 - pico-train - INFO - │ learning_dynamics_dir: learning_dynamics │ 2025-08-30 12:55:58 - pico-train - INFO - │ logs_dir: logs │ 2025-08-30 12:55:58 - pico-train - INFO - │ run_name: pico-decoder-tiny-dolma5M-v1 │ 2025-08-30 12:55:58 - pico-train - INFO - │ runs_dir: runs │ 2025-08-30 12:55:58 - pico-train - INFO - │ save_every_n_steps: 500 │ 2025-08-30 12:55:58 - pico-train - INFO - │ save_to_hf: true │ 2025-08-30 12:55:58 - pico-train - INFO - │ training: │ 2025-08-30 12:55:58 - pico-train - INFO - │ auto_resume: true │ 2025-08-30 12:55:58 - pico-train - INFO - │ data: │ 2025-08-30 12:55:58 - pico-train - INFO - │ dataloader: │ 2025-08-30 12:55:58 - pico-train - INFO - │ batch_size: 4 │ 2025-08-30 12:55:58 - pico-train - INFO - │ dataset: │ 2025-08-30 12:55:58 - pico-train - INFO - │ name: ThomasTheMaker/pretokenized-dolma-5M │ 2025-08-30 12:55:58 - pico-train - INFO - │ tokenizer: │ 2025-08-30 12:55:58 - pico-train - INFO - │ name: allenai/OLMo-7B-0724-hf │ 2025-08-30 12:55:58 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 12:55:58 - pico-train - INFO - │ evaluation: │ 2025-08-30 12:55:58 - pico-train - INFO - │ metrics: │ 2025-08-30 12:55:58 - pico-train - INFO - │ - paloma │ 2025-08-30 12:55:58 - pico-train - INFO - │ paloma: │ 2025-08-30 12:55:58 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 12:55:58 - pico-train - INFO - │ dataset_name: pico-lm/pretokenized-paloma-tinsy │ 2025-08-30 12:55:58 - pico-train - INFO - │ dataset_split: val │ 2025-08-30 12:55:58 - pico-train - INFO - │ max_length: 2048 │ 2025-08-30 12:55:58 - pico-train - INFO - │ model: │ 2025-08-30 12:55:58 - pico-train - INFO - │ activation_hidden_dim: 384 │ 2025-08-30 12:55:58 - pico-train - INFO - │ attention_n_heads: 12 │ 2025-08-30 12:55:58 - pico-train - INFO - │ attention_n_kv_heads: 4 │ 2025-08-30 12:55:58 - pico-train - INFO - │ batch_size: 1024 │ 2025-08-30 12:55:58 - pico-train - INFO - │ d_model: 96 │ 2025-08-30 12:55:58 - pico-train - INFO - │ max_seq_len: 2048 │ 2025-08-30 12:55:58 - pico-train - INFO - │ model_type: pico_decoder │ 2025-08-30 12:55:58 - pico-train - INFO - │ n_layers: 12 │ 2025-08-30 12:55:58 - pico-train - INFO - │ norm_eps: 1.0e-06 │ 2025-08-30 12:55:58 - pico-train - INFO - │ position_emb_theta: 10000.0 │ 2025-08-30 12:55:58 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 12:55:58 - pico-train - INFO - │ monitoring: │ 2025-08-30 12:55:58 - pico-train - INFO - │ logging: │ 2025-08-30 12:55:58 - pico-train - INFO - │ log_every_n_steps: 25 │ 2025-08-30 12:55:58 - pico-train - INFO - │ log_level: INFO │ 2025-08-30 12:55:58 - pico-train - INFO - │ save_to_wandb: false │ 2025-08-30 12:55:58 - pico-train - INFO - │ wandb: │ 2025-08-30 12:55:58 - pico-train - INFO - │ entity: boymyc │ 2025-08-30 12:55:58 - pico-train - INFO - │ project: pico-decoder-tiny │ 2025-08-30 12:55:58 - pico-train - INFO - │ training: │ 2025-08-30 12:55:58 - pico-train - INFO - │ fabric: │ 2025-08-30 12:55:58 - pico-train - INFO - │ accelerator: cuda │ 2025-08-30 12:55:58 - pico-train - INFO - │ num_devices: 1 │ 2025-08-30 12:55:58 - pico-train - INFO - │ num_nodes: 1 │ 2025-08-30 12:55:58 - pico-train - INFO - │ precision: bf16-mixed │ 2025-08-30 12:55:58 - pico-train - INFO - │ max_steps: 20000 │ 2025-08-30 12:55:58 - pico-train - INFO - │ optimization: │ 2025-08-30 12:55:58 - pico-train - INFO - │ gradient_accumulation_steps: 4 │ 2025-08-30 12:55:58 - pico-train - INFO - │ lr: 5.0e-05 │ 2025-08-30 12:55:58 - pico-train - INFO - │ lr_scheduler: cosine │ 2025-08-30 12:55:58 - pico-train - INFO - │ lr_warmup_steps: 8000 │ 2025-08-30 12:55:58 - pico-train - INFO - │ optimizer: adamw │ 2025-08-30 12:55:58 - pico-train - INFO - │ │ 2025-08-30 12:55:58 - pico-train - INFO - ╰─────────────────────────────────────────────────────╯ 2025-08-30 12:55:58 - pico-train - INFO - ================================================== 2025-08-30 12:55:58 - pico-train - INFO - ⛭ Runtime Summary: 2025-08-30 12:55:58 - pico-train - INFO - ================================================== 2025-08-30 12:55:58 - pico-train - INFO - Starting from step: 75000 2025-08-30 12:55:58 - pico-train - INFO - Model Setup: 2025-08-30 12:55:58 - pico-train - INFO - └─ Total Parameters: 11,282,784 2025-08-30 12:55:58 - pico-train - INFO - └─ Trainable Parameters: 11,282,784 2025-08-30 12:55:58 - pico-train - INFO - Distributed Setup: 2025-08-30 12:55:58 - pico-train - INFO - └─ Number of Devices: 1 2025-08-30 12:55:58 - pico-train - INFO - └─ Device Type: NVIDIA GeForce RTX 5090 2025-08-30 12:55:58 - pico-train - INFO - └─ Available Memory: 33.68 GB 2025-08-30 12:55:58 - pico-train - INFO - Software Setup: 2025-08-30 12:55:58 - pico-train - INFO - └─ Python Version: 3.10.12 2025-08-30 12:55:58 - pico-train - INFO - └─ PyTorch Version: 2.8.0+cu128 2025-08-30 12:55:58 - pico-train - INFO - └─ CUDA Version: 12.8 2025-08-30 12:55:58 - pico-train - INFO - └─ Operating System: Linux 6.8.0-63-generic 2025-08-30 12:55:58 - pico-train - INFO - Batch Size Configuration: 2025-08-30 12:55:58 - pico-train - INFO - └─ Global Batch Size: 4 2025-08-30 12:55:58 - pico-train - INFO - └─ Per Device Batch Size: 1 2025-08-30 12:55:58 - pico-train - INFO - └─ Gradient Accumulation Steps: 4 2025-08-30 12:55:58 - pico-train - INFO - ================================================== 2025-08-30 12:55:58 - pico-train - INFO - Step 75000 -- 🔄 Training Metrics 2025-08-30 12:55:58 - pico-train - INFO - ├── Loss: 5.9842 2025-08-30 12:55:58 - pico-train - INFO - ├── Learning Rate: 8.57e-06 2025-08-30 12:55:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:55:58 - pico-train - INFO - Step 75000 -- 📈 Saving Learning Dynamics 2025-08-30 12:56:15 - pico-train - INFO - Step 75025 -- 🔄 Training Metrics 2025-08-30 12:56:15 - pico-train - INFO - ├── Loss: 5.7937 2025-08-30 12:56:15 - pico-train - INFO - ├── Learning Rate: 8.55e-06 2025-08-30 12:56:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:56:28 - pico-train - INFO - Step 75050 -- 🔄 Training Metrics 2025-08-30 12:56:28 - pico-train - INFO - ├── Loss: 5.7618 2025-08-30 12:56:28 - pico-train - INFO - ├── Learning Rate: 8.54e-06 2025-08-30 12:56:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:56:41 - pico-train - INFO - Step 75075 -- 🔄 Training Metrics 2025-08-30 12:56:41 - pico-train - INFO - ├── Loss: 5.6455 2025-08-30 12:56:41 - pico-train - INFO - ├── Learning Rate: 8.52e-06 2025-08-30 12:56:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:56:53 - pico-train - INFO - Step 75100 -- 🔄 Training Metrics 2025-08-30 12:56:53 - pico-train - INFO - ├── Loss: 5.7658 2025-08-30 12:56:53 - pico-train - INFO - ├── Learning Rate: 8.51e-06 2025-08-30 12:56:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:57:06 - pico-train - INFO - Step 75125 -- 🔄 Training Metrics 2025-08-30 12:57:06 - pico-train - INFO - ├── Loss: 5.7955 2025-08-30 12:57:06 - pico-train - INFO - ├── Learning Rate: 8.49e-06 2025-08-30 12:57:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:57:18 - pico-train - INFO - Step 75150 -- 🔄 Training Metrics 2025-08-30 12:57:18 - pico-train - INFO - ├── Loss: 5.7704 2025-08-30 12:57:18 - pico-train - INFO - ├── Learning Rate: 8.47e-06 2025-08-30 12:57:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:57:31 - pico-train - INFO - Step 75175 -- 🔄 Training Metrics 2025-08-30 12:57:31 - pico-train - INFO - ├── Loss: 5.7650 2025-08-30 12:57:31 - pico-train - INFO - ├── Learning Rate: 8.46e-06 2025-08-30 12:57:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:57:43 - pico-train - INFO - Step 75200 -- 🔄 Training Metrics 2025-08-30 12:57:43 - pico-train - INFO - ├── Loss: 5.8174 2025-08-30 12:57:43 - pico-train - INFO - ├── Learning Rate: 8.44e-06 2025-08-30 12:57:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:57:56 - pico-train - INFO - Step 75225 -- 🔄 Training Metrics 2025-08-30 12:57:56 - pico-train - INFO - ├── Loss: 5.8180 2025-08-30 12:57:56 - pico-train - INFO - ├── Learning Rate: 8.43e-06 2025-08-30 12:57:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:58:09 - pico-train - INFO - Step 75250 -- 🔄 Training Metrics 2025-08-30 12:58:09 - pico-train - INFO - ├── Loss: 5.7200 2025-08-30 12:58:09 - pico-train - INFO - ├── Learning Rate: 8.41e-06 2025-08-30 12:58:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:58:21 - pico-train - INFO - Step 75275 -- 🔄 Training Metrics 2025-08-30 12:58:21 - pico-train - INFO - ├── Loss: 5.7102 2025-08-30 12:58:21 - pico-train - INFO - ├── Learning Rate: 8.39e-06 2025-08-30 12:58:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:58:34 - pico-train - INFO - Step 75300 -- 🔄 Training Metrics 2025-08-30 12:58:34 - pico-train - INFO - ├── Loss: 5.6836 2025-08-30 12:58:34 - pico-train - INFO - ├── Learning Rate: 8.38e-06 2025-08-30 12:58:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:58:47 - pico-train - INFO - Step 75325 -- 🔄 Training Metrics 2025-08-30 12:58:47 - pico-train - INFO - ├── Loss: 5.7321 2025-08-30 12:58:47 - pico-train - INFO - ├── Learning Rate: 8.36e-06 2025-08-30 12:58:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:58:59 - pico-train - INFO - Step 75350 -- 🔄 Training Metrics 2025-08-30 12:58:59 - pico-train - INFO - ├── Loss: 5.7950 2025-08-30 12:58:59 - pico-train - INFO - ├── Learning Rate: 8.35e-06 2025-08-30 12:58:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:59:12 - pico-train - INFO - Step 75375 -- 🔄 Training Metrics 2025-08-30 12:59:12 - pico-train - INFO - ├── Loss: 5.7404 2025-08-30 12:59:12 - pico-train - INFO - ├── Learning Rate: 8.33e-06 2025-08-30 12:59:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:59:24 - pico-train - INFO - Step 75400 -- 🔄 Training Metrics 2025-08-30 12:59:24 - pico-train - INFO - ├── Loss: 5.7231 2025-08-30 12:59:24 - pico-train - INFO - ├── Learning Rate: 8.31e-06 2025-08-30 12:59:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:59:37 - pico-train - INFO - Step 75425 -- 🔄 Training Metrics 2025-08-30 12:59:37 - pico-train - INFO - ├── Loss: 5.8661 2025-08-30 12:59:37 - pico-train - INFO - ├── Learning Rate: 8.30e-06 2025-08-30 12:59:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 12:59:49 - pico-train - INFO - Step 75450 -- 🔄 Training Metrics 2025-08-30 12:59:49 - pico-train - INFO - ├── Loss: 5.7079 2025-08-30 12:59:49 - pico-train - INFO - ├── Learning Rate: 8.28e-06 2025-08-30 12:59:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:00:02 - pico-train - INFO - Step 75475 -- 🔄 Training Metrics 2025-08-30 13:00:02 - pico-train - INFO - ├── Loss: 5.8021 2025-08-30 13:00:02 - pico-train - INFO - ├── Learning Rate: 8.27e-06 2025-08-30 13:00:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:00:14 - pico-train - INFO - Step 75500 -- 💾 Saving Checkpoint 2025-08-30 13:02:14 - pico-train - INFO - Step 75500 -- 📊 Evaluation Results 2025-08-30 13:02:14 - pico-train - INFO - └── paloma: 1.1703533144695549e+32 2025-08-30 13:02:19 - pico-train - INFO - Step 75500 -- 🔄 Training Metrics 2025-08-30 13:02:19 - pico-train - INFO - ├── Loss: 5.7536 2025-08-30 13:02:19 - pico-train - INFO - ├── Learning Rate: 8.25e-06 2025-08-30 13:02:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:02:19 - pico-train - INFO - Step 75500 -- 📈 Saving Learning Dynamics 2025-08-30 13:02:35 - pico-train - INFO - Step 75525 -- 🔄 Training Metrics 2025-08-30 13:02:35 - pico-train - INFO - ├── Loss: 5.7883 2025-08-30 13:02:35 - pico-train - INFO - ├── Learning Rate: 8.23e-06 2025-08-30 13:02:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:02:47 - pico-train - INFO - Step 75550 -- 🔄 Training Metrics 2025-08-30 13:02:47 - pico-train - INFO - ├── Loss: 5.7518 2025-08-30 13:02:47 - pico-train - INFO - ├── Learning Rate: 8.22e-06 2025-08-30 13:02:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:03:00 - pico-train - INFO - Step 75575 -- 🔄 Training Metrics 2025-08-30 13:03:00 - pico-train - INFO - ├── Loss: 5.7713 2025-08-30 13:03:00 - pico-train - INFO - ├── Learning Rate: 8.20e-06 2025-08-30 13:03:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:03:12 - pico-train - INFO - Step 75600 -- 🔄 Training Metrics 2025-08-30 13:03:12 - pico-train - INFO - ├── Loss: 5.8570 2025-08-30 13:03:12 - pico-train - INFO - ├── Learning Rate: 8.19e-06 2025-08-30 13:03:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:03:25 - pico-train - INFO - Step 75625 -- 🔄 Training Metrics 2025-08-30 13:03:25 - pico-train - INFO - ├── Loss: 5.7343 2025-08-30 13:03:25 - pico-train - INFO - ├── Learning Rate: 8.17e-06 2025-08-30 13:03:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:03:37 - pico-train - INFO - Step 75650 -- 🔄 Training Metrics 2025-08-30 13:03:37 - pico-train - INFO - ├── Loss: 5.8057 2025-08-30 13:03:37 - pico-train - INFO - ├── Learning Rate: 8.16e-06 2025-08-30 13:03:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:03:50 - pico-train - INFO - Step 75675 -- 🔄 Training Metrics 2025-08-30 13:03:50 - pico-train - INFO - ├── Loss: 5.7960 2025-08-30 13:03:50 - pico-train - INFO - ├── Learning Rate: 8.14e-06 2025-08-30 13:03:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:04:03 - pico-train - INFO - Step 75700 -- 🔄 Training Metrics 2025-08-30 13:04:03 - pico-train - INFO - ├── Loss: 5.7937 2025-08-30 13:04:03 - pico-train - INFO - ├── Learning Rate: 8.12e-06 2025-08-30 13:04:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:04:15 - pico-train - INFO - Step 75725 -- 🔄 Training Metrics 2025-08-30 13:04:15 - pico-train - INFO - ├── Loss: 5.7846 2025-08-30 13:04:15 - pico-train - INFO - ├── Learning Rate: 8.11e-06 2025-08-30 13:04:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:04:28 - pico-train - INFO - Step 75750 -- 🔄 Training Metrics 2025-08-30 13:04:28 - pico-train - INFO - ├── Loss: 5.7307 2025-08-30 13:04:28 - pico-train - INFO - ├── Learning Rate: 8.09e-06 2025-08-30 13:04:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:04:40 - pico-train - INFO - Step 75775 -- 🔄 Training Metrics 2025-08-30 13:04:40 - pico-train - INFO - ├── Loss: 5.8029 2025-08-30 13:04:40 - pico-train - INFO - ├── Learning Rate: 8.08e-06 2025-08-30 13:04:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:04:53 - pico-train - INFO - Step 75800 -- 🔄 Training Metrics 2025-08-30 13:04:53 - pico-train - INFO - ├── Loss: 5.7910 2025-08-30 13:04:53 - pico-train - INFO - ├── Learning Rate: 8.06e-06 2025-08-30 13:04:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:05:05 - pico-train - INFO - Step 75825 -- 🔄 Training Metrics 2025-08-30 13:05:05 - pico-train - INFO - ├── Loss: 5.7281 2025-08-30 13:05:05 - pico-train - INFO - ├── Learning Rate: 8.05e-06 2025-08-30 13:05:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:05:18 - pico-train - INFO - Step 75850 -- 🔄 Training Metrics 2025-08-30 13:05:18 - pico-train - INFO - ├── Loss: 5.7593 2025-08-30 13:05:18 - pico-train - INFO - ├── Learning Rate: 8.03e-06 2025-08-30 13:05:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:05:31 - pico-train - INFO - Step 75875 -- 🔄 Training Metrics 2025-08-30 13:05:31 - pico-train - INFO - ├── Loss: 5.7519 2025-08-30 13:05:31 - pico-train - INFO - ├── Learning Rate: 8.01e-06 2025-08-30 13:05:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:05:43 - pico-train - INFO - Step 75900 -- 🔄 Training Metrics 2025-08-30 13:05:43 - pico-train - INFO - ├── Loss: 5.7349 2025-08-30 13:05:43 - pico-train - INFO - ├── Learning Rate: 8.00e-06 2025-08-30 13:05:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:05:56 - pico-train - INFO - Step 75925 -- 🔄 Training Metrics 2025-08-30 13:05:56 - pico-train - INFO - ├── Loss: 5.7110 2025-08-30 13:05:56 - pico-train - INFO - ├── Learning Rate: 7.98e-06 2025-08-30 13:05:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:06:08 - pico-train - INFO - Step 75950 -- 🔄 Training Metrics 2025-08-30 13:06:08 - pico-train - INFO - ├── Loss: 5.8281 2025-08-30 13:06:08 - pico-train - INFO - ├── Learning Rate: 7.97e-06 2025-08-30 13:06:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:06:21 - pico-train - INFO - Step 75975 -- 🔄 Training Metrics 2025-08-30 13:06:21 - pico-train - INFO - ├── Loss: 5.8051 2025-08-30 13:06:21 - pico-train - INFO - ├── Learning Rate: 7.95e-06 2025-08-30 13:06:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:06:33 - pico-train - INFO - Step 76000 -- 💾 Saving Checkpoint 2025-08-30 13:08:28 - pico-train - INFO - Step 76000 -- 📊 Evaluation Results 2025-08-30 13:08:28 - pico-train - INFO - └── paloma: 1.0635070314975952e+32 2025-08-30 13:08:34 - pico-train - INFO - Step 76000 -- 🔄 Training Metrics 2025-08-30 13:08:34 - pico-train - INFO - ├── Loss: 5.7022 2025-08-30 13:08:34 - pico-train - INFO - ├── Learning Rate: 7.94e-06 2025-08-30 13:08:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:08:34 - pico-train - INFO - Step 76000 -- 📈 Saving Learning Dynamics 2025-08-30 13:08:50 - pico-train - INFO - Step 76025 -- 🔄 Training Metrics 2025-08-30 13:08:50 - pico-train - INFO - ├── Loss: 5.7178 2025-08-30 13:08:50 - pico-train - INFO - ├── Learning Rate: 7.92e-06 2025-08-30 13:08:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:09:02 - pico-train - INFO - Step 76050 -- 🔄 Training Metrics 2025-08-30 13:09:02 - pico-train - INFO - ├── Loss: 5.7233 2025-08-30 13:09:02 - pico-train - INFO - ├── Learning Rate: 7.91e-06 2025-08-30 13:09:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:09:15 - pico-train - INFO - Step 76075 -- 🔄 Training Metrics 2025-08-30 13:09:15 - pico-train - INFO - ├── Loss: 5.7066 2025-08-30 13:09:15 - pico-train - INFO - ├── Learning Rate: 7.89e-06 2025-08-30 13:09:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:09:27 - pico-train - INFO - Step 76100 -- 🔄 Training Metrics 2025-08-30 13:09:27 - pico-train - INFO - ├── Loss: 5.8350 2025-08-30 13:09:27 - pico-train - INFO - ├── Learning Rate: 7.87e-06 2025-08-30 13:09:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:09:40 - pico-train - INFO - Step 76125 -- 🔄 Training Metrics 2025-08-30 13:09:40 - pico-train - INFO - ├── Loss: 5.7579 2025-08-30 13:09:40 - pico-train - INFO - ├── Learning Rate: 7.86e-06 2025-08-30 13:09:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:09:52 - pico-train - INFO - Step 76150 -- 🔄 Training Metrics 2025-08-30 13:09:52 - pico-train - INFO - ├── Loss: 5.6608 2025-08-30 13:09:52 - pico-train - INFO - ├── Learning Rate: 7.84e-06 2025-08-30 13:09:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:10:05 - pico-train - INFO - Step 76175 -- 🔄 Training Metrics 2025-08-30 13:10:05 - pico-train - INFO - ├── Loss: 5.8757 2025-08-30 13:10:05 - pico-train - INFO - ├── Learning Rate: 7.83e-06 2025-08-30 13:10:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:10:17 - pico-train - INFO - Step 76200 -- 🔄 Training Metrics 2025-08-30 13:10:17 - pico-train - INFO - ├── Loss: 5.8298 2025-08-30 13:10:17 - pico-train - INFO - ├── Learning Rate: 7.81e-06 2025-08-30 13:10:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:10:30 - pico-train - INFO - Step 76225 -- 🔄 Training Metrics 2025-08-30 13:10:30 - pico-train - INFO - ├── Loss: 5.7045 2025-08-30 13:10:30 - pico-train - INFO - ├── Learning Rate: 7.80e-06 2025-08-30 13:10:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:10:43 - pico-train - INFO - Step 76250 -- 🔄 Training Metrics 2025-08-30 13:10:43 - pico-train - INFO - ├── Loss: 5.6883 2025-08-30 13:10:43 - pico-train - INFO - ├── Learning Rate: 7.78e-06 2025-08-30 13:10:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:10:55 - pico-train - INFO - Step 76275 -- 🔄 Training Metrics 2025-08-30 13:10:55 - pico-train - INFO - ├── Loss: 5.7222 2025-08-30 13:10:55 - pico-train - INFO - ├── Learning Rate: 7.77e-06 2025-08-30 13:10:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:11:08 - pico-train - INFO - Step 76300 -- 🔄 Training Metrics 2025-08-30 13:11:08 - pico-train - INFO - ├── Loss: 5.6725 2025-08-30 13:11:08 - pico-train - INFO - ├── Learning Rate: 7.75e-06 2025-08-30 13:11:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:11:20 - pico-train - INFO - Step 76325 -- 🔄 Training Metrics 2025-08-30 13:11:20 - pico-train - INFO - ├── Loss: 5.6940 2025-08-30 13:11:20 - pico-train - INFO - ├── Learning Rate: 7.73e-06 2025-08-30 13:11:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:11:33 - pico-train - INFO - Step 76350 -- 🔄 Training Metrics 2025-08-30 13:11:33 - pico-train - INFO - ├── Loss: 5.7236 2025-08-30 13:11:33 - pico-train - INFO - ├── Learning Rate: 7.72e-06 2025-08-30 13:11:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:11:45 - pico-train - INFO - Step 76375 -- 🔄 Training Metrics 2025-08-30 13:11:45 - pico-train - INFO - ├── Loss: 5.8143 2025-08-30 13:11:45 - pico-train - INFO - ├── Learning Rate: 7.70e-06 2025-08-30 13:11:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:11:58 - pico-train - INFO - Step 76400 -- 🔄 Training Metrics 2025-08-30 13:11:58 - pico-train - INFO - ├── Loss: 5.6933 2025-08-30 13:11:58 - pico-train - INFO - ├── Learning Rate: 7.69e-06 2025-08-30 13:11:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:12:10 - pico-train - INFO - Step 76425 -- 🔄 Training Metrics 2025-08-30 13:12:10 - pico-train - INFO - ├── Loss: 5.7876 2025-08-30 13:12:10 - pico-train - INFO - ├── Learning Rate: 7.67e-06 2025-08-30 13:12:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:12:23 - pico-train - INFO - Step 76450 -- 🔄 Training Metrics 2025-08-30 13:12:23 - pico-train - INFO - ├── Loss: 5.8203 2025-08-30 13:12:23 - pico-train - INFO - ├── Learning Rate: 7.66e-06 2025-08-30 13:12:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:12:36 - pico-train - INFO - Step 76475 -- 🔄 Training Metrics 2025-08-30 13:12:36 - pico-train - INFO - ├── Loss: 5.7310 2025-08-30 13:12:36 - pico-train - INFO - ├── Learning Rate: 7.64e-06 2025-08-30 13:12:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:12:48 - pico-train - INFO - Step 76500 -- 💾 Saving Checkpoint 2025-08-30 13:14:40 - pico-train - INFO - Step 76500 -- 📊 Evaluation Results 2025-08-30 13:14:40 - pico-train - INFO - └── paloma: 1.3281493371595482e+32 2025-08-30 13:14:44 - pico-train - INFO - Step 76500 -- 🔄 Training Metrics 2025-08-30 13:14:44 - pico-train - INFO - ├── Loss: 5.7159 2025-08-30 13:14:44 - pico-train - INFO - ├── Learning Rate: 7.63e-06 2025-08-30 13:14:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:14:44 - pico-train - INFO - Step 76500 -- 📈 Saving Learning Dynamics 2025-08-30 13:14:59 - pico-train - INFO - Step 76525 -- 🔄 Training Metrics 2025-08-30 13:14:59 - pico-train - INFO - ├── Loss: 5.6999 2025-08-30 13:14:59 - pico-train - INFO - ├── Learning Rate: 7.61e-06 2025-08-30 13:14:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:15:12 - pico-train - INFO - Step 76550 -- 🔄 Training Metrics 2025-08-30 13:15:12 - pico-train - INFO - ├── Loss: 5.7363 2025-08-30 13:15:12 - pico-train - INFO - ├── Learning Rate: 7.60e-06 2025-08-30 13:15:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:15:24 - pico-train - INFO - Step 76575 -- 🔄 Training Metrics 2025-08-30 13:15:24 - pico-train - INFO - ├── Loss: 5.7486 2025-08-30 13:15:24 - pico-train - INFO - ├── Learning Rate: 7.58e-06 2025-08-30 13:15:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:15:37 - pico-train - INFO - Step 76600 -- 🔄 Training Metrics 2025-08-30 13:15:37 - pico-train - INFO - ├── Loss: 5.7345 2025-08-30 13:15:37 - pico-train - INFO - ├── Learning Rate: 7.57e-06 2025-08-30 13:15:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:15:50 - pico-train - INFO - Step 76625 -- 🔄 Training Metrics 2025-08-30 13:15:50 - pico-train - INFO - ├── Loss: 5.7601 2025-08-30 13:15:50 - pico-train - INFO - ├── Learning Rate: 7.55e-06 2025-08-30 13:15:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:16:02 - pico-train - INFO - Step 76650 -- 🔄 Training Metrics 2025-08-30 13:16:02 - pico-train - INFO - ├── Loss: 5.7545 2025-08-30 13:16:02 - pico-train - INFO - ├── Learning Rate: 7.53e-06 2025-08-30 13:16:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:16:15 - pico-train - INFO - Step 76675 -- 🔄 Training Metrics 2025-08-30 13:16:15 - pico-train - INFO - ├── Loss: 5.7052 2025-08-30 13:16:15 - pico-train - INFO - ├── Learning Rate: 7.52e-06 2025-08-30 13:16:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:16:27 - pico-train - INFO - Step 76700 -- 🔄 Training Metrics 2025-08-30 13:16:27 - pico-train - INFO - ├── Loss: 5.7594 2025-08-30 13:16:27 - pico-train - INFO - ├── Learning Rate: 7.50e-06 2025-08-30 13:16:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:16:40 - pico-train - INFO - Step 76725 -- 🔄 Training Metrics 2025-08-30 13:16:40 - pico-train - INFO - ├── Loss: 5.8056 2025-08-30 13:16:40 - pico-train - INFO - ├── Learning Rate: 7.49e-06 2025-08-30 13:16:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:16:53 - pico-train - INFO - Step 76750 -- 🔄 Training Metrics 2025-08-30 13:16:53 - pico-train - INFO - ├── Loss: 5.7480 2025-08-30 13:16:53 - pico-train - INFO - ├── Learning Rate: 7.47e-06 2025-08-30 13:16:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:17:05 - pico-train - INFO - Step 76775 -- 🔄 Training Metrics 2025-08-30 13:17:05 - pico-train - INFO - ├── Loss: 5.7824 2025-08-30 13:17:05 - pico-train - INFO - ├── Learning Rate: 7.46e-06 2025-08-30 13:17:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:17:18 - pico-train - INFO - Step 76800 -- 🔄 Training Metrics 2025-08-30 13:17:18 - pico-train - INFO - ├── Loss: 5.8136 2025-08-30 13:17:18 - pico-train - INFO - ├── Learning Rate: 7.44e-06 2025-08-30 13:17:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:17:30 - pico-train - INFO - Step 76825 -- 🔄 Training Metrics 2025-08-30 13:17:30 - pico-train - INFO - ├── Loss: 5.7666 2025-08-30 13:17:30 - pico-train - INFO - ├── Learning Rate: 7.43e-06 2025-08-30 13:17:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:17:43 - pico-train - INFO - Step 76850 -- 🔄 Training Metrics 2025-08-30 13:17:43 - pico-train - INFO - ├── Loss: 5.7527 2025-08-30 13:17:43 - pico-train - INFO - ├── Learning Rate: 7.41e-06 2025-08-30 13:17:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:17:55 - pico-train - INFO - Step 76875 -- 🔄 Training Metrics 2025-08-30 13:17:55 - pico-train - INFO - ├── Loss: 5.8082 2025-08-30 13:17:55 - pico-train - INFO - ├── Learning Rate: 7.40e-06 2025-08-30 13:17:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:18:08 - pico-train - INFO - Step 76900 -- 🔄 Training Metrics 2025-08-30 13:18:08 - pico-train - INFO - ├── Loss: 5.7924 2025-08-30 13:18:08 - pico-train - INFO - ├── Learning Rate: 7.38e-06 2025-08-30 13:18:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:18:21 - pico-train - INFO - Step 76925 -- 🔄 Training Metrics 2025-08-30 13:18:21 - pico-train - INFO - ├── Loss: 5.8537 2025-08-30 13:18:21 - pico-train - INFO - ├── Learning Rate: 7.37e-06 2025-08-30 13:18:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:18:33 - pico-train - INFO - Step 76950 -- 🔄 Training Metrics 2025-08-30 13:18:33 - pico-train - INFO - ├── Loss: 5.7541 2025-08-30 13:18:33 - pico-train - INFO - ├── Learning Rate: 7.35e-06 2025-08-30 13:18:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:18:46 - pico-train - INFO - Step 76975 -- 🔄 Training Metrics 2025-08-30 13:18:46 - pico-train - INFO - ├── Loss: 5.8442 2025-08-30 13:18:46 - pico-train - INFO - ├── Learning Rate: 7.34e-06 2025-08-30 13:18:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:18:58 - pico-train - INFO - Step 77000 -- 💾 Saving Checkpoint 2025-08-30 13:20:53 - pico-train - INFO - Step 77000 -- 📊 Evaluation Results 2025-08-30 13:20:53 - pico-train - INFO - └── paloma: 1.3262581519829998e+32 2025-08-30 13:20:57 - pico-train - INFO - Step 77000 -- 🔄 Training Metrics 2025-08-30 13:20:57 - pico-train - INFO - ├── Loss: 5.7803 2025-08-30 13:20:57 - pico-train - INFO - ├── Learning Rate: 7.32e-06 2025-08-30 13:20:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:20:57 - pico-train - INFO - Step 77000 -- 📈 Saving Learning Dynamics 2025-08-30 13:21:12 - pico-train - INFO - Step 77025 -- 🔄 Training Metrics 2025-08-30 13:21:12 - pico-train - INFO - ├── Loss: 5.8421 2025-08-30 13:21:12 - pico-train - INFO - ├── Learning Rate: 7.31e-06 2025-08-30 13:21:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:21:25 - pico-train - INFO - Step 77050 -- 🔄 Training Metrics 2025-08-30 13:21:25 - pico-train - INFO - ├── Loss: 5.7609 2025-08-30 13:21:25 - pico-train - INFO - ├── Learning Rate: 7.29e-06 2025-08-30 13:21:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:21:37 - pico-train - INFO - Step 77075 -- 🔄 Training Metrics 2025-08-30 13:21:37 - pico-train - INFO - ├── Loss: 5.7051 2025-08-30 13:21:37 - pico-train - INFO - ├── Learning Rate: 7.28e-06 2025-08-30 13:21:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:21:50 - pico-train - INFO - Step 77100 -- 🔄 Training Metrics 2025-08-30 13:21:50 - pico-train - INFO - ├── Loss: 5.6952 2025-08-30 13:21:50 - pico-train - INFO - ├── Learning Rate: 7.26e-06 2025-08-30 13:21:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:22:02 - pico-train - INFO - Step 77125 -- 🔄 Training Metrics 2025-08-30 13:22:02 - pico-train - INFO - ├── Loss: 5.7317 2025-08-30 13:22:02 - pico-train - INFO - ├── Learning Rate: 7.25e-06 2025-08-30 13:22:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:22:15 - pico-train - INFO - Step 77150 -- 🔄 Training Metrics 2025-08-30 13:22:15 - pico-train - INFO - ├── Loss: 5.8339 2025-08-30 13:22:15 - pico-train - INFO - ├── Learning Rate: 7.23e-06 2025-08-30 13:22:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:22:27 - pico-train - INFO - Step 77175 -- 🔄 Training Metrics 2025-08-30 13:22:27 - pico-train - INFO - ├── Loss: 5.8593 2025-08-30 13:22:27 - pico-train - INFO - ├── Learning Rate: 7.22e-06 2025-08-30 13:22:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:22:40 - pico-train - INFO - Step 77200 -- 🔄 Training Metrics 2025-08-30 13:22:40 - pico-train - INFO - ├── Loss: 5.7435 2025-08-30 13:22:40 - pico-train - INFO - ├── Learning Rate: 7.20e-06 2025-08-30 13:22:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:22:52 - pico-train - INFO - Step 77225 -- 🔄 Training Metrics 2025-08-30 13:22:52 - pico-train - INFO - ├── Loss: 5.8122 2025-08-30 13:22:52 - pico-train - INFO - ├── Learning Rate: 7.19e-06 2025-08-30 13:22:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:23:05 - pico-train - INFO - Step 77250 -- 🔄 Training Metrics 2025-08-30 13:23:05 - pico-train - INFO - ├── Loss: 5.7850 2025-08-30 13:23:05 - pico-train - INFO - ├── Learning Rate: 7.17e-06 2025-08-30 13:23:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:23:17 - pico-train - INFO - Step 77275 -- 🔄 Training Metrics 2025-08-30 13:23:17 - pico-train - INFO - ├── Loss: 5.7594 2025-08-30 13:23:17 - pico-train - INFO - ├── Learning Rate: 7.16e-06 2025-08-30 13:23:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:23:30 - pico-train - INFO - Step 77300 -- 🔄 Training Metrics 2025-08-30 13:23:30 - pico-train - INFO - ├── Loss: 5.7617 2025-08-30 13:23:30 - pico-train - INFO - ├── Learning Rate: 7.14e-06 2025-08-30 13:23:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:23:43 - pico-train - INFO - Step 77325 -- 🔄 Training Metrics 2025-08-30 13:23:43 - pico-train - INFO - ├── Loss: 5.8007 2025-08-30 13:23:43 - pico-train - INFO - ├── Learning Rate: 7.13e-06 2025-08-30 13:23:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:23:55 - pico-train - INFO - Step 77350 -- 🔄 Training Metrics 2025-08-30 13:23:55 - pico-train - INFO - ├── Loss: 5.7782 2025-08-30 13:23:55 - pico-train - INFO - ├── Learning Rate: 7.11e-06 2025-08-30 13:23:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:24:08 - pico-train - INFO - Step 77375 -- 🔄 Training Metrics 2025-08-30 13:24:08 - pico-train - INFO - ├── Loss: 5.6834 2025-08-30 13:24:08 - pico-train - INFO - ├── Learning Rate: 7.10e-06 2025-08-30 13:24:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:24:20 - pico-train - INFO - Step 77400 -- 🔄 Training Metrics 2025-08-30 13:24:20 - pico-train - INFO - ├── Loss: 5.8166 2025-08-30 13:24:20 - pico-train - INFO - ├── Learning Rate: 7.08e-06 2025-08-30 13:24:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:24:33 - pico-train - INFO - Step 77425 -- 🔄 Training Metrics 2025-08-30 13:24:33 - pico-train - INFO - ├── Loss: 5.8236 2025-08-30 13:24:33 - pico-train - INFO - ├── Learning Rate: 7.07e-06 2025-08-30 13:24:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:24:45 - pico-train - INFO - Step 77450 -- 🔄 Training Metrics 2025-08-30 13:24:45 - pico-train - INFO - ├── Loss: 5.8171 2025-08-30 13:24:45 - pico-train - INFO - ├── Learning Rate: 7.05e-06 2025-08-30 13:24:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:24:58 - pico-train - INFO - Step 77475 -- 🔄 Training Metrics 2025-08-30 13:24:58 - pico-train - INFO - ├── Loss: 5.8069 2025-08-30 13:24:58 - pico-train - INFO - ├── Learning Rate: 7.04e-06 2025-08-30 13:24:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:25:10 - pico-train - INFO - Step 77500 -- 💾 Saving Checkpoint 2025-08-30 13:27:10 - pico-train - INFO - Step 77500 -- 📊 Evaluation Results 2025-08-30 13:27:10 - pico-train - INFO - └── paloma: 1.37460615720984e+32 2025-08-30 13:27:13 - pico-train - INFO - Step 77500 -- 🔄 Training Metrics 2025-08-30 13:27:13 - pico-train - INFO - ├── Loss: 5.7086 2025-08-30 13:27:13 - pico-train - INFO - ├── Learning Rate: 7.02e-06 2025-08-30 13:27:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:27:13 - pico-train - INFO - Step 77500 -- 📈 Saving Learning Dynamics 2025-08-30 13:27:28 - pico-train - INFO - Step 77525 -- 🔄 Training Metrics 2025-08-30 13:27:28 - pico-train - INFO - ├── Loss: 5.7708 2025-08-30 13:27:28 - pico-train - INFO - ├── Learning Rate: 7.01e-06 2025-08-30 13:27:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:27:41 - pico-train - INFO - Step 77550 -- 🔄 Training Metrics 2025-08-30 13:27:41 - pico-train - INFO - ├── Loss: 5.7651 2025-08-30 13:27:41 - pico-train - INFO - ├── Learning Rate: 6.99e-06 2025-08-30 13:27:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:27:53 - pico-train - INFO - Step 77575 -- 🔄 Training Metrics 2025-08-30 13:27:53 - pico-train - INFO - ├── Loss: 5.7525 2025-08-30 13:27:53 - pico-train - INFO - ├── Learning Rate: 6.98e-06 2025-08-30 13:27:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:28:06 - pico-train - INFO - Step 77600 -- 🔄 Training Metrics 2025-08-30 13:28:06 - pico-train - INFO - ├── Loss: 5.7183 2025-08-30 13:28:06 - pico-train - INFO - ├── Learning Rate: 6.96e-06 2025-08-30 13:28:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:28:19 - pico-train - INFO - Step 77625 -- 🔄 Training Metrics 2025-08-30 13:28:19 - pico-train - INFO - ├── Loss: 5.8332 2025-08-30 13:28:19 - pico-train - INFO - ├── Learning Rate: 6.95e-06 2025-08-30 13:28:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:28:31 - pico-train - INFO - Step 77650 -- 🔄 Training Metrics 2025-08-30 13:28:31 - pico-train - INFO - ├── Loss: 5.7577 2025-08-30 13:28:31 - pico-train - INFO - ├── Learning Rate: 6.93e-06 2025-08-30 13:28:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:28:44 - pico-train - INFO - Step 77675 -- 🔄 Training Metrics 2025-08-30 13:28:44 - pico-train - INFO - ├── Loss: 5.7993 2025-08-30 13:28:44 - pico-train - INFO - ├── Learning Rate: 6.92e-06 2025-08-30 13:28:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:28:56 - pico-train - INFO - Step 77700 -- 🔄 Training Metrics 2025-08-30 13:28:56 - pico-train - INFO - ├── Loss: 5.6537 2025-08-30 13:28:56 - pico-train - INFO - ├── Learning Rate: 6.90e-06 2025-08-30 13:28:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:29:09 - pico-train - INFO - Step 77725 -- 🔄 Training Metrics 2025-08-30 13:29:09 - pico-train - INFO - ├── Loss: 5.7031 2025-08-30 13:29:09 - pico-train - INFO - ├── Learning Rate: 6.89e-06 2025-08-30 13:29:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:29:21 - pico-train - INFO - Step 77750 -- 🔄 Training Metrics 2025-08-30 13:29:21 - pico-train - INFO - ├── Loss: 5.8388 2025-08-30 13:29:21 - pico-train - INFO - ├── Learning Rate: 6.88e-06 2025-08-30 13:29:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:29:34 - pico-train - INFO - Step 77775 -- 🔄 Training Metrics 2025-08-30 13:29:34 - pico-train - INFO - ├── Loss: 5.7670 2025-08-30 13:29:34 - pico-train - INFO - ├── Learning Rate: 6.86e-06 2025-08-30 13:29:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:29:47 - pico-train - INFO - Step 77800 -- 🔄 Training Metrics 2025-08-30 13:29:47 - pico-train - INFO - ├── Loss: 5.7904 2025-08-30 13:29:47 - pico-train - INFO - ├── Learning Rate: 6.85e-06 2025-08-30 13:29:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:29:59 - pico-train - INFO - Step 77825 -- 🔄 Training Metrics 2025-08-30 13:29:59 - pico-train - INFO - ├── Loss: 5.7841 2025-08-30 13:29:59 - pico-train - INFO - ├── Learning Rate: 6.83e-06 2025-08-30 13:29:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:30:12 - pico-train - INFO - Step 77850 -- 🔄 Training Metrics 2025-08-30 13:30:12 - pico-train - INFO - ├── Loss: 5.6992 2025-08-30 13:30:12 - pico-train - INFO - ├── Learning Rate: 6.82e-06 2025-08-30 13:30:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:30:24 - pico-train - INFO - Step 77875 -- 🔄 Training Metrics 2025-08-30 13:30:24 - pico-train - INFO - ├── Loss: 5.8001 2025-08-30 13:30:24 - pico-train - INFO - ├── Learning Rate: 6.80e-06 2025-08-30 13:30:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:30:37 - pico-train - INFO - Step 77900 -- 🔄 Training Metrics 2025-08-30 13:30:37 - pico-train - INFO - ├── Loss: 5.8439 2025-08-30 13:30:37 - pico-train - INFO - ├── Learning Rate: 6.79e-06 2025-08-30 13:30:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:30:49 - pico-train - INFO - Step 77925 -- 🔄 Training Metrics 2025-08-30 13:30:49 - pico-train - INFO - ├── Loss: 5.6487 2025-08-30 13:30:49 - pico-train - INFO - ├── Learning Rate: 6.77e-06 2025-08-30 13:30:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:31:02 - pico-train - INFO - Step 77950 -- 🔄 Training Metrics 2025-08-30 13:31:02 - pico-train - INFO - ├── Loss: 5.7602 2025-08-30 13:31:02 - pico-train - INFO - ├── Learning Rate: 6.76e-06 2025-08-30 13:31:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:31:15 - pico-train - INFO - Step 77975 -- 🔄 Training Metrics 2025-08-30 13:31:15 - pico-train - INFO - ├── Loss: 5.7373 2025-08-30 13:31:15 - pico-train - INFO - ├── Learning Rate: 6.74e-06 2025-08-30 13:31:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:31:27 - pico-train - INFO - Step 78000 -- 💾 Saving Checkpoint 2025-08-30 13:33:24 - pico-train - INFO - Step 78000 -- 📊 Evaluation Results 2025-08-30 13:33:24 - pico-train - INFO - └── paloma: 1.482656221809282e+32 2025-08-30 13:33:27 - pico-train - INFO - Step 78000 -- 🔄 Training Metrics 2025-08-30 13:33:27 - pico-train - INFO - ├── Loss: 5.9167 2025-08-30 13:33:27 - pico-train - INFO - ├── Learning Rate: 6.73e-06 2025-08-30 13:33:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:33:27 - pico-train - INFO - Step 78000 -- 📈 Saving Learning Dynamics 2025-08-30 13:33:42 - pico-train - INFO - Step 78025 -- 🔄 Training Metrics 2025-08-30 13:33:42 - pico-train - INFO - ├── Loss: 5.7856 2025-08-30 13:33:42 - pico-train - INFO - ├── Learning Rate: 6.71e-06 2025-08-30 13:33:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:33:55 - pico-train - INFO - Step 78050 -- 🔄 Training Metrics 2025-08-30 13:33:55 - pico-train - INFO - ├── Loss: 5.7644 2025-08-30 13:33:55 - pico-train - INFO - ├── Learning Rate: 6.70e-06 2025-08-30 13:33:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:34:07 - pico-train - INFO - Step 78075 -- 🔄 Training Metrics 2025-08-30 13:34:07 - pico-train - INFO - ├── Loss: 5.8183 2025-08-30 13:34:07 - pico-train - INFO - ├── Learning Rate: 6.69e-06 2025-08-30 13:34:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:34:20 - pico-train - INFO - Step 78100 -- 🔄 Training Metrics 2025-08-30 13:34:20 - pico-train - INFO - ├── Loss: 5.7616 2025-08-30 13:34:20 - pico-train - INFO - ├── Learning Rate: 6.67e-06 2025-08-30 13:34:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:34:32 - pico-train - INFO - Step 78125 -- 🔄 Training Metrics 2025-08-30 13:34:32 - pico-train - INFO - ├── Loss: 5.8173 2025-08-30 13:34:32 - pico-train - INFO - ├── Learning Rate: 6.66e-06 2025-08-30 13:34:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:34:45 - pico-train - INFO - Step 78150 -- 🔄 Training Metrics 2025-08-30 13:34:45 - pico-train - INFO - ├── Loss: 5.7469 2025-08-30 13:34:45 - pico-train - INFO - ├── Learning Rate: 6.64e-06 2025-08-30 13:34:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:34:57 - pico-train - INFO - Step 78175 -- 🔄 Training Metrics 2025-08-30 13:34:57 - pico-train - INFO - ├── Loss: 5.7143 2025-08-30 13:34:57 - pico-train - INFO - ├── Learning Rate: 6.63e-06 2025-08-30 13:34:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:35:10 - pico-train - INFO - Step 78200 -- 🔄 Training Metrics 2025-08-30 13:35:10 - pico-train - INFO - ├── Loss: 5.7608 2025-08-30 13:35:10 - pico-train - INFO - ├── Learning Rate: 6.61e-06 2025-08-30 13:35:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:35:22 - pico-train - INFO - Step 78225 -- 🔄 Training Metrics 2025-08-30 13:35:22 - pico-train - INFO - ├── Loss: 5.7547 2025-08-30 13:35:22 - pico-train - INFO - ├── Learning Rate: 6.60e-06 2025-08-30 13:35:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:35:35 - pico-train - INFO - Step 78250 -- 🔄 Training Metrics 2025-08-30 13:35:35 - pico-train - INFO - ├── Loss: 5.7887 2025-08-30 13:35:35 - pico-train - INFO - ├── Learning Rate: 6.58e-06 2025-08-30 13:35:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:35:47 - pico-train - INFO - Step 78275 -- 🔄 Training Metrics 2025-08-30 13:35:47 - pico-train - INFO - ├── Loss: 5.7825 2025-08-30 13:35:47 - pico-train - INFO - ├── Learning Rate: 6.57e-06 2025-08-30 13:35:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:36:00 - pico-train - INFO - Step 78300 -- 🔄 Training Metrics 2025-08-30 13:36:00 - pico-train - INFO - ├── Loss: 5.8398 2025-08-30 13:36:00 - pico-train - INFO - ├── Learning Rate: 6.56e-06 2025-08-30 13:36:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:36:12 - pico-train - INFO - Step 78325 -- 🔄 Training Metrics 2025-08-30 13:36:12 - pico-train - INFO - ├── Loss: 5.8214 2025-08-30 13:36:12 - pico-train - INFO - ├── Learning Rate: 6.54e-06 2025-08-30 13:36:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:36:25 - pico-train - INFO - Step 78350 -- 🔄 Training Metrics 2025-08-30 13:36:25 - pico-train - INFO - ├── Loss: 5.7599 2025-08-30 13:36:25 - pico-train - INFO - ├── Learning Rate: 6.53e-06 2025-08-30 13:36:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:36:38 - pico-train - INFO - Step 78375 -- 🔄 Training Metrics 2025-08-30 13:36:38 - pico-train - INFO - ├── Loss: 5.7524 2025-08-30 13:36:38 - pico-train - INFO - ├── Learning Rate: 6.51e-06 2025-08-30 13:36:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:36:50 - pico-train - INFO - Step 78400 -- 🔄 Training Metrics 2025-08-30 13:36:50 - pico-train - INFO - ├── Loss: 5.7680 2025-08-30 13:36:50 - pico-train - INFO - ├── Learning Rate: 6.50e-06 2025-08-30 13:36:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:37:03 - pico-train - INFO - Step 78425 -- 🔄 Training Metrics 2025-08-30 13:37:03 - pico-train - INFO - ├── Loss: 5.8255 2025-08-30 13:37:03 - pico-train - INFO - ├── Learning Rate: 6.48e-06 2025-08-30 13:37:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:37:15 - pico-train - INFO - Step 78450 -- 🔄 Training Metrics 2025-08-30 13:37:15 - pico-train - INFO - ├── Loss: 5.8206 2025-08-30 13:37:15 - pico-train - INFO - ├── Learning Rate: 6.47e-06 2025-08-30 13:37:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:37:28 - pico-train - INFO - Step 78475 -- 🔄 Training Metrics 2025-08-30 13:37:28 - pico-train - INFO - ├── Loss: 5.6984 2025-08-30 13:37:28 - pico-train - INFO - ├── Learning Rate: 6.45e-06 2025-08-30 13:37:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:37:40 - pico-train - INFO - Step 78500 -- 💾 Saving Checkpoint 2025-08-30 13:39:44 - pico-train - INFO - Step 78500 -- 📊 Evaluation Results 2025-08-30 13:39:44 - pico-train - INFO - └── paloma: 1.4987575124157111e+32 2025-08-30 13:39:47 - pico-train - INFO - Step 78500 -- 🔄 Training Metrics 2025-08-30 13:39:47 - pico-train - INFO - ├── Loss: 5.8354 2025-08-30 13:39:47 - pico-train - INFO - ├── Learning Rate: 6.44e-06 2025-08-30 13:39:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:39:47 - pico-train - INFO - Step 78500 -- 📈 Saving Learning Dynamics 2025-08-30 13:40:02 - pico-train - INFO - Step 78525 -- 🔄 Training Metrics 2025-08-30 13:40:02 - pico-train - INFO - ├── Loss: 5.7395 2025-08-30 13:40:02 - pico-train - INFO - ├── Learning Rate: 6.43e-06 2025-08-30 13:40:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:40:14 - pico-train - INFO - Step 78550 -- 🔄 Training Metrics 2025-08-30 13:40:14 - pico-train - INFO - ├── Loss: 5.7707 2025-08-30 13:40:14 - pico-train - INFO - ├── Learning Rate: 6.41e-06 2025-08-30 13:40:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:40:27 - pico-train - INFO - Step 78575 -- 🔄 Training Metrics 2025-08-30 13:40:27 - pico-train - INFO - ├── Loss: 5.6931 2025-08-30 13:40:27 - pico-train - INFO - ├── Learning Rate: 6.40e-06 2025-08-30 13:40:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:40:40 - pico-train - INFO - Step 78600 -- 🔄 Training Metrics 2025-08-30 13:40:40 - pico-train - INFO - ├── Loss: 5.6957 2025-08-30 13:40:40 - pico-train - INFO - ├── Learning Rate: 6.38e-06 2025-08-30 13:40:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:40:52 - pico-train - INFO - Step 78625 -- 🔄 Training Metrics 2025-08-30 13:40:52 - pico-train - INFO - ├── Loss: 5.8155 2025-08-30 13:40:52 - pico-train - INFO - ├── Learning Rate: 6.37e-06 2025-08-30 13:40:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:41:05 - pico-train - INFO - Step 78650 -- 🔄 Training Metrics 2025-08-30 13:41:05 - pico-train - INFO - ├── Loss: 5.7366 2025-08-30 13:41:05 - pico-train - INFO - ├── Learning Rate: 6.35e-06 2025-08-30 13:41:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:41:18 - pico-train - INFO - Step 78675 -- 🔄 Training Metrics 2025-08-30 13:41:18 - pico-train - INFO - ├── Loss: 5.7280 2025-08-30 13:41:18 - pico-train - INFO - ├── Learning Rate: 6.34e-06 2025-08-30 13:41:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:41:30 - pico-train - INFO - Step 78700 -- 🔄 Training Metrics 2025-08-30 13:41:30 - pico-train - INFO - ├── Loss: 5.7837 2025-08-30 13:41:30 - pico-train - INFO - ├── Learning Rate: 6.33e-06 2025-08-30 13:41:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:41:43 - pico-train - INFO - Step 78725 -- 🔄 Training Metrics 2025-08-30 13:41:43 - pico-train - INFO - ├── Loss: 5.7363 2025-08-30 13:41:43 - pico-train - INFO - ├── Learning Rate: 6.31e-06 2025-08-30 13:41:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:41:55 - pico-train - INFO - Step 78750 -- 🔄 Training Metrics 2025-08-30 13:41:55 - pico-train - INFO - ├── Loss: 5.7484 2025-08-30 13:41:55 - pico-train - INFO - ├── Learning Rate: 6.30e-06 2025-08-30 13:41:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:42:08 - pico-train - INFO - Step 78775 -- 🔄 Training Metrics 2025-08-30 13:42:08 - pico-train - INFO - ├── Loss: 5.7971 2025-08-30 13:42:08 - pico-train - INFO - ├── Learning Rate: 6.28e-06 2025-08-30 13:42:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:42:20 - pico-train - INFO - Step 78800 -- 🔄 Training Metrics 2025-08-30 13:42:20 - pico-train - INFO - ├── Loss: 5.7364 2025-08-30 13:42:20 - pico-train - INFO - ├── Learning Rate: 6.27e-06 2025-08-30 13:42:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:42:33 - pico-train - INFO - Step 78825 -- 🔄 Training Metrics 2025-08-30 13:42:33 - pico-train - INFO - ├── Loss: 5.7123 2025-08-30 13:42:33 - pico-train - INFO - ├── Learning Rate: 6.26e-06 2025-08-30 13:42:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:42:45 - pico-train - INFO - Step 78850 -- 🔄 Training Metrics 2025-08-30 13:42:45 - pico-train - INFO - ├── Loss: 5.7304 2025-08-30 13:42:45 - pico-train - INFO - ├── Learning Rate: 6.24e-06 2025-08-30 13:42:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:42:58 - pico-train - INFO - Step 78875 -- 🔄 Training Metrics 2025-08-30 13:42:58 - pico-train - INFO - ├── Loss: 5.8327 2025-08-30 13:42:58 - pico-train - INFO - ├── Learning Rate: 6.23e-06 2025-08-30 13:42:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:43:10 - pico-train - INFO - Step 78900 -- 🔄 Training Metrics 2025-08-30 13:43:10 - pico-train - INFO - ├── Loss: 5.8177 2025-08-30 13:43:10 - pico-train - INFO - ├── Learning Rate: 6.21e-06 2025-08-30 13:43:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:43:23 - pico-train - INFO - Step 78925 -- 🔄 Training Metrics 2025-08-30 13:43:23 - pico-train - INFO - ├── Loss: 5.7848 2025-08-30 13:43:23 - pico-train - INFO - ├── Learning Rate: 6.20e-06 2025-08-30 13:43:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:43:36 - pico-train - INFO - Step 78950 -- 🔄 Training Metrics 2025-08-30 13:43:36 - pico-train - INFO - ├── Loss: 5.7463 2025-08-30 13:43:36 - pico-train - INFO - ├── Learning Rate: 6.19e-06 2025-08-30 13:43:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:43:48 - pico-train - INFO - Step 78975 -- 🔄 Training Metrics 2025-08-30 13:43:48 - pico-train - INFO - ├── Loss: 5.7602 2025-08-30 13:43:48 - pico-train - INFO - ├── Learning Rate: 6.17e-06 2025-08-30 13:43:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:44:00 - pico-train - INFO - Step 79000 -- 💾 Saving Checkpoint 2025-08-30 13:46:09 - pico-train - INFO - Step 79000 -- 📊 Evaluation Results 2025-08-30 13:46:09 - pico-train - INFO - └── paloma: 1.6995062784982508e+32 2025-08-30 13:46:12 - pico-train - INFO - Step 79000 -- 🔄 Training Metrics 2025-08-30 13:46:12 - pico-train - INFO - ├── Loss: 5.6751 2025-08-30 13:46:12 - pico-train - INFO - ├── Learning Rate: 6.16e-06 2025-08-30 13:46:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:46:12 - pico-train - INFO - Step 79000 -- 📈 Saving Learning Dynamics 2025-08-30 13:46:27 - pico-train - INFO - Step 79025 -- 🔄 Training Metrics 2025-08-30 13:46:27 - pico-train - INFO - ├── Loss: 5.7435 2025-08-30 13:46:27 - pico-train - INFO - ├── Learning Rate: 6.14e-06 2025-08-30 13:46:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:46:40 - pico-train - INFO - Step 79050 -- 🔄 Training Metrics 2025-08-30 13:46:40 - pico-train - INFO - ├── Loss: 5.7299 2025-08-30 13:46:40 - pico-train - INFO - ├── Learning Rate: 6.13e-06 2025-08-30 13:46:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:46:52 - pico-train - INFO - Step 79075 -- 🔄 Training Metrics 2025-08-30 13:46:52 - pico-train - INFO - ├── Loss: 5.6822 2025-08-30 13:46:52 - pico-train - INFO - ├── Learning Rate: 6.12e-06 2025-08-30 13:46:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:47:05 - pico-train - INFO - Step 79100 -- 🔄 Training Metrics 2025-08-30 13:47:05 - pico-train - INFO - ├── Loss: 5.8018 2025-08-30 13:47:05 - pico-train - INFO - ├── Learning Rate: 6.10e-06 2025-08-30 13:47:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:47:17 - pico-train - INFO - Step 79125 -- 🔄 Training Metrics 2025-08-30 13:47:17 - pico-train - INFO - ├── Loss: 5.7502 2025-08-30 13:47:17 - pico-train - INFO - ├── Learning Rate: 6.09e-06 2025-08-30 13:47:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:47:30 - pico-train - INFO - Step 79150 -- 🔄 Training Metrics 2025-08-30 13:47:30 - pico-train - INFO - ├── Loss: 5.7981 2025-08-30 13:47:30 - pico-train - INFO - ├── Learning Rate: 6.07e-06 2025-08-30 13:47:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:47:43 - pico-train - INFO - Step 79175 -- 🔄 Training Metrics 2025-08-30 13:47:43 - pico-train - INFO - ├── Loss: 5.7462 2025-08-30 13:47:43 - pico-train - INFO - ├── Learning Rate: 6.06e-06 2025-08-30 13:47:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:47:55 - pico-train - INFO - Step 79200 -- 🔄 Training Metrics 2025-08-30 13:47:55 - pico-train - INFO - ├── Loss: 5.7083 2025-08-30 13:47:55 - pico-train - INFO - ├── Learning Rate: 6.05e-06 2025-08-30 13:47:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:48:08 - pico-train - INFO - Step 79225 -- 🔄 Training Metrics 2025-08-30 13:48:08 - pico-train - INFO - ├── Loss: 5.7592 2025-08-30 13:48:08 - pico-train - INFO - ├── Learning Rate: 6.03e-06 2025-08-30 13:48:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:48:21 - pico-train - INFO - Step 79250 -- 🔄 Training Metrics 2025-08-30 13:48:21 - pico-train - INFO - ├── Loss: 5.7599 2025-08-30 13:48:21 - pico-train - INFO - ├── Learning Rate: 6.02e-06 2025-08-30 13:48:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:48:33 - pico-train - INFO - Step 79275 -- 🔄 Training Metrics 2025-08-30 13:48:33 - pico-train - INFO - ├── Loss: 5.7827 2025-08-30 13:48:33 - pico-train - INFO - ├── Learning Rate: 6.00e-06 2025-08-30 13:48:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:48:46 - pico-train - INFO - Step 79300 -- 🔄 Training Metrics 2025-08-30 13:48:46 - pico-train - INFO - ├── Loss: 5.7638 2025-08-30 13:48:46 - pico-train - INFO - ├── Learning Rate: 5.99e-06 2025-08-30 13:48:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:48:59 - pico-train - INFO - Step 79325 -- 🔄 Training Metrics 2025-08-30 13:48:59 - pico-train - INFO - ├── Loss: 5.7239 2025-08-30 13:48:59 - pico-train - INFO - ├── Learning Rate: 5.98e-06 2025-08-30 13:48:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:49:11 - pico-train - INFO - Step 79350 -- 🔄 Training Metrics 2025-08-30 13:49:11 - pico-train - INFO - ├── Loss: 5.7121 2025-08-30 13:49:11 - pico-train - INFO - ├── Learning Rate: 5.96e-06 2025-08-30 13:49:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:49:24 - pico-train - INFO - Step 79375 -- 🔄 Training Metrics 2025-08-30 13:49:24 - pico-train - INFO - ├── Loss: 5.6850 2025-08-30 13:49:24 - pico-train - INFO - ├── Learning Rate: 5.95e-06 2025-08-30 13:49:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:49:37 - pico-train - INFO - Step 79400 -- 🔄 Training Metrics 2025-08-30 13:49:37 - pico-train - INFO - ├── Loss: 5.7397 2025-08-30 13:49:37 - pico-train - INFO - ├── Learning Rate: 5.93e-06 2025-08-30 13:49:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:49:49 - pico-train - INFO - Step 79425 -- 🔄 Training Metrics 2025-08-30 13:49:49 - pico-train - INFO - ├── Loss: 5.8555 2025-08-30 13:49:49 - pico-train - INFO - ├── Learning Rate: 5.92e-06 2025-08-30 13:49:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:50:02 - pico-train - INFO - Step 79450 -- 🔄 Training Metrics 2025-08-30 13:50:02 - pico-train - INFO - ├── Loss: 5.6932 2025-08-30 13:50:02 - pico-train - INFO - ├── Learning Rate: 5.91e-06 2025-08-30 13:50:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:50:14 - pico-train - INFO - Step 79475 -- 🔄 Training Metrics 2025-08-30 13:50:14 - pico-train - INFO - ├── Loss: 5.7935 2025-08-30 13:50:14 - pico-train - INFO - ├── Learning Rate: 5.89e-06 2025-08-30 13:50:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:50:26 - pico-train - INFO - Step 79500 -- 💾 Saving Checkpoint 2025-08-30 13:52:29 - pico-train - INFO - Step 79500 -- 📊 Evaluation Results 2025-08-30 13:52:29 - pico-train - INFO - └── paloma: 1.7804004273080345e+32 2025-08-30 13:52:32 - pico-train - INFO - Step 79500 -- 🔄 Training Metrics 2025-08-30 13:52:32 - pico-train - INFO - ├── Loss: 5.7940 2025-08-30 13:52:32 - pico-train - INFO - ├── Learning Rate: 5.88e-06 2025-08-30 13:52:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:52:32 - pico-train - INFO - Step 79500 -- 📈 Saving Learning Dynamics 2025-08-30 13:52:48 - pico-train - INFO - Step 79525 -- 🔄 Training Metrics 2025-08-30 13:52:48 - pico-train - INFO - ├── Loss: 5.7546 2025-08-30 13:52:48 - pico-train - INFO - ├── Learning Rate: 5.87e-06 2025-08-30 13:52:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:53:01 - pico-train - INFO - Step 79550 -- 🔄 Training Metrics 2025-08-30 13:53:01 - pico-train - INFO - ├── Loss: 5.7345 2025-08-30 13:53:01 - pico-train - INFO - ├── Learning Rate: 5.85e-06 2025-08-30 13:53:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:53:13 - pico-train - INFO - Step 79575 -- 🔄 Training Metrics 2025-08-30 13:53:13 - pico-train - INFO - ├── Loss: 5.7131 2025-08-30 13:53:13 - pico-train - INFO - ├── Learning Rate: 5.84e-06 2025-08-30 13:53:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:53:26 - pico-train - INFO - Step 79600 -- 🔄 Training Metrics 2025-08-30 13:53:26 - pico-train - INFO - ├── Loss: 5.7361 2025-08-30 13:53:26 - pico-train - INFO - ├── Learning Rate: 5.82e-06 2025-08-30 13:53:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:53:39 - pico-train - INFO - Step 79625 -- 🔄 Training Metrics 2025-08-30 13:53:39 - pico-train - INFO - ├── Loss: 5.7601 2025-08-30 13:53:39 - pico-train - INFO - ├── Learning Rate: 5.81e-06 2025-08-30 13:53:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:53:51 - pico-train - INFO - Step 79650 -- 🔄 Training Metrics 2025-08-30 13:53:51 - pico-train - INFO - ├── Loss: 5.6957 2025-08-30 13:53:51 - pico-train - INFO - ├── Learning Rate: 5.80e-06 2025-08-30 13:53:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:54:04 - pico-train - INFO - Step 79675 -- 🔄 Training Metrics 2025-08-30 13:54:04 - pico-train - INFO - ├── Loss: 5.7019 2025-08-30 13:54:04 - pico-train - INFO - ├── Learning Rate: 5.78e-06 2025-08-30 13:54:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:54:17 - pico-train - INFO - Step 79700 -- 🔄 Training Metrics 2025-08-30 13:54:17 - pico-train - INFO - ├── Loss: 5.7928 2025-08-30 13:54:17 - pico-train - INFO - ├── Learning Rate: 5.77e-06 2025-08-30 13:54:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:54:29 - pico-train - INFO - Step 79725 -- 🔄 Training Metrics 2025-08-30 13:54:29 - pico-train - INFO - ├── Loss: 5.7410 2025-08-30 13:54:29 - pico-train - INFO - ├── Learning Rate: 5.76e-06 2025-08-30 13:54:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:54:42 - pico-train - INFO - Step 79750 -- 🔄 Training Metrics 2025-08-30 13:54:42 - pico-train - INFO - ├── Loss: 5.7772 2025-08-30 13:54:42 - pico-train - INFO - ├── Learning Rate: 5.74e-06 2025-08-30 13:54:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:54:55 - pico-train - INFO - Step 79775 -- 🔄 Training Metrics 2025-08-30 13:54:55 - pico-train - INFO - ├── Loss: 5.8042 2025-08-30 13:54:55 - pico-train - INFO - ├── Learning Rate: 5.73e-06 2025-08-30 13:54:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:55:07 - pico-train - INFO - Step 79800 -- 🔄 Training Metrics 2025-08-30 13:55:07 - pico-train - INFO - ├── Loss: 5.7606 2025-08-30 13:55:07 - pico-train - INFO - ├── Learning Rate: 5.72e-06 2025-08-30 13:55:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:55:20 - pico-train - INFO - Step 79825 -- 🔄 Training Metrics 2025-08-30 13:55:20 - pico-train - INFO - ├── Loss: 5.7374 2025-08-30 13:55:20 - pico-train - INFO - ├── Learning Rate: 5.70e-06 2025-08-30 13:55:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:55:33 - pico-train - INFO - Step 79850 -- 🔄 Training Metrics 2025-08-30 13:55:33 - pico-train - INFO - ├── Loss: 5.7220 2025-08-30 13:55:33 - pico-train - INFO - ├── Learning Rate: 5.69e-06 2025-08-30 13:55:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:55:45 - pico-train - INFO - Step 79875 -- 🔄 Training Metrics 2025-08-30 13:55:45 - pico-train - INFO - ├── Loss: 5.7824 2025-08-30 13:55:45 - pico-train - INFO - ├── Learning Rate: 5.67e-06 2025-08-30 13:55:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:55:58 - pico-train - INFO - Step 79900 -- 🔄 Training Metrics 2025-08-30 13:55:58 - pico-train - INFO - ├── Loss: 5.7782 2025-08-30 13:55:58 - pico-train - INFO - ├── Learning Rate: 5.66e-06 2025-08-30 13:55:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:56:10 - pico-train - INFO - Step 79925 -- 🔄 Training Metrics 2025-08-30 13:56:10 - pico-train - INFO - ├── Loss: 5.8175 2025-08-30 13:56:10 - pico-train - INFO - ├── Learning Rate: 5.65e-06 2025-08-30 13:56:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:56:23 - pico-train - INFO - Step 79950 -- 🔄 Training Metrics 2025-08-30 13:56:23 - pico-train - INFO - ├── Loss: 5.7613 2025-08-30 13:56:23 - pico-train - INFO - ├── Learning Rate: 5.63e-06 2025-08-30 13:56:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:56:36 - pico-train - INFO - Step 79975 -- 🔄 Training Metrics 2025-08-30 13:56:36 - pico-train - INFO - ├── Loss: 5.7550 2025-08-30 13:56:36 - pico-train - INFO - ├── Learning Rate: 5.62e-06 2025-08-30 13:56:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:56:48 - pico-train - INFO - Step 80000 -- 💾 Saving Checkpoint 2025-08-30 13:58:47 - pico-train - INFO - Step 80000 -- 📊 Evaluation Results 2025-08-30 13:58:47 - pico-train - INFO - └── paloma: 1.987722823866233e+32 2025-08-30 13:58:50 - pico-train - INFO - Step 80000 -- 🔄 Training Metrics 2025-08-30 13:58:50 - pico-train - INFO - ├── Loss: 5.7825 2025-08-30 13:58:50 - pico-train - INFO - ├── Learning Rate: 5.61e-06 2025-08-30 13:58:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:58:50 - pico-train - INFO - Step 80000 -- 📈 Saving Learning Dynamics 2025-08-30 13:59:05 - pico-train - INFO - Step 80025 -- 🔄 Training Metrics 2025-08-30 13:59:05 - pico-train - INFO - ├── Loss: 5.7400 2025-08-30 13:59:05 - pico-train - INFO - ├── Learning Rate: 5.59e-06 2025-08-30 13:59:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:59:18 - pico-train - INFO - Step 80050 -- 🔄 Training Metrics 2025-08-30 13:59:18 - pico-train - INFO - ├── Loss: 5.7464 2025-08-30 13:59:18 - pico-train - INFO - ├── Learning Rate: 5.58e-06 2025-08-30 13:59:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:59:30 - pico-train - INFO - Step 80075 -- 🔄 Training Metrics 2025-08-30 13:59:30 - pico-train - INFO - ├── Loss: 5.7555 2025-08-30 13:59:30 - pico-train - INFO - ├── Learning Rate: 5.57e-06 2025-08-30 13:59:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:59:43 - pico-train - INFO - Step 80100 -- 🔄 Training Metrics 2025-08-30 13:59:43 - pico-train - INFO - ├── Loss: 5.7221 2025-08-30 13:59:43 - pico-train - INFO - ├── Learning Rate: 5.55e-06 2025-08-30 13:59:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 13:59:56 - pico-train - INFO - Step 80125 -- 🔄 Training Metrics 2025-08-30 13:59:56 - pico-train - INFO - ├── Loss: 5.7802 2025-08-30 13:59:56 - pico-train - INFO - ├── Learning Rate: 5.54e-06 2025-08-30 13:59:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:00:08 - pico-train - INFO - Step 80150 -- 🔄 Training Metrics 2025-08-30 14:00:08 - pico-train - INFO - ├── Loss: 5.7317 2025-08-30 14:00:08 - pico-train - INFO - ├── Learning Rate: 5.53e-06 2025-08-30 14:00:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:00:21 - pico-train - INFO - Step 80175 -- 🔄 Training Metrics 2025-08-30 14:00:21 - pico-train - INFO - ├── Loss: 5.8823 2025-08-30 14:00:21 - pico-train - INFO - ├── Learning Rate: 5.51e-06 2025-08-30 14:00:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:00:34 - pico-train - INFO - Step 80200 -- 🔄 Training Metrics 2025-08-30 14:00:34 - pico-train - INFO - ├── Loss: 5.7323 2025-08-30 14:00:34 - pico-train - INFO - ├── Learning Rate: 5.50e-06 2025-08-30 14:00:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:00:46 - pico-train - INFO - Step 80225 -- 🔄 Training Metrics 2025-08-30 14:00:46 - pico-train - INFO - ├── Loss: 5.7382 2025-08-30 14:00:46 - pico-train - INFO - ├── Learning Rate: 5.49e-06 2025-08-30 14:00:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:00:59 - pico-train - INFO - Step 80250 -- 🔄 Training Metrics 2025-08-30 14:00:59 - pico-train - INFO - ├── Loss: 5.7209 2025-08-30 14:00:59 - pico-train - INFO - ├── Learning Rate: 5.47e-06 2025-08-30 14:00:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:01:12 - pico-train - INFO - Step 80275 -- 🔄 Training Metrics 2025-08-30 14:01:12 - pico-train - INFO - ├── Loss: 5.8878 2025-08-30 14:01:12 - pico-train - INFO - ├── Learning Rate: 5.46e-06 2025-08-30 14:01:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:01:25 - pico-train - INFO - Step 80300 -- 🔄 Training Metrics 2025-08-30 14:01:25 - pico-train - INFO - ├── Loss: 5.7548 2025-08-30 14:01:25 - pico-train - INFO - ├── Learning Rate: 5.45e-06 2025-08-30 14:01:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:01:37 - pico-train - INFO - Step 80325 -- 🔄 Training Metrics 2025-08-30 14:01:37 - pico-train - INFO - ├── Loss: 5.6725 2025-08-30 14:01:37 - pico-train - INFO - ├── Learning Rate: 5.43e-06 2025-08-30 14:01:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:01:50 - pico-train - INFO - Step 80350 -- 🔄 Training Metrics 2025-08-30 14:01:50 - pico-train - INFO - ├── Loss: 5.7960 2025-08-30 14:01:50 - pico-train - INFO - ├── Learning Rate: 5.42e-06 2025-08-30 14:01:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:02:03 - pico-train - INFO - Step 80375 -- 🔄 Training Metrics 2025-08-30 14:02:03 - pico-train - INFO - ├── Loss: 5.7363 2025-08-30 14:02:03 - pico-train - INFO - ├── Learning Rate: 5.41e-06 2025-08-30 14:02:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:02:16 - pico-train - INFO - Step 80400 -- 🔄 Training Metrics 2025-08-30 14:02:16 - pico-train - INFO - ├── Loss: 5.7633 2025-08-30 14:02:16 - pico-train - INFO - ├── Learning Rate: 5.39e-06 2025-08-30 14:02:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:02:28 - pico-train - INFO - Step 80425 -- 🔄 Training Metrics 2025-08-30 14:02:28 - pico-train - INFO - ├── Loss: 5.7039 2025-08-30 14:02:28 - pico-train - INFO - ├── Learning Rate: 5.38e-06 2025-08-30 14:02:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:02:41 - pico-train - INFO - Step 80450 -- 🔄 Training Metrics 2025-08-30 14:02:41 - pico-train - INFO - ├── Loss: 5.7231 2025-08-30 14:02:41 - pico-train - INFO - ├── Learning Rate: 5.37e-06 2025-08-30 14:02:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:02:54 - pico-train - INFO - Step 80475 -- 🔄 Training Metrics 2025-08-30 14:02:54 - pico-train - INFO - ├── Loss: 5.6900 2025-08-30 14:02:54 - pico-train - INFO - ├── Learning Rate: 5.35e-06 2025-08-30 14:02:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:03:06 - pico-train - INFO - Step 80500 -- 💾 Saving Checkpoint 2025-08-30 14:05:08 - pico-train - INFO - Step 80500 -- 📊 Evaluation Results 2025-08-30 14:05:08 - pico-train - INFO - └── paloma: 1.726815020006889e+32 2025-08-30 14:05:11 - pico-train - INFO - Step 80500 -- 🔄 Training Metrics 2025-08-30 14:05:11 - pico-train - INFO - ├── Loss: 5.7961 2025-08-30 14:05:11 - pico-train - INFO - ├── Learning Rate: 5.34e-06 2025-08-30 14:05:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:05:11 - pico-train - INFO - Step 80500 -- 📈 Saving Learning Dynamics 2025-08-30 14:05:26 - pico-train - INFO - Step 80525 -- 🔄 Training Metrics 2025-08-30 14:05:26 - pico-train - INFO - ├── Loss: 5.7756 2025-08-30 14:05:26 - pico-train - INFO - ├── Learning Rate: 5.33e-06 2025-08-30 14:05:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:05:39 - pico-train - INFO - Step 80550 -- 🔄 Training Metrics 2025-08-30 14:05:39 - pico-train - INFO - ├── Loss: 5.7271 2025-08-30 14:05:39 - pico-train - INFO - ├── Learning Rate: 5.31e-06 2025-08-30 14:05:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:05:52 - pico-train - INFO - Step 80575 -- 🔄 Training Metrics 2025-08-30 14:05:52 - pico-train - INFO - ├── Loss: 5.7783 2025-08-30 14:05:52 - pico-train - INFO - ├── Learning Rate: 5.30e-06 2025-08-30 14:05:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:06:04 - pico-train - INFO - Step 80600 -- 🔄 Training Metrics 2025-08-30 14:06:04 - pico-train - INFO - ├── Loss: 5.6665 2025-08-30 14:06:04 - pico-train - INFO - ├── Learning Rate: 5.29e-06 2025-08-30 14:06:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:06:18 - pico-train - INFO - Step 80625 -- 🔄 Training Metrics 2025-08-30 14:06:18 - pico-train - INFO - ├── Loss: 5.8194 2025-08-30 14:06:18 - pico-train - INFO - ├── Learning Rate: 5.27e-06 2025-08-30 14:06:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:06:30 - pico-train - INFO - Step 80650 -- 🔄 Training Metrics 2025-08-30 14:06:30 - pico-train - INFO - ├── Loss: 5.8042 2025-08-30 14:06:30 - pico-train - INFO - ├── Learning Rate: 5.26e-06 2025-08-30 14:06:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:06:43 - pico-train - INFO - Step 80675 -- 🔄 Training Metrics 2025-08-30 14:06:43 - pico-train - INFO - ├── Loss: 5.7788 2025-08-30 14:06:43 - pico-train - INFO - ├── Learning Rate: 5.25e-06 2025-08-30 14:06:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:06:56 - pico-train - INFO - Step 80700 -- 🔄 Training Metrics 2025-08-30 14:06:56 - pico-train - INFO - ├── Loss: 5.7682 2025-08-30 14:06:56 - pico-train - INFO - ├── Learning Rate: 5.24e-06 2025-08-30 14:06:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:07:08 - pico-train - INFO - Step 80725 -- 🔄 Training Metrics 2025-08-30 14:07:08 - pico-train - INFO - ├── Loss: 5.7833 2025-08-30 14:07:08 - pico-train - INFO - ├── Learning Rate: 5.22e-06 2025-08-30 14:07:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:07:21 - pico-train - INFO - Step 80750 -- 🔄 Training Metrics 2025-08-30 14:07:21 - pico-train - INFO - ├── Loss: 5.7930 2025-08-30 14:07:21 - pico-train - INFO - ├── Learning Rate: 5.21e-06 2025-08-30 14:07:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:07:34 - pico-train - INFO - Step 80775 -- 🔄 Training Metrics 2025-08-30 14:07:34 - pico-train - INFO - ├── Loss: 5.7252 2025-08-30 14:07:34 - pico-train - INFO - ├── Learning Rate: 5.20e-06 2025-08-30 14:07:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:07:46 - pico-train - INFO - Step 80800 -- 🔄 Training Metrics 2025-08-30 14:07:46 - pico-train - INFO - ├── Loss: 5.7923 2025-08-30 14:07:46 - pico-train - INFO - ├── Learning Rate: 5.18e-06 2025-08-30 14:07:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:07:59 - pico-train - INFO - Step 80825 -- 🔄 Training Metrics 2025-08-30 14:07:59 - pico-train - INFO - ├── Loss: 5.7558 2025-08-30 14:07:59 - pico-train - INFO - ├── Learning Rate: 5.17e-06 2025-08-30 14:07:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:08:11 - pico-train - INFO - Step 80850 -- 🔄 Training Metrics 2025-08-30 14:08:11 - pico-train - INFO - ├── Loss: 5.7496 2025-08-30 14:08:11 - pico-train - INFO - ├── Learning Rate: 5.16e-06 2025-08-30 14:08:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:08:24 - pico-train - INFO - Step 80875 -- 🔄 Training Metrics 2025-08-30 14:08:24 - pico-train - INFO - ├── Loss: 5.7380 2025-08-30 14:08:24 - pico-train - INFO - ├── Learning Rate: 5.14e-06 2025-08-30 14:08:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:08:36 - pico-train - INFO - Step 80900 -- 🔄 Training Metrics 2025-08-30 14:08:36 - pico-train - INFO - ├── Loss: 5.8166 2025-08-30 14:08:36 - pico-train - INFO - ├── Learning Rate: 5.13e-06 2025-08-30 14:08:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:08:49 - pico-train - INFO - Step 80925 -- 🔄 Training Metrics 2025-08-30 14:08:49 - pico-train - INFO - ├── Loss: 5.7453 2025-08-30 14:08:49 - pico-train - INFO - ├── Learning Rate: 5.12e-06 2025-08-30 14:08:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:09:01 - pico-train - INFO - Step 80950 -- 🔄 Training Metrics 2025-08-30 14:09:01 - pico-train - INFO - ├── Loss: 5.7020 2025-08-30 14:09:01 - pico-train - INFO - ├── Learning Rate: 5.11e-06 2025-08-30 14:09:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:09:14 - pico-train - INFO - Step 80975 -- 🔄 Training Metrics 2025-08-30 14:09:14 - pico-train - INFO - ├── Loss: 5.7210 2025-08-30 14:09:14 - pico-train - INFO - ├── Learning Rate: 5.09e-06 2025-08-30 14:09:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:09:26 - pico-train - INFO - Step 81000 -- 💾 Saving Checkpoint 2025-08-30 14:11:29 - pico-train - INFO - Step 81000 -- 📊 Evaluation Results 2025-08-30 14:11:29 - pico-train - INFO - └── paloma: 1.651512962514706e+32 2025-08-30 14:11:31 - pico-train - INFO - Step 81000 -- 🔄 Training Metrics 2025-08-30 14:11:31 - pico-train - INFO - ├── Loss: 5.8271 2025-08-30 14:11:31 - pico-train - INFO - ├── Learning Rate: 5.08e-06 2025-08-30 14:11:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:11:31 - pico-train - INFO - Step 81000 -- 📈 Saving Learning Dynamics 2025-08-30 14:11:46 - pico-train - INFO - Step 81025 -- 🔄 Training Metrics 2025-08-30 14:11:46 - pico-train - INFO - ├── Loss: 5.7319 2025-08-30 14:11:46 - pico-train - INFO - ├── Learning Rate: 5.07e-06 2025-08-30 14:11:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:11:59 - pico-train - INFO - Step 81050 -- 🔄 Training Metrics 2025-08-30 14:11:59 - pico-train - INFO - ├── Loss: 5.7012 2025-08-30 14:11:59 - pico-train - INFO - ├── Learning Rate: 5.05e-06 2025-08-30 14:11:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:12:11 - pico-train - INFO - Step 81075 -- 🔄 Training Metrics 2025-08-30 14:12:11 - pico-train - INFO - ├── Loss: 5.7922 2025-08-30 14:12:11 - pico-train - INFO - ├── Learning Rate: 5.04e-06 2025-08-30 14:12:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:12:24 - pico-train - INFO - Step 81100 -- 🔄 Training Metrics 2025-08-30 14:12:24 - pico-train - INFO - ├── Loss: 5.7282 2025-08-30 14:12:24 - pico-train - INFO - ├── Learning Rate: 5.03e-06 2025-08-30 14:12:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:12:37 - pico-train - INFO - Step 81125 -- 🔄 Training Metrics 2025-08-30 14:12:37 - pico-train - INFO - ├── Loss: 5.7321 2025-08-30 14:12:37 - pico-train - INFO - ├── Learning Rate: 5.02e-06 2025-08-30 14:12:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:12:50 - pico-train - INFO - Step 81150 -- 🔄 Training Metrics 2025-08-30 14:12:50 - pico-train - INFO - ├── Loss: 5.7547 2025-08-30 14:12:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:12:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:13:02 - pico-train - INFO - Step 81175 -- 🔄 Training Metrics 2025-08-30 14:13:02 - pico-train - INFO - ├── Loss: 5.6927 2025-08-30 14:13:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:13:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:13:15 - pico-train - INFO - Step 81200 -- 🔄 Training Metrics 2025-08-30 14:13:15 - pico-train - INFO - ├── Loss: 5.6503 2025-08-30 14:13:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:13:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:13:28 - pico-train - INFO - Step 81225 -- 🔄 Training Metrics 2025-08-30 14:13:28 - pico-train - INFO - ├── Loss: 5.8804 2025-08-30 14:13:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:13:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:13:40 - pico-train - INFO - Step 81250 -- 🔄 Training Metrics 2025-08-30 14:13:40 - pico-train - INFO - ├── Loss: 5.6838 2025-08-30 14:13:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:13:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:13:53 - pico-train - INFO - Step 81275 -- 🔄 Training Metrics 2025-08-30 14:13:53 - pico-train - INFO - ├── Loss: 5.7304 2025-08-30 14:13:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:13:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:14:05 - pico-train - INFO - Step 81300 -- 🔄 Training Metrics 2025-08-30 14:14:05 - pico-train - INFO - ├── Loss: 5.8093 2025-08-30 14:14:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:14:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:14:18 - pico-train - INFO - Step 81325 -- 🔄 Training Metrics 2025-08-30 14:14:18 - pico-train - INFO - ├── Loss: 5.8088 2025-08-30 14:14:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:14:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:14:31 - pico-train - INFO - Step 81350 -- 🔄 Training Metrics 2025-08-30 14:14:31 - pico-train - INFO - ├── Loss: 5.7882 2025-08-30 14:14:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:14:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:14:43 - pico-train - INFO - Step 81375 -- 🔄 Training Metrics 2025-08-30 14:14:43 - pico-train - INFO - ├── Loss: 5.7803 2025-08-30 14:14:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:14:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:14:56 - pico-train - INFO - Step 81400 -- 🔄 Training Metrics 2025-08-30 14:14:56 - pico-train - INFO - ├── Loss: 5.7343 2025-08-30 14:14:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:14:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:15:08 - pico-train - INFO - Step 81425 -- 🔄 Training Metrics 2025-08-30 14:15:08 - pico-train - INFO - ├── Loss: 5.7395 2025-08-30 14:15:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:15:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:15:21 - pico-train - INFO - Step 81450 -- 🔄 Training Metrics 2025-08-30 14:15:21 - pico-train - INFO - ├── Loss: 5.8339 2025-08-30 14:15:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:15:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:15:33 - pico-train - INFO - Step 81475 -- 🔄 Training Metrics 2025-08-30 14:15:33 - pico-train - INFO - ├── Loss: 5.7163 2025-08-30 14:15:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:15:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:15:46 - pico-train - INFO - Step 81500 -- 💾 Saving Checkpoint 2025-08-30 14:17:52 - pico-train - INFO - Step 81500 -- 📊 Evaluation Results 2025-08-30 14:17:52 - pico-train - INFO - └── paloma: 1.8670257861792715e+32 2025-08-30 14:17:55 - pico-train - INFO - Step 81500 -- 🔄 Training Metrics 2025-08-30 14:17:55 - pico-train - INFO - ├── Loss: 5.7219 2025-08-30 14:17:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:17:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:17:55 - pico-train - INFO - Step 81500 -- 📈 Saving Learning Dynamics 2025-08-30 14:18:09 - pico-train - INFO - Step 81525 -- 🔄 Training Metrics 2025-08-30 14:18:09 - pico-train - INFO - ├── Loss: 5.6933 2025-08-30 14:18:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:18:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:18:22 - pico-train - INFO - Step 81550 -- 🔄 Training Metrics 2025-08-30 14:18:22 - pico-train - INFO - ├── Loss: 5.6887 2025-08-30 14:18:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:18:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:18:35 - pico-train - INFO - Step 81575 -- 🔄 Training Metrics 2025-08-30 14:18:35 - pico-train - INFO - ├── Loss: 5.6767 2025-08-30 14:18:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:18:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:18:47 - pico-train - INFO - Step 81600 -- 🔄 Training Metrics 2025-08-30 14:18:47 - pico-train - INFO - ├── Loss: 5.7753 2025-08-30 14:18:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:18:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:19:00 - pico-train - INFO - Step 81625 -- 🔄 Training Metrics 2025-08-30 14:19:00 - pico-train - INFO - ├── Loss: 5.7827 2025-08-30 14:19:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:19:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:19:13 - pico-train - INFO - Step 81650 -- 🔄 Training Metrics 2025-08-30 14:19:13 - pico-train - INFO - ├── Loss: 5.7172 2025-08-30 14:19:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:19:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:19:25 - pico-train - INFO - Step 81675 -- 🔄 Training Metrics 2025-08-30 14:19:25 - pico-train - INFO - ├── Loss: 5.7891 2025-08-30 14:19:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:19:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:19:38 - pico-train - INFO - Step 81700 -- 🔄 Training Metrics 2025-08-30 14:19:38 - pico-train - INFO - ├── Loss: 5.7853 2025-08-30 14:19:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:19:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:19:51 - pico-train - INFO - Step 81725 -- 🔄 Training Metrics 2025-08-30 14:19:51 - pico-train - INFO - ├── Loss: 5.7594 2025-08-30 14:19:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:19:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:20:03 - pico-train - INFO - Step 81750 -- 🔄 Training Metrics 2025-08-30 14:20:03 - pico-train - INFO - ├── Loss: 5.7848 2025-08-30 14:20:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:20:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:20:16 - pico-train - INFO - Step 81775 -- 🔄 Training Metrics 2025-08-30 14:20:16 - pico-train - INFO - ├── Loss: 5.7882 2025-08-30 14:20:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:20:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:20:28 - pico-train - INFO - Step 81800 -- 🔄 Training Metrics 2025-08-30 14:20:28 - pico-train - INFO - ├── Loss: 5.7696 2025-08-30 14:20:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:20:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:20:41 - pico-train - INFO - Step 81825 -- 🔄 Training Metrics 2025-08-30 14:20:41 - pico-train - INFO - ├── Loss: 5.7128 2025-08-30 14:20:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:20:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:20:54 - pico-train - INFO - Step 81850 -- 🔄 Training Metrics 2025-08-30 14:20:54 - pico-train - INFO - ├── Loss: 5.7358 2025-08-30 14:20:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:20:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:21:06 - pico-train - INFO - Step 81875 -- 🔄 Training Metrics 2025-08-30 14:21:06 - pico-train - INFO - ├── Loss: 5.6611 2025-08-30 14:21:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:21:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:21:19 - pico-train - INFO - Step 81900 -- 🔄 Training Metrics 2025-08-30 14:21:19 - pico-train - INFO - ├── Loss: 5.7804 2025-08-30 14:21:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:21:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:21:31 - pico-train - INFO - Step 81925 -- 🔄 Training Metrics 2025-08-30 14:21:31 - pico-train - INFO - ├── Loss: 5.7204 2025-08-30 14:21:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:21:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:21:44 - pico-train - INFO - Step 81950 -- 🔄 Training Metrics 2025-08-30 14:21:44 - pico-train - INFO - ├── Loss: 5.7879 2025-08-30 14:21:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:21:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:21:57 - pico-train - INFO - Step 81975 -- 🔄 Training Metrics 2025-08-30 14:21:57 - pico-train - INFO - ├── Loss: 5.7594 2025-08-30 14:21:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:21:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:22:09 - pico-train - INFO - Step 82000 -- 💾 Saving Checkpoint 2025-08-30 14:24:17 - pico-train - INFO - Step 82000 -- 📊 Evaluation Results 2025-08-30 14:24:17 - pico-train - INFO - └── paloma: 1.775405456471126e+32 2025-08-30 14:24:20 - pico-train - INFO - Step 82000 -- 🔄 Training Metrics 2025-08-30 14:24:20 - pico-train - INFO - ├── Loss: 5.7622 2025-08-30 14:24:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:24:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:24:20 - pico-train - INFO - Step 82000 -- 📈 Saving Learning Dynamics 2025-08-30 14:24:35 - pico-train - INFO - Step 82025 -- 🔄 Training Metrics 2025-08-30 14:24:35 - pico-train - INFO - ├── Loss: 5.7859 2025-08-30 14:24:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:24:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:24:48 - pico-train - INFO - Step 82050 -- 🔄 Training Metrics 2025-08-30 14:24:48 - pico-train - INFO - ├── Loss: 5.6977 2025-08-30 14:24:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:24:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:25:00 - pico-train - INFO - Step 82075 -- 🔄 Training Metrics 2025-08-30 14:25:00 - pico-train - INFO - ├── Loss: 5.7132 2025-08-30 14:25:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:25:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:25:13 - pico-train - INFO - Step 82100 -- 🔄 Training Metrics 2025-08-30 14:25:13 - pico-train - INFO - ├── Loss: 5.7781 2025-08-30 14:25:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:25:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:25:26 - pico-train - INFO - Step 82125 -- 🔄 Training Metrics 2025-08-30 14:25:26 - pico-train - INFO - ├── Loss: 5.7576 2025-08-30 14:25:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:25:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:25:38 - pico-train - INFO - Step 82150 -- 🔄 Training Metrics 2025-08-30 14:25:38 - pico-train - INFO - ├── Loss: 5.7535 2025-08-30 14:25:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:25:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:25:51 - pico-train - INFO - Step 82175 -- 🔄 Training Metrics 2025-08-30 14:25:51 - pico-train - INFO - ├── Loss: 5.7676 2025-08-30 14:25:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:25:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:26:04 - pico-train - INFO - Step 82200 -- 🔄 Training Metrics 2025-08-30 14:26:04 - pico-train - INFO - ├── Loss: 5.6759 2025-08-30 14:26:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:26:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:26:16 - pico-train - INFO - Step 82225 -- 🔄 Training Metrics 2025-08-30 14:26:16 - pico-train - INFO - ├── Loss: 5.8393 2025-08-30 14:26:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:26:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:26:29 - pico-train - INFO - Step 82250 -- 🔄 Training Metrics 2025-08-30 14:26:29 - pico-train - INFO - ├── Loss: 5.7143 2025-08-30 14:26:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:26:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:26:41 - pico-train - INFO - Step 82275 -- 🔄 Training Metrics 2025-08-30 14:26:41 - pico-train - INFO - ├── Loss: 5.7666 2025-08-30 14:26:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:26:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:26:54 - pico-train - INFO - Step 82300 -- 🔄 Training Metrics 2025-08-30 14:26:54 - pico-train - INFO - ├── Loss: 5.6821 2025-08-30 14:26:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:26:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:27:07 - pico-train - INFO - Step 82325 -- 🔄 Training Metrics 2025-08-30 14:27:07 - pico-train - INFO - ├── Loss: 5.6464 2025-08-30 14:27:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:27:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:27:20 - pico-train - INFO - Step 82350 -- 🔄 Training Metrics 2025-08-30 14:27:20 - pico-train - INFO - ├── Loss: 5.7406 2025-08-30 14:27:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:27:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:27:32 - pico-train - INFO - Step 82375 -- 🔄 Training Metrics 2025-08-30 14:27:32 - pico-train - INFO - ├── Loss: 5.6918 2025-08-30 14:27:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:27:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:27:45 - pico-train - INFO - Step 82400 -- 🔄 Training Metrics 2025-08-30 14:27:45 - pico-train - INFO - ├── Loss: 5.7523 2025-08-30 14:27:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:27:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:27:57 - pico-train - INFO - Step 82425 -- 🔄 Training Metrics 2025-08-30 14:27:57 - pico-train - INFO - ├── Loss: 5.7037 2025-08-30 14:27:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:27:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:28:10 - pico-train - INFO - Step 82450 -- 🔄 Training Metrics 2025-08-30 14:28:10 - pico-train - INFO - ├── Loss: 5.7349 2025-08-30 14:28:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:28:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:28:23 - pico-train - INFO - Step 82475 -- 🔄 Training Metrics 2025-08-30 14:28:23 - pico-train - INFO - ├── Loss: 5.7771 2025-08-30 14:28:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:28:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:28:35 - pico-train - INFO - Step 82500 -- 💾 Saving Checkpoint 2025-08-30 14:30:31 - pico-train - INFO - Step 82500 -- 📊 Evaluation Results 2025-08-30 14:30:31 - pico-train - INFO - └── paloma: 2.0381527246479315e+32 2025-08-30 14:30:34 - pico-train - INFO - Step 82500 -- 🔄 Training Metrics 2025-08-30 14:30:34 - pico-train - INFO - ├── Loss: 5.8109 2025-08-30 14:30:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:30:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:30:34 - pico-train - INFO - Step 82500 -- 📈 Saving Learning Dynamics 2025-08-30 14:30:49 - pico-train - INFO - Step 82525 -- 🔄 Training Metrics 2025-08-30 14:30:49 - pico-train - INFO - ├── Loss: 5.7396 2025-08-30 14:30:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:30:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:31:01 - pico-train - INFO - Step 82550 -- 🔄 Training Metrics 2025-08-30 14:31:01 - pico-train - INFO - ├── Loss: 5.7149 2025-08-30 14:31:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:31:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:31:14 - pico-train - INFO - Step 82575 -- 🔄 Training Metrics 2025-08-30 14:31:14 - pico-train - INFO - ├── Loss: 5.7566 2025-08-30 14:31:14 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:31:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:31:27 - pico-train - INFO - Step 82600 -- 🔄 Training Metrics 2025-08-30 14:31:27 - pico-train - INFO - ├── Loss: 5.7531 2025-08-30 14:31:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:31:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:31:40 - pico-train - INFO - Step 82625 -- 🔄 Training Metrics 2025-08-30 14:31:40 - pico-train - INFO - ├── Loss: 5.6984 2025-08-30 14:31:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:31:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:31:52 - pico-train - INFO - Step 82650 -- 🔄 Training Metrics 2025-08-30 14:31:52 - pico-train - INFO - ├── Loss: 5.6784 2025-08-30 14:31:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:31:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:32:05 - pico-train - INFO - Step 82675 -- 🔄 Training Metrics 2025-08-30 14:32:05 - pico-train - INFO - ├── Loss: 5.8166 2025-08-30 14:32:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:32:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:32:17 - pico-train - INFO - Step 82700 -- 🔄 Training Metrics 2025-08-30 14:32:17 - pico-train - INFO - ├── Loss: 5.8273 2025-08-30 14:32:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:32:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:32:30 - pico-train - INFO - Step 82725 -- 🔄 Training Metrics 2025-08-30 14:32:30 - pico-train - INFO - ├── Loss: 5.6730 2025-08-30 14:32:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:32:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:32:43 - pico-train - INFO - Step 82750 -- 🔄 Training Metrics 2025-08-30 14:32:43 - pico-train - INFO - ├── Loss: 5.7608 2025-08-30 14:32:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:32:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:32:55 - pico-train - INFO - Step 82775 -- 🔄 Training Metrics 2025-08-30 14:32:55 - pico-train - INFO - ├── Loss: 5.7138 2025-08-30 14:32:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:32:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:33:08 - pico-train - INFO - Step 82800 -- 🔄 Training Metrics 2025-08-30 14:33:08 - pico-train - INFO - ├── Loss: 5.8013 2025-08-30 14:33:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:33:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:33:21 - pico-train - INFO - Step 82825 -- 🔄 Training Metrics 2025-08-30 14:33:21 - pico-train - INFO - ├── Loss: 5.7688 2025-08-30 14:33:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:33:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:33:33 - pico-train - INFO - Step 82850 -- 🔄 Training Metrics 2025-08-30 14:33:33 - pico-train - INFO - ├── Loss: 5.7681 2025-08-30 14:33:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:33:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:33:46 - pico-train - INFO - Step 82875 -- 🔄 Training Metrics 2025-08-30 14:33:46 - pico-train - INFO - ├── Loss: 5.7065 2025-08-30 14:33:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:33:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:33:59 - pico-train - INFO - Step 82900 -- 🔄 Training Metrics 2025-08-30 14:33:59 - pico-train - INFO - ├── Loss: 5.6905 2025-08-30 14:33:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:33:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:34:11 - pico-train - INFO - Step 82925 -- 🔄 Training Metrics 2025-08-30 14:34:11 - pico-train - INFO - ├── Loss: 5.7421 2025-08-30 14:34:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:34:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:34:24 - pico-train - INFO - Step 82950 -- 🔄 Training Metrics 2025-08-30 14:34:24 - pico-train - INFO - ├── Loss: 5.7278 2025-08-30 14:34:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:34:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:34:36 - pico-train - INFO - Step 82975 -- 🔄 Training Metrics 2025-08-30 14:34:36 - pico-train - INFO - ├── Loss: 5.8126 2025-08-30 14:34:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:34:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:34:49 - pico-train - INFO - Step 83000 -- 💾 Saving Checkpoint 2025-08-30 14:36:51 - pico-train - INFO - Step 83000 -- 📊 Evaluation Results 2025-08-30 14:36:51 - pico-train - INFO - └── paloma: 2.1088575195139195e+32 2025-08-30 14:36:55 - pico-train - INFO - Step 83000 -- 🔄 Training Metrics 2025-08-30 14:36:55 - pico-train - INFO - ├── Loss: 5.8305 2025-08-30 14:36:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:36:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:36:55 - pico-train - INFO - Step 83000 -- 📈 Saving Learning Dynamics 2025-08-30 14:37:10 - pico-train - INFO - Step 83025 -- 🔄 Training Metrics 2025-08-30 14:37:10 - pico-train - INFO - ├── Loss: 5.7679 2025-08-30 14:37:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:37:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:37:23 - pico-train - INFO - Step 83050 -- 🔄 Training Metrics 2025-08-30 14:37:23 - pico-train - INFO - ├── Loss: 5.7978 2025-08-30 14:37:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:37:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:37:35 - pico-train - INFO - Step 83075 -- 🔄 Training Metrics 2025-08-30 14:37:35 - pico-train - INFO - ├── Loss: 5.7078 2025-08-30 14:37:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:37:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:37:48 - pico-train - INFO - Step 83100 -- 🔄 Training Metrics 2025-08-30 14:37:48 - pico-train - INFO - ├── Loss: 5.7186 2025-08-30 14:37:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:37:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:38:01 - pico-train - INFO - Step 83125 -- 🔄 Training Metrics 2025-08-30 14:38:01 - pico-train - INFO - ├── Loss: 5.8144 2025-08-30 14:38:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:38:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:38:13 - pico-train - INFO - Step 83150 -- 🔄 Training Metrics 2025-08-30 14:38:13 - pico-train - INFO - ├── Loss: 5.8319 2025-08-30 14:38:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:38:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:38:26 - pico-train - INFO - Step 83175 -- 🔄 Training Metrics 2025-08-30 14:38:26 - pico-train - INFO - ├── Loss: 5.8569 2025-08-30 14:38:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:38:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:38:38 - pico-train - INFO - Step 83200 -- 🔄 Training Metrics 2025-08-30 14:38:38 - pico-train - INFO - ├── Loss: 5.6715 2025-08-30 14:38:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:38:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:38:51 - pico-train - INFO - Step 83225 -- 🔄 Training Metrics 2025-08-30 14:38:51 - pico-train - INFO - ├── Loss: 5.7456 2025-08-30 14:38:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:38:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:39:04 - pico-train - INFO - Step 83250 -- 🔄 Training Metrics 2025-08-30 14:39:04 - pico-train - INFO - ├── Loss: 5.8059 2025-08-30 14:39:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:39:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:39:16 - pico-train - INFO - Step 83275 -- 🔄 Training Metrics 2025-08-30 14:39:16 - pico-train - INFO - ├── Loss: 5.7893 2025-08-30 14:39:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:39:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:39:29 - pico-train - INFO - Step 83300 -- 🔄 Training Metrics 2025-08-30 14:39:29 - pico-train - INFO - ├── Loss: 5.8234 2025-08-30 14:39:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:39:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:39:41 - pico-train - INFO - Step 83325 -- 🔄 Training Metrics 2025-08-30 14:39:41 - pico-train - INFO - ├── Loss: 5.7418 2025-08-30 14:39:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:39:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:39:54 - pico-train - INFO - Step 83350 -- 🔄 Training Metrics 2025-08-30 14:39:54 - pico-train - INFO - ├── Loss: 5.6773 2025-08-30 14:39:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:39:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:40:06 - pico-train - INFO - Step 83375 -- 🔄 Training Metrics 2025-08-30 14:40:06 - pico-train - INFO - ├── Loss: 5.8218 2025-08-30 14:40:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:40:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:40:19 - pico-train - INFO - Step 83400 -- 🔄 Training Metrics 2025-08-30 14:40:19 - pico-train - INFO - ├── Loss: 5.7571 2025-08-30 14:40:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:40:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:40:32 - pico-train - INFO - Step 83425 -- 🔄 Training Metrics 2025-08-30 14:40:32 - pico-train - INFO - ├── Loss: 5.7947 2025-08-30 14:40:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:40:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:40:44 - pico-train - INFO - Step 83450 -- 🔄 Training Metrics 2025-08-30 14:40:44 - pico-train - INFO - ├── Loss: 5.7485 2025-08-30 14:40:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:40:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:40:57 - pico-train - INFO - Step 83475 -- 🔄 Training Metrics 2025-08-30 14:40:57 - pico-train - INFO - ├── Loss: 5.7645 2025-08-30 14:40:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:40:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:41:09 - pico-train - INFO - Step 83500 -- 💾 Saving Checkpoint 2025-08-30 14:43:36 - pico-train - INFO - Step 83500 -- 📊 Evaluation Results 2025-08-30 14:43:36 - pico-train - INFO - └── paloma: 2.3116439762714222e+32 2025-08-30 14:43:38 - pico-train - INFO - Step 83500 -- 🔄 Training Metrics 2025-08-30 14:43:38 - pico-train - INFO - ├── Loss: 5.7654 2025-08-30 14:43:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:43:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:43:38 - pico-train - INFO - Step 83500 -- 📈 Saving Learning Dynamics 2025-08-30 14:43:54 - pico-train - INFO - Step 83525 -- 🔄 Training Metrics 2025-08-30 14:43:54 - pico-train - INFO - ├── Loss: 5.7003 2025-08-30 14:43:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:43:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:44:07 - pico-train - INFO - Step 83550 -- 🔄 Training Metrics 2025-08-30 14:44:07 - pico-train - INFO - ├── Loss: 5.7259 2025-08-30 14:44:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:44:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:44:19 - pico-train - INFO - Step 83575 -- 🔄 Training Metrics 2025-08-30 14:44:19 - pico-train - INFO - ├── Loss: 5.6887 2025-08-30 14:44:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:44:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:44:32 - pico-train - INFO - Step 83600 -- 🔄 Training Metrics 2025-08-30 14:44:32 - pico-train - INFO - ├── Loss: 5.7988 2025-08-30 14:44:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:44:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:44:45 - pico-train - INFO - Step 83625 -- 🔄 Training Metrics 2025-08-30 14:44:45 - pico-train - INFO - ├── Loss: 5.7483 2025-08-30 14:44:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:44:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:44:58 - pico-train - INFO - Step 83650 -- 🔄 Training Metrics 2025-08-30 14:44:58 - pico-train - INFO - ├── Loss: 5.7157 2025-08-30 14:44:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:44:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:45:10 - pico-train - INFO - Step 83675 -- 🔄 Training Metrics 2025-08-30 14:45:10 - pico-train - INFO - ├── Loss: 5.7172 2025-08-30 14:45:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:45:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:45:23 - pico-train - INFO - Step 83700 -- 🔄 Training Metrics 2025-08-30 14:45:23 - pico-train - INFO - ├── Loss: 5.8406 2025-08-30 14:45:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:45:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:45:35 - pico-train - INFO - Step 83725 -- 🔄 Training Metrics 2025-08-30 14:45:35 - pico-train - INFO - ├── Loss: 5.6529 2025-08-30 14:45:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:45:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:45:48 - pico-train - INFO - Step 83750 -- 🔄 Training Metrics 2025-08-30 14:45:48 - pico-train - INFO - ├── Loss: 5.7618 2025-08-30 14:45:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:45:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:46:01 - pico-train - INFO - Step 83775 -- 🔄 Training Metrics 2025-08-30 14:46:01 - pico-train - INFO - ├── Loss: 5.7394 2025-08-30 14:46:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:46:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:46:13 - pico-train - INFO - Step 83800 -- 🔄 Training Metrics 2025-08-30 14:46:13 - pico-train - INFO - ├── Loss: 5.7042 2025-08-30 14:46:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:46:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:46:26 - pico-train - INFO - Step 83825 -- 🔄 Training Metrics 2025-08-30 14:46:26 - pico-train - INFO - ├── Loss: 5.7598 2025-08-30 14:46:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:46:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:46:38 - pico-train - INFO - Step 83850 -- 🔄 Training Metrics 2025-08-30 14:46:38 - pico-train - INFO - ├── Loss: 5.6951 2025-08-30 14:46:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:46:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:46:51 - pico-train - INFO - Step 83875 -- 🔄 Training Metrics 2025-08-30 14:46:51 - pico-train - INFO - ├── Loss: 5.6520 2025-08-30 14:46:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:46:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:47:04 - pico-train - INFO - Step 83900 -- 🔄 Training Metrics 2025-08-30 14:47:04 - pico-train - INFO - ├── Loss: 5.7671 2025-08-30 14:47:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:47:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:47:16 - pico-train - INFO - Step 83925 -- 🔄 Training Metrics 2025-08-30 14:47:16 - pico-train - INFO - ├── Loss: 5.6982 2025-08-30 14:47:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:47:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:47:29 - pico-train - INFO - Step 83950 -- 🔄 Training Metrics 2025-08-30 14:47:29 - pico-train - INFO - ├── Loss: 5.7661 2025-08-30 14:47:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:47:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:47:42 - pico-train - INFO - Step 83975 -- 🔄 Training Metrics 2025-08-30 14:47:42 - pico-train - INFO - ├── Loss: 5.7602 2025-08-30 14:47:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:47:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:47:54 - pico-train - INFO - Step 84000 -- 💾 Saving Checkpoint 2025-08-30 14:49:51 - pico-train - INFO - Step 84000 -- 📊 Evaluation Results 2025-08-30 14:49:51 - pico-train - INFO - └── paloma: 2.2158188887517165e+32 2025-08-30 14:49:54 - pico-train - INFO - Step 84000 -- 🔄 Training Metrics 2025-08-30 14:49:54 - pico-train - INFO - ├── Loss: 5.7141 2025-08-30 14:49:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:49:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:49:54 - pico-train - INFO - Step 84000 -- 📈 Saving Learning Dynamics 2025-08-30 14:50:09 - pico-train - INFO - Step 84025 -- 🔄 Training Metrics 2025-08-30 14:50:09 - pico-train - INFO - ├── Loss: 5.7851 2025-08-30 14:50:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:50:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:50:22 - pico-train - INFO - Step 84050 -- 🔄 Training Metrics 2025-08-30 14:50:22 - pico-train - INFO - ├── Loss: 5.7009 2025-08-30 14:50:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:50:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:50:34 - pico-train - INFO - Step 84075 -- 🔄 Training Metrics 2025-08-30 14:50:34 - pico-train - INFO - ├── Loss: 5.7267 2025-08-30 14:50:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:50:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:50:47 - pico-train - INFO - Step 84100 -- 🔄 Training Metrics 2025-08-30 14:50:47 - pico-train - INFO - ├── Loss: 5.7209 2025-08-30 14:50:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:50:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:51:00 - pico-train - INFO - Step 84125 -- 🔄 Training Metrics 2025-08-30 14:51:00 - pico-train - INFO - ├── Loss: 5.7855 2025-08-30 14:51:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:51:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:51:13 - pico-train - INFO - Step 84150 -- 🔄 Training Metrics 2025-08-30 14:51:13 - pico-train - INFO - ├── Loss: 5.6679 2025-08-30 14:51:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:51:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:51:25 - pico-train - INFO - Step 84175 -- 🔄 Training Metrics 2025-08-30 14:51:25 - pico-train - INFO - ├── Loss: 5.6716 2025-08-30 14:51:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:51:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:51:38 - pico-train - INFO - Step 84200 -- 🔄 Training Metrics 2025-08-30 14:51:38 - pico-train - INFO - ├── Loss: 5.7459 2025-08-30 14:51:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:51:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:51:51 - pico-train - INFO - Step 84225 -- 🔄 Training Metrics 2025-08-30 14:51:51 - pico-train - INFO - ├── Loss: 5.8850 2025-08-30 14:51:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:51:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:52:03 - pico-train - INFO - Step 84250 -- 🔄 Training Metrics 2025-08-30 14:52:03 - pico-train - INFO - ├── Loss: 5.7402 2025-08-30 14:52:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:52:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:52:16 - pico-train - INFO - Step 84275 -- 🔄 Training Metrics 2025-08-30 14:52:16 - pico-train - INFO - ├── Loss: 5.7490 2025-08-30 14:52:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:52:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:52:28 - pico-train - INFO - Step 84300 -- 🔄 Training Metrics 2025-08-30 14:52:28 - pico-train - INFO - ├── Loss: 5.6493 2025-08-30 14:52:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:52:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:52:41 - pico-train - INFO - Step 84325 -- 🔄 Training Metrics 2025-08-30 14:52:41 - pico-train - INFO - ├── Loss: 5.8409 2025-08-30 14:52:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:52:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:52:54 - pico-train - INFO - Step 84350 -- 🔄 Training Metrics 2025-08-30 14:52:54 - pico-train - INFO - ├── Loss: 5.7827 2025-08-30 14:52:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:52:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:53:06 - pico-train - INFO - Step 84375 -- 🔄 Training Metrics 2025-08-30 14:53:06 - pico-train - INFO - ├── Loss: 5.6671 2025-08-30 14:53:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:53:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:53:19 - pico-train - INFO - Step 84400 -- 🔄 Training Metrics 2025-08-30 14:53:19 - pico-train - INFO - ├── Loss: 5.6999 2025-08-30 14:53:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:53:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:53:32 - pico-train - INFO - Step 84425 -- 🔄 Training Metrics 2025-08-30 14:53:32 - pico-train - INFO - ├── Loss: 5.8044 2025-08-30 14:53:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:53:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:53:45 - pico-train - INFO - Step 84450 -- 🔄 Training Metrics 2025-08-30 14:53:45 - pico-train - INFO - ├── Loss: 5.7034 2025-08-30 14:53:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:53:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:53:57 - pico-train - INFO - Step 84475 -- 🔄 Training Metrics 2025-08-30 14:53:57 - pico-train - INFO - ├── Loss: 5.7470 2025-08-30 14:53:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:53:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:54:09 - pico-train - INFO - Step 84500 -- 💾 Saving Checkpoint 2025-08-30 14:56:08 - pico-train - INFO - Step 84500 -- 📊 Evaluation Results 2025-08-30 14:56:08 - pico-train - INFO - └── paloma: 2.283543681278441e+32 2025-08-30 14:56:11 - pico-train - INFO - Step 84500 -- 🔄 Training Metrics 2025-08-30 14:56:11 - pico-train - INFO - ├── Loss: 5.7532 2025-08-30 14:56:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:56:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:56:11 - pico-train - INFO - Step 84500 -- 📈 Saving Learning Dynamics 2025-08-30 14:56:26 - pico-train - INFO - Step 84525 -- 🔄 Training Metrics 2025-08-30 14:56:26 - pico-train - INFO - ├── Loss: 5.6957 2025-08-30 14:56:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:56:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:56:39 - pico-train - INFO - Step 84550 -- 🔄 Training Metrics 2025-08-30 14:56:39 - pico-train - INFO - ├── Loss: 5.6533 2025-08-30 14:56:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:56:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:56:51 - pico-train - INFO - Step 84575 -- 🔄 Training Metrics 2025-08-30 14:56:51 - pico-train - INFO - ├── Loss: 5.7480 2025-08-30 14:56:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:56:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:57:04 - pico-train - INFO - Step 84600 -- 🔄 Training Metrics 2025-08-30 14:57:04 - pico-train - INFO - ├── Loss: 5.8383 2025-08-30 14:57:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:57:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:57:16 - pico-train - INFO - Step 84625 -- 🔄 Training Metrics 2025-08-30 14:57:16 - pico-train - INFO - ├── Loss: 5.8163 2025-08-30 14:57:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:57:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:57:29 - pico-train - INFO - Step 84650 -- 🔄 Training Metrics 2025-08-30 14:57:29 - pico-train - INFO - ├── Loss: 5.6896 2025-08-30 14:57:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:57:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:57:42 - pico-train - INFO - Step 84675 -- 🔄 Training Metrics 2025-08-30 14:57:42 - pico-train - INFO - ├── Loss: 5.6731 2025-08-30 14:57:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:57:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:57:54 - pico-train - INFO - Step 84700 -- 🔄 Training Metrics 2025-08-30 14:57:54 - pico-train - INFO - ├── Loss: 5.7533 2025-08-30 14:57:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:57:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:58:07 - pico-train - INFO - Step 84725 -- 🔄 Training Metrics 2025-08-30 14:58:07 - pico-train - INFO - ├── Loss: 5.6532 2025-08-30 14:58:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:58:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:58:19 - pico-train - INFO - Step 84750 -- 🔄 Training Metrics 2025-08-30 14:58:19 - pico-train - INFO - ├── Loss: 5.7462 2025-08-30 14:58:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:58:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:58:32 - pico-train - INFO - Step 84775 -- 🔄 Training Metrics 2025-08-30 14:58:32 - pico-train - INFO - ├── Loss: 5.7329 2025-08-30 14:58:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:58:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:58:44 - pico-train - INFO - Step 84800 -- 🔄 Training Metrics 2025-08-30 14:58:44 - pico-train - INFO - ├── Loss: 5.7428 2025-08-30 14:58:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:58:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:58:57 - pico-train - INFO - Step 84825 -- 🔄 Training Metrics 2025-08-30 14:58:57 - pico-train - INFO - ├── Loss: 5.7658 2025-08-30 14:58:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:58:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:59:10 - pico-train - INFO - Step 84850 -- 🔄 Training Metrics 2025-08-30 14:59:10 - pico-train - INFO - ├── Loss: 5.7238 2025-08-30 14:59:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:59:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:59:22 - pico-train - INFO - Step 84875 -- 🔄 Training Metrics 2025-08-30 14:59:22 - pico-train - INFO - ├── Loss: 5.8444 2025-08-30 14:59:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:59:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:59:35 - pico-train - INFO - Step 84900 -- 🔄 Training Metrics 2025-08-30 14:59:35 - pico-train - INFO - ├── Loss: 5.8164 2025-08-30 14:59:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:59:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 14:59:47 - pico-train - INFO - Step 84925 -- 🔄 Training Metrics 2025-08-30 14:59:47 - pico-train - INFO - ├── Loss: 5.7140 2025-08-30 14:59:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 14:59:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:00:00 - pico-train - INFO - Step 84950 -- 🔄 Training Metrics 2025-08-30 15:00:00 - pico-train - INFO - ├── Loss: 5.7849 2025-08-30 15:00:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:00:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:00:13 - pico-train - INFO - Step 84975 -- 🔄 Training Metrics 2025-08-30 15:00:13 - pico-train - INFO - ├── Loss: 5.6480 2025-08-30 15:00:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:00:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:00:25 - pico-train - INFO - Step 85000 -- 💾 Saving Checkpoint 2025-08-30 15:02:20 - pico-train - INFO - Step 85000 -- 📊 Evaluation Results 2025-08-30 15:02:20 - pico-train - INFO - └── paloma: 2.5087651688455694e+32 2025-08-30 15:02:23 - pico-train - INFO - Step 85000 -- 🔄 Training Metrics 2025-08-30 15:02:23 - pico-train - INFO - ├── Loss: 5.7259 2025-08-30 15:02:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:02:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:02:23 - pico-train - INFO - Step 85000 -- 📈 Saving Learning Dynamics 2025-08-30 15:02:39 - pico-train - INFO - Step 85025 -- 🔄 Training Metrics 2025-08-30 15:02:39 - pico-train - INFO - ├── Loss: 5.7697 2025-08-30 15:02:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:02:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:02:52 - pico-train - INFO - Step 85050 -- 🔄 Training Metrics 2025-08-30 15:02:52 - pico-train - INFO - ├── Loss: 5.7017 2025-08-30 15:02:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:02:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:03:05 - pico-train - INFO - Step 85075 -- 🔄 Training Metrics 2025-08-30 15:03:05 - pico-train - INFO - ├── Loss: 5.7242 2025-08-30 15:03:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:03:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:03:18 - pico-train - INFO - Step 85100 -- 🔄 Training Metrics 2025-08-30 15:03:18 - pico-train - INFO - ├── Loss: 5.7135 2025-08-30 15:03:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:03:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:03:31 - pico-train - INFO - Step 85125 -- 🔄 Training Metrics 2025-08-30 15:03:31 - pico-train - INFO - ├── Loss: 5.7460 2025-08-30 15:03:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:03:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:03:44 - pico-train - INFO - Step 85150 -- 🔄 Training Metrics 2025-08-30 15:03:44 - pico-train - INFO - ├── Loss: 5.8052 2025-08-30 15:03:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:03:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:03:57 - pico-train - INFO - Step 85175 -- 🔄 Training Metrics 2025-08-30 15:03:57 - pico-train - INFO - ├── Loss: 5.8188 2025-08-30 15:03:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:03:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:04:09 - pico-train - INFO - Step 85200 -- 🔄 Training Metrics 2025-08-30 15:04:09 - pico-train - INFO - ├── Loss: 5.7972 2025-08-30 15:04:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:04:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:04:22 - pico-train - INFO - Step 85225 -- 🔄 Training Metrics 2025-08-30 15:04:22 - pico-train - INFO - ├── Loss: 5.7244 2025-08-30 15:04:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:04:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:04:35 - pico-train - INFO - Step 85250 -- 🔄 Training Metrics 2025-08-30 15:04:35 - pico-train - INFO - ├── Loss: 5.6862 2025-08-30 15:04:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:04:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:04:47 - pico-train - INFO - Step 85275 -- 🔄 Training Metrics 2025-08-30 15:04:47 - pico-train - INFO - ├── Loss: 5.8149 2025-08-30 15:04:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:04:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:05:00 - pico-train - INFO - Step 85300 -- 🔄 Training Metrics 2025-08-30 15:05:00 - pico-train - INFO - ├── Loss: 5.7755 2025-08-30 15:05:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:05:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:05:13 - pico-train - INFO - Step 85325 -- 🔄 Training Metrics 2025-08-30 15:05:13 - pico-train - INFO - ├── Loss: 5.7168 2025-08-30 15:05:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:05:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:05:25 - pico-train - INFO - Step 85350 -- 🔄 Training Metrics 2025-08-30 15:05:25 - pico-train - INFO - ├── Loss: 5.8078 2025-08-30 15:05:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:05:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:05:38 - pico-train - INFO - Step 85375 -- 🔄 Training Metrics 2025-08-30 15:05:38 - pico-train - INFO - ├── Loss: 5.7861 2025-08-30 15:05:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:05:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:05:51 - pico-train - INFO - Step 85400 -- 🔄 Training Metrics 2025-08-30 15:05:51 - pico-train - INFO - ├── Loss: 5.8097 2025-08-30 15:05:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:05:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:06:03 - pico-train - INFO - Step 85425 -- 🔄 Training Metrics 2025-08-30 15:06:03 - pico-train - INFO - ├── Loss: 5.7833 2025-08-30 15:06:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:06:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:06:16 - pico-train - INFO - Step 85450 -- 🔄 Training Metrics 2025-08-30 15:06:16 - pico-train - INFO - ├── Loss: 5.7643 2025-08-30 15:06:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:06:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:06:28 - pico-train - INFO - Step 85475 -- 🔄 Training Metrics 2025-08-30 15:06:28 - pico-train - INFO - ├── Loss: 5.7135 2025-08-30 15:06:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:06:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:06:41 - pico-train - INFO - Step 85500 -- 💾 Saving Checkpoint 2025-08-30 15:08:34 - pico-train - INFO - Step 85500 -- 📊 Evaluation Results 2025-08-30 15:08:34 - pico-train - INFO - └── paloma: 2.526897197570139e+32 2025-08-30 15:08:38 - pico-train - INFO - Step 85500 -- 🔄 Training Metrics 2025-08-30 15:08:38 - pico-train - INFO - ├── Loss: 5.7441 2025-08-30 15:08:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:08:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:08:38 - pico-train - INFO - Step 85500 -- 📈 Saving Learning Dynamics 2025-08-30 15:08:52 - pico-train - INFO - Step 85525 -- 🔄 Training Metrics 2025-08-30 15:08:53 - pico-train - INFO - ├── Loss: 5.6970 2025-08-30 15:08:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:08:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:09:05 - pico-train - INFO - Step 85550 -- 🔄 Training Metrics 2025-08-30 15:09:05 - pico-train - INFO - ├── Loss: 5.7048 2025-08-30 15:09:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:09:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:09:18 - pico-train - INFO - Step 85575 -- 🔄 Training Metrics 2025-08-30 15:09:18 - pico-train - INFO - ├── Loss: 5.7447 2025-08-30 15:09:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:09:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:09:30 - pico-train - INFO - Step 85600 -- 🔄 Training Metrics 2025-08-30 15:09:30 - pico-train - INFO - ├── Loss: 5.7826 2025-08-30 15:09:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:09:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:09:43 - pico-train - INFO - Step 85625 -- 🔄 Training Metrics 2025-08-30 15:09:43 - pico-train - INFO - ├── Loss: 5.6880 2025-08-30 15:09:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:09:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:09:56 - pico-train - INFO - Step 85650 -- 🔄 Training Metrics 2025-08-30 15:09:56 - pico-train - INFO - ├── Loss: 5.7228 2025-08-30 15:09:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:09:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:10:08 - pico-train - INFO - Step 85675 -- 🔄 Training Metrics 2025-08-30 15:10:08 - pico-train - INFO - ├── Loss: 5.7447 2025-08-30 15:10:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:10:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:10:21 - pico-train - INFO - Step 85700 -- 🔄 Training Metrics 2025-08-30 15:10:21 - pico-train - INFO - ├── Loss: 5.7662 2025-08-30 15:10:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:10:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:10:34 - pico-train - INFO - Step 85725 -- 🔄 Training Metrics 2025-08-30 15:10:34 - pico-train - INFO - ├── Loss: 5.6988 2025-08-30 15:10:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:10:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:10:47 - pico-train - INFO - Step 85750 -- 🔄 Training Metrics 2025-08-30 15:10:47 - pico-train - INFO - ├── Loss: 5.7295 2025-08-30 15:10:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:10:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:10:59 - pico-train - INFO - Step 85775 -- 🔄 Training Metrics 2025-08-30 15:10:59 - pico-train - INFO - ├── Loss: 5.7442 2025-08-30 15:10:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:10:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:11:12 - pico-train - INFO - Step 85800 -- 🔄 Training Metrics 2025-08-30 15:11:12 - pico-train - INFO - ├── Loss: 5.7960 2025-08-30 15:11:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:11:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:11:24 - pico-train - INFO - Step 85825 -- 🔄 Training Metrics 2025-08-30 15:11:24 - pico-train - INFO - ├── Loss: 5.6939 2025-08-30 15:11:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:11:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:11:37 - pico-train - INFO - Step 85850 -- 🔄 Training Metrics 2025-08-30 15:11:37 - pico-train - INFO - ├── Loss: 5.7725 2025-08-30 15:11:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:11:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:11:49 - pico-train - INFO - Step 85875 -- 🔄 Training Metrics 2025-08-30 15:11:49 - pico-train - INFO - ├── Loss: 5.8111 2025-08-30 15:11:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:11:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:12:02 - pico-train - INFO - Step 85900 -- 🔄 Training Metrics 2025-08-30 15:12:02 - pico-train - INFO - ├── Loss: 5.8032 2025-08-30 15:12:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:12:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:12:15 - pico-train - INFO - Step 85925 -- 🔄 Training Metrics 2025-08-30 15:12:15 - pico-train - INFO - ├── Loss: 5.7568 2025-08-30 15:12:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:12:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:12:27 - pico-train - INFO - Step 85950 -- 🔄 Training Metrics 2025-08-30 15:12:27 - pico-train - INFO - ├── Loss: 5.7991 2025-08-30 15:12:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:12:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:12:40 - pico-train - INFO - Step 85975 -- 🔄 Training Metrics 2025-08-30 15:12:40 - pico-train - INFO - ├── Loss: 5.7900 2025-08-30 15:12:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:12:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:12:52 - pico-train - INFO - Step 86000 -- 💾 Saving Checkpoint 2025-08-30 15:14:53 - pico-train - INFO - Step 86000 -- 📊 Evaluation Results 2025-08-30 15:14:53 - pico-train - INFO - └── paloma: 2.5395585157246708e+32 2025-08-30 15:14:56 - pico-train - INFO - Step 86000 -- 🔄 Training Metrics 2025-08-30 15:14:56 - pico-train - INFO - ├── Loss: 5.7032 2025-08-30 15:14:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:14:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:14:56 - pico-train - INFO - Step 86000 -- 📈 Saving Learning Dynamics 2025-08-30 15:15:13 - pico-train - INFO - Step 86025 -- 🔄 Training Metrics 2025-08-30 15:15:13 - pico-train - INFO - ├── Loss: 5.8018 2025-08-30 15:15:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:15:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:15:25 - pico-train - INFO - Step 86050 -- 🔄 Training Metrics 2025-08-30 15:15:25 - pico-train - INFO - ├── Loss: 5.7650 2025-08-30 15:15:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:15:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:15:38 - pico-train - INFO - Step 86075 -- 🔄 Training Metrics 2025-08-30 15:15:38 - pico-train - INFO - ├── Loss: 5.7429 2025-08-30 15:15:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:15:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:15:50 - pico-train - INFO - Step 86100 -- 🔄 Training Metrics 2025-08-30 15:15:50 - pico-train - INFO - ├── Loss: 5.7789 2025-08-30 15:15:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:15:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:16:03 - pico-train - INFO - Step 86125 -- 🔄 Training Metrics 2025-08-30 15:16:03 - pico-train - INFO - ├── Loss: 5.7001 2025-08-30 15:16:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:16:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:16:15 - pico-train - INFO - Step 86150 -- 🔄 Training Metrics 2025-08-30 15:16:15 - pico-train - INFO - ├── Loss: 5.7935 2025-08-30 15:16:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:16:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:16:28 - pico-train - INFO - Step 86175 -- 🔄 Training Metrics 2025-08-30 15:16:28 - pico-train - INFO - ├── Loss: 5.7105 2025-08-30 15:16:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:16:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:16:41 - pico-train - INFO - Step 86200 -- 🔄 Training Metrics 2025-08-30 15:16:41 - pico-train - INFO - ├── Loss: 5.7426 2025-08-30 15:16:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:16:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:16:53 - pico-train - INFO - Step 86225 -- 🔄 Training Metrics 2025-08-30 15:16:53 - pico-train - INFO - ├── Loss: 5.7919 2025-08-30 15:16:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:16:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:17:07 - pico-train - INFO - Step 86250 -- 🔄 Training Metrics 2025-08-30 15:17:07 - pico-train - INFO - ├── Loss: 5.7451 2025-08-30 15:17:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:17:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:17:19 - pico-train - INFO - Step 86275 -- 🔄 Training Metrics 2025-08-30 15:17:19 - pico-train - INFO - ├── Loss: 5.8049 2025-08-30 15:17:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:17:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:17:32 - pico-train - INFO - Step 86300 -- 🔄 Training Metrics 2025-08-30 15:17:32 - pico-train - INFO - ├── Loss: 5.7550 2025-08-30 15:17:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:17:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:17:44 - pico-train - INFO - Step 86325 -- 🔄 Training Metrics 2025-08-30 15:17:44 - pico-train - INFO - ├── Loss: 5.7332 2025-08-30 15:17:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:17:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:17:57 - pico-train - INFO - Step 86350 -- 🔄 Training Metrics 2025-08-30 15:17:57 - pico-train - INFO - ├── Loss: 5.8024 2025-08-30 15:17:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:17:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:18:10 - pico-train - INFO - Step 86375 -- 🔄 Training Metrics 2025-08-30 15:18:10 - pico-train - INFO - ├── Loss: 5.7668 2025-08-30 15:18:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:18:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:18:22 - pico-train - INFO - Step 86400 -- 🔄 Training Metrics 2025-08-30 15:18:22 - pico-train - INFO - ├── Loss: 5.7908 2025-08-30 15:18:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:18:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:18:35 - pico-train - INFO - Step 86425 -- 🔄 Training Metrics 2025-08-30 15:18:35 - pico-train - INFO - ├── Loss: 5.7413 2025-08-30 15:18:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:18:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:18:47 - pico-train - INFO - Step 86450 -- 🔄 Training Metrics 2025-08-30 15:18:47 - pico-train - INFO - ├── Loss: 5.7442 2025-08-30 15:18:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:18:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:19:00 - pico-train - INFO - Step 86475 -- 🔄 Training Metrics 2025-08-30 15:19:00 - pico-train - INFO - ├── Loss: 5.7933 2025-08-30 15:19:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:19:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:19:12 - pico-train - INFO - Step 86500 -- 💾 Saving Checkpoint 2025-08-30 15:21:06 - pico-train - INFO - Step 86500 -- 📊 Evaluation Results 2025-08-30 15:21:06 - pico-train - INFO - └── paloma: 2.983745928476798e+32 2025-08-30 15:21:09 - pico-train - INFO - Step 86500 -- 🔄 Training Metrics 2025-08-30 15:21:09 - pico-train - INFO - ├── Loss: 5.7088 2025-08-30 15:21:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:21:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:21:09 - pico-train - INFO - Step 86500 -- 📈 Saving Learning Dynamics 2025-08-30 15:21:24 - pico-train - INFO - Step 86525 -- 🔄 Training Metrics 2025-08-30 15:21:24 - pico-train - INFO - ├── Loss: 5.7427 2025-08-30 15:21:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:21:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:21:37 - pico-train - INFO - Step 86550 -- 🔄 Training Metrics 2025-08-30 15:21:37 - pico-train - INFO - ├── Loss: 5.7492 2025-08-30 15:21:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:21:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:21:50 - pico-train - INFO - Step 86575 -- 🔄 Training Metrics 2025-08-30 15:21:50 - pico-train - INFO - ├── Loss: 5.7769 2025-08-30 15:21:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:21:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:22:03 - pico-train - INFO - Step 86600 -- 🔄 Training Metrics 2025-08-30 15:22:03 - pico-train - INFO - ├── Loss: 5.7265 2025-08-30 15:22:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:22:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:22:15 - pico-train - INFO - Step 86625 -- 🔄 Training Metrics 2025-08-30 15:22:15 - pico-train - INFO - ├── Loss: 5.7499 2025-08-30 15:22:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:22:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:22:28 - pico-train - INFO - Step 86650 -- 🔄 Training Metrics 2025-08-30 15:22:28 - pico-train - INFO - ├── Loss: 5.6808 2025-08-30 15:22:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:22:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:22:41 - pico-train - INFO - Step 86675 -- 🔄 Training Metrics 2025-08-30 15:22:41 - pico-train - INFO - ├── Loss: 5.7845 2025-08-30 15:22:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:22:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:22:53 - pico-train - INFO - Step 86700 -- 🔄 Training Metrics 2025-08-30 15:22:53 - pico-train - INFO - ├── Loss: 5.6157 2025-08-30 15:22:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:22:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:23:06 - pico-train - INFO - Step 86725 -- 🔄 Training Metrics 2025-08-30 15:23:06 - pico-train - INFO - ├── Loss: 5.6912 2025-08-30 15:23:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:23:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:23:19 - pico-train - INFO - Step 86750 -- 🔄 Training Metrics 2025-08-30 15:23:19 - pico-train - INFO - ├── Loss: 5.7776 2025-08-30 15:23:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:23:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:23:31 - pico-train - INFO - Step 86775 -- 🔄 Training Metrics 2025-08-30 15:23:31 - pico-train - INFO - ├── Loss: 5.8204 2025-08-30 15:23:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:23:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:23:44 - pico-train - INFO - Step 86800 -- 🔄 Training Metrics 2025-08-30 15:23:44 - pico-train - INFO - ├── Loss: 5.7292 2025-08-30 15:23:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:23:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:23:57 - pico-train - INFO - Step 86825 -- 🔄 Training Metrics 2025-08-30 15:23:57 - pico-train - INFO - ├── Loss: 5.7485 2025-08-30 15:23:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:23:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:24:10 - pico-train - INFO - Step 86850 -- 🔄 Training Metrics 2025-08-30 15:24:10 - pico-train - INFO - ├── Loss: 5.8013 2025-08-30 15:24:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:24:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:24:23 - pico-train - INFO - Step 86875 -- 🔄 Training Metrics 2025-08-30 15:24:23 - pico-train - INFO - ├── Loss: 5.7522 2025-08-30 15:24:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:24:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:24:35 - pico-train - INFO - Step 86900 -- 🔄 Training Metrics 2025-08-30 15:24:35 - pico-train - INFO - ├── Loss: 5.7249 2025-08-30 15:24:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:24:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:24:48 - pico-train - INFO - Step 86925 -- 🔄 Training Metrics 2025-08-30 15:24:48 - pico-train - INFO - ├── Loss: 5.5854 2025-08-30 15:24:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:24:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:25:00 - pico-train - INFO - Step 86950 -- 🔄 Training Metrics 2025-08-30 15:25:00 - pico-train - INFO - ├── Loss: 5.7977 2025-08-30 15:25:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:25:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:25:13 - pico-train - INFO - Step 86975 -- 🔄 Training Metrics 2025-08-30 15:25:13 - pico-train - INFO - ├── Loss: 5.7733 2025-08-30 15:25:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:25:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:25:25 - pico-train - INFO - Step 87000 -- 💾 Saving Checkpoint 2025-08-30 15:27:23 - pico-train - INFO - Step 87000 -- 📊 Evaluation Results 2025-08-30 15:27:23 - pico-train - INFO - └── paloma: 2.8810581080598207e+32 2025-08-30 15:27:26 - pico-train - INFO - Step 87000 -- 🔄 Training Metrics 2025-08-30 15:27:26 - pico-train - INFO - ├── Loss: 5.8312 2025-08-30 15:27:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:27:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:27:26 - pico-train - INFO - Step 87000 -- 📈 Saving Learning Dynamics 2025-08-30 15:27:42 - pico-train - INFO - Step 87025 -- 🔄 Training Metrics 2025-08-30 15:27:42 - pico-train - INFO - ├── Loss: 5.6702 2025-08-30 15:27:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:27:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:27:54 - pico-train - INFO - Step 87050 -- 🔄 Training Metrics 2025-08-30 15:27:54 - pico-train - INFO - ├── Loss: 5.7006 2025-08-30 15:27:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:27:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:28:07 - pico-train - INFO - Step 87075 -- 🔄 Training Metrics 2025-08-30 15:28:07 - pico-train - INFO - ├── Loss: 5.6640 2025-08-30 15:28:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:28:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:28:20 - pico-train - INFO - Step 87100 -- 🔄 Training Metrics 2025-08-30 15:28:20 - pico-train - INFO - ├── Loss: 5.7438 2025-08-30 15:28:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:28:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:28:32 - pico-train - INFO - Step 87125 -- 🔄 Training Metrics 2025-08-30 15:28:32 - pico-train - INFO - ├── Loss: 5.8106 2025-08-30 15:28:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:28:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:28:45 - pico-train - INFO - Step 87150 -- 🔄 Training Metrics 2025-08-30 15:28:45 - pico-train - INFO - ├── Loss: 5.7836 2025-08-30 15:28:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:28:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:28:58 - pico-train - INFO - Step 87175 -- 🔄 Training Metrics 2025-08-30 15:28:58 - pico-train - INFO - ├── Loss: 5.7094 2025-08-30 15:28:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:28:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:29:11 - pico-train - INFO - Step 87200 -- 🔄 Training Metrics 2025-08-30 15:29:11 - pico-train - INFO - ├── Loss: 5.7583 2025-08-30 15:29:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:29:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:29:23 - pico-train - INFO - Step 87225 -- 🔄 Training Metrics 2025-08-30 15:29:23 - pico-train - INFO - ├── Loss: 5.7326 2025-08-30 15:29:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:29:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:29:36 - pico-train - INFO - Step 87250 -- 🔄 Training Metrics 2025-08-30 15:29:36 - pico-train - INFO - ├── Loss: 5.7466 2025-08-30 15:29:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:29:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:29:49 - pico-train - INFO - Step 87275 -- 🔄 Training Metrics 2025-08-30 15:29:49 - pico-train - INFO - ├── Loss: 5.8335 2025-08-30 15:29:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:29:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:30:01 - pico-train - INFO - Step 87300 -- 🔄 Training Metrics 2025-08-30 15:30:01 - pico-train - INFO - ├── Loss: 5.8182 2025-08-30 15:30:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:30:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:30:14 - pico-train - INFO - Step 87325 -- 🔄 Training Metrics 2025-08-30 15:30:14 - pico-train - INFO - ├── Loss: 5.7018 2025-08-30 15:30:14 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:30:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:30:27 - pico-train - INFO - Step 87350 -- 🔄 Training Metrics 2025-08-30 15:30:27 - pico-train - INFO - ├── Loss: 5.7570 2025-08-30 15:30:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:30:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:30:39 - pico-train - INFO - Step 87375 -- 🔄 Training Metrics 2025-08-30 15:30:39 - pico-train - INFO - ├── Loss: 5.8416 2025-08-30 15:30:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:30:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:30:52 - pico-train - INFO - Step 87400 -- 🔄 Training Metrics 2025-08-30 15:30:52 - pico-train - INFO - ├── Loss: 5.7823 2025-08-30 15:30:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:30:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:31:05 - pico-train - INFO - Step 87425 -- 🔄 Training Metrics 2025-08-30 15:31:05 - pico-train - INFO - ├── Loss: 5.6998 2025-08-30 15:31:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:31:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:31:17 - pico-train - INFO - Step 87450 -- 🔄 Training Metrics 2025-08-30 15:31:17 - pico-train - INFO - ├── Loss: 5.8327 2025-08-30 15:31:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:31:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:31:30 - pico-train - INFO - Step 87475 -- 🔄 Training Metrics 2025-08-30 15:31:30 - pico-train - INFO - ├── Loss: 5.7288 2025-08-30 15:31:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:31:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:31:42 - pico-train - INFO - Step 87500 -- 💾 Saving Checkpoint 2025-08-30 15:33:49 - pico-train - INFO - Step 87500 -- 📊 Evaluation Results 2025-08-30 15:33:49 - pico-train - INFO - └── paloma: 2.799469925719804e+32 2025-08-30 15:33:51 - pico-train - INFO - Step 87500 -- 🔄 Training Metrics 2025-08-30 15:33:51 - pico-train - INFO - ├── Loss: 5.8469 2025-08-30 15:33:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:33:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:33:51 - pico-train - INFO - Step 87500 -- 📈 Saving Learning Dynamics 2025-08-30 15:34:06 - pico-train - INFO - Step 87525 -- 🔄 Training Metrics 2025-08-30 15:34:06 - pico-train - INFO - ├── Loss: 5.7839 2025-08-30 15:34:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:34:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:34:19 - pico-train - INFO - Step 87550 -- 🔄 Training Metrics 2025-08-30 15:34:19 - pico-train - INFO - ├── Loss: 5.7669 2025-08-30 15:34:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:34:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:34:31 - pico-train - INFO - Step 87575 -- 🔄 Training Metrics 2025-08-30 15:34:31 - pico-train - INFO - ├── Loss: 5.7506 2025-08-30 15:34:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:34:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:34:44 - pico-train - INFO - Step 87600 -- 🔄 Training Metrics 2025-08-30 15:34:44 - pico-train - INFO - ├── Loss: 5.7107 2025-08-30 15:34:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:34:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:34:56 - pico-train - INFO - Step 87625 -- 🔄 Training Metrics 2025-08-30 15:34:56 - pico-train - INFO - ├── Loss: 5.7098 2025-08-30 15:34:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:34:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:35:09 - pico-train - INFO - Step 87650 -- 🔄 Training Metrics 2025-08-30 15:35:09 - pico-train - INFO - ├── Loss: 5.7290 2025-08-30 15:35:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:35:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:35:22 - pico-train - INFO - Step 87675 -- 🔄 Training Metrics 2025-08-30 15:35:22 - pico-train - INFO - ├── Loss: 5.8343 2025-08-30 15:35:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:35:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:35:34 - pico-train - INFO - Step 87700 -- 🔄 Training Metrics 2025-08-30 15:35:34 - pico-train - INFO - ├── Loss: 5.7439 2025-08-30 15:35:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:35:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:35:47 - pico-train - INFO - Step 87725 -- 🔄 Training Metrics 2025-08-30 15:35:47 - pico-train - INFO - ├── Loss: 5.6432 2025-08-30 15:35:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:35:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:36:00 - pico-train - INFO - Step 87750 -- 🔄 Training Metrics 2025-08-30 15:36:00 - pico-train - INFO - ├── Loss: 5.7196 2025-08-30 15:36:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:36:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:36:12 - pico-train - INFO - Step 87775 -- 🔄 Training Metrics 2025-08-30 15:36:12 - pico-train - INFO - ├── Loss: 5.6542 2025-08-30 15:36:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:36:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:36:25 - pico-train - INFO - Step 87800 -- 🔄 Training Metrics 2025-08-30 15:36:25 - pico-train - INFO - ├── Loss: 5.7995 2025-08-30 15:36:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:36:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:36:38 - pico-train - INFO - Step 87825 -- 🔄 Training Metrics 2025-08-30 15:36:38 - pico-train - INFO - ├── Loss: 5.8301 2025-08-30 15:36:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:36:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:36:50 - pico-train - INFO - Step 87850 -- 🔄 Training Metrics 2025-08-30 15:36:50 - pico-train - INFO - ├── Loss: 5.7938 2025-08-30 15:36:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:36:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:37:03 - pico-train - INFO - Step 87875 -- 🔄 Training Metrics 2025-08-30 15:37:03 - pico-train - INFO - ├── Loss: 5.6972 2025-08-30 15:37:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:37:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:37:15 - pico-train - INFO - Step 87900 -- 🔄 Training Metrics 2025-08-30 15:37:15 - pico-train - INFO - ├── Loss: 5.6862 2025-08-30 15:37:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:37:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:37:28 - pico-train - INFO - Step 87925 -- 🔄 Training Metrics 2025-08-30 15:37:28 - pico-train - INFO - ├── Loss: 5.8037 2025-08-30 15:37:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:37:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:37:40 - pico-train - INFO - Step 87950 -- 🔄 Training Metrics 2025-08-30 15:37:40 - pico-train - INFO - ├── Loss: 5.7647 2025-08-30 15:37:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:37:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:37:53 - pico-train - INFO - Step 87975 -- 🔄 Training Metrics 2025-08-30 15:37:53 - pico-train - INFO - ├── Loss: 5.7602 2025-08-30 15:37:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:37:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:38:05 - pico-train - INFO - Step 88000 -- 💾 Saving Checkpoint 2025-08-30 15:40:00 - pico-train - INFO - Step 88000 -- 📊 Evaluation Results 2025-08-30 15:40:00 - pico-train - INFO - └── paloma: 3.1466549920581647e+32 2025-08-30 15:40:02 - pico-train - INFO - Step 88000 -- 🔄 Training Metrics 2025-08-30 15:40:02 - pico-train - INFO - ├── Loss: 5.7855 2025-08-30 15:40:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:40:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:40:02 - pico-train - INFO - Step 88000 -- 📈 Saving Learning Dynamics 2025-08-30 15:40:17 - pico-train - INFO - Step 88025 -- 🔄 Training Metrics 2025-08-30 15:40:17 - pico-train - INFO - ├── Loss: 5.8101 2025-08-30 15:40:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:40:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:40:29 - pico-train - INFO - Step 88050 -- 🔄 Training Metrics 2025-08-30 15:40:29 - pico-train - INFO - ├── Loss: 5.7339 2025-08-30 15:40:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:40:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:40:42 - pico-train - INFO - Step 88075 -- 🔄 Training Metrics 2025-08-30 15:40:42 - pico-train - INFO - ├── Loss: 5.7454 2025-08-30 15:40:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:40:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:40:54 - pico-train - INFO - Step 88100 -- 🔄 Training Metrics 2025-08-30 15:40:54 - pico-train - INFO - ├── Loss: 5.7224 2025-08-30 15:40:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:40:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:41:07 - pico-train - INFO - Step 88125 -- 🔄 Training Metrics 2025-08-30 15:41:07 - pico-train - INFO - ├── Loss: 5.8219 2025-08-30 15:41:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:41:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:41:20 - pico-train - INFO - Step 88150 -- 🔄 Training Metrics 2025-08-30 15:41:20 - pico-train - INFO - ├── Loss: 5.7135 2025-08-30 15:41:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:41:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:41:33 - pico-train - INFO - Step 88175 -- 🔄 Training Metrics 2025-08-30 15:41:33 - pico-train - INFO - ├── Loss: 5.7870 2025-08-30 15:41:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:41:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:41:45 - pico-train - INFO - Step 88200 -- 🔄 Training Metrics 2025-08-30 15:41:45 - pico-train - INFO - ├── Loss: 5.7951 2025-08-30 15:41:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:41:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:41:58 - pico-train - INFO - Step 88225 -- 🔄 Training Metrics 2025-08-30 15:41:58 - pico-train - INFO - ├── Loss: 5.7138 2025-08-30 15:41:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:41:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:42:11 - pico-train - INFO - Step 88250 -- 🔄 Training Metrics 2025-08-30 15:42:11 - pico-train - INFO - ├── Loss: 5.7812 2025-08-30 15:42:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:42:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:42:24 - pico-train - INFO - Step 88275 -- 🔄 Training Metrics 2025-08-30 15:42:24 - pico-train - INFO - ├── Loss: 5.7163 2025-08-30 15:42:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:42:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:42:36 - pico-train - INFO - Step 88300 -- 🔄 Training Metrics 2025-08-30 15:42:36 - pico-train - INFO - ├── Loss: 5.7193 2025-08-30 15:42:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:42:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:42:49 - pico-train - INFO - Step 88325 -- 🔄 Training Metrics 2025-08-30 15:42:49 - pico-train - INFO - ├── Loss: 5.7766 2025-08-30 15:42:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:42:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:43:02 - pico-train - INFO - Step 88350 -- 🔄 Training Metrics 2025-08-30 15:43:02 - pico-train - INFO - ├── Loss: 5.6819 2025-08-30 15:43:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:43:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:43:14 - pico-train - INFO - Step 88375 -- 🔄 Training Metrics 2025-08-30 15:43:14 - pico-train - INFO - ├── Loss: 5.6956 2025-08-30 15:43:14 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:43:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:43:27 - pico-train - INFO - Step 88400 -- 🔄 Training Metrics 2025-08-30 15:43:27 - pico-train - INFO - ├── Loss: 5.8095 2025-08-30 15:43:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:43:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:43:39 - pico-train - INFO - Step 88425 -- 🔄 Training Metrics 2025-08-30 15:43:39 - pico-train - INFO - ├── Loss: 5.8015 2025-08-30 15:43:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:43:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:43:52 - pico-train - INFO - Step 88450 -- 🔄 Training Metrics 2025-08-30 15:43:52 - pico-train - INFO - ├── Loss: 5.6796 2025-08-30 15:43:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:43:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:44:05 - pico-train - INFO - Step 88475 -- 🔄 Training Metrics 2025-08-30 15:44:05 - pico-train - INFO - ├── Loss: 5.7676 2025-08-30 15:44:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:44:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:44:17 - pico-train - INFO - Step 88500 -- 💾 Saving Checkpoint 2025-08-30 15:46:18 - pico-train - INFO - Step 88500 -- 📊 Evaluation Results 2025-08-30 15:46:18 - pico-train - INFO - └── paloma: 3.144629826397518e+32 2025-08-30 15:46:21 - pico-train - INFO - Step 88500 -- 🔄 Training Metrics 2025-08-30 15:46:21 - pico-train - INFO - ├── Loss: 5.7201 2025-08-30 15:46:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:46:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:46:21 - pico-train - INFO - Step 88500 -- 📈 Saving Learning Dynamics 2025-08-30 15:46:36 - pico-train - INFO - Step 88525 -- 🔄 Training Metrics 2025-08-30 15:46:36 - pico-train - INFO - ├── Loss: 5.7260 2025-08-30 15:46:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:46:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:46:49 - pico-train - INFO - Step 88550 -- 🔄 Training Metrics 2025-08-30 15:46:49 - pico-train - INFO - ├── Loss: 5.6447 2025-08-30 15:46:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:46:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:47:01 - pico-train - INFO - Step 88575 -- 🔄 Training Metrics 2025-08-30 15:47:01 - pico-train - INFO - ├── Loss: 5.6951 2025-08-30 15:47:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:47:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:47:14 - pico-train - INFO - Step 88600 -- 🔄 Training Metrics 2025-08-30 15:47:14 - pico-train - INFO - ├── Loss: 5.7123 2025-08-30 15:47:14 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:47:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:47:26 - pico-train - INFO - Step 88625 -- 🔄 Training Metrics 2025-08-30 15:47:26 - pico-train - INFO - ├── Loss: 5.7047 2025-08-30 15:47:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:47:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:47:39 - pico-train - INFO - Step 88650 -- 🔄 Training Metrics 2025-08-30 15:47:39 - pico-train - INFO - ├── Loss: 5.6127 2025-08-30 15:47:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:47:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:47:52 - pico-train - INFO - Step 88675 -- 🔄 Training Metrics 2025-08-30 15:47:52 - pico-train - INFO - ├── Loss: 5.8085 2025-08-30 15:47:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:47:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:48:04 - pico-train - INFO - Step 88700 -- 🔄 Training Metrics 2025-08-30 15:48:04 - pico-train - INFO - ├── Loss: 5.7317 2025-08-30 15:48:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:48:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:48:17 - pico-train - INFO - Step 88725 -- 🔄 Training Metrics 2025-08-30 15:48:17 - pico-train - INFO - ├── Loss: 5.7546 2025-08-30 15:48:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:48:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:48:30 - pico-train - INFO - Step 88750 -- 🔄 Training Metrics 2025-08-30 15:48:30 - pico-train - INFO - ├── Loss: 5.7284 2025-08-30 15:48:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:48:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:48:42 - pico-train - INFO - Step 88775 -- 🔄 Training Metrics 2025-08-30 15:48:42 - pico-train - INFO - ├── Loss: 5.6721 2025-08-30 15:48:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:48:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:48:55 - pico-train - INFO - Step 88800 -- 🔄 Training Metrics 2025-08-30 15:48:55 - pico-train - INFO - ├── Loss: 5.7062 2025-08-30 15:48:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:48:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:49:08 - pico-train - INFO - Step 88825 -- 🔄 Training Metrics 2025-08-30 15:49:08 - pico-train - INFO - ├── Loss: 5.7419 2025-08-30 15:49:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:49:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:49:20 - pico-train - INFO - Step 88850 -- 🔄 Training Metrics 2025-08-30 15:49:20 - pico-train - INFO - ├── Loss: 5.7287 2025-08-30 15:49:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:49:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:49:33 - pico-train - INFO - Step 88875 -- 🔄 Training Metrics 2025-08-30 15:49:33 - pico-train - INFO - ├── Loss: 5.7531 2025-08-30 15:49:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:49:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:49:45 - pico-train - INFO - Step 88900 -- 🔄 Training Metrics 2025-08-30 15:49:45 - pico-train - INFO - ├── Loss: 5.6494 2025-08-30 15:49:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:49:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:49:58 - pico-train - INFO - Step 88925 -- 🔄 Training Metrics 2025-08-30 15:49:58 - pico-train - INFO - ├── Loss: 5.7563 2025-08-30 15:49:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:49:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:50:11 - pico-train - INFO - Step 88950 -- 🔄 Training Metrics 2025-08-30 15:50:11 - pico-train - INFO - ├── Loss: 5.6708 2025-08-30 15:50:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:50:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:50:23 - pico-train - INFO - Step 88975 -- 🔄 Training Metrics 2025-08-30 15:50:23 - pico-train - INFO - ├── Loss: 5.7223 2025-08-30 15:50:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:50:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:50:36 - pico-train - INFO - Step 89000 -- 💾 Saving Checkpoint 2025-08-30 15:52:39 - pico-train - INFO - Step 89000 -- 📊 Evaluation Results 2025-08-30 15:52:39 - pico-train - INFO - └── paloma: 3.2301587443631354e+32 2025-08-30 15:52:41 - pico-train - INFO - Step 89000 -- 🔄 Training Metrics 2025-08-30 15:52:41 - pico-train - INFO - ├── Loss: 5.5711 2025-08-30 15:52:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:52:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:52:41 - pico-train - INFO - Step 89000 -- 📈 Saving Learning Dynamics 2025-08-30 15:52:56 - pico-train - INFO - Step 89025 -- 🔄 Training Metrics 2025-08-30 15:52:56 - pico-train - INFO - ├── Loss: 5.7442 2025-08-30 15:52:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:52:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:53:09 - pico-train - INFO - Step 89050 -- 🔄 Training Metrics 2025-08-30 15:53:09 - pico-train - INFO - ├── Loss: 5.7524 2025-08-30 15:53:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:53:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:53:21 - pico-train - INFO - Step 89075 -- 🔄 Training Metrics 2025-08-30 15:53:21 - pico-train - INFO - ├── Loss: 5.8300 2025-08-30 15:53:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:53:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:53:34 - pico-train - INFO - Step 89100 -- 🔄 Training Metrics 2025-08-30 15:53:34 - pico-train - INFO - ├── Loss: 5.7325 2025-08-30 15:53:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:53:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:53:47 - pico-train - INFO - Step 89125 -- 🔄 Training Metrics 2025-08-30 15:53:47 - pico-train - INFO - ├── Loss: 5.8455 2025-08-30 15:53:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:53:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:54:00 - pico-train - INFO - Step 89150 -- 🔄 Training Metrics 2025-08-30 15:54:00 - pico-train - INFO - ├── Loss: 5.7652 2025-08-30 15:54:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:54:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:54:12 - pico-train - INFO - Step 89175 -- 🔄 Training Metrics 2025-08-30 15:54:12 - pico-train - INFO - ├── Loss: 5.7129 2025-08-30 15:54:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:54:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:54:25 - pico-train - INFO - Step 89200 -- 🔄 Training Metrics 2025-08-30 15:54:25 - pico-train - INFO - ├── Loss: 5.7530 2025-08-30 15:54:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:54:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:54:38 - pico-train - INFO - Step 89225 -- 🔄 Training Metrics 2025-08-30 15:54:38 - pico-train - INFO - ├── Loss: 5.6867 2025-08-30 15:54:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:54:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:54:51 - pico-train - INFO - Step 89250 -- 🔄 Training Metrics 2025-08-30 15:54:51 - pico-train - INFO - ├── Loss: 5.7202 2025-08-30 15:54:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:54:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:55:03 - pico-train - INFO - Step 89275 -- 🔄 Training Metrics 2025-08-30 15:55:03 - pico-train - INFO - ├── Loss: 5.6995 2025-08-30 15:55:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:55:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:55:16 - pico-train - INFO - Step 89300 -- 🔄 Training Metrics 2025-08-30 15:55:16 - pico-train - INFO - ├── Loss: 5.7101 2025-08-30 15:55:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:55:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:55:29 - pico-train - INFO - Step 89325 -- 🔄 Training Metrics 2025-08-30 15:55:29 - pico-train - INFO - ├── Loss: 5.7965 2025-08-30 15:55:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:55:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:55:41 - pico-train - INFO - Step 89350 -- 🔄 Training Metrics 2025-08-30 15:55:41 - pico-train - INFO - ├── Loss: 5.6982 2025-08-30 15:55:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:55:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:55:54 - pico-train - INFO - Step 89375 -- 🔄 Training Metrics 2025-08-30 15:55:54 - pico-train - INFO - ├── Loss: 5.8355 2025-08-30 15:55:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:55:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:56:07 - pico-train - INFO - Step 89400 -- 🔄 Training Metrics 2025-08-30 15:56:07 - pico-train - INFO - ├── Loss: 5.7425 2025-08-30 15:56:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:56:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:56:19 - pico-train - INFO - Step 89425 -- 🔄 Training Metrics 2025-08-30 15:56:19 - pico-train - INFO - ├── Loss: 5.6805 2025-08-30 15:56:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:56:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:56:32 - pico-train - INFO - Step 89450 -- 🔄 Training Metrics 2025-08-30 15:56:32 - pico-train - INFO - ├── Loss: 5.7606 2025-08-30 15:56:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:56:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:56:44 - pico-train - INFO - Step 89475 -- 🔄 Training Metrics 2025-08-30 15:56:44 - pico-train - INFO - ├── Loss: 5.7342 2025-08-30 15:56:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:56:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:56:57 - pico-train - INFO - Step 89500 -- 💾 Saving Checkpoint 2025-08-30 15:58:51 - pico-train - INFO - Step 89500 -- 📊 Evaluation Results 2025-08-30 15:58:51 - pico-train - INFO - └── paloma: 3.262009806286359e+32 2025-08-30 15:58:53 - pico-train - INFO - Step 89500 -- 🔄 Training Metrics 2025-08-30 15:58:53 - pico-train - INFO - ├── Loss: 5.7022 2025-08-30 15:58:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:58:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:58:53 - pico-train - INFO - Step 89500 -- 📈 Saving Learning Dynamics 2025-08-30 15:59:08 - pico-train - INFO - Step 89525 -- 🔄 Training Metrics 2025-08-30 15:59:08 - pico-train - INFO - ├── Loss: 5.7403 2025-08-30 15:59:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:59:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:59:21 - pico-train - INFO - Step 89550 -- 🔄 Training Metrics 2025-08-30 15:59:21 - pico-train - INFO - ├── Loss: 5.8105 2025-08-30 15:59:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:59:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:59:33 - pico-train - INFO - Step 89575 -- 🔄 Training Metrics 2025-08-30 15:59:33 - pico-train - INFO - ├── Loss: 5.7289 2025-08-30 15:59:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:59:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:59:46 - pico-train - INFO - Step 89600 -- 🔄 Training Metrics 2025-08-30 15:59:46 - pico-train - INFO - ├── Loss: 5.7651 2025-08-30 15:59:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:59:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:59:59 - pico-train - INFO - Step 89625 -- 🔄 Training Metrics 2025-08-30 15:59:59 - pico-train - INFO - ├── Loss: 5.7487 2025-08-30 15:59:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 15:59:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:00:11 - pico-train - INFO - Step 89650 -- 🔄 Training Metrics 2025-08-30 16:00:11 - pico-train - INFO - ├── Loss: 5.6775 2025-08-30 16:00:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:00:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:00:24 - pico-train - INFO - Step 89675 -- 🔄 Training Metrics 2025-08-30 16:00:24 - pico-train - INFO - ├── Loss: 5.7791 2025-08-30 16:00:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:00:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:00:36 - pico-train - INFO - Step 89700 -- 🔄 Training Metrics 2025-08-30 16:00:36 - pico-train - INFO - ├── Loss: 5.7210 2025-08-30 16:00:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:00:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:00:49 - pico-train - INFO - Step 89725 -- 🔄 Training Metrics 2025-08-30 16:00:49 - pico-train - INFO - ├── Loss: 5.7160 2025-08-30 16:00:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:00:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:01:02 - pico-train - INFO - Step 89750 -- 🔄 Training Metrics 2025-08-30 16:01:02 - pico-train - INFO - ├── Loss: 5.7778 2025-08-30 16:01:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:01:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:01:15 - pico-train - INFO - Step 89775 -- 🔄 Training Metrics 2025-08-30 16:01:15 - pico-train - INFO - ├── Loss: 5.7797 2025-08-30 16:01:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:01:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:01:27 - pico-train - INFO - Step 89800 -- 🔄 Training Metrics 2025-08-30 16:01:27 - pico-train - INFO - ├── Loss: 5.8280 2025-08-30 16:01:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:01:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:01:40 - pico-train - INFO - Step 89825 -- 🔄 Training Metrics 2025-08-30 16:01:40 - pico-train - INFO - ├── Loss: 5.6832 2025-08-30 16:01:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:01:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:01:53 - pico-train - INFO - Step 89850 -- 🔄 Training Metrics 2025-08-30 16:01:53 - pico-train - INFO - ├── Loss: 5.8080 2025-08-30 16:01:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:01:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:02:05 - pico-train - INFO - Step 89875 -- 🔄 Training Metrics 2025-08-30 16:02:05 - pico-train - INFO - ├── Loss: 5.7207 2025-08-30 16:02:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:02:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:02:18 - pico-train - INFO - Step 89900 -- 🔄 Training Metrics 2025-08-30 16:02:18 - pico-train - INFO - ├── Loss: 5.7668 2025-08-30 16:02:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:02:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:02:31 - pico-train - INFO - Step 89925 -- 🔄 Training Metrics 2025-08-30 16:02:31 - pico-train - INFO - ├── Loss: 5.8558 2025-08-30 16:02:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:02:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:02:43 - pico-train - INFO - Step 89950 -- 🔄 Training Metrics 2025-08-30 16:02:43 - pico-train - INFO - ├── Loss: 5.7308 2025-08-30 16:02:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:02:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:02:56 - pico-train - INFO - Step 89975 -- 🔄 Training Metrics 2025-08-30 16:02:56 - pico-train - INFO - ├── Loss: 5.7213 2025-08-30 16:02:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:02:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:03:08 - pico-train - INFO - Step 90000 -- 💾 Saving Checkpoint 2025-08-30 16:05:03 - pico-train - INFO - Step 90000 -- 📊 Evaluation Results 2025-08-30 16:05:03 - pico-train - INFO - └── paloma: 3.466294078952058e+32 2025-08-30 16:05:05 - pico-train - INFO - Step 90000 -- 🔄 Training Metrics 2025-08-30 16:05:05 - pico-train - INFO - ├── Loss: 5.8416 2025-08-30 16:05:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:05:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:05:05 - pico-train - INFO - Step 90000 -- 📈 Saving Learning Dynamics 2025-08-30 16:05:20 - pico-train - INFO - Step 90025 -- 🔄 Training Metrics 2025-08-30 16:05:20 - pico-train - INFO - ├── Loss: 5.7941 2025-08-30 16:05:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:05:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:05:32 - pico-train - INFO - Step 90050 -- 🔄 Training Metrics 2025-08-30 16:05:32 - pico-train - INFO - ├── Loss: 5.7694 2025-08-30 16:05:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:05:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:05:45 - pico-train - INFO - Step 90075 -- 🔄 Training Metrics 2025-08-30 16:05:45 - pico-train - INFO - ├── Loss: 5.7361 2025-08-30 16:05:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:05:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:05:57 - pico-train - INFO - Step 90100 -- 🔄 Training Metrics 2025-08-30 16:05:57 - pico-train - INFO - ├── Loss: 5.7911 2025-08-30 16:05:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:05:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:06:10 - pico-train - INFO - Step 90125 -- 🔄 Training Metrics 2025-08-30 16:06:10 - pico-train - INFO - ├── Loss: 5.6909 2025-08-30 16:06:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:06:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:06:23 - pico-train - INFO - Step 90150 -- 🔄 Training Metrics 2025-08-30 16:06:23 - pico-train - INFO - ├── Loss: 5.6364 2025-08-30 16:06:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:06:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:06:35 - pico-train - INFO - Step 90175 -- 🔄 Training Metrics 2025-08-30 16:06:35 - pico-train - INFO - ├── Loss: 5.7358 2025-08-30 16:06:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:06:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:06:48 - pico-train - INFO - Step 90200 -- 🔄 Training Metrics 2025-08-30 16:06:48 - pico-train - INFO - ├── Loss: 5.7627 2025-08-30 16:06:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:06:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:07:00 - pico-train - INFO - Step 90225 -- 🔄 Training Metrics 2025-08-30 16:07:00 - pico-train - INFO - ├── Loss: 5.7083 2025-08-30 16:07:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:07:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:07:13 - pico-train - INFO - Step 90250 -- 🔄 Training Metrics 2025-08-30 16:07:13 - pico-train - INFO - ├── Loss: 5.7539 2025-08-30 16:07:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:07:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:07:26 - pico-train - INFO - Step 90275 -- 🔄 Training Metrics 2025-08-30 16:07:26 - pico-train - INFO - ├── Loss: 5.7345 2025-08-30 16:07:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:07:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:07:38 - pico-train - INFO - Step 90300 -- 🔄 Training Metrics 2025-08-30 16:07:38 - pico-train - INFO - ├── Loss: 5.7286 2025-08-30 16:07:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:07:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:07:51 - pico-train - INFO - Step 90325 -- 🔄 Training Metrics 2025-08-30 16:07:51 - pico-train - INFO - ├── Loss: 5.7033 2025-08-30 16:07:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:07:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:08:04 - pico-train - INFO - Step 90350 -- 🔄 Training Metrics 2025-08-30 16:08:04 - pico-train - INFO - ├── Loss: 5.7129 2025-08-30 16:08:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:08:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:08:17 - pico-train - INFO - Step 90375 -- 🔄 Training Metrics 2025-08-30 16:08:17 - pico-train - INFO - ├── Loss: 5.7449 2025-08-30 16:08:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:08:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:08:29 - pico-train - INFO - Step 90400 -- 🔄 Training Metrics 2025-08-30 16:08:29 - pico-train - INFO - ├── Loss: 5.6943 2025-08-30 16:08:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:08:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:08:42 - pico-train - INFO - Step 90425 -- 🔄 Training Metrics 2025-08-30 16:08:42 - pico-train - INFO - ├── Loss: 5.7410 2025-08-30 16:08:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:08:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:08:55 - pico-train - INFO - Step 90450 -- 🔄 Training Metrics 2025-08-30 16:08:55 - pico-train - INFO - ├── Loss: 5.8269 2025-08-30 16:08:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:08:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:09:07 - pico-train - INFO - Step 90475 -- 🔄 Training Metrics 2025-08-30 16:09:07 - pico-train - INFO - ├── Loss: 5.7443 2025-08-30 16:09:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:09:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:09:19 - pico-train - INFO - Step 90500 -- 💾 Saving Checkpoint 2025-08-30 16:11:19 - pico-train - INFO - Step 90500 -- 📊 Evaluation Results 2025-08-30 16:11:19 - pico-train - INFO - └── paloma: 3.755080460256105e+32 2025-08-30 16:11:22 - pico-train - INFO - Step 90500 -- 🔄 Training Metrics 2025-08-30 16:11:22 - pico-train - INFO - ├── Loss: 5.7626 2025-08-30 16:11:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:11:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:11:22 - pico-train - INFO - Step 90500 -- 📈 Saving Learning Dynamics 2025-08-30 16:11:37 - pico-train - INFO - Step 90525 -- 🔄 Training Metrics 2025-08-30 16:11:37 - pico-train - INFO - ├── Loss: 5.6503 2025-08-30 16:11:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:11:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:11:49 - pico-train - INFO - Step 90550 -- 🔄 Training Metrics 2025-08-30 16:11:49 - pico-train - INFO - ├── Loss: 5.7273 2025-08-30 16:11:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:11:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:12:02 - pico-train - INFO - Step 90575 -- 🔄 Training Metrics 2025-08-30 16:12:02 - pico-train - INFO - ├── Loss: 5.7007 2025-08-30 16:12:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:12:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:12:15 - pico-train - INFO - Step 90600 -- 🔄 Training Metrics 2025-08-30 16:12:15 - pico-train - INFO - ├── Loss: 5.7027 2025-08-30 16:12:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:12:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:12:27 - pico-train - INFO - Step 90625 -- 🔄 Training Metrics 2025-08-30 16:12:27 - pico-train - INFO - ├── Loss: 5.7608 2025-08-30 16:12:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:12:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:12:40 - pico-train - INFO - Step 90650 -- 🔄 Training Metrics 2025-08-30 16:12:40 - pico-train - INFO - ├── Loss: 5.7791 2025-08-30 16:12:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:12:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:12:52 - pico-train - INFO - Step 90675 -- 🔄 Training Metrics 2025-08-30 16:12:52 - pico-train - INFO - ├── Loss: 5.8336 2025-08-30 16:12:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:12:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:13:05 - pico-train - INFO - Step 90700 -- 🔄 Training Metrics 2025-08-30 16:13:05 - pico-train - INFO - ├── Loss: 5.7434 2025-08-30 16:13:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:13:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:13:18 - pico-train - INFO - Step 90725 -- 🔄 Training Metrics 2025-08-30 16:13:18 - pico-train - INFO - ├── Loss: 5.7888 2025-08-30 16:13:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:13:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:13:30 - pico-train - INFO - Step 90750 -- 🔄 Training Metrics 2025-08-30 16:13:30 - pico-train - INFO - ├── Loss: 5.6800 2025-08-30 16:13:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:13:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:13:43 - pico-train - INFO - Step 90775 -- 🔄 Training Metrics 2025-08-30 16:13:43 - pico-train - INFO - ├── Loss: 5.7706 2025-08-30 16:13:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:13:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:13:55 - pico-train - INFO - Step 90800 -- 🔄 Training Metrics 2025-08-30 16:13:55 - pico-train - INFO - ├── Loss: 5.6717 2025-08-30 16:13:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:13:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:14:08 - pico-train - INFO - Step 90825 -- 🔄 Training Metrics 2025-08-30 16:14:08 - pico-train - INFO - ├── Loss: 5.8099 2025-08-30 16:14:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:14:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:14:21 - pico-train - INFO - Step 90850 -- 🔄 Training Metrics 2025-08-30 16:14:21 - pico-train - INFO - ├── Loss: 5.7191 2025-08-30 16:14:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:14:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:14:34 - pico-train - INFO - Step 90875 -- 🔄 Training Metrics 2025-08-30 16:14:34 - pico-train - INFO - ├── Loss: 5.7016 2025-08-30 16:14:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:14:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:14:46 - pico-train - INFO - Step 90900 -- 🔄 Training Metrics 2025-08-30 16:14:46 - pico-train - INFO - ├── Loss: 5.7293 2025-08-30 16:14:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:14:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:14:59 - pico-train - INFO - Step 90925 -- 🔄 Training Metrics 2025-08-30 16:14:59 - pico-train - INFO - ├── Loss: 5.6757 2025-08-30 16:14:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:14:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:15:11 - pico-train - INFO - Step 90950 -- 🔄 Training Metrics 2025-08-30 16:15:11 - pico-train - INFO - ├── Loss: 5.6731 2025-08-30 16:15:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:15:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:15:24 - pico-train - INFO - Step 90975 -- 🔄 Training Metrics 2025-08-30 16:15:24 - pico-train - INFO - ├── Loss: 5.8485 2025-08-30 16:15:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:15:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:15:36 - pico-train - INFO - Step 91000 -- 💾 Saving Checkpoint 2025-08-30 16:17:35 - pico-train - INFO - Step 91000 -- 📊 Evaluation Results 2025-08-30 16:17:35 - pico-train - INFO - └── paloma: 3.925399791957211e+32 2025-08-30 16:17:37 - pico-train - INFO - Step 91000 -- 🔄 Training Metrics 2025-08-30 16:17:37 - pico-train - INFO - ├── Loss: 5.8338 2025-08-30 16:17:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:17:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:17:37 - pico-train - INFO - Step 91000 -- 📈 Saving Learning Dynamics 2025-08-30 16:17:52 - pico-train - INFO - Step 91025 -- 🔄 Training Metrics 2025-08-30 16:17:52 - pico-train - INFO - ├── Loss: 5.7817 2025-08-30 16:17:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:17:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:18:05 - pico-train - INFO - Step 91050 -- 🔄 Training Metrics 2025-08-30 16:18:05 - pico-train - INFO - ├── Loss: 5.7219 2025-08-30 16:18:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:18:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:18:18 - pico-train - INFO - Step 91075 -- 🔄 Training Metrics 2025-08-30 16:18:18 - pico-train - INFO - ├── Loss: 5.6680 2025-08-30 16:18:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:18:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:18:30 - pico-train - INFO - Step 91100 -- 🔄 Training Metrics 2025-08-30 16:18:30 - pico-train - INFO - ├── Loss: 5.6805 2025-08-30 16:18:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:18:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:18:43 - pico-train - INFO - Step 91125 -- 🔄 Training Metrics 2025-08-30 16:18:43 - pico-train - INFO - ├── Loss: 5.7380 2025-08-30 16:18:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:18:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:18:55 - pico-train - INFO - Step 91150 -- 🔄 Training Metrics 2025-08-30 16:18:55 - pico-train - INFO - ├── Loss: 5.6620 2025-08-30 16:18:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:18:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:19:08 - pico-train - INFO - Step 91175 -- 🔄 Training Metrics 2025-08-30 16:19:08 - pico-train - INFO - ├── Loss: 5.7634 2025-08-30 16:19:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:19:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:19:21 - pico-train - INFO - Step 91200 -- 🔄 Training Metrics 2025-08-30 16:19:21 - pico-train - INFO - ├── Loss: 5.8396 2025-08-30 16:19:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:19:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:19:33 - pico-train - INFO - Step 91225 -- 🔄 Training Metrics 2025-08-30 16:19:33 - pico-train - INFO - ├── Loss: 5.7409 2025-08-30 16:19:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:19:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:19:46 - pico-train - INFO - Step 91250 -- 🔄 Training Metrics 2025-08-30 16:19:46 - pico-train - INFO - ├── Loss: 5.7281 2025-08-30 16:19:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:19:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:19:58 - pico-train - INFO - Step 91275 -- 🔄 Training Metrics 2025-08-30 16:19:58 - pico-train - INFO - ├── Loss: 5.7615 2025-08-30 16:19:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:19:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:20:11 - pico-train - INFO - Step 91300 -- 🔄 Training Metrics 2025-08-30 16:20:11 - pico-train - INFO - ├── Loss: 5.7345 2025-08-30 16:20:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:20:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:20:24 - pico-train - INFO - Step 91325 -- 🔄 Training Metrics 2025-08-30 16:20:24 - pico-train - INFO - ├── Loss: 5.7406 2025-08-30 16:20:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:20:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:20:36 - pico-train - INFO - Step 91350 -- 🔄 Training Metrics 2025-08-30 16:20:36 - pico-train - INFO - ├── Loss: 5.7448 2025-08-30 16:20:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:20:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:20:50 - pico-train - INFO - Step 91375 -- 🔄 Training Metrics 2025-08-30 16:20:50 - pico-train - INFO - ├── Loss: 5.7582 2025-08-30 16:20:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:20:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:21:02 - pico-train - INFO - Step 91400 -- 🔄 Training Metrics 2025-08-30 16:21:02 - pico-train - INFO - ├── Loss: 5.6881 2025-08-30 16:21:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:21:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:21:15 - pico-train - INFO - Step 91425 -- 🔄 Training Metrics 2025-08-30 16:21:15 - pico-train - INFO - ├── Loss: 5.7450 2025-08-30 16:21:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:21:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:21:27 - pico-train - INFO - Step 91450 -- 🔄 Training Metrics 2025-08-30 16:21:27 - pico-train - INFO - ├── Loss: 5.7043 2025-08-30 16:21:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:21:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:21:40 - pico-train - INFO - Step 91475 -- 🔄 Training Metrics 2025-08-30 16:21:40 - pico-train - INFO - ├── Loss: 5.7281 2025-08-30 16:21:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:21:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:21:52 - pico-train - INFO - Step 91500 -- 💾 Saving Checkpoint 2025-08-30 16:23:55 - pico-train - INFO - Step 91500 -- 📊 Evaluation Results 2025-08-30 16:23:55 - pico-train - INFO - └── paloma: 3.7840765863718955e+32 2025-08-30 16:23:58 - pico-train - INFO - Step 91500 -- 🔄 Training Metrics 2025-08-30 16:23:58 - pico-train - INFO - ├── Loss: 5.7353 2025-08-30 16:23:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:23:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:23:58 - pico-train - INFO - Step 91500 -- 📈 Saving Learning Dynamics 2025-08-30 16:24:13 - pico-train - INFO - Step 91525 -- 🔄 Training Metrics 2025-08-30 16:24:13 - pico-train - INFO - ├── Loss: 5.7002 2025-08-30 16:24:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:24:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:24:25 - pico-train - INFO - Step 91550 -- 🔄 Training Metrics 2025-08-30 16:24:25 - pico-train - INFO - ├── Loss: 5.7167 2025-08-30 16:24:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:24:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:24:38 - pico-train - INFO - Step 91575 -- 🔄 Training Metrics 2025-08-30 16:24:38 - pico-train - INFO - ├── Loss: 5.6997 2025-08-30 16:24:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:24:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:24:50 - pico-train - INFO - Step 91600 -- 🔄 Training Metrics 2025-08-30 16:24:50 - pico-train - INFO - ├── Loss: 5.7332 2025-08-30 16:24:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:24:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:25:03 - pico-train - INFO - Step 91625 -- 🔄 Training Metrics 2025-08-30 16:25:03 - pico-train - INFO - ├── Loss: 5.8377 2025-08-30 16:25:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:25:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:25:16 - pico-train - INFO - Step 91650 -- 🔄 Training Metrics 2025-08-30 16:25:16 - pico-train - INFO - ├── Loss: 5.7236 2025-08-30 16:25:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:25:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:25:28 - pico-train - INFO - Step 91675 -- 🔄 Training Metrics 2025-08-30 16:25:28 - pico-train - INFO - ├── Loss: 5.7628 2025-08-30 16:25:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:25:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:25:41 - pico-train - INFO - Step 91700 -- 🔄 Training Metrics 2025-08-30 16:25:41 - pico-train - INFO - ├── Loss: 5.7237 2025-08-30 16:25:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:25:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:25:54 - pico-train - INFO - Step 91725 -- 🔄 Training Metrics 2025-08-30 16:25:54 - pico-train - INFO - ├── Loss: 5.7507 2025-08-30 16:25:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:25:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:26:07 - pico-train - INFO - Step 91750 -- 🔄 Training Metrics 2025-08-30 16:26:07 - pico-train - INFO - ├── Loss: 5.7198 2025-08-30 16:26:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:26:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:26:19 - pico-train - INFO - Step 91775 -- 🔄 Training Metrics 2025-08-30 16:26:19 - pico-train - INFO - ├── Loss: 5.7375 2025-08-30 16:26:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:26:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:26:32 - pico-train - INFO - Step 91800 -- 🔄 Training Metrics 2025-08-30 16:26:32 - pico-train - INFO - ├── Loss: 5.7619 2025-08-30 16:26:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:26:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:26:44 - pico-train - INFO - Step 91825 -- 🔄 Training Metrics 2025-08-30 16:26:44 - pico-train - INFO - ├── Loss: 5.7278 2025-08-30 16:26:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:26:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:26:57 - pico-train - INFO - Step 91850 -- 🔄 Training Metrics 2025-08-30 16:26:57 - pico-train - INFO - ├── Loss: 5.7585 2025-08-30 16:26:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:26:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:27:10 - pico-train - INFO - Step 91875 -- 🔄 Training Metrics 2025-08-30 16:27:10 - pico-train - INFO - ├── Loss: 5.7255 2025-08-30 16:27:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:27:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:27:23 - pico-train - INFO - Step 91900 -- 🔄 Training Metrics 2025-08-30 16:27:23 - pico-train - INFO - ├── Loss: 5.7973 2025-08-30 16:27:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:27:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:27:35 - pico-train - INFO - Step 91925 -- 🔄 Training Metrics 2025-08-30 16:27:35 - pico-train - INFO - ├── Loss: 5.6683 2025-08-30 16:27:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:27:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:27:48 - pico-train - INFO - Step 91950 -- 🔄 Training Metrics 2025-08-30 16:27:48 - pico-train - INFO - ├── Loss: 5.7674 2025-08-30 16:27:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:27:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:28:01 - pico-train - INFO - Step 91975 -- 🔄 Training Metrics 2025-08-30 16:28:01 - pico-train - INFO - ├── Loss: 5.7353 2025-08-30 16:28:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:28:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:28:13 - pico-train - INFO - Step 92000 -- 💾 Saving Checkpoint 2025-08-30 16:30:23 - pico-train - INFO - Step 92000 -- 📊 Evaluation Results 2025-08-30 16:30:23 - pico-train - INFO - └── paloma: 3.702459594218492e+32 2025-08-30 16:30:26 - pico-train - INFO - Step 92000 -- 🔄 Training Metrics 2025-08-30 16:30:26 - pico-train - INFO - ├── Loss: 5.6962 2025-08-30 16:30:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:30:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:30:26 - pico-train - INFO - Step 92000 -- 📈 Saving Learning Dynamics 2025-08-30 16:30:41 - pico-train - INFO - Step 92025 -- 🔄 Training Metrics 2025-08-30 16:30:41 - pico-train - INFO - ├── Loss: 5.6971 2025-08-30 16:30:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:30:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:30:54 - pico-train - INFO - Step 92050 -- 🔄 Training Metrics 2025-08-30 16:30:54 - pico-train - INFO - ├── Loss: 5.7129 2025-08-30 16:30:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:30:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:31:06 - pico-train - INFO - Step 92075 -- 🔄 Training Metrics 2025-08-30 16:31:06 - pico-train - INFO - ├── Loss: 5.7390 2025-08-30 16:31:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:31:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:31:19 - pico-train - INFO - Step 92100 -- 🔄 Training Metrics 2025-08-30 16:31:19 - pico-train - INFO - ├── Loss: 5.8188 2025-08-30 16:31:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:31:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:31:32 - pico-train - INFO - Step 92125 -- 🔄 Training Metrics 2025-08-30 16:31:32 - pico-train - INFO - ├── Loss: 5.6597 2025-08-30 16:31:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:31:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:31:45 - pico-train - INFO - Step 92150 -- 🔄 Training Metrics 2025-08-30 16:31:45 - pico-train - INFO - ├── Loss: 5.7480 2025-08-30 16:31:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:31:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:31:58 - pico-train - INFO - Step 92175 -- 🔄 Training Metrics 2025-08-30 16:31:58 - pico-train - INFO - ├── Loss: 5.6001 2025-08-30 16:31:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:31:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:32:10 - pico-train - INFO - Step 92200 -- 🔄 Training Metrics 2025-08-30 16:32:10 - pico-train - INFO - ├── Loss: 5.7151 2025-08-30 16:32:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:32:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:32:23 - pico-train - INFO - Step 92225 -- 🔄 Training Metrics 2025-08-30 16:32:23 - pico-train - INFO - ├── Loss: 5.7429 2025-08-30 16:32:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:32:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:32:36 - pico-train - INFO - Step 92250 -- 🔄 Training Metrics 2025-08-30 16:32:36 - pico-train - INFO - ├── Loss: 5.7581 2025-08-30 16:32:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:32:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:32:48 - pico-train - INFO - Step 92275 -- 🔄 Training Metrics 2025-08-30 16:32:48 - pico-train - INFO - ├── Loss: 5.7439 2025-08-30 16:32:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:32:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:33:01 - pico-train - INFO - Step 92300 -- 🔄 Training Metrics 2025-08-30 16:33:01 - pico-train - INFO - ├── Loss: 5.6744 2025-08-30 16:33:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:33:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:33:14 - pico-train - INFO - Step 92325 -- 🔄 Training Metrics 2025-08-30 16:33:14 - pico-train - INFO - ├── Loss: 5.6935 2025-08-30 16:33:14 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:33:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:33:27 - pico-train - INFO - Step 92350 -- 🔄 Training Metrics 2025-08-30 16:33:27 - pico-train - INFO - ├── Loss: 5.7779 2025-08-30 16:33:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:33:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:33:40 - pico-train - INFO - Step 92375 -- 🔄 Training Metrics 2025-08-30 16:33:40 - pico-train - INFO - ├── Loss: 5.8165 2025-08-30 16:33:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:33:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:33:52 - pico-train - INFO - Step 92400 -- 🔄 Training Metrics 2025-08-30 16:33:52 - pico-train - INFO - ├── Loss: 5.8294 2025-08-30 16:33:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:33:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:34:05 - pico-train - INFO - Step 92425 -- 🔄 Training Metrics 2025-08-30 16:34:05 - pico-train - INFO - ├── Loss: 5.8325 2025-08-30 16:34:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:34:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:34:18 - pico-train - INFO - Step 92450 -- 🔄 Training Metrics 2025-08-30 16:34:18 - pico-train - INFO - ├── Loss: 5.7155 2025-08-30 16:34:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:34:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:34:30 - pico-train - INFO - Step 92475 -- 🔄 Training Metrics 2025-08-30 16:34:30 - pico-train - INFO - ├── Loss: 5.7248 2025-08-30 16:34:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:34:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:34:42 - pico-train - INFO - Step 92500 -- 💾 Saving Checkpoint 2025-08-30 16:36:42 - pico-train - INFO - Step 92500 -- 📊 Evaluation Results 2025-08-30 16:36:42 - pico-train - INFO - └── paloma: 3.8474250562364964e+32 2025-08-30 16:36:47 - pico-train - INFO - Step 92500 -- 🔄 Training Metrics 2025-08-30 16:36:47 - pico-train - INFO - ├── Loss: 5.7181 2025-08-30 16:36:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:36:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:36:47 - pico-train - INFO - Step 92500 -- 📈 Saving Learning Dynamics 2025-08-30 16:37:02 - pico-train - INFO - Step 92525 -- 🔄 Training Metrics 2025-08-30 16:37:02 - pico-train - INFO - ├── Loss: 5.7445 2025-08-30 16:37:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:37:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:37:15 - pico-train - INFO - Step 92550 -- 🔄 Training Metrics 2025-08-30 16:37:15 - pico-train - INFO - ├── Loss: 5.7415 2025-08-30 16:37:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:37:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:37:28 - pico-train - INFO - Step 92575 -- 🔄 Training Metrics 2025-08-30 16:37:28 - pico-train - INFO - ├── Loss: 5.7331 2025-08-30 16:37:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:37:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:37:40 - pico-train - INFO - Step 92600 -- 🔄 Training Metrics 2025-08-30 16:37:40 - pico-train - INFO - ├── Loss: 5.7361 2025-08-30 16:37:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:37:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:37:53 - pico-train - INFO - Step 92625 -- 🔄 Training Metrics 2025-08-30 16:37:53 - pico-train - INFO - ├── Loss: 5.7742 2025-08-30 16:37:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:37:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:38:06 - pico-train - INFO - Step 92650 -- 🔄 Training Metrics 2025-08-30 16:38:06 - pico-train - INFO - ├── Loss: 5.7316 2025-08-30 16:38:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:38:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:38:18 - pico-train - INFO - Step 92675 -- 🔄 Training Metrics 2025-08-30 16:38:18 - pico-train - INFO - ├── Loss: 5.7957 2025-08-30 16:38:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:38:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:38:31 - pico-train - INFO - Step 92700 -- 🔄 Training Metrics 2025-08-30 16:38:31 - pico-train - INFO - ├── Loss: 5.7575 2025-08-30 16:38:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:38:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:38:43 - pico-train - INFO - Step 92725 -- 🔄 Training Metrics 2025-08-30 16:38:43 - pico-train - INFO - ├── Loss: 5.7993 2025-08-30 16:38:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:38:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:38:56 - pico-train - INFO - Step 92750 -- 🔄 Training Metrics 2025-08-30 16:38:56 - pico-train - INFO - ├── Loss: 5.6859 2025-08-30 16:38:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:38:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:39:09 - pico-train - INFO - Step 92775 -- 🔄 Training Metrics 2025-08-30 16:39:09 - pico-train - INFO - ├── Loss: 5.7527 2025-08-30 16:39:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:39:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:39:21 - pico-train - INFO - Step 92800 -- 🔄 Training Metrics 2025-08-30 16:39:21 - pico-train - INFO - ├── Loss: 5.6678 2025-08-30 16:39:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:39:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:39:34 - pico-train - INFO - Step 92825 -- 🔄 Training Metrics 2025-08-30 16:39:34 - pico-train - INFO - ├── Loss: 5.8365 2025-08-30 16:39:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:39:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:39:46 - pico-train - INFO - Step 92850 -- 🔄 Training Metrics 2025-08-30 16:39:46 - pico-train - INFO - ├── Loss: 5.7141 2025-08-30 16:39:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:39:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:39:59 - pico-train - INFO - Step 92875 -- 🔄 Training Metrics 2025-08-30 16:39:59 - pico-train - INFO - ├── Loss: 5.7633 2025-08-30 16:39:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:39:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:40:12 - pico-train - INFO - Step 92900 -- 🔄 Training Metrics 2025-08-30 16:40:12 - pico-train - INFO - ├── Loss: 5.7410 2025-08-30 16:40:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:40:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:40:24 - pico-train - INFO - Step 92925 -- 🔄 Training Metrics 2025-08-30 16:40:24 - pico-train - INFO - ├── Loss: 5.7554 2025-08-30 16:40:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:40:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:40:37 - pico-train - INFO - Step 92950 -- 🔄 Training Metrics 2025-08-30 16:40:37 - pico-train - INFO - ├── Loss: 5.7599 2025-08-30 16:40:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:40:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:40:49 - pico-train - INFO - Step 92975 -- 🔄 Training Metrics 2025-08-30 16:40:49 - pico-train - INFO - ├── Loss: 5.7130 2025-08-30 16:40:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:40:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:41:02 - pico-train - INFO - Step 93000 -- 💾 Saving Checkpoint 2025-08-30 16:43:08 - pico-train - INFO - Step 93000 -- 📊 Evaluation Results 2025-08-30 16:43:08 - pico-train - INFO - └── paloma: 4.0048357156400644e+32 2025-08-30 16:43:10 - pico-train - INFO - Step 93000 -- 🔄 Training Metrics 2025-08-30 16:43:10 - pico-train - INFO - ├── Loss: 5.7274 2025-08-30 16:43:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:43:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:43:10 - pico-train - INFO - Step 93000 -- 📈 Saving Learning Dynamics 2025-08-30 16:43:24 - pico-train - INFO - Step 93025 -- 🔄 Training Metrics 2025-08-30 16:43:24 - pico-train - INFO - ├── Loss: 5.7410 2025-08-30 16:43:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:43:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:43:37 - pico-train - INFO - Step 93050 -- 🔄 Training Metrics 2025-08-30 16:43:37 - pico-train - INFO - ├── Loss: 5.7398 2025-08-30 16:43:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:43:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:43:50 - pico-train - INFO - Step 93075 -- 🔄 Training Metrics 2025-08-30 16:43:50 - pico-train - INFO - ├── Loss: 5.7029 2025-08-30 16:43:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:43:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:44:02 - pico-train - INFO - Step 93100 -- 🔄 Training Metrics 2025-08-30 16:44:02 - pico-train - INFO - ├── Loss: 5.7073 2025-08-30 16:44:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:44:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:44:15 - pico-train - INFO - Step 93125 -- 🔄 Training Metrics 2025-08-30 16:44:15 - pico-train - INFO - ├── Loss: 5.7430 2025-08-30 16:44:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:44:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:44:27 - pico-train - INFO - Step 93150 -- 🔄 Training Metrics 2025-08-30 16:44:27 - pico-train - INFO - ├── Loss: 5.6730 2025-08-30 16:44:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:44:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:44:40 - pico-train - INFO - Step 93175 -- 🔄 Training Metrics 2025-08-30 16:44:40 - pico-train - INFO - ├── Loss: 5.7652 2025-08-30 16:44:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:44:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:44:53 - pico-train - INFO - Step 93200 -- 🔄 Training Metrics 2025-08-30 16:44:53 - pico-train - INFO - ├── Loss: 5.7162 2025-08-30 16:44:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:44:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:45:05 - pico-train - INFO - Step 93225 -- 🔄 Training Metrics 2025-08-30 16:45:05 - pico-train - INFO - ├── Loss: 5.7719 2025-08-30 16:45:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:45:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:45:18 - pico-train - INFO - Step 93250 -- 🔄 Training Metrics 2025-08-30 16:45:18 - pico-train - INFO - ├── Loss: 5.7282 2025-08-30 16:45:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:45:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:45:31 - pico-train - INFO - Step 93275 -- 🔄 Training Metrics 2025-08-30 16:45:31 - pico-train - INFO - ├── Loss: 5.7916 2025-08-30 16:45:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:45:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:45:43 - pico-train - INFO - Step 93300 -- 🔄 Training Metrics 2025-08-30 16:45:43 - pico-train - INFO - ├── Loss: 5.7256 2025-08-30 16:45:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:45:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:45:56 - pico-train - INFO - Step 93325 -- 🔄 Training Metrics 2025-08-30 16:45:56 - pico-train - INFO - ├── Loss: 5.6998 2025-08-30 16:45:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:45:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:46:08 - pico-train - INFO - Step 93350 -- 🔄 Training Metrics 2025-08-30 16:46:08 - pico-train - INFO - ├── Loss: 5.7917 2025-08-30 16:46:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:46:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:46:22 - pico-train - INFO - Step 93375 -- 🔄 Training Metrics 2025-08-30 16:46:22 - pico-train - INFO - ├── Loss: 5.7009 2025-08-30 16:46:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:46:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:46:34 - pico-train - INFO - Step 93400 -- 🔄 Training Metrics 2025-08-30 16:46:34 - pico-train - INFO - ├── Loss: 5.6527 2025-08-30 16:46:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:46:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:46:47 - pico-train - INFO - Step 93425 -- 🔄 Training Metrics 2025-08-30 16:46:47 - pico-train - INFO - ├── Loss: 5.7525 2025-08-30 16:46:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:46:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:46:59 - pico-train - INFO - Step 93450 -- 🔄 Training Metrics 2025-08-30 16:46:59 - pico-train - INFO - ├── Loss: 5.7059 2025-08-30 16:46:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:46:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:47:12 - pico-train - INFO - Step 93475 -- 🔄 Training Metrics 2025-08-30 16:47:12 - pico-train - INFO - ├── Loss: 5.6816 2025-08-30 16:47:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:47:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:47:24 - pico-train - INFO - Step 93500 -- 💾 Saving Checkpoint 2025-08-30 16:49:17 - pico-train - INFO - Step 93500 -- 📊 Evaluation Results 2025-08-30 16:49:17 - pico-train - INFO - └── paloma: 4.1961751100827166e+32 2025-08-30 16:49:21 - pico-train - INFO - Step 93500 -- 🔄 Training Metrics 2025-08-30 16:49:21 - pico-train - INFO - ├── Loss: 5.7192 2025-08-30 16:49:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:49:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:49:21 - pico-train - INFO - Step 93500 -- 📈 Saving Learning Dynamics 2025-08-30 16:49:35 - pico-train - INFO - Step 93525 -- 🔄 Training Metrics 2025-08-30 16:49:35 - pico-train - INFO - ├── Loss: 5.6872 2025-08-30 16:49:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:49:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:49:48 - pico-train - INFO - Step 93550 -- 🔄 Training Metrics 2025-08-30 16:49:48 - pico-train - INFO - ├── Loss: 5.6761 2025-08-30 16:49:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:49:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:50:00 - pico-train - INFO - Step 93575 -- 🔄 Training Metrics 2025-08-30 16:50:00 - pico-train - INFO - ├── Loss: 5.8000 2025-08-30 16:50:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:50:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:50:13 - pico-train - INFO - Step 93600 -- 🔄 Training Metrics 2025-08-30 16:50:13 - pico-train - INFO - ├── Loss: 5.7608 2025-08-30 16:50:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:50:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:50:25 - pico-train - INFO - Step 93625 -- 🔄 Training Metrics 2025-08-30 16:50:25 - pico-train - INFO - ├── Loss: 5.6573 2025-08-30 16:50:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:50:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:50:38 - pico-train - INFO - Step 93650 -- 🔄 Training Metrics 2025-08-30 16:50:38 - pico-train - INFO - ├── Loss: 5.7380 2025-08-30 16:50:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:50:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:50:51 - pico-train - INFO - Step 93675 -- 🔄 Training Metrics 2025-08-30 16:50:51 - pico-train - INFO - ├── Loss: 5.7669 2025-08-30 16:50:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:50:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:51:03 - pico-train - INFO - Step 93700 -- 🔄 Training Metrics 2025-08-30 16:51:03 - pico-train - INFO - ├── Loss: 5.7749 2025-08-30 16:51:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:51:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:51:16 - pico-train - INFO - Step 93725 -- 🔄 Training Metrics 2025-08-30 16:51:16 - pico-train - INFO - ├── Loss: 5.7154 2025-08-30 16:51:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:51:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:51:28 - pico-train - INFO - Step 93750 -- 🔄 Training Metrics 2025-08-30 16:51:28 - pico-train - INFO - ├── Loss: 5.7243 2025-08-30 16:51:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:51:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:51:41 - pico-train - INFO - Step 93775 -- 🔄 Training Metrics 2025-08-30 16:51:41 - pico-train - INFO - ├── Loss: 5.6949 2025-08-30 16:51:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:51:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:51:54 - pico-train - INFO - Step 93800 -- 🔄 Training Metrics 2025-08-30 16:51:54 - pico-train - INFO - ├── Loss: 5.7039 2025-08-30 16:51:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:51:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:52:06 - pico-train - INFO - Step 93825 -- 🔄 Training Metrics 2025-08-30 16:52:06 - pico-train - INFO - ├── Loss: 5.7926 2025-08-30 16:52:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:52:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:52:19 - pico-train - INFO - Step 93850 -- 🔄 Training Metrics 2025-08-30 16:52:19 - pico-train - INFO - ├── Loss: 5.7340 2025-08-30 16:52:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:52:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:52:32 - pico-train - INFO - Step 93875 -- 🔄 Training Metrics 2025-08-30 16:52:32 - pico-train - INFO - ├── Loss: 5.6842 2025-08-30 16:52:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:52:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:52:45 - pico-train - INFO - Step 93900 -- 🔄 Training Metrics 2025-08-30 16:52:45 - pico-train - INFO - ├── Loss: 5.7671 2025-08-30 16:52:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:52:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:52:57 - pico-train - INFO - Step 93925 -- 🔄 Training Metrics 2025-08-30 16:52:57 - pico-train - INFO - ├── Loss: 5.7668 2025-08-30 16:52:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:52:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:53:10 - pico-train - INFO - Step 93950 -- 🔄 Training Metrics 2025-08-30 16:53:10 - pico-train - INFO - ├── Loss: 5.7131 2025-08-30 16:53:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:53:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:53:22 - pico-train - INFO - Step 93975 -- 🔄 Training Metrics 2025-08-30 16:53:22 - pico-train - INFO - ├── Loss: 5.6793 2025-08-30 16:53:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:53:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:53:34 - pico-train - INFO - Step 94000 -- 💾 Saving Checkpoint 2025-08-30 16:55:28 - pico-train - INFO - Step 94000 -- 📊 Evaluation Results 2025-08-30 16:55:28 - pico-train - INFO - └── paloma: 4.6632619645598235e+32 2025-08-30 16:55:31 - pico-train - INFO - Step 94000 -- 🔄 Training Metrics 2025-08-30 16:55:31 - pico-train - INFO - ├── Loss: 5.6699 2025-08-30 16:55:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:55:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:55:31 - pico-train - INFO - Step 94000 -- 📈 Saving Learning Dynamics 2025-08-30 16:55:46 - pico-train - INFO - Step 94025 -- 🔄 Training Metrics 2025-08-30 16:55:46 - pico-train - INFO - ├── Loss: 5.6327 2025-08-30 16:55:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:55:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:55:59 - pico-train - INFO - Step 94050 -- 🔄 Training Metrics 2025-08-30 16:55:59 - pico-train - INFO - ├── Loss: 5.7652 2025-08-30 16:55:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:55:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:56:11 - pico-train - INFO - Step 94075 -- 🔄 Training Metrics 2025-08-30 16:56:11 - pico-train - INFO - ├── Loss: 5.7427 2025-08-30 16:56:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:56:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:56:24 - pico-train - INFO - Step 94100 -- 🔄 Training Metrics 2025-08-30 16:56:24 - pico-train - INFO - ├── Loss: 5.6906 2025-08-30 16:56:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:56:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:56:36 - pico-train - INFO - Step 94125 -- 🔄 Training Metrics 2025-08-30 16:56:36 - pico-train - INFO - ├── Loss: 5.6742 2025-08-30 16:56:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:56:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:56:49 - pico-train - INFO - Step 94150 -- 🔄 Training Metrics 2025-08-30 16:56:49 - pico-train - INFO - ├── Loss: 5.7668 2025-08-30 16:56:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:56:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:57:01 - pico-train - INFO - Step 94175 -- 🔄 Training Metrics 2025-08-30 16:57:01 - pico-train - INFO - ├── Loss: 5.7753 2025-08-30 16:57:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:57:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:57:14 - pico-train - INFO - Step 94200 -- 🔄 Training Metrics 2025-08-30 16:57:14 - pico-train - INFO - ├── Loss: 5.6786 2025-08-30 16:57:14 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:57:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:57:27 - pico-train - INFO - Step 94225 -- 🔄 Training Metrics 2025-08-30 16:57:27 - pico-train - INFO - ├── Loss: 5.7441 2025-08-30 16:57:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:57:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:57:39 - pico-train - INFO - Step 94250 -- 🔄 Training Metrics 2025-08-30 16:57:39 - pico-train - INFO - ├── Loss: 5.6884 2025-08-30 16:57:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:57:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:57:52 - pico-train - INFO - Step 94275 -- 🔄 Training Metrics 2025-08-30 16:57:52 - pico-train - INFO - ├── Loss: 5.7145 2025-08-30 16:57:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:57:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:58:04 - pico-train - INFO - Step 94300 -- 🔄 Training Metrics 2025-08-30 16:58:04 - pico-train - INFO - ├── Loss: 5.6673 2025-08-30 16:58:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:58:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:58:17 - pico-train - INFO - Step 94325 -- 🔄 Training Metrics 2025-08-30 16:58:17 - pico-train - INFO - ├── Loss: 5.7572 2025-08-30 16:58:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:58:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:58:29 - pico-train - INFO - Step 94350 -- 🔄 Training Metrics 2025-08-30 16:58:29 - pico-train - INFO - ├── Loss: 5.7174 2025-08-30 16:58:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:58:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:58:42 - pico-train - INFO - Step 94375 -- 🔄 Training Metrics 2025-08-30 16:58:42 - pico-train - INFO - ├── Loss: 5.7323 2025-08-30 16:58:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:58:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:58:55 - pico-train - INFO - Step 94400 -- 🔄 Training Metrics 2025-08-30 16:58:55 - pico-train - INFO - ├── Loss: 5.7568 2025-08-30 16:58:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:58:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:59:08 - pico-train - INFO - Step 94425 -- 🔄 Training Metrics 2025-08-30 16:59:08 - pico-train - INFO - ├── Loss: 5.6680 2025-08-30 16:59:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:59:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:59:20 - pico-train - INFO - Step 94450 -- 🔄 Training Metrics 2025-08-30 16:59:20 - pico-train - INFO - ├── Loss: 5.6854 2025-08-30 16:59:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:59:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:59:33 - pico-train - INFO - Step 94475 -- 🔄 Training Metrics 2025-08-30 16:59:33 - pico-train - INFO - ├── Loss: 5.7575 2025-08-30 16:59:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 16:59:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:59:45 - pico-train - INFO - Step 94500 -- 💾 Saving Checkpoint 2025-08-30 17:01:48 - pico-train - INFO - Step 94500 -- 📊 Evaluation Results 2025-08-30 17:01:48 - pico-train - INFO - └── paloma: 4.508661952444135e+32 2025-08-30 17:01:51 - pico-train - INFO - Step 94500 -- 🔄 Training Metrics 2025-08-30 17:01:51 - pico-train - INFO - ├── Loss: 5.7380 2025-08-30 17:01:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:01:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:01:51 - pico-train - INFO - Step 94500 -- 📈 Saving Learning Dynamics 2025-08-30 17:02:05 - pico-train - INFO - Step 94525 -- 🔄 Training Metrics 2025-08-30 17:02:05 - pico-train - INFO - ├── Loss: 5.7742 2025-08-30 17:02:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:02:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:02:18 - pico-train - INFO - Step 94550 -- 🔄 Training Metrics 2025-08-30 17:02:18 - pico-train - INFO - ├── Loss: 5.6715 2025-08-30 17:02:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:02:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:02:31 - pico-train - INFO - Step 94575 -- 🔄 Training Metrics 2025-08-30 17:02:31 - pico-train - INFO - ├── Loss: 5.7154 2025-08-30 17:02:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:02:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:02:43 - pico-train - INFO - Step 94600 -- 🔄 Training Metrics 2025-08-30 17:02:43 - pico-train - INFO - ├── Loss: 5.6689 2025-08-30 17:02:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:02:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:02:56 - pico-train - INFO - Step 94625 -- 🔄 Training Metrics 2025-08-30 17:02:56 - pico-train - INFO - ├── Loss: 5.6885 2025-08-30 17:02:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:02:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:03:08 - pico-train - INFO - Step 94650 -- 🔄 Training Metrics 2025-08-30 17:03:08 - pico-train - INFO - ├── Loss: 5.6480 2025-08-30 17:03:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:03:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:03:21 - pico-train - INFO - Step 94675 -- 🔄 Training Metrics 2025-08-30 17:03:21 - pico-train - INFO - ├── Loss: 5.7438 2025-08-30 17:03:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:03:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:03:33 - pico-train - INFO - Step 94700 -- 🔄 Training Metrics 2025-08-30 17:03:33 - pico-train - INFO - ├── Loss: 5.7230 2025-08-30 17:03:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:03:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:03:46 - pico-train - INFO - Step 94725 -- 🔄 Training Metrics 2025-08-30 17:03:46 - pico-train - INFO - ├── Loss: 5.7158 2025-08-30 17:03:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:03:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:03:59 - pico-train - INFO - Step 94750 -- 🔄 Training Metrics 2025-08-30 17:03:59 - pico-train - INFO - ├── Loss: 5.7983 2025-08-30 17:03:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:03:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:04:11 - pico-train - INFO - Step 94775 -- 🔄 Training Metrics 2025-08-30 17:04:11 - pico-train - INFO - ├── Loss: 5.6361 2025-08-30 17:04:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:04:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:04:24 - pico-train - INFO - Step 94800 -- 🔄 Training Metrics 2025-08-30 17:04:24 - pico-train - INFO - ├── Loss: 5.7918 2025-08-30 17:04:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:04:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:04:37 - pico-train - INFO - Step 94825 -- 🔄 Training Metrics 2025-08-30 17:04:37 - pico-train - INFO - ├── Loss: 5.7671 2025-08-30 17:04:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:04:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:04:49 - pico-train - INFO - Step 94850 -- 🔄 Training Metrics 2025-08-30 17:04:49 - pico-train - INFO - ├── Loss: 5.8057 2025-08-30 17:04:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:04:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:05:02 - pico-train - INFO - Step 94875 -- 🔄 Training Metrics 2025-08-30 17:05:02 - pico-train - INFO - ├── Loss: 5.7364 2025-08-30 17:05:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:05:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:05:15 - pico-train - INFO - Step 94900 -- 🔄 Training Metrics 2025-08-30 17:05:15 - pico-train - INFO - ├── Loss: 5.6978 2025-08-30 17:05:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:05:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:05:27 - pico-train - INFO - Step 94925 -- 🔄 Training Metrics 2025-08-30 17:05:27 - pico-train - INFO - ├── Loss: 5.7986 2025-08-30 17:05:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:05:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:05:40 - pico-train - INFO - Step 94950 -- 🔄 Training Metrics 2025-08-30 17:05:40 - pico-train - INFO - ├── Loss: 5.7708 2025-08-30 17:05:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:05:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:05:53 - pico-train - INFO - Step 94975 -- 🔄 Training Metrics 2025-08-30 17:05:53 - pico-train - INFO - ├── Loss: 5.7860 2025-08-30 17:05:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:05:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:06:05 - pico-train - INFO - Step 95000 -- 💾 Saving Checkpoint 2025-08-30 17:08:04 - pico-train - INFO - Step 95000 -- 📊 Evaluation Results 2025-08-30 17:08:04 - pico-train - INFO - └── paloma: 5.104812713913069e+32 2025-08-30 17:08:08 - pico-train - INFO - Step 95000 -- 🔄 Training Metrics 2025-08-30 17:08:08 - pico-train - INFO - ├── Loss: 5.7908 2025-08-30 17:08:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:08:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:08:08 - pico-train - INFO - Step 95000 -- 📈 Saving Learning Dynamics 2025-08-30 17:08:23 - pico-train - INFO - Step 95025 -- 🔄 Training Metrics 2025-08-30 17:08:23 - pico-train - INFO - ├── Loss: 5.6715 2025-08-30 17:08:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:08:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:08:35 - pico-train - INFO - Step 95050 -- 🔄 Training Metrics 2025-08-30 17:08:35 - pico-train - INFO - ├── Loss: 5.7396 2025-08-30 17:08:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:08:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:08:48 - pico-train - INFO - Step 95075 -- 🔄 Training Metrics 2025-08-30 17:08:48 - pico-train - INFO - ├── Loss: 5.7813 2025-08-30 17:08:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:08:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:09:00 - pico-train - INFO - Step 95100 -- 🔄 Training Metrics 2025-08-30 17:09:00 - pico-train - INFO - ├── Loss: 5.7272 2025-08-30 17:09:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:09:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:09:13 - pico-train - INFO - Step 95125 -- 🔄 Training Metrics 2025-08-30 17:09:13 - pico-train - INFO - ├── Loss: 5.8216 2025-08-30 17:09:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:09:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:09:26 - pico-train - INFO - Step 95150 -- 🔄 Training Metrics 2025-08-30 17:09:26 - pico-train - INFO - ├── Loss: 5.8266 2025-08-30 17:09:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:09:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:09:38 - pico-train - INFO - Step 95175 -- 🔄 Training Metrics 2025-08-30 17:09:38 - pico-train - INFO - ├── Loss: 5.6436 2025-08-30 17:09:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:09:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:09:51 - pico-train - INFO - Step 95200 -- 🔄 Training Metrics 2025-08-30 17:09:51 - pico-train - INFO - ├── Loss: 5.8010 2025-08-30 17:09:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:09:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:10:03 - pico-train - INFO - Step 95225 -- 🔄 Training Metrics 2025-08-30 17:10:03 - pico-train - INFO - ├── Loss: 5.7890 2025-08-30 17:10:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:10:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:10:16 - pico-train - INFO - Step 95250 -- 🔄 Training Metrics 2025-08-30 17:10:16 - pico-train - INFO - ├── Loss: 5.7605 2025-08-30 17:10:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:10:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:10:29 - pico-train - INFO - Step 95275 -- 🔄 Training Metrics 2025-08-30 17:10:29 - pico-train - INFO - ├── Loss: 5.7014 2025-08-30 17:10:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:10:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:10:41 - pico-train - INFO - Step 95300 -- 🔄 Training Metrics 2025-08-30 17:10:41 - pico-train - INFO - ├── Loss: 5.6459 2025-08-30 17:10:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:10:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:10:54 - pico-train - INFO - Step 95325 -- 🔄 Training Metrics 2025-08-30 17:10:54 - pico-train - INFO - ├── Loss: 5.6781 2025-08-30 17:10:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:10:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:11:07 - pico-train - INFO - Step 95350 -- 🔄 Training Metrics 2025-08-30 17:11:07 - pico-train - INFO - ├── Loss: 5.6840 2025-08-30 17:11:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:11:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:11:19 - pico-train - INFO - Step 95375 -- 🔄 Training Metrics 2025-08-30 17:11:19 - pico-train - INFO - ├── Loss: 5.7857 2025-08-30 17:11:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:11:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:11:32 - pico-train - INFO - Step 95400 -- 🔄 Training Metrics 2025-08-30 17:11:32 - pico-train - INFO - ├── Loss: 5.6726 2025-08-30 17:11:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:11:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:11:44 - pico-train - INFO - Step 95425 -- 🔄 Training Metrics 2025-08-30 17:11:44 - pico-train - INFO - ├── Loss: 5.7372 2025-08-30 17:11:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:11:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:11:57 - pico-train - INFO - Step 95450 -- 🔄 Training Metrics 2025-08-30 17:11:57 - pico-train - INFO - ├── Loss: 5.7102 2025-08-30 17:11:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:11:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:12:10 - pico-train - INFO - Step 95475 -- 🔄 Training Metrics 2025-08-30 17:12:10 - pico-train - INFO - ├── Loss: 5.7584 2025-08-30 17:12:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:12:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:12:22 - pico-train - INFO - Step 95500 -- 💾 Saving Checkpoint 2025-08-30 17:14:14 - pico-train - INFO - Step 95500 -- 📊 Evaluation Results 2025-08-30 17:14:14 - pico-train - INFO - └── paloma: 4.957634656571212e+32 2025-08-30 17:14:17 - pico-train - INFO - Step 95500 -- 🔄 Training Metrics 2025-08-30 17:14:17 - pico-train - INFO - ├── Loss: 5.7361 2025-08-30 17:14:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:14:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:14:17 - pico-train - INFO - Step 95500 -- 📈 Saving Learning Dynamics 2025-08-30 17:14:32 - pico-train - INFO - Step 95525 -- 🔄 Training Metrics 2025-08-30 17:14:32 - pico-train - INFO - ├── Loss: 5.7807 2025-08-30 17:14:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:14:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:14:44 - pico-train - INFO - Step 95550 -- 🔄 Training Metrics 2025-08-30 17:14:44 - pico-train - INFO - ├── Loss: 5.7065 2025-08-30 17:14:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:14:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:14:57 - pico-train - INFO - Step 95575 -- 🔄 Training Metrics 2025-08-30 17:14:57 - pico-train - INFO - ├── Loss: 5.7822 2025-08-30 17:14:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:14:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:15:09 - pico-train - INFO - Step 95600 -- 🔄 Training Metrics 2025-08-30 17:15:09 - pico-train - INFO - ├── Loss: 5.7196 2025-08-30 17:15:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:15:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:15:22 - pico-train - INFO - Step 95625 -- 🔄 Training Metrics 2025-08-30 17:15:22 - pico-train - INFO - ├── Loss: 5.7237 2025-08-30 17:15:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:15:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:15:35 - pico-train - INFO - Step 95650 -- 🔄 Training Metrics 2025-08-30 17:15:35 - pico-train - INFO - ├── Loss: 5.6754 2025-08-30 17:15:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:15:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:15:47 - pico-train - INFO - Step 95675 -- 🔄 Training Metrics 2025-08-30 17:15:47 - pico-train - INFO - ├── Loss: 5.6954 2025-08-30 17:15:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:15:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:16:00 - pico-train - INFO - Step 95700 -- 🔄 Training Metrics 2025-08-30 17:16:00 - pico-train - INFO - ├── Loss: 5.7059 2025-08-30 17:16:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:16:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:16:12 - pico-train - INFO - Step 95725 -- 🔄 Training Metrics 2025-08-30 17:16:12 - pico-train - INFO - ├── Loss: 5.8143 2025-08-30 17:16:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:16:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:16:25 - pico-train - INFO - Step 95750 -- 🔄 Training Metrics 2025-08-30 17:16:25 - pico-train - INFO - ├── Loss: 5.7690 2025-08-30 17:16:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:16:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:16:38 - pico-train - INFO - Step 95775 -- 🔄 Training Metrics 2025-08-30 17:16:38 - pico-train - INFO - ├── Loss: 5.6384 2025-08-30 17:16:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:16:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:16:50 - pico-train - INFO - Step 95800 -- 🔄 Training Metrics 2025-08-30 17:16:50 - pico-train - INFO - ├── Loss: 5.7031 2025-08-30 17:16:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:16:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:17:03 - pico-train - INFO - Step 95825 -- 🔄 Training Metrics 2025-08-30 17:17:03 - pico-train - INFO - ├── Loss: 5.6803 2025-08-30 17:17:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:17:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:17:15 - pico-train - INFO - Step 95850 -- 🔄 Training Metrics 2025-08-30 17:17:15 - pico-train - INFO - ├── Loss: 5.7078 2025-08-30 17:17:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:17:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:17:28 - pico-train - INFO - Step 95875 -- 🔄 Training Metrics 2025-08-30 17:17:28 - pico-train - INFO - ├── Loss: 5.8252 2025-08-30 17:17:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:17:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:17:41 - pico-train - INFO - Step 95900 -- 🔄 Training Metrics 2025-08-30 17:17:41 - pico-train - INFO - ├── Loss: 5.7385 2025-08-30 17:17:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:17:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:17:53 - pico-train - INFO - Step 95925 -- 🔄 Training Metrics 2025-08-30 17:17:53 - pico-train - INFO - ├── Loss: 5.6623 2025-08-30 17:17:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:17:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:18:06 - pico-train - INFO - Step 95950 -- 🔄 Training Metrics 2025-08-30 17:18:06 - pico-train - INFO - ├── Loss: 5.7009 2025-08-30 17:18:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:18:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:18:18 - pico-train - INFO - Step 95975 -- 🔄 Training Metrics 2025-08-30 17:18:18 - pico-train - INFO - ├── Loss: 5.7091 2025-08-30 17:18:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:18:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:18:30 - pico-train - INFO - Step 96000 -- 💾 Saving Checkpoint 2025-08-30 17:20:28 - pico-train - INFO - Step 96000 -- 📊 Evaluation Results 2025-08-30 17:20:28 - pico-train - INFO - └── paloma: 5.493562243407693e+32 2025-08-30 17:20:31 - pico-train - INFO - Step 96000 -- 🔄 Training Metrics 2025-08-30 17:20:31 - pico-train - INFO - ├── Loss: 5.7198 2025-08-30 17:20:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:20:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:20:31 - pico-train - INFO - Step 96000 -- 📈 Saving Learning Dynamics 2025-08-30 17:20:45 - pico-train - INFO - Step 96025 -- 🔄 Training Metrics 2025-08-30 17:20:45 - pico-train - INFO - ├── Loss: 5.8080 2025-08-30 17:20:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:20:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:20:58 - pico-train - INFO - Step 96050 -- 🔄 Training Metrics 2025-08-30 17:20:58 - pico-train - INFO - ├── Loss: 5.7250 2025-08-30 17:20:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:20:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:21:11 - pico-train - INFO - Step 96075 -- 🔄 Training Metrics 2025-08-30 17:21:11 - pico-train - INFO - ├── Loss: 5.7499 2025-08-30 17:21:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:21:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:21:23 - pico-train - INFO - Step 96100 -- 🔄 Training Metrics 2025-08-30 17:21:23 - pico-train - INFO - ├── Loss: 5.7085 2025-08-30 17:21:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:21:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:21:36 - pico-train - INFO - Step 96125 -- 🔄 Training Metrics 2025-08-30 17:21:36 - pico-train - INFO - ├── Loss: 5.7201 2025-08-30 17:21:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:21:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:21:48 - pico-train - INFO - Step 96150 -- 🔄 Training Metrics 2025-08-30 17:21:48 - pico-train - INFO - ├── Loss: 5.7547 2025-08-30 17:21:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:21:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:22:01 - pico-train - INFO - Step 96175 -- 🔄 Training Metrics 2025-08-30 17:22:01 - pico-train - INFO - ├── Loss: 5.8151 2025-08-30 17:22:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:22:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:22:13 - pico-train - INFO - Step 96200 -- 🔄 Training Metrics 2025-08-30 17:22:13 - pico-train - INFO - ├── Loss: 5.7170 2025-08-30 17:22:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:22:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:22:26 - pico-train - INFO - Step 96225 -- 🔄 Training Metrics 2025-08-30 17:22:26 - pico-train - INFO - ├── Loss: 5.8170 2025-08-30 17:22:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:22:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:22:39 - pico-train - INFO - Step 96250 -- 🔄 Training Metrics 2025-08-30 17:22:39 - pico-train - INFO - ├── Loss: 5.7251 2025-08-30 17:22:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:22:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:22:51 - pico-train - INFO - Step 96275 -- 🔄 Training Metrics 2025-08-30 17:22:51 - pico-train - INFO - ├── Loss: 5.7940 2025-08-30 17:22:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:22:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:23:04 - pico-train - INFO - Step 96300 -- 🔄 Training Metrics 2025-08-30 17:23:04 - pico-train - INFO - ├── Loss: 5.7028 2025-08-30 17:23:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:23:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:23:17 - pico-train - INFO - Step 96325 -- 🔄 Training Metrics 2025-08-30 17:23:17 - pico-train - INFO - ├── Loss: 5.7019 2025-08-30 17:23:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:23:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:23:29 - pico-train - INFO - Step 96350 -- 🔄 Training Metrics 2025-08-30 17:23:29 - pico-train - INFO - ├── Loss: 5.7100 2025-08-30 17:23:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:23:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:23:42 - pico-train - INFO - Step 96375 -- 🔄 Training Metrics 2025-08-30 17:23:42 - pico-train - INFO - ├── Loss: 5.7076 2025-08-30 17:23:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:23:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:23:55 - pico-train - INFO - Step 96400 -- 🔄 Training Metrics 2025-08-30 17:23:55 - pico-train - INFO - ├── Loss: 5.7707 2025-08-30 17:23:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:23:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:24:07 - pico-train - INFO - Step 96425 -- 🔄 Training Metrics 2025-08-30 17:24:07 - pico-train - INFO - ├── Loss: 5.7697 2025-08-30 17:24:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:24:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:24:20 - pico-train - INFO - Step 96450 -- 🔄 Training Metrics 2025-08-30 17:24:20 - pico-train - INFO - ├── Loss: 5.6849 2025-08-30 17:24:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:24:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:24:32 - pico-train - INFO - Step 96475 -- 🔄 Training Metrics 2025-08-30 17:24:32 - pico-train - INFO - ├── Loss: 5.7712 2025-08-30 17:24:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:24:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:24:45 - pico-train - INFO - Step 96500 -- 💾 Saving Checkpoint 2025-08-30 17:26:46 - pico-train - INFO - Step 96500 -- 📊 Evaluation Results 2025-08-30 17:26:46 - pico-train - INFO - └── paloma: 5.48067532478105e+32 2025-08-30 17:26:49 - pico-train - INFO - Step 96500 -- 🔄 Training Metrics 2025-08-30 17:26:49 - pico-train - INFO - ├── Loss: 5.7094 2025-08-30 17:26:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:26:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:26:49 - pico-train - INFO - Step 96500 -- 📈 Saving Learning Dynamics 2025-08-30 17:27:04 - pico-train - INFO - Step 96525 -- 🔄 Training Metrics 2025-08-30 17:27:04 - pico-train - INFO - ├── Loss: 5.7648 2025-08-30 17:27:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:27:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:27:17 - pico-train - INFO - Step 96550 -- 🔄 Training Metrics 2025-08-30 17:27:17 - pico-train - INFO - ├── Loss: 5.7549 2025-08-30 17:27:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:27:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:27:29 - pico-train - INFO - Step 96575 -- 🔄 Training Metrics 2025-08-30 17:27:29 - pico-train - INFO - ├── Loss: 5.7346 2025-08-30 17:27:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:27:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:27:42 - pico-train - INFO - Step 96600 -- 🔄 Training Metrics 2025-08-30 17:27:42 - pico-train - INFO - ├── Loss: 5.7456 2025-08-30 17:27:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:27:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:27:54 - pico-train - INFO - Step 96625 -- 🔄 Training Metrics 2025-08-30 17:27:54 - pico-train - INFO - ├── Loss: 5.7921 2025-08-30 17:27:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:27:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:28:07 - pico-train - INFO - Step 96650 -- 🔄 Training Metrics 2025-08-30 17:28:07 - pico-train - INFO - ├── Loss: 5.7698 2025-08-30 17:28:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:28:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:28:20 - pico-train - INFO - Step 96675 -- 🔄 Training Metrics 2025-08-30 17:28:20 - pico-train - INFO - ├── Loss: 5.7013 2025-08-30 17:28:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:28:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:28:32 - pico-train - INFO - Step 96700 -- 🔄 Training Metrics 2025-08-30 17:28:32 - pico-train - INFO - ├── Loss: 5.7223 2025-08-30 17:28:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:28:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:28:45 - pico-train - INFO - Step 96725 -- 🔄 Training Metrics 2025-08-30 17:28:45 - pico-train - INFO - ├── Loss: 5.7874 2025-08-30 17:28:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:28:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:28:57 - pico-train - INFO - Step 96750 -- 🔄 Training Metrics 2025-08-30 17:28:57 - pico-train - INFO - ├── Loss: 5.6761 2025-08-30 17:28:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:28:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:29:10 - pico-train - INFO - Step 96775 -- 🔄 Training Metrics 2025-08-30 17:29:10 - pico-train - INFO - ├── Loss: 5.6824 2025-08-30 17:29:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:29:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:29:22 - pico-train - INFO - Step 96800 -- 🔄 Training Metrics 2025-08-30 17:29:22 - pico-train - INFO - ├── Loss: 5.7050 2025-08-30 17:29:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:29:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:29:35 - pico-train - INFO - Step 96825 -- 🔄 Training Metrics 2025-08-30 17:29:35 - pico-train - INFO - ├── Loss: 5.7277 2025-08-30 17:29:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:29:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:29:48 - pico-train - INFO - Step 96850 -- 🔄 Training Metrics 2025-08-30 17:29:48 - pico-train - INFO - ├── Loss: 5.8112 2025-08-30 17:29:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:29:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:30:00 - pico-train - INFO - Step 96875 -- 🔄 Training Metrics 2025-08-30 17:30:00 - pico-train - INFO - ├── Loss: 5.7395 2025-08-30 17:30:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:30:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:30:13 - pico-train - INFO - Step 96900 -- 🔄 Training Metrics 2025-08-30 17:30:13 - pico-train - INFO - ├── Loss: 5.7074 2025-08-30 17:30:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:30:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:30:25 - pico-train - INFO - Step 96925 -- 🔄 Training Metrics 2025-08-30 17:30:25 - pico-train - INFO - ├── Loss: 5.5761 2025-08-30 17:30:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:30:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:30:38 - pico-train - INFO - Step 96950 -- 🔄 Training Metrics 2025-08-30 17:30:38 - pico-train - INFO - ├── Loss: 5.7307 2025-08-30 17:30:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:30:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:30:51 - pico-train - INFO - Step 96975 -- 🔄 Training Metrics 2025-08-30 17:30:51 - pico-train - INFO - ├── Loss: 5.7488 2025-08-30 17:30:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:30:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:31:04 - pico-train - INFO - Step 97000 -- 💾 Saving Checkpoint 2025-08-30 17:33:01 - pico-train - INFO - Step 97000 -- 📊 Evaluation Results 2025-08-30 17:33:01 - pico-train - INFO - └── paloma: 5.7264733450144815e+32 2025-08-30 17:33:04 - pico-train - INFO - Step 97000 -- 🔄 Training Metrics 2025-08-30 17:33:04 - pico-train - INFO - ├── Loss: 5.7404 2025-08-30 17:33:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:33:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:33:04 - pico-train - INFO - Step 97000 -- 📈 Saving Learning Dynamics 2025-08-30 17:33:19 - pico-train - INFO - Step 97025 -- 🔄 Training Metrics 2025-08-30 17:33:19 - pico-train - INFO - ├── Loss: 5.7170 2025-08-30 17:33:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:33:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:33:32 - pico-train - INFO - Step 97050 -- 🔄 Training Metrics 2025-08-30 17:33:32 - pico-train - INFO - ├── Loss: 5.7462 2025-08-30 17:33:32 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:33:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:33:44 - pico-train - INFO - Step 97075 -- 🔄 Training Metrics 2025-08-30 17:33:44 - pico-train - INFO - ├── Loss: 5.7603 2025-08-30 17:33:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:33:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:33:57 - pico-train - INFO - Step 97100 -- 🔄 Training Metrics 2025-08-30 17:33:57 - pico-train - INFO - ├── Loss: 5.7696 2025-08-30 17:33:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:33:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:34:10 - pico-train - INFO - Step 97125 -- 🔄 Training Metrics 2025-08-30 17:34:10 - pico-train - INFO - ├── Loss: 5.6190 2025-08-30 17:34:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:34:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:34:22 - pico-train - INFO - Step 97150 -- 🔄 Training Metrics 2025-08-30 17:34:22 - pico-train - INFO - ├── Loss: 5.7495 2025-08-30 17:34:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:34:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:34:35 - pico-train - INFO - Step 97175 -- 🔄 Training Metrics 2025-08-30 17:34:35 - pico-train - INFO - ├── Loss: 5.7458 2025-08-30 17:34:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:34:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:34:47 - pico-train - INFO - Step 97200 -- 🔄 Training Metrics 2025-08-30 17:34:47 - pico-train - INFO - ├── Loss: 5.7160 2025-08-30 17:34:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:34:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:35:00 - pico-train - INFO - Step 97225 -- 🔄 Training Metrics 2025-08-30 17:35:00 - pico-train - INFO - ├── Loss: 5.7542 2025-08-30 17:35:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:35:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:35:13 - pico-train - INFO - Step 97250 -- 🔄 Training Metrics 2025-08-30 17:35:13 - pico-train - INFO - ├── Loss: 5.6895 2025-08-30 17:35:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:35:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:35:25 - pico-train - INFO - Step 97275 -- 🔄 Training Metrics 2025-08-30 17:35:25 - pico-train - INFO - ├── Loss: 5.6600 2025-08-30 17:35:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:35:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:35:38 - pico-train - INFO - Step 97300 -- 🔄 Training Metrics 2025-08-30 17:35:38 - pico-train - INFO - ├── Loss: 5.7099 2025-08-30 17:35:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:35:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:35:50 - pico-train - INFO - Step 97325 -- 🔄 Training Metrics 2025-08-30 17:35:50 - pico-train - INFO - ├── Loss: 5.7300 2025-08-30 17:35:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:35:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:36:03 - pico-train - INFO - Step 97350 -- 🔄 Training Metrics 2025-08-30 17:36:03 - pico-train - INFO - ├── Loss: 5.7254 2025-08-30 17:36:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:36:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:36:15 - pico-train - INFO - Step 97375 -- 🔄 Training Metrics 2025-08-30 17:36:15 - pico-train - INFO - ├── Loss: 5.6031 2025-08-30 17:36:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:36:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:36:28 - pico-train - INFO - Step 97400 -- 🔄 Training Metrics 2025-08-30 17:36:28 - pico-train - INFO - ├── Loss: 5.7259 2025-08-30 17:36:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:36:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:36:40 - pico-train - INFO - Step 97425 -- 🔄 Training Metrics 2025-08-30 17:36:40 - pico-train - INFO - ├── Loss: 5.7374 2025-08-30 17:36:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:36:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:36:53 - pico-train - INFO - Step 97450 -- 🔄 Training Metrics 2025-08-30 17:36:53 - pico-train - INFO - ├── Loss: 5.7045 2025-08-30 17:36:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:36:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:37:06 - pico-train - INFO - Step 97475 -- 🔄 Training Metrics 2025-08-30 17:37:06 - pico-train - INFO - ├── Loss: 5.8159 2025-08-30 17:37:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:37:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:37:19 - pico-train - INFO - Step 97500 -- 💾 Saving Checkpoint 2025-08-30 17:39:23 - pico-train - INFO - Step 97500 -- 📊 Evaluation Results 2025-08-30 17:39:23 - pico-train - INFO - └── paloma: 5.531512718782419e+32 2025-08-30 17:39:26 - pico-train - INFO - Step 97500 -- 🔄 Training Metrics 2025-08-30 17:39:26 - pico-train - INFO - ├── Loss: 5.7937 2025-08-30 17:39:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:39:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:39:26 - pico-train - INFO - Step 97500 -- 📈 Saving Learning Dynamics 2025-08-30 17:39:40 - pico-train - INFO - Step 97525 -- 🔄 Training Metrics 2025-08-30 17:39:40 - pico-train - INFO - ├── Loss: 5.7326 2025-08-30 17:39:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:39:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:39:53 - pico-train - INFO - Step 97550 -- 🔄 Training Metrics 2025-08-30 17:39:53 - pico-train - INFO - ├── Loss: 5.7356 2025-08-30 17:39:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:39:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:40:06 - pico-train - INFO - Step 97575 -- 🔄 Training Metrics 2025-08-30 17:40:06 - pico-train - INFO - ├── Loss: 5.7435 2025-08-30 17:40:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:40:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:40:19 - pico-train - INFO - Step 97600 -- 🔄 Training Metrics 2025-08-30 17:40:19 - pico-train - INFO - ├── Loss: 5.8169 2025-08-30 17:40:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:40:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:40:31 - pico-train - INFO - Step 97625 -- 🔄 Training Metrics 2025-08-30 17:40:31 - pico-train - INFO - ├── Loss: 5.7146 2025-08-30 17:40:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:40:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:40:44 - pico-train - INFO - Step 97650 -- 🔄 Training Metrics 2025-08-30 17:40:44 - pico-train - INFO - ├── Loss: 5.7146 2025-08-30 17:40:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:40:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:40:57 - pico-train - INFO - Step 97675 -- 🔄 Training Metrics 2025-08-30 17:40:57 - pico-train - INFO - ├── Loss: 5.7508 2025-08-30 17:40:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:40:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:41:09 - pico-train - INFO - Step 97700 -- 🔄 Training Metrics 2025-08-30 17:41:09 - pico-train - INFO - ├── Loss: 5.7223 2025-08-30 17:41:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:41:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:41:22 - pico-train - INFO - Step 97725 -- 🔄 Training Metrics 2025-08-30 17:41:22 - pico-train - INFO - ├── Loss: 5.7193 2025-08-30 17:41:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:41:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:41:34 - pico-train - INFO - Step 97750 -- 🔄 Training Metrics 2025-08-30 17:41:34 - pico-train - INFO - ├── Loss: 5.7559 2025-08-30 17:41:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:41:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:41:47 - pico-train - INFO - Step 97775 -- 🔄 Training Metrics 2025-08-30 17:41:47 - pico-train - INFO - ├── Loss: 5.6605 2025-08-30 17:41:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:41:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:41:59 - pico-train - INFO - Step 97800 -- 🔄 Training Metrics 2025-08-30 17:41:59 - pico-train - INFO - ├── Loss: 5.6877 2025-08-30 17:41:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:41:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:42:12 - pico-train - INFO - Step 97825 -- 🔄 Training Metrics 2025-08-30 17:42:12 - pico-train - INFO - ├── Loss: 5.7657 2025-08-30 17:42:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:42:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:42:24 - pico-train - INFO - Step 97850 -- 🔄 Training Metrics 2025-08-30 17:42:24 - pico-train - INFO - ├── Loss: 5.6716 2025-08-30 17:42:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:42:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:42:37 - pico-train - INFO - Step 97875 -- 🔄 Training Metrics 2025-08-30 17:42:37 - pico-train - INFO - ├── Loss: 5.7788 2025-08-30 17:42:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:42:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:42:50 - pico-train - INFO - Step 97900 -- 🔄 Training Metrics 2025-08-30 17:42:50 - pico-train - INFO - ├── Loss: 5.6792 2025-08-30 17:42:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:42:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:43:02 - pico-train - INFO - Step 97925 -- 🔄 Training Metrics 2025-08-30 17:43:02 - pico-train - INFO - ├── Loss: 5.7525 2025-08-30 17:43:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:43:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:43:15 - pico-train - INFO - Step 97950 -- 🔄 Training Metrics 2025-08-30 17:43:15 - pico-train - INFO - ├── Loss: 5.5884 2025-08-30 17:43:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:43:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:43:27 - pico-train - INFO - Step 97975 -- 🔄 Training Metrics 2025-08-30 17:43:27 - pico-train - INFO - ├── Loss: 5.7355 2025-08-30 17:43:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:43:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:43:40 - pico-train - INFO - Step 98000 -- 💾 Saving Checkpoint 2025-08-30 17:45:33 - pico-train - INFO - Step 98000 -- 📊 Evaluation Results 2025-08-30 17:45:33 - pico-train - INFO - └── paloma: 5.246902427320844e+32 2025-08-30 17:45:35 - pico-train - INFO - Step 98000 -- 🔄 Training Metrics 2025-08-30 17:45:35 - pico-train - INFO - ├── Loss: 5.7403 2025-08-30 17:45:35 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:45:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:45:35 - pico-train - INFO - Step 98000 -- 📈 Saving Learning Dynamics 2025-08-30 17:45:50 - pico-train - INFO - Step 98025 -- 🔄 Training Metrics 2025-08-30 17:45:50 - pico-train - INFO - ├── Loss: 5.7210 2025-08-30 17:45:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:45:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:46:03 - pico-train - INFO - Step 98050 -- 🔄 Training Metrics 2025-08-30 17:46:03 - pico-train - INFO - ├── Loss: 5.6683 2025-08-30 17:46:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:46:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:46:15 - pico-train - INFO - Step 98075 -- 🔄 Training Metrics 2025-08-30 17:46:15 - pico-train - INFO - ├── Loss: 5.7354 2025-08-30 17:46:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:46:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:46:28 - pico-train - INFO - Step 98100 -- 🔄 Training Metrics 2025-08-30 17:46:28 - pico-train - INFO - ├── Loss: 5.7459 2025-08-30 17:46:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:46:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:46:40 - pico-train - INFO - Step 98125 -- 🔄 Training Metrics 2025-08-30 17:46:40 - pico-train - INFO - ├── Loss: 5.7033 2025-08-30 17:46:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:46:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:46:53 - pico-train - INFO - Step 98150 -- 🔄 Training Metrics 2025-08-30 17:46:53 - pico-train - INFO - ├── Loss: 5.7499 2025-08-30 17:46:53 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:46:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:47:06 - pico-train - INFO - Step 98175 -- 🔄 Training Metrics 2025-08-30 17:47:06 - pico-train - INFO - ├── Loss: 5.7994 2025-08-30 17:47:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:47:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:47:18 - pico-train - INFO - Step 98200 -- 🔄 Training Metrics 2025-08-30 17:47:18 - pico-train - INFO - ├── Loss: 5.6783 2025-08-30 17:47:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:47:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:47:31 - pico-train - INFO - Step 98225 -- 🔄 Training Metrics 2025-08-30 17:47:31 - pico-train - INFO - ├── Loss: 5.6711 2025-08-30 17:47:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:47:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:47:44 - pico-train - INFO - Step 98250 -- 🔄 Training Metrics 2025-08-30 17:47:44 - pico-train - INFO - ├── Loss: 5.6655 2025-08-30 17:47:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:47:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:47:56 - pico-train - INFO - Step 98275 -- 🔄 Training Metrics 2025-08-30 17:47:56 - pico-train - INFO - ├── Loss: 5.8255 2025-08-30 17:47:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:47:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:48:09 - pico-train - INFO - Step 98300 -- 🔄 Training Metrics 2025-08-30 17:48:09 - pico-train - INFO - ├── Loss: 5.6334 2025-08-30 17:48:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:48:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:48:21 - pico-train - INFO - Step 98325 -- 🔄 Training Metrics 2025-08-30 17:48:21 - pico-train - INFO - ├── Loss: 5.7332 2025-08-30 17:48:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:48:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:48:34 - pico-train - INFO - Step 98350 -- 🔄 Training Metrics 2025-08-30 17:48:34 - pico-train - INFO - ├── Loss: 5.7630 2025-08-30 17:48:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:48:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:48:47 - pico-train - INFO - Step 98375 -- 🔄 Training Metrics 2025-08-30 17:48:47 - pico-train - INFO - ├── Loss: 5.7364 2025-08-30 17:48:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:48:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:48:59 - pico-train - INFO - Step 98400 -- 🔄 Training Metrics 2025-08-30 17:48:59 - pico-train - INFO - ├── Loss: 5.7012 2025-08-30 17:48:59 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:48:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:49:12 - pico-train - INFO - Step 98425 -- 🔄 Training Metrics 2025-08-30 17:49:12 - pico-train - INFO - ├── Loss: 5.7739 2025-08-30 17:49:12 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:49:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:49:25 - pico-train - INFO - Step 98450 -- 🔄 Training Metrics 2025-08-30 17:49:25 - pico-train - INFO - ├── Loss: 5.7062 2025-08-30 17:49:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:49:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:49:37 - pico-train - INFO - Step 98475 -- 🔄 Training Metrics 2025-08-30 17:49:37 - pico-train - INFO - ├── Loss: 5.7318 2025-08-30 17:49:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:49:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:49:50 - pico-train - INFO - Step 98500 -- 💾 Saving Checkpoint 2025-08-30 17:51:43 - pico-train - INFO - Step 98500 -- 📊 Evaluation Results 2025-08-30 17:51:43 - pico-train - INFO - └── paloma: 6.508598529141235e+32 2025-08-30 17:51:45 - pico-train - INFO - Step 98500 -- 🔄 Training Metrics 2025-08-30 17:51:45 - pico-train - INFO - ├── Loss: 5.7632 2025-08-30 17:51:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:51:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:51:45 - pico-train - INFO - Step 98500 -- 📈 Saving Learning Dynamics 2025-08-30 17:52:00 - pico-train - INFO - Step 98525 -- 🔄 Training Metrics 2025-08-30 17:52:00 - pico-train - INFO - ├── Loss: 5.7365 2025-08-30 17:52:00 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:52:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:52:13 - pico-train - INFO - Step 98550 -- 🔄 Training Metrics 2025-08-30 17:52:13 - pico-train - INFO - ├── Loss: 5.7181 2025-08-30 17:52:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:52:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:52:25 - pico-train - INFO - Step 98575 -- 🔄 Training Metrics 2025-08-30 17:52:25 - pico-train - INFO - ├── Loss: 5.7607 2025-08-30 17:52:25 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:52:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:52:38 - pico-train - INFO - Step 98600 -- 🔄 Training Metrics 2025-08-30 17:52:38 - pico-train - INFO - ├── Loss: 5.8725 2025-08-30 17:52:38 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:52:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:52:50 - pico-train - INFO - Step 98625 -- 🔄 Training Metrics 2025-08-30 17:52:50 - pico-train - INFO - ├── Loss: 5.6675 2025-08-30 17:52:50 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:52:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:53:03 - pico-train - INFO - Step 98650 -- 🔄 Training Metrics 2025-08-30 17:53:03 - pico-train - INFO - ├── Loss: 5.7988 2025-08-30 17:53:03 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:53:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:53:16 - pico-train - INFO - Step 98675 -- 🔄 Training Metrics 2025-08-30 17:53:16 - pico-train - INFO - ├── Loss: 5.6917 2025-08-30 17:53:16 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:53:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:53:28 - pico-train - INFO - Step 98700 -- 🔄 Training Metrics 2025-08-30 17:53:28 - pico-train - INFO - ├── Loss: 5.6784 2025-08-30 17:53:28 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:53:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:53:41 - pico-train - INFO - Step 98725 -- 🔄 Training Metrics 2025-08-30 17:53:41 - pico-train - INFO - ├── Loss: 5.6719 2025-08-30 17:53:41 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:53:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:53:54 - pico-train - INFO - Step 98750 -- 🔄 Training Metrics 2025-08-30 17:53:54 - pico-train - INFO - ├── Loss: 5.7690 2025-08-30 17:53:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:53:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:54:06 - pico-train - INFO - Step 98775 -- 🔄 Training Metrics 2025-08-30 17:54:06 - pico-train - INFO - ├── Loss: 5.6928 2025-08-30 17:54:06 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:54:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:54:19 - pico-train - INFO - Step 98800 -- 🔄 Training Metrics 2025-08-30 17:54:19 - pico-train - INFO - ├── Loss: 5.7491 2025-08-30 17:54:19 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:54:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:54:31 - pico-train - INFO - Step 98825 -- 🔄 Training Metrics 2025-08-30 17:54:31 - pico-train - INFO - ├── Loss: 5.7532 2025-08-30 17:54:31 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:54:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:54:44 - pico-train - INFO - Step 98850 -- 🔄 Training Metrics 2025-08-30 17:54:44 - pico-train - INFO - ├── Loss: 5.8232 2025-08-30 17:54:44 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:54:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:54:57 - pico-train - INFO - Step 98875 -- 🔄 Training Metrics 2025-08-30 17:54:57 - pico-train - INFO - ├── Loss: 5.7709 2025-08-30 17:54:57 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:54:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:55:09 - pico-train - INFO - Step 98900 -- 🔄 Training Metrics 2025-08-30 17:55:09 - pico-train - INFO - ├── Loss: 5.7482 2025-08-30 17:55:09 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:55:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:55:22 - pico-train - INFO - Step 98925 -- 🔄 Training Metrics 2025-08-30 17:55:22 - pico-train - INFO - ├── Loss: 5.6830 2025-08-30 17:55:22 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:55:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:55:34 - pico-train - INFO - Step 98950 -- 🔄 Training Metrics 2025-08-30 17:55:34 - pico-train - INFO - ├── Loss: 5.7586 2025-08-30 17:55:34 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:55:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:55:47 - pico-train - INFO - Step 98975 -- 🔄 Training Metrics 2025-08-30 17:55:47 - pico-train - INFO - ├── Loss: 5.7720 2025-08-30 17:55:47 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:55:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:56:00 - pico-train - INFO - Step 99000 -- 💾 Saving Checkpoint 2025-08-30 17:57:53 - pico-train - INFO - Step 99000 -- 📊 Evaluation Results 2025-08-30 17:57:53 - pico-train - INFO - └── paloma: 5.8870480611363886e+32 2025-08-30 17:57:56 - pico-train - INFO - Step 99000 -- 🔄 Training Metrics 2025-08-30 17:57:56 - pico-train - INFO - ├── Loss: 5.8245 2025-08-30 17:57:56 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:57:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:57:56 - pico-train - INFO - Step 99000 -- 📈 Saving Learning Dynamics 2025-08-30 17:58:11 - pico-train - INFO - Step 99025 -- 🔄 Training Metrics 2025-08-30 17:58:11 - pico-train - INFO - ├── Loss: 5.7741 2025-08-30 17:58:11 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:58:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:58:24 - pico-train - INFO - Step 99050 -- 🔄 Training Metrics 2025-08-30 17:58:24 - pico-train - INFO - ├── Loss: 5.7613 2025-08-30 17:58:24 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:58:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:58:37 - pico-train - INFO - Step 99075 -- 🔄 Training Metrics 2025-08-30 17:58:37 - pico-train - INFO - ├── Loss: 5.7590 2025-08-30 17:58:37 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:58:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:58:49 - pico-train - INFO - Step 99100 -- 🔄 Training Metrics 2025-08-30 17:58:49 - pico-train - INFO - ├── Loss: 5.7106 2025-08-30 17:58:49 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:58:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:59:02 - pico-train - INFO - Step 99125 -- 🔄 Training Metrics 2025-08-30 17:59:02 - pico-train - INFO - ├── Loss: 5.7122 2025-08-30 17:59:02 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:59:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:59:15 - pico-train - INFO - Step 99150 -- 🔄 Training Metrics 2025-08-30 17:59:15 - pico-train - INFO - ├── Loss: 5.7691 2025-08-30 17:59:15 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:59:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:59:27 - pico-train - INFO - Step 99175 -- 🔄 Training Metrics 2025-08-30 17:59:27 - pico-train - INFO - ├── Loss: 5.7368 2025-08-30 17:59:27 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:59:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:59:40 - pico-train - INFO - Step 99200 -- 🔄 Training Metrics 2025-08-30 17:59:40 - pico-train - INFO - ├── Loss: 5.7177 2025-08-30 17:59:40 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:59:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:59:52 - pico-train - INFO - Step 99225 -- 🔄 Training Metrics 2025-08-30 17:59:52 - pico-train - INFO - ├── Loss: 5.7907 2025-08-30 17:59:52 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 17:59:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:00:05 - pico-train - INFO - Step 99250 -- 🔄 Training Metrics 2025-08-30 18:00:05 - pico-train - INFO - ├── Loss: 5.7482 2025-08-30 18:00:05 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:00:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:00:18 - pico-train - INFO - Step 99275 -- 🔄 Training Metrics 2025-08-30 18:00:18 - pico-train - INFO - ├── Loss: 5.8113 2025-08-30 18:00:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:00:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:00:30 - pico-train - INFO - Step 99300 -- 🔄 Training Metrics 2025-08-30 18:00:30 - pico-train - INFO - ├── Loss: 5.7323 2025-08-30 18:00:30 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:00:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:00:43 - pico-train - INFO - Step 99325 -- 🔄 Training Metrics 2025-08-30 18:00:43 - pico-train - INFO - ├── Loss: 5.8241 2025-08-30 18:00:43 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:00:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:00:55 - pico-train - INFO - Step 99350 -- 🔄 Training Metrics 2025-08-30 18:00:55 - pico-train - INFO - ├── Loss: 5.7065 2025-08-30 18:00:55 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:00:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:01:08 - pico-train - INFO - Step 99375 -- 🔄 Training Metrics 2025-08-30 18:01:08 - pico-train - INFO - ├── Loss: 5.7672 2025-08-30 18:01:08 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:01:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:01:21 - pico-train - INFO - Step 99400 -- 🔄 Training Metrics 2025-08-30 18:01:21 - pico-train - INFO - ├── Loss: 5.6693 2025-08-30 18:01:21 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:01:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:01:33 - pico-train - INFO - Step 99425 -- 🔄 Training Metrics 2025-08-30 18:01:33 - pico-train - INFO - ├── Loss: 5.7310 2025-08-30 18:01:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:01:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:01:46 - pico-train - INFO - Step 99450 -- 🔄 Training Metrics 2025-08-30 18:01:46 - pico-train - INFO - ├── Loss: 5.7669 2025-08-30 18:01:46 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:01:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:01:58 - pico-train - INFO - Step 99475 -- 🔄 Training Metrics 2025-08-30 18:01:58 - pico-train - INFO - ├── Loss: 5.7305 2025-08-30 18:01:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:01:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:02:11 - pico-train - INFO - Step 99500 -- 💾 Saving Checkpoint 2025-08-30 18:04:16 - pico-train - INFO - Step 99500 -- 📊 Evaluation Results 2025-08-30 18:04:16 - pico-train - INFO - └── paloma: 6.068047428948403e+32 2025-08-30 18:04:18 - pico-train - INFO - Step 99500 -- 🔄 Training Metrics 2025-08-30 18:04:18 - pico-train - INFO - ├── Loss: 5.6898 2025-08-30 18:04:18 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:04:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:04:18 - pico-train - INFO - Step 99500 -- 📈 Saving Learning Dynamics 2025-08-30 18:04:33 - pico-train - INFO - Step 99525 -- 🔄 Training Metrics 2025-08-30 18:04:33 - pico-train - INFO - ├── Loss: 5.6852 2025-08-30 18:04:33 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:04:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:04:45 - pico-train - INFO - Step 99550 -- 🔄 Training Metrics 2025-08-30 18:04:45 - pico-train - INFO - ├── Loss: 5.6740 2025-08-30 18:04:45 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:04:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:04:58 - pico-train - INFO - Step 99575 -- 🔄 Training Metrics 2025-08-30 18:04:58 - pico-train - INFO - ├── Loss: 5.6740 2025-08-30 18:04:58 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:04:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:05:10 - pico-train - INFO - Step 99600 -- 🔄 Training Metrics 2025-08-30 18:05:10 - pico-train - INFO - ├── Loss: 5.7617 2025-08-30 18:05:10 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:05:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:05:23 - pico-train - INFO - Step 99625 -- 🔄 Training Metrics 2025-08-30 18:05:23 - pico-train - INFO - ├── Loss: 5.7173 2025-08-30 18:05:23 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:05:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:05:36 - pico-train - INFO - Step 99650 -- 🔄 Training Metrics 2025-08-30 18:05:36 - pico-train - INFO - ├── Loss: 5.7002 2025-08-30 18:05:36 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:05:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:05:48 - pico-train - INFO - Step 99675 -- 🔄 Training Metrics 2025-08-30 18:05:48 - pico-train - INFO - ├── Loss: 5.7901 2025-08-30 18:05:48 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:05:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:06:01 - pico-train - INFO - Step 99700 -- 🔄 Training Metrics 2025-08-30 18:06:01 - pico-train - INFO - ├── Loss: 5.6642 2025-08-30 18:06:01 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:06:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:06:13 - pico-train - INFO - Step 99725 -- 🔄 Training Metrics 2025-08-30 18:06:13 - pico-train - INFO - ├── Loss: 5.7029 2025-08-30 18:06:13 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:06:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:06:26 - pico-train - INFO - Step 99750 -- 🔄 Training Metrics 2025-08-30 18:06:26 - pico-train - INFO - ├── Loss: 5.7937 2025-08-30 18:06:26 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:06:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:06:39 - pico-train - INFO - Step 99775 -- 🔄 Training Metrics 2025-08-30 18:06:39 - pico-train - INFO - ├── Loss: 5.6641 2025-08-30 18:06:39 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:06:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:06:51 - pico-train - INFO - Step 99800 -- 🔄 Training Metrics 2025-08-30 18:06:51 - pico-train - INFO - ├── Loss: 5.6999 2025-08-30 18:06:51 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:06:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:07:04 - pico-train - INFO - Step 99825 -- 🔄 Training Metrics 2025-08-30 18:07:04 - pico-train - INFO - ├── Loss: 5.6917 2025-08-30 18:07:04 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:07:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:07:17 - pico-train - INFO - Step 99850 -- 🔄 Training Metrics 2025-08-30 18:07:17 - pico-train - INFO - ├── Loss: 5.6787 2025-08-30 18:07:17 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:07:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:07:29 - pico-train - INFO - Step 99875 -- 🔄 Training Metrics 2025-08-30 18:07:29 - pico-train - INFO - ├── Loss: 5.7142 2025-08-30 18:07:29 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:07:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:07:42 - pico-train - INFO - Step 99900 -- 🔄 Training Metrics 2025-08-30 18:07:42 - pico-train - INFO - ├── Loss: 5.7533 2025-08-30 18:07:42 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:07:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:07:54 - pico-train - INFO - Step 99925 -- 🔄 Training Metrics 2025-08-30 18:07:54 - pico-train - INFO - ├── Loss: 5.7335 2025-08-30 18:07:54 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:07:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:08:07 - pico-train - INFO - Step 99950 -- 🔄 Training Metrics 2025-08-30 18:08:07 - pico-train - INFO - ├── Loss: 5.7257 2025-08-30 18:08:07 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:08:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:08:20 - pico-train - INFO - Step 99975 -- 🔄 Training Metrics 2025-08-30 18:08:20 - pico-train - INFO - ├── Loss: 5.6977 2025-08-30 18:08:20 - pico-train - INFO - ├── Learning Rate: 5.00e-06 2025-08-30 18:08:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:08:32 - pico-train - INFO - Step 100000 -- 💾 Saving Checkpoint 2025-08-30 18:10:27 - pico-train - INFO - Step 100000 -- 📊 Evaluation Results 2025-08-30 18:10:27 - pico-train - INFO - └── paloma: 6.402927231583771e+32 2025-08-30 18:10:28 - pico-train - INFO - 🎉 Training complete! Final step: 100000