2025-08-30 18:43:39 - pico-train - INFO - Step 78000 -- 📊 Evaluation Results 2025-08-30 18:43:39 - pico-train - INFO - └── paloma: inf 2025-08-30 18:43:39 - pico-train - INFO - ================================================== 2025-08-30 18:43:39 - pico-train - INFO - ✨ Training Configuration 2025-08-30 18:43:39 - pico-train - INFO - ================================================== 2025-08-30 18:43:39 - pico-train - INFO - ╭─────────────────────────────────────────────────────╮ 2025-08-30 18:43:39 - pico-train - INFO - │ checkpointing: │ 2025-08-30 18:43:39 - pico-train - INFO - │ checkpoints_dir: checkpoints │ 2025-08-30 18:43:39 - pico-train - INFO - │ evaluation: │ 2025-08-30 18:43:39 - pico-train - INFO - │ eval_results_dir: eval_results │ 2025-08-30 18:43:39 - pico-train - INFO - │ fabric_checkpoint_dir: fabric_state │ 2025-08-30 18:43:39 - pico-train - INFO - │ fabric_checkpoint_filename: checkpoint.pt │ 2025-08-30 18:43:39 - pico-train - INFO - │ hf_checkpoint: │ 2025-08-30 18:43:39 - pico-train - INFO - │ collection_slug: null │ 2025-08-30 18:43:39 - pico-train - INFO - │ repo_id: ThomasTheMaker/pico-decoder-tiny │ 2025-08-30 18:43:39 - pico-train - INFO - │ learning_dynamics: │ 2025-08-30 18:43:39 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 18:43:39 - pico-train - INFO - │ eval_data: null │ 2025-08-30 18:43:39 - pico-train - INFO - │ layer_suffixes: │ 2025-08-30 18:43:39 - pico-train - INFO - │ - attention.v_proj │ 2025-08-30 18:43:39 - pico-train - INFO - │ - attention.o_proj │ 2025-08-30 18:43:39 - pico-train - INFO - │ - swiglu.w_2 │ 2025-08-30 18:43:39 - pico-train - INFO - │ sequence_idx: -1 │ 2025-08-30 18:43:39 - pico-train - INFO - │ learning_dynamics_dir: learning_dynamics │ 2025-08-30 18:43:39 - pico-train - INFO - │ logs_dir: logs │ 2025-08-30 18:43:39 - pico-train - INFO - │ run_name: pico-decoder-tiny-dolma10M-v1 │ 2025-08-30 18:43:39 - pico-train - INFO - │ runs_dir: runs │ 2025-08-30 18:43:39 - pico-train - INFO - │ save_every_n_steps: 2000 │ 2025-08-30 18:43:39 - pico-train - INFO - │ save_to_hf: true │ 2025-08-30 18:43:39 - pico-train - INFO - │ training: │ 2025-08-30 18:43:39 - pico-train - INFO - │ auto_resume: true │ 2025-08-30 18:43:39 - pico-train - INFO - │ data: │ 2025-08-30 18:43:39 - pico-train - INFO - │ dataloader: │ 2025-08-30 18:43:39 - pico-train - INFO - │ batch_size: 16 │ 2025-08-30 18:43:39 - pico-train - INFO - │ dataset: │ 2025-08-30 18:43:39 - pico-train - INFO - │ name: ThomasTheMaker/pretokenized-dolma-10M │ 2025-08-30 18:43:39 - pico-train - INFO - │ tokenizer: │ 2025-08-30 18:43:39 - pico-train - INFO - │ name: allenai/OLMo-7B-0724-hf │ 2025-08-30 18:43:39 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 18:43:39 - pico-train - INFO - │ evaluation: │ 2025-08-30 18:43:39 - pico-train - INFO - │ metrics: │ 2025-08-30 18:43:39 - pico-train - INFO - │ - paloma │ 2025-08-30 18:43:39 - pico-train - INFO - │ paloma: │ 2025-08-30 18:43:39 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 18:43:39 - pico-train - INFO - │ dataset_name: pico-lm/pretokenized-paloma-tinsy │ 2025-08-30 18:43:39 - pico-train - INFO - │ dataset_split: val │ 2025-08-30 18:43:39 - pico-train - INFO - │ max_length: 2048 │ 2025-08-30 18:43:39 - pico-train - INFO - │ model: │ 2025-08-30 18:43:39 - pico-train - INFO - │ activation_hidden_dim: 384 │ 2025-08-30 18:43:39 - pico-train - INFO - │ attention_n_heads: 12 │ 2025-08-30 18:43:39 - pico-train - INFO - │ attention_n_kv_heads: 4 │ 2025-08-30 18:43:39 - pico-train - INFO - │ batch_size: 1024 │ 2025-08-30 18:43:39 - pico-train - INFO - │ d_model: 96 │ 2025-08-30 18:43:39 - pico-train - INFO - │ max_seq_len: 2048 │ 2025-08-30 18:43:39 - pico-train - INFO - │ model_type: pico_decoder │ 2025-08-30 18:43:39 - pico-train - INFO - │ n_layers: 12 │ 2025-08-30 18:43:39 - pico-train - INFO - │ norm_eps: 1.0e-06 │ 2025-08-30 18:43:39 - pico-train - INFO - │ position_emb_theta: 10000.0 │ 2025-08-30 18:43:39 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 18:43:39 - pico-train - INFO - │ monitoring: │ 2025-08-30 18:43:39 - pico-train - INFO - │ logging: │ 2025-08-30 18:43:39 - pico-train - INFO - │ log_every_n_steps: 100 │ 2025-08-30 18:43:39 - pico-train - INFO - │ log_level: INFO │ 2025-08-30 18:43:39 - pico-train - INFO - │ save_to_wandb: false │ 2025-08-30 18:43:39 - pico-train - INFO - │ wandb: │ 2025-08-30 18:43:39 - pico-train - INFO - │ entity: boymyc │ 2025-08-30 18:43:39 - pico-train - INFO - │ project: pico-decoder-tiny │ 2025-08-30 18:43:39 - pico-train - INFO - │ training: │ 2025-08-30 18:43:39 - pico-train - INFO - │ fabric: │ 2025-08-30 18:43:39 - pico-train - INFO - │ accelerator: cuda │ 2025-08-30 18:43:39 - pico-train - INFO - │ num_devices: 1 │ 2025-08-30 18:43:39 - pico-train - INFO - │ num_nodes: 1 │ 2025-08-30 18:43:39 - pico-train - INFO - │ precision: bf16-mixed │ 2025-08-30 18:43:39 - pico-train - INFO - │ max_steps: 100000 │ 2025-08-30 18:43:39 - pico-train - INFO - │ optimization: │ 2025-08-30 18:43:39 - pico-train - INFO - │ gradient_accumulation_steps: 1 │ 2025-08-30 18:43:39 - pico-train - INFO - │ lr: 0.0002 │ 2025-08-30 18:43:39 - pico-train - INFO - │ lr_scheduler: cosine │ 2025-08-30 18:43:39 - pico-train - INFO - │ lr_warmup_steps: 2000 │ 2025-08-30 18:43:39 - pico-train - INFO - │ optimizer: adamw │ 2025-08-30 18:43:39 - pico-train - INFO - │ │ 2025-08-30 18:43:39 - pico-train - INFO - ╰─────────────────────────────────────────────────────╯ 2025-08-30 18:43:39 - pico-train - INFO - ================================================== 2025-08-30 18:43:39 - pico-train - INFO - ⛭ Runtime Summary: 2025-08-30 18:43:39 - pico-train - INFO - ================================================== 2025-08-30 18:43:39 - pico-train - INFO - Starting from step: 78000 2025-08-30 18:43:39 - pico-train - INFO - Model Setup: 2025-08-30 18:43:39 - pico-train - INFO - └─ Total Parameters: 11,282,784 2025-08-30 18:43:39 - pico-train - INFO - └─ Trainable Parameters: 11,282,784 2025-08-30 18:43:39 - pico-train - INFO - Distributed Setup: 2025-08-30 18:43:39 - pico-train - INFO - └─ Number of Devices: 1 2025-08-30 18:43:39 - pico-train - INFO - └─ Device Type: NVIDIA H100 80GB HBM3 2025-08-30 18:43:39 - pico-train - INFO - └─ Available Memory: 85.03 GB 2025-08-30 18:43:39 - pico-train - INFO - Software Setup: 2025-08-30 18:43:39 - pico-train - INFO - └─ Python Version: 3.12.3 2025-08-30 18:43:39 - pico-train - INFO - └─ PyTorch Version: 2.8.0+cu128 2025-08-30 18:43:39 - pico-train - INFO - └─ CUDA Version: 12.8 2025-08-30 18:43:39 - pico-train - INFO - └─ Operating System: Linux 6.8.0-71-generic 2025-08-30 18:43:39 - pico-train - INFO - Batch Size Configuration: 2025-08-30 18:43:39 - pico-train - INFO - └─ Global Batch Size: 16 2025-08-30 18:43:39 - pico-train - INFO - └─ Per Device Batch Size: 16 2025-08-30 18:43:39 - pico-train - INFO - └─ Gradient Accumulation Steps: 1 2025-08-30 18:43:39 - pico-train - INFO - ================================================== 2025-08-30 18:43:40 - pico-train - INFO - Step 78000 -- 🔄 Training Metrics 2025-08-30 18:43:40 - pico-train - INFO - ├── Loss: 4.5461 2025-08-30 18:43:40 - pico-train - INFO - ├── Learning Rate: 2.39e-05 2025-08-30 18:43:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:43:40 - pico-train - INFO - Step 78000 -- 📈 Saving Learning Dynamics 2025-08-30 18:44:34 - pico-train - INFO - Step 78100 -- 🔄 Training Metrics 2025-08-30 18:44:34 - pico-train - INFO - ├── Loss: 4.7732 2025-08-30 18:44:34 - pico-train - INFO - ├── Learning Rate: 2.36e-05 2025-08-30 18:44:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:45:26 - pico-train - INFO - Step 78200 -- 🔄 Training Metrics 2025-08-30 18:45:26 - pico-train - INFO - ├── Loss: 4.7809 2025-08-30 18:45:26 - pico-train - INFO - ├── Learning Rate: 2.34e-05 2025-08-30 18:45:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:46:18 - pico-train - INFO - Step 78300 -- 🔄 Training Metrics 2025-08-30 18:46:18 - pico-train - INFO - ├── Loss: 4.7659 2025-08-30 18:46:18 - pico-train - INFO - ├── Learning Rate: 2.32e-05 2025-08-30 18:46:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:47:16 - pico-train - INFO - Step 78400 -- 🔄 Training Metrics 2025-08-30 18:47:16 - pico-train - INFO - ├── Loss: 4.7466 2025-08-30 18:47:16 - pico-train - INFO - ├── Learning Rate: 2.30e-05 2025-08-30 18:47:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:48:27 - pico-train - INFO - Step 78500 -- 🔄 Training Metrics 2025-08-30 18:48:27 - pico-train - INFO - ├── Loss: 4.8076 2025-08-30 18:48:27 - pico-train - INFO - ├── Learning Rate: 2.28e-05 2025-08-30 18:48:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:49:39 - pico-train - INFO - Step 78600 -- 🔄 Training Metrics 2025-08-30 18:49:39 - pico-train - INFO - ├── Loss: 4.7884 2025-08-30 18:49:39 - pico-train - INFO - ├── Learning Rate: 2.26e-05 2025-08-30 18:49:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:50:50 - pico-train - INFO - Step 78700 -- 🔄 Training Metrics 2025-08-30 18:50:50 - pico-train - INFO - ├── Loss: 4.7882 2025-08-30 18:50:50 - pico-train - INFO - ├── Learning Rate: 2.24e-05 2025-08-30 18:50:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:51:55 - pico-train - INFO - Step 78800 -- 🔄 Training Metrics 2025-08-30 18:51:55 - pico-train - INFO - ├── Loss: 4.7942 2025-08-30 18:51:55 - pico-train - INFO - ├── Learning Rate: 2.22e-05 2025-08-30 18:51:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:52:48 - pico-train - INFO - Step 78900 -- 🔄 Training Metrics 2025-08-30 18:52:48 - pico-train - INFO - ├── Loss: 4.7966 2025-08-30 18:52:48 - pico-train - INFO - ├── Learning Rate: 2.20e-05 2025-08-30 18:52:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:53:41 - pico-train - INFO - Step 79000 -- 🔄 Training Metrics 2025-08-30 18:53:41 - pico-train - INFO - ├── Loss: 4.7800 2025-08-30 18:53:41 - pico-train - INFO - ├── Learning Rate: 2.18e-05 2025-08-30 18:53:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:54:34 - pico-train - INFO - Step 79100 -- 🔄 Training Metrics 2025-08-30 18:54:34 - pico-train - INFO - ├── Loss: 4.7808 2025-08-30 18:54:34 - pico-train - INFO - ├── Learning Rate: 2.16e-05 2025-08-30 18:54:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:55:27 - pico-train - INFO - Step 79200 -- 🔄 Training Metrics 2025-08-30 18:55:27 - pico-train - INFO - ├── Loss: 4.7704 2025-08-30 18:55:27 - pico-train - INFO - ├── Learning Rate: 2.14e-05 2025-08-30 18:55:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:56:20 - pico-train - INFO - Step 79300 -- 🔄 Training Metrics 2025-08-30 18:56:20 - pico-train - INFO - ├── Loss: 4.7921 2025-08-30 18:56:20 - pico-train - INFO - ├── Learning Rate: 2.12e-05 2025-08-30 18:56:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:57:12 - pico-train - INFO - Step 79400 -- 🔄 Training Metrics 2025-08-30 18:57:12 - pico-train - INFO - ├── Loss: 4.7701 2025-08-30 18:57:12 - pico-train - INFO - ├── Learning Rate: 2.10e-05 2025-08-30 18:57:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:58:05 - pico-train - INFO - Step 79500 -- 🔄 Training Metrics 2025-08-30 18:58:05 - pico-train - INFO - ├── Loss: 4.7990 2025-08-30 18:58:05 - pico-train - INFO - ├── Learning Rate: 2.08e-05 2025-08-30 18:58:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:58:58 - pico-train - INFO - Step 79600 -- 🔄 Training Metrics 2025-08-30 18:58:58 - pico-train - INFO - ├── Loss: 4.7864 2025-08-30 18:58:58 - pico-train - INFO - ├── Learning Rate: 2.06e-05 2025-08-30 18:58:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:59:51 - pico-train - INFO - Step 79700 -- 🔄 Training Metrics 2025-08-30 18:59:51 - pico-train - INFO - ├── Loss: 4.7747 2025-08-30 18:59:51 - pico-train - INFO - ├── Learning Rate: 2.04e-05 2025-08-30 18:59:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:00:44 - pico-train - INFO - Step 79800 -- 🔄 Training Metrics 2025-08-30 19:00:44 - pico-train - INFO - ├── Loss: 4.7703 2025-08-30 19:00:44 - pico-train - INFO - ├── Learning Rate: 2.02e-05 2025-08-30 19:00:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:01:37 - pico-train - INFO - Step 79900 -- 🔄 Training Metrics 2025-08-30 19:01:37 - pico-train - INFO - ├── Loss: 4.7738 2025-08-30 19:01:37 - pico-train - INFO - ├── Learning Rate: 2.01e-05 2025-08-30 19:01:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:02:29 - pico-train - INFO - Step 80000 -- 💾 Saving Checkpoint 2025-08-30 19:04:30 - pico-train - INFO - Step 80000 -- 📊 Evaluation Results 2025-08-30 19:04:30 - pico-train - INFO - └── paloma: inf 2025-08-30 19:04:31 - pico-train - INFO - Step 80000 -- 🔄 Training Metrics 2025-08-30 19:04:31 - pico-train - INFO - ├── Loss: 4.7781 2025-08-30 19:04:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:04:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:04:31 - pico-train - INFO - Step 80000 -- 📈 Saving Learning Dynamics 2025-08-30 19:05:25 - pico-train - INFO - Step 80100 -- 🔄 Training Metrics 2025-08-30 19:05:25 - pico-train - INFO - ├── Loss: 4.8125 2025-08-30 19:05:25 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:05:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:06:17 - pico-train - INFO - Step 80200 -- 🔄 Training Metrics 2025-08-30 19:06:17 - pico-train - INFO - ├── Loss: 4.7764 2025-08-30 19:06:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:06:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:07:09 - pico-train - INFO - Step 80300 -- 🔄 Training Metrics 2025-08-30 19:07:09 - pico-train - INFO - ├── Loss: 4.7498 2025-08-30 19:07:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:07:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:08:02 - pico-train - INFO - Step 80400 -- 🔄 Training Metrics 2025-08-30 19:08:02 - pico-train - INFO - ├── Loss: 4.7809 2025-08-30 19:08:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:08:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:08:55 - pico-train - INFO - Step 80500 -- 🔄 Training Metrics 2025-08-30 19:08:55 - pico-train - INFO - ├── Loss: 4.7766 2025-08-30 19:08:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:08:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:09:48 - pico-train - INFO - Step 80600 -- 🔄 Training Metrics 2025-08-30 19:09:48 - pico-train - INFO - ├── Loss: 4.7933 2025-08-30 19:09:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:09:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:10:40 - pico-train - INFO - Step 80700 -- 🔄 Training Metrics 2025-08-30 19:10:40 - pico-train - INFO - ├── Loss: 4.7826 2025-08-30 19:10:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:10:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:11:32 - pico-train - INFO - Step 80800 -- 🔄 Training Metrics 2025-08-30 19:11:32 - pico-train - INFO - ├── Loss: 4.7968 2025-08-30 19:11:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:11:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:12:24 - pico-train - INFO - Step 80900 -- 🔄 Training Metrics 2025-08-30 19:12:24 - pico-train - INFO - ├── Loss: 4.8019 2025-08-30 19:12:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:12:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:13:16 - pico-train - INFO - Step 81000 -- 🔄 Training Metrics 2025-08-30 19:13:16 - pico-train - INFO - ├── Loss: 4.7786 2025-08-30 19:13:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:13:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:14:07 - pico-train - INFO - Step 81100 -- 🔄 Training Metrics 2025-08-30 19:14:07 - pico-train - INFO - ├── Loss: 4.7870 2025-08-30 19:14:07 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:14:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:14:59 - pico-train - INFO - Step 81200 -- 🔄 Training Metrics 2025-08-30 19:14:59 - pico-train - INFO - ├── Loss: 4.7989 2025-08-30 19:14:59 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:14:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:15:51 - pico-train - INFO - Step 81300 -- 🔄 Training Metrics 2025-08-30 19:15:51 - pico-train - INFO - ├── Loss: 4.8003 2025-08-30 19:15:51 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:15:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:16:44 - pico-train - INFO - Step 81400 -- 🔄 Training Metrics 2025-08-30 19:16:44 - pico-train - INFO - ├── Loss: 4.7783 2025-08-30 19:16:44 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:16:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:17:36 - pico-train - INFO - Step 81500 -- 🔄 Training Metrics 2025-08-30 19:17:36 - pico-train - INFO - ├── Loss: 4.7549 2025-08-30 19:17:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:17:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:18:28 - pico-train - INFO - Step 81600 -- 🔄 Training Metrics 2025-08-30 19:18:28 - pico-train - INFO - ├── Loss: 4.7775 2025-08-30 19:18:28 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:18:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:19:19 - pico-train - INFO - Step 81700 -- 🔄 Training Metrics 2025-08-30 19:19:19 - pico-train - INFO - ├── Loss: 4.7858 2025-08-30 19:19:19 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:19:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:20:11 - pico-train - INFO - Step 81800 -- 🔄 Training Metrics 2025-08-30 19:20:11 - pico-train - INFO - ├── Loss: 4.7789 2025-08-30 19:20:11 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:20:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:21:03 - pico-train - INFO - Step 81900 -- 🔄 Training Metrics 2025-08-30 19:21:03 - pico-train - INFO - ├── Loss: 4.7737 2025-08-30 19:21:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:21:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:21:55 - pico-train - INFO - Step 82000 -- 💾 Saving Checkpoint 2025-08-30 19:23:45 - pico-train - INFO - Step 82000 -- 📊 Evaluation Results 2025-08-30 19:23:45 - pico-train - INFO - └── paloma: inf 2025-08-30 19:23:46 - pico-train - INFO - Step 82000 -- 🔄 Training Metrics 2025-08-30 19:23:46 - pico-train - INFO - ├── Loss: 4.7934 2025-08-30 19:23:46 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:23:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:23:46 - pico-train - INFO - Step 82000 -- 📈 Saving Learning Dynamics 2025-08-30 19:24:40 - pico-train - INFO - Step 82100 -- 🔄 Training Metrics 2025-08-30 19:24:40 - pico-train - INFO - ├── Loss: 4.7784 2025-08-30 19:24:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:24:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:25:32 - pico-train - INFO - Step 82200 -- 🔄 Training Metrics 2025-08-30 19:25:32 - pico-train - INFO - ├── Loss: 4.7837 2025-08-30 19:25:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:25:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:26:24 - pico-train - INFO - Step 82300 -- 🔄 Training Metrics 2025-08-30 19:26:24 - pico-train - INFO - ├── Loss: 4.7611 2025-08-30 19:26:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:26:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:27:16 - pico-train - INFO - Step 82400 -- 🔄 Training Metrics 2025-08-30 19:27:16 - pico-train - INFO - ├── Loss: 4.7873 2025-08-30 19:27:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:27:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:28:09 - pico-train - INFO - Step 82500 -- 🔄 Training Metrics 2025-08-30 19:28:09 - pico-train - INFO - ├── Loss: 4.7805 2025-08-30 19:28:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:28:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:29:02 - pico-train - INFO - Step 82600 -- 🔄 Training Metrics 2025-08-30 19:29:02 - pico-train - INFO - ├── Loss: 4.7728 2025-08-30 19:29:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:29:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:29:53 - pico-train - INFO - Step 82700 -- 🔄 Training Metrics 2025-08-30 19:29:53 - pico-train - INFO - ├── Loss: 4.7685 2025-08-30 19:29:53 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:29:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:30:45 - pico-train - INFO - Step 82800 -- 🔄 Training Metrics 2025-08-30 19:30:45 - pico-train - INFO - ├── Loss: 4.7772 2025-08-30 19:30:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:30:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:31:37 - pico-train - INFO - Step 82900 -- 🔄 Training Metrics 2025-08-30 19:31:37 - pico-train - INFO - ├── Loss: 4.7580 2025-08-30 19:31:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:31:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:32:30 - pico-train - INFO - Step 83000 -- 🔄 Training Metrics 2025-08-30 19:32:30 - pico-train - INFO - ├── Loss: 4.7907 2025-08-30 19:32:30 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:32:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:33:23 - pico-train - INFO - Step 83100 -- 🔄 Training Metrics 2025-08-30 19:33:23 - pico-train - INFO - ├── Loss: 4.7721 2025-08-30 19:33:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:33:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:34:16 - pico-train - INFO - Step 83200 -- 🔄 Training Metrics 2025-08-30 19:34:16 - pico-train - INFO - ├── Loss: 4.7750 2025-08-30 19:34:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:34:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:35:09 - pico-train - INFO - Step 83300 -- 🔄 Training Metrics 2025-08-30 19:35:09 - pico-train - INFO - ├── Loss: 4.7808 2025-08-30 19:35:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:35:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:36:02 - pico-train - INFO - Step 83400 -- 🔄 Training Metrics 2025-08-30 19:36:02 - pico-train - INFO - ├── Loss: 4.7869 2025-08-30 19:36:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:36:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:36:55 - pico-train - INFO - Step 83500 -- 🔄 Training Metrics 2025-08-30 19:36:55 - pico-train - INFO - ├── Loss: 4.7670 2025-08-30 19:36:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:36:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:37:48 - pico-train - INFO - Step 83600 -- 🔄 Training Metrics 2025-08-30 19:37:48 - pico-train - INFO - ├── Loss: 4.7615 2025-08-30 19:37:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:37:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:38:40 - pico-train - INFO - Step 83700 -- 🔄 Training Metrics 2025-08-30 19:38:40 - pico-train - INFO - ├── Loss: 4.7976 2025-08-30 19:38:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:38:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:39:32 - pico-train - INFO - Step 83800 -- 🔄 Training Metrics 2025-08-30 19:39:32 - pico-train - INFO - ├── Loss: 4.7549 2025-08-30 19:39:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:39:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:40:24 - pico-train - INFO - Step 83900 -- 🔄 Training Metrics 2025-08-30 19:40:24 - pico-train - INFO - ├── Loss: 4.7879 2025-08-30 19:40:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:40:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:41:15 - pico-train - INFO - Step 84000 -- 💾 Saving Checkpoint 2025-08-30 19:43:17 - pico-train - INFO - Step 84000 -- 📊 Evaluation Results 2025-08-30 19:43:17 - pico-train - INFO - └── paloma: inf 2025-08-30 19:43:18 - pico-train - INFO - Step 84000 -- 🔄 Training Metrics 2025-08-30 19:43:18 - pico-train - INFO - ├── Loss: 4.7979 2025-08-30 19:43:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:43:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:43:18 - pico-train - INFO - Step 84000 -- 📈 Saving Learning Dynamics 2025-08-30 19:44:12 - pico-train - INFO - Step 84100 -- 🔄 Training Metrics 2025-08-30 19:44:12 - pico-train - INFO - ├── Loss: 4.8088 2025-08-30 19:44:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:44:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:45:04 - pico-train - INFO - Step 84200 -- 🔄 Training Metrics 2025-08-30 19:45:04 - pico-train - INFO - ├── Loss: 4.7678 2025-08-30 19:45:04 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:45:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:45:56 - pico-train - INFO - Step 84300 -- 🔄 Training Metrics 2025-08-30 19:45:56 - pico-train - INFO - ├── Loss: 4.7725 2025-08-30 19:45:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:45:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:46:48 - pico-train - INFO - Step 84400 -- 🔄 Training Metrics 2025-08-30 19:46:48 - pico-train - INFO - ├── Loss: 4.7841 2025-08-30 19:46:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:46:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:47:40 - pico-train - INFO - Step 84500 -- 🔄 Training Metrics 2025-08-30 19:47:40 - pico-train - INFO - ├── Loss: 4.7708 2025-08-30 19:47:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:47:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:48:32 - pico-train - INFO - Step 84600 -- 🔄 Training Metrics 2025-08-30 19:48:32 - pico-train - INFO - ├── Loss: 4.7748 2025-08-30 19:48:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:48:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:49:24 - pico-train - INFO - Step 84700 -- 🔄 Training Metrics 2025-08-30 19:49:24 - pico-train - INFO - ├── Loss: 4.7714 2025-08-30 19:49:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:49:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:50:16 - pico-train - INFO - Step 84800 -- 🔄 Training Metrics 2025-08-30 19:50:16 - pico-train - INFO - ├── Loss: 4.7860 2025-08-30 19:50:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:50:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:51:09 - pico-train - INFO - Step 84900 -- 🔄 Training Metrics 2025-08-30 19:51:09 - pico-train - INFO - ├── Loss: 4.7671 2025-08-30 19:51:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:51:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:52:02 - pico-train - INFO - Step 85000 -- 🔄 Training Metrics 2025-08-30 19:52:02 - pico-train - INFO - ├── Loss: 4.7753 2025-08-30 19:52:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:52:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:52:55 - pico-train - INFO - Step 85100 -- 🔄 Training Metrics 2025-08-30 19:52:55 - pico-train - INFO - ├── Loss: 4.7335 2025-08-30 19:52:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:52:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:53:48 - pico-train - INFO - Step 85200 -- 🔄 Training Metrics 2025-08-30 19:53:48 - pico-train - INFO - ├── Loss: 4.7700 2025-08-30 19:53:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:53:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:54:41 - pico-train - INFO - Step 85300 -- 🔄 Training Metrics 2025-08-30 19:54:41 - pico-train - INFO - ├── Loss: 4.7800 2025-08-30 19:54:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:54:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:55:34 - pico-train - INFO - Step 85400 -- 🔄 Training Metrics 2025-08-30 19:55:34 - pico-train - INFO - ├── Loss: 4.7782 2025-08-30 19:55:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:55:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:56:27 - pico-train - INFO - Step 85500 -- 🔄 Training Metrics 2025-08-30 19:56:27 - pico-train - INFO - ├── Loss: 4.7698 2025-08-30 19:56:27 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:56:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:57:21 - pico-train - INFO - Step 85600 -- 🔄 Training Metrics 2025-08-30 19:57:21 - pico-train - INFO - ├── Loss: 4.7835 2025-08-30 19:57:21 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:57:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:58:14 - pico-train - INFO - Step 85700 -- 🔄 Training Metrics 2025-08-30 19:58:14 - pico-train - INFO - ├── Loss: 4.7651 2025-08-30 19:58:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:58:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:59:06 - pico-train - INFO - Step 85800 -- 🔄 Training Metrics 2025-08-30 19:59:06 - pico-train - INFO - ├── Loss: 4.7900 2025-08-30 19:59:06 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:59:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 19:59:58 - pico-train - INFO - Step 85900 -- 🔄 Training Metrics 2025-08-30 19:59:58 - pico-train - INFO - ├── Loss: 4.7797 2025-08-30 19:59:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 19:59:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:00:50 - pico-train - INFO - Step 86000 -- 💾 Saving Checkpoint 2025-08-30 20:02:55 - pico-train - INFO - Step 86000 -- 📊 Evaluation Results 2025-08-30 20:02:55 - pico-train - INFO - └── paloma: inf 2025-08-30 20:02:56 - pico-train - INFO - Step 86000 -- 🔄 Training Metrics 2025-08-30 20:02:56 - pico-train - INFO - ├── Loss: 4.7650 2025-08-30 20:02:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:02:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:02:56 - pico-train - INFO - Step 86000 -- 📈 Saving Learning Dynamics 2025-08-30 20:03:51 - pico-train - INFO - Step 86100 -- 🔄 Training Metrics 2025-08-30 20:03:51 - pico-train - INFO - ├── Loss: 4.7682 2025-08-30 20:03:51 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:03:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:04:44 - pico-train - INFO - Step 86200 -- 🔄 Training Metrics 2025-08-30 20:04:44 - pico-train - INFO - ├── Loss: 4.7968 2025-08-30 20:04:44 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:04:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:05:37 - pico-train - INFO - Step 86300 -- 🔄 Training Metrics 2025-08-30 20:05:37 - pico-train - INFO - ├── Loss: 4.7895 2025-08-30 20:05:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:05:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:06:30 - pico-train - INFO - Step 86400 -- 🔄 Training Metrics 2025-08-30 20:06:30 - pico-train - INFO - ├── Loss: 4.7680 2025-08-30 20:06:30 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:06:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:07:23 - pico-train - INFO - Step 86500 -- 🔄 Training Metrics 2025-08-30 20:07:23 - pico-train - INFO - ├── Loss: 4.7686 2025-08-30 20:07:23 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:07:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:08:16 - pico-train - INFO - Step 86600 -- 🔄 Training Metrics 2025-08-30 20:08:16 - pico-train - INFO - ├── Loss: 4.7828 2025-08-30 20:08:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:08:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:09:10 - pico-train - INFO - Step 86700 -- 🔄 Training Metrics 2025-08-30 20:09:10 - pico-train - INFO - ├── Loss: 4.7595 2025-08-30 20:09:10 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:09:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:10:02 - pico-train - INFO - Step 86800 -- 🔄 Training Metrics 2025-08-30 20:10:02 - pico-train - INFO - ├── Loss: 4.7808 2025-08-30 20:10:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:10:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:10:56 - pico-train - INFO - Step 86900 -- 🔄 Training Metrics 2025-08-30 20:10:56 - pico-train - INFO - ├── Loss: 4.7668 2025-08-30 20:10:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:10:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:11:49 - pico-train - INFO - Step 87000 -- 🔄 Training Metrics 2025-08-30 20:11:49 - pico-train - INFO - ├── Loss: 4.7481 2025-08-30 20:11:49 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:11:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:12:41 - pico-train - INFO - Step 87100 -- 🔄 Training Metrics 2025-08-30 20:12:41 - pico-train - INFO - ├── Loss: 4.7536 2025-08-30 20:12:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:12:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:13:32 - pico-train - INFO - Step 87200 -- 🔄 Training Metrics 2025-08-30 20:13:32 - pico-train - INFO - ├── Loss: 4.7748 2025-08-30 20:13:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:13:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:14:26 - pico-train - INFO - Step 87300 -- 🔄 Training Metrics 2025-08-30 20:14:26 - pico-train - INFO - ├── Loss: 4.7597 2025-08-30 20:14:26 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:14:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:15:19 - pico-train - INFO - Step 87400 -- 🔄 Training Metrics 2025-08-30 20:15:19 - pico-train - INFO - ├── Loss: 4.7862 2025-08-30 20:15:19 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:15:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:16:12 - pico-train - INFO - Step 87500 -- 🔄 Training Metrics 2025-08-30 20:16:12 - pico-train - INFO - ├── Loss: 4.7682 2025-08-30 20:16:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:16:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:17:05 - pico-train - INFO - Step 87600 -- 🔄 Training Metrics 2025-08-30 20:17:05 - pico-train - INFO - ├── Loss: 4.8045 2025-08-30 20:17:05 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:17:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:17:58 - pico-train - INFO - Step 87700 -- 🔄 Training Metrics 2025-08-30 20:17:58 - pico-train - INFO - ├── Loss: 4.7911 2025-08-30 20:17:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:17:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:18:51 - pico-train - INFO - Step 87800 -- 🔄 Training Metrics 2025-08-30 20:18:51 - pico-train - INFO - ├── Loss: 4.7530 2025-08-30 20:18:51 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:18:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:19:45 - pico-train - INFO - Step 87900 -- 🔄 Training Metrics 2025-08-30 20:19:45 - pico-train - INFO - ├── Loss: 4.7618 2025-08-30 20:19:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:19:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:20:37 - pico-train - INFO - Step 88000 -- 💾 Saving Checkpoint 2025-08-30 20:22:42 - pico-train - INFO - Step 88000 -- 📊 Evaluation Results 2025-08-30 20:22:42 - pico-train - INFO - └── paloma: inf 2025-08-30 20:22:43 - pico-train - INFO - Step 88000 -- 🔄 Training Metrics 2025-08-30 20:22:43 - pico-train - INFO - ├── Loss: 4.7796 2025-08-30 20:22:43 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:22:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:22:43 - pico-train - INFO - Step 88000 -- 📈 Saving Learning Dynamics 2025-08-30 20:23:38 - pico-train - INFO - Step 88100 -- 🔄 Training Metrics 2025-08-30 20:23:38 - pico-train - INFO - ├── Loss: 4.7432 2025-08-30 20:23:38 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:23:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:24:31 - pico-train - INFO - Step 88200 -- 🔄 Training Metrics 2025-08-30 20:24:31 - pico-train - INFO - ├── Loss: 4.7725 2025-08-30 20:24:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:24:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:25:24 - pico-train - INFO - Step 88300 -- 🔄 Training Metrics 2025-08-30 20:25:24 - pico-train - INFO - ├── Loss: 4.7749 2025-08-30 20:25:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:25:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:26:17 - pico-train - INFO - Step 88400 -- 🔄 Training Metrics 2025-08-30 20:26:17 - pico-train - INFO - ├── Loss: 4.7883 2025-08-30 20:26:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:26:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:27:10 - pico-train - INFO - Step 88500 -- 🔄 Training Metrics 2025-08-30 20:27:10 - pico-train - INFO - ├── Loss: 4.7871 2025-08-30 20:27:10 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:27:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:28:03 - pico-train - INFO - Step 88600 -- 🔄 Training Metrics 2025-08-30 20:28:03 - pico-train - INFO - ├── Loss: 4.7894 2025-08-30 20:28:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:28:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:28:56 - pico-train - INFO - Step 88700 -- 🔄 Training Metrics 2025-08-30 20:28:56 - pico-train - INFO - ├── Loss: 4.7812 2025-08-30 20:28:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:28:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:29:49 - pico-train - INFO - Step 88800 -- 🔄 Training Metrics 2025-08-30 20:29:49 - pico-train - INFO - ├── Loss: 4.7371 2025-08-30 20:29:49 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:29:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:30:43 - pico-train - INFO - Step 88900 -- 🔄 Training Metrics 2025-08-30 20:30:43 - pico-train - INFO - ├── Loss: 4.7666 2025-08-30 20:30:43 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:30:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:31:36 - pico-train - INFO - Step 89000 -- 🔄 Training Metrics 2025-08-30 20:31:36 - pico-train - INFO - ├── Loss: 4.7623 2025-08-30 20:31:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:31:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:32:29 - pico-train - INFO - Step 89100 -- 🔄 Training Metrics 2025-08-30 20:32:29 - pico-train - INFO - ├── Loss: 4.7911 2025-08-30 20:32:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:32:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:33:22 - pico-train - INFO - Step 89200 -- 🔄 Training Metrics 2025-08-30 20:33:22 - pico-train - INFO - ├── Loss: 4.7823 2025-08-30 20:33:22 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:33:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:34:15 - pico-train - INFO - Step 89300 -- 🔄 Training Metrics 2025-08-30 20:34:15 - pico-train - INFO - ├── Loss: 4.7830 2025-08-30 20:34:15 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:34:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:35:08 - pico-train - INFO - Step 89400 -- 🔄 Training Metrics 2025-08-30 20:35:08 - pico-train - INFO - ├── Loss: 4.7724 2025-08-30 20:35:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:35:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:36:01 - pico-train - INFO - Step 89500 -- 🔄 Training Metrics 2025-08-30 20:36:01 - pico-train - INFO - ├── Loss: 4.7654 2025-08-30 20:36:01 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:36:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:36:54 - pico-train - INFO - Step 89600 -- 🔄 Training Metrics 2025-08-30 20:36:54 - pico-train - INFO - ├── Loss: 4.7613 2025-08-30 20:36:54 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:36:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:37:47 - pico-train - INFO - Step 89700 -- 🔄 Training Metrics 2025-08-30 20:37:47 - pico-train - INFO - ├── Loss: 4.7544 2025-08-30 20:37:47 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:37:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:38:41 - pico-train - INFO - Step 89800 -- 🔄 Training Metrics 2025-08-30 20:38:41 - pico-train - INFO - ├── Loss: 4.7889 2025-08-30 20:38:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:38:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:39:34 - pico-train - INFO - Step 89900 -- 🔄 Training Metrics 2025-08-30 20:39:34 - pico-train - INFO - ├── Loss: 4.7928 2025-08-30 20:39:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:39:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:40:26 - pico-train - INFO - Step 90000 -- 💾 Saving Checkpoint 2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- 📊 Evaluation Results 2025-08-30 20:42:31 - pico-train - INFO - └── paloma: inf 2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- 🔄 Training Metrics 2025-08-30 20:42:31 - pico-train - INFO - ├── Loss: 4.7777 2025-08-30 20:42:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:42:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- 📈 Saving Learning Dynamics 2025-08-30 20:43:26 - pico-train - INFO - Step 90100 -- 🔄 Training Metrics 2025-08-30 20:43:26 - pico-train - INFO - ├── Loss: 4.7721 2025-08-30 20:43:26 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:43:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:44:18 - pico-train - INFO - Step 90200 -- 🔄 Training Metrics 2025-08-30 20:44:18 - pico-train - INFO - ├── Loss: 4.7616 2025-08-30 20:44:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:44:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:45:10 - pico-train - INFO - Step 90300 -- 🔄 Training Metrics 2025-08-30 20:45:10 - pico-train - INFO - ├── Loss: 4.7529 2025-08-30 20:45:10 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:45:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:46:04 - pico-train - INFO - Step 90400 -- 🔄 Training Metrics 2025-08-30 20:46:04 - pico-train - INFO - ├── Loss: 4.7656 2025-08-30 20:46:04 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:46:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:46:56 - pico-train - INFO - Step 90500 -- 🔄 Training Metrics 2025-08-30 20:46:56 - pico-train - INFO - ├── Loss: 4.7484 2025-08-30 20:46:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:46:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:47:50 - pico-train - INFO - Step 90600 -- 🔄 Training Metrics 2025-08-30 20:47:50 - pico-train - INFO - ├── Loss: 4.7811 2025-08-30 20:47:50 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:47:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:48:43 - pico-train - INFO - Step 90700 -- 🔄 Training Metrics 2025-08-30 20:48:43 - pico-train - INFO - ├── Loss: 4.7523 2025-08-30 20:48:43 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:48:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:49:36 - pico-train - INFO - Step 90800 -- 🔄 Training Metrics 2025-08-30 20:49:36 - pico-train - INFO - ├── Loss: 4.7822 2025-08-30 20:49:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:49:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:50:29 - pico-train - INFO - Step 90900 -- 🔄 Training Metrics 2025-08-30 20:50:29 - pico-train - INFO - ├── Loss: 4.7780 2025-08-30 20:50:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:50:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:51:22 - pico-train - INFO - Step 91000 -- 🔄 Training Metrics 2025-08-30 20:51:22 - pico-train - INFO - ├── Loss: 4.7850 2025-08-30 20:51:22 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:51:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:52:15 - pico-train - INFO - Step 91100 -- 🔄 Training Metrics 2025-08-30 20:52:15 - pico-train - INFO - ├── Loss: 4.7669 2025-08-30 20:52:15 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:52:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:53:09 - pico-train - INFO - Step 91200 -- 🔄 Training Metrics 2025-08-30 20:53:09 - pico-train - INFO - ├── Loss: 4.7713 2025-08-30 20:53:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:53:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:54:02 - pico-train - INFO - Step 91300 -- 🔄 Training Metrics 2025-08-30 20:54:02 - pico-train - INFO - ├── Loss: 4.7832 2025-08-30 20:54:02 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:54:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:54:55 - pico-train - INFO - Step 91400 -- 🔄 Training Metrics 2025-08-30 20:54:55 - pico-train - INFO - ├── Loss: 4.7749 2025-08-30 20:54:55 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:54:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:55:48 - pico-train - INFO - Step 91500 -- 🔄 Training Metrics 2025-08-30 20:55:48 - pico-train - INFO - ├── Loss: 4.7702 2025-08-30 20:55:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:55:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:56:41 - pico-train - INFO - Step 91600 -- 🔄 Training Metrics 2025-08-30 20:56:41 - pico-train - INFO - ├── Loss: 4.7792 2025-08-30 20:56:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:56:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:57:34 - pico-train - INFO - Step 91700 -- 🔄 Training Metrics 2025-08-30 20:57:34 - pico-train - INFO - ├── Loss: 4.7678 2025-08-30 20:57:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:57:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:58:28 - pico-train - INFO - Step 91800 -- 🔄 Training Metrics 2025-08-30 20:58:28 - pico-train - INFO - ├── Loss: 4.7831 2025-08-30 20:58:28 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:58:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 20:59:21 - pico-train - INFO - Step 91900 -- 🔄 Training Metrics 2025-08-30 20:59:21 - pico-train - INFO - ├── Loss: 4.7746 2025-08-30 20:59:21 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 20:59:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:00:13 - pico-train - INFO - Step 92000 -- 💾 Saving Checkpoint 2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- 📊 Evaluation Results 2025-08-30 21:02:18 - pico-train - INFO - └── paloma: inf 2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- 🔄 Training Metrics 2025-08-30 21:02:18 - pico-train - INFO - ├── Loss: 4.7812 2025-08-30 21:02:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:02:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- 📈 Saving Learning Dynamics 2025-08-30 21:03:14 - pico-train - INFO - Step 92100 -- 🔄 Training Metrics 2025-08-30 21:03:14 - pico-train - INFO - ├── Loss: 4.7569 2025-08-30 21:03:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:03:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:04:06 - pico-train - INFO - Step 92200 -- 🔄 Training Metrics 2025-08-30 21:04:06 - pico-train - INFO - ├── Loss: 4.7846 2025-08-30 21:04:06 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:04:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:04:58 - pico-train - INFO - Step 92300 -- 🔄 Training Metrics 2025-08-30 21:04:58 - pico-train - INFO - ├── Loss: 4.7687 2025-08-30 21:04:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:04:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:05:50 - pico-train - INFO - Step 92400 -- 🔄 Training Metrics 2025-08-30 21:05:50 - pico-train - INFO - ├── Loss: 4.7699 2025-08-30 21:05:50 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:05:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:06:42 - pico-train - INFO - Step 92500 -- 🔄 Training Metrics 2025-08-30 21:06:42 - pico-train - INFO - ├── Loss: 4.7961 2025-08-30 21:06:42 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:06:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:07:34 - pico-train - INFO - Step 92600 -- 🔄 Training Metrics 2025-08-30 21:07:34 - pico-train - INFO - ├── Loss: 4.7682 2025-08-30 21:07:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:07:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:08:26 - pico-train - INFO - Step 92700 -- 🔄 Training Metrics 2025-08-30 21:08:26 - pico-train - INFO - ├── Loss: 4.7786 2025-08-30 21:08:26 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:08:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:09:18 - pico-train - INFO - Step 92800 -- 🔄 Training Metrics 2025-08-30 21:09:18 - pico-train - INFO - ├── Loss: 4.7716 2025-08-30 21:09:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:09:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:10:11 - pico-train - INFO - Step 92900 -- 🔄 Training Metrics 2025-08-30 21:10:11 - pico-train - INFO - ├── Loss: 4.7837 2025-08-30 21:10:11 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:10:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:11:04 - pico-train - INFO - Step 93000 -- 🔄 Training Metrics 2025-08-30 21:11:04 - pico-train - INFO - ├── Loss: 4.7811 2025-08-30 21:11:04 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:11:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:11:57 - pico-train - INFO - Step 93100 -- 🔄 Training Metrics 2025-08-30 21:11:57 - pico-train - INFO - ├── Loss: 4.7830 2025-08-30 21:11:57 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:11:57 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:12:50 - pico-train - INFO - Step 93200 -- 🔄 Training Metrics 2025-08-30 21:12:50 - pico-train - INFO - ├── Loss: 4.7935 2025-08-30 21:12:50 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:12:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:13:43 - pico-train - INFO - Step 93300 -- 🔄 Training Metrics 2025-08-30 21:13:43 - pico-train - INFO - ├── Loss: 4.8135 2025-08-30 21:13:43 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:13:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:14:36 - pico-train - INFO - Step 93400 -- 🔄 Training Metrics 2025-08-30 21:14:36 - pico-train - INFO - ├── Loss: 4.7767 2025-08-30 21:14:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:14:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:15:29 - pico-train - INFO - Step 93500 -- 🔄 Training Metrics 2025-08-30 21:15:29 - pico-train - INFO - ├── Loss: 4.8005 2025-08-30 21:15:29 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:15:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:16:22 - pico-train - INFO - Step 93600 -- 🔄 Training Metrics 2025-08-30 21:16:22 - pico-train - INFO - ├── Loss: 4.7913 2025-08-30 21:16:22 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:16:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:17:15 - pico-train - INFO - Step 93700 -- 🔄 Training Metrics 2025-08-30 21:17:15 - pico-train - INFO - ├── Loss: 4.7739 2025-08-30 21:17:15 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:17:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:18:08 - pico-train - INFO - Step 93800 -- 🔄 Training Metrics 2025-08-30 21:18:08 - pico-train - INFO - ├── Loss: 4.7875 2025-08-30 21:18:08 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:18:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:19:00 - pico-train - INFO - Step 93900 -- 🔄 Training Metrics 2025-08-30 21:19:00 - pico-train - INFO - ├── Loss: 4.7801 2025-08-30 21:19:00 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:19:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:19:52 - pico-train - INFO - Step 94000 -- 💾 Saving Checkpoint 2025-08-30 21:21:41 - pico-train - INFO - Step 94000 -- 📊 Evaluation Results 2025-08-30 21:21:41 - pico-train - INFO - └── paloma: inf 2025-08-30 21:21:42 - pico-train - INFO - Step 94000 -- 🔄 Training Metrics 2025-08-30 21:21:42 - pico-train - INFO - ├── Loss: 4.7826 2025-08-30 21:21:42 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:21:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:21:42 - pico-train - INFO - Step 94000 -- 📈 Saving Learning Dynamics 2025-08-30 21:22:36 - pico-train - INFO - Step 94100 -- 🔄 Training Metrics 2025-08-30 21:22:36 - pico-train - INFO - ├── Loss: 4.7712 2025-08-30 21:22:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:22:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:23:28 - pico-train - INFO - Step 94200 -- 🔄 Training Metrics 2025-08-30 21:23:28 - pico-train - INFO - ├── Loss: 4.7528 2025-08-30 21:23:28 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:23:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:24:21 - pico-train - INFO - Step 94300 -- 🔄 Training Metrics 2025-08-30 21:24:21 - pico-train - INFO - ├── Loss: 4.7867 2025-08-30 21:24:21 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:24:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:25:14 - pico-train - INFO - Step 94400 -- 🔄 Training Metrics 2025-08-30 21:25:14 - pico-train - INFO - ├── Loss: 4.7694 2025-08-30 21:25:14 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:25:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:26:07 - pico-train - INFO - Step 94500 -- 🔄 Training Metrics 2025-08-30 21:26:07 - pico-train - INFO - ├── Loss: 4.7677 2025-08-30 21:26:07 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:26:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:26:59 - pico-train - INFO - Step 94600 -- 🔄 Training Metrics 2025-08-30 21:26:59 - pico-train - INFO - ├── Loss: 4.7968 2025-08-30 21:26:59 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:26:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:27:52 - pico-train - INFO - Step 94700 -- 🔄 Training Metrics 2025-08-30 21:27:52 - pico-train - INFO - ├── Loss: 4.7716 2025-08-30 21:27:52 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:27:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:28:44 - pico-train - INFO - Step 94800 -- 🔄 Training Metrics 2025-08-30 21:28:44 - pico-train - INFO - ├── Loss: 4.7446 2025-08-30 21:28:44 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:28:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:29:36 - pico-train - INFO - Step 94900 -- 🔄 Training Metrics 2025-08-30 21:29:36 - pico-train - INFO - ├── Loss: 4.7763 2025-08-30 21:29:36 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:29:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:30:28 - pico-train - INFO - Step 95000 -- 🔄 Training Metrics 2025-08-30 21:30:28 - pico-train - INFO - ├── Loss: 4.7830 2025-08-30 21:30:28 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:30:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:31:20 - pico-train - INFO - Step 95100 -- 🔄 Training Metrics 2025-08-30 21:31:20 - pico-train - INFO - ├── Loss: 4.7890 2025-08-30 21:31:20 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:31:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:32:13 - pico-train - INFO - Step 95200 -- 🔄 Training Metrics 2025-08-30 21:32:13 - pico-train - INFO - ├── Loss: 4.7685 2025-08-30 21:32:13 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:32:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:33:06 - pico-train - INFO - Step 95300 -- 🔄 Training Metrics 2025-08-30 21:33:06 - pico-train - INFO - ├── Loss: 4.8231 2025-08-30 21:33:06 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:33:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:33:58 - pico-train - INFO - Step 95400 -- 🔄 Training Metrics 2025-08-30 21:33:58 - pico-train - INFO - ├── Loss: 4.7698 2025-08-30 21:33:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:33:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:34:50 - pico-train - INFO - Step 95500 -- 🔄 Training Metrics 2025-08-30 21:34:50 - pico-train - INFO - ├── Loss: 4.7614 2025-08-30 21:34:50 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:34:50 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:35:42 - pico-train - INFO - Step 95600 -- 🔄 Training Metrics 2025-08-30 21:35:42 - pico-train - INFO - ├── Loss: 4.7906 2025-08-30 21:35:42 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:35:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:36:34 - pico-train - INFO - Step 95700 -- 🔄 Training Metrics 2025-08-30 21:36:34 - pico-train - INFO - ├── Loss: 4.7685 2025-08-30 21:36:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:36:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:37:26 - pico-train - INFO - Step 95800 -- 🔄 Training Metrics 2025-08-30 21:37:26 - pico-train - INFO - ├── Loss: 4.7466 2025-08-30 21:37:26 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:37:26 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:38:18 - pico-train - INFO - Step 95900 -- 🔄 Training Metrics 2025-08-30 21:38:18 - pico-train - INFO - ├── Loss: 4.7771 2025-08-30 21:38:18 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:38:18 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:39:09 - pico-train - INFO - Step 96000 -- 💾 Saving Checkpoint 2025-08-30 21:41:02 - pico-train - INFO - Step 96000 -- 📊 Evaluation Results 2025-08-30 21:41:02 - pico-train - INFO - └── paloma: inf 2025-08-30 21:41:03 - pico-train - INFO - Step 96000 -- 🔄 Training Metrics 2025-08-30 21:41:03 - pico-train - INFO - ├── Loss: 4.7812 2025-08-30 21:41:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:41:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:41:03 - pico-train - INFO - Step 96000 -- 📈 Saving Learning Dynamics 2025-08-30 21:41:58 - pico-train - INFO - Step 96100 -- 🔄 Training Metrics 2025-08-30 21:41:58 - pico-train - INFO - ├── Loss: 4.7849 2025-08-30 21:41:58 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:41:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:42:51 - pico-train - INFO - Step 96200 -- 🔄 Training Metrics 2025-08-30 21:42:51 - pico-train - INFO - ├── Loss: 4.7649 2025-08-30 21:42:51 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:42:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:43:45 - pico-train - INFO - Step 96300 -- 🔄 Training Metrics 2025-08-30 21:43:45 - pico-train - INFO - ├── Loss: 4.7696 2025-08-30 21:43:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:43:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:44:37 - pico-train - INFO - Step 96400 -- 🔄 Training Metrics 2025-08-30 21:44:37 - pico-train - INFO - ├── Loss: 4.7768 2025-08-30 21:44:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:44:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:45:31 - pico-train - INFO - Step 96500 -- 🔄 Training Metrics 2025-08-30 21:45:31 - pico-train - INFO - ├── Loss: 4.7631 2025-08-30 21:45:31 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:45:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:46:24 - pico-train - INFO - Step 96600 -- 🔄 Training Metrics 2025-08-30 21:46:24 - pico-train - INFO - ├── Loss: 4.7730 2025-08-30 21:46:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:46:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:47:17 - pico-train - INFO - Step 96700 -- 🔄 Training Metrics 2025-08-30 21:47:17 - pico-train - INFO - ├── Loss: 4.7832 2025-08-30 21:47:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:47:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:48:10 - pico-train - INFO - Step 96800 -- 🔄 Training Metrics 2025-08-30 21:48:10 - pico-train - INFO - ├── Loss: 4.7508 2025-08-30 21:48:10 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:48:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:49:03 - pico-train - INFO - Step 96900 -- 🔄 Training Metrics 2025-08-30 21:49:03 - pico-train - INFO - ├── Loss: 4.7688 2025-08-30 21:49:03 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:49:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:49:56 - pico-train - INFO - Step 97000 -- 🔄 Training Metrics 2025-08-30 21:49:56 - pico-train - INFO - ├── Loss: 4.7887 2025-08-30 21:49:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:49:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:50:49 - pico-train - INFO - Step 97100 -- 🔄 Training Metrics 2025-08-30 21:50:49 - pico-train - INFO - ├── Loss: 4.7774 2025-08-30 21:50:49 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:50:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:51:43 - pico-train - INFO - Step 97200 -- 🔄 Training Metrics 2025-08-30 21:51:43 - pico-train - INFO - ├── Loss: 4.7731 2025-08-30 21:51:43 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:51:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:52:34 - pico-train - INFO - Step 97300 -- 🔄 Training Metrics 2025-08-30 21:52:34 - pico-train - INFO - ├── Loss: 4.7823 2025-08-30 21:52:34 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:52:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:53:27 - pico-train - INFO - Step 97400 -- 🔄 Training Metrics 2025-08-30 21:53:27 - pico-train - INFO - ├── Loss: 4.7782 2025-08-30 21:53:27 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:53:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:54:20 - pico-train - INFO - Step 97500 -- 🔄 Training Metrics 2025-08-30 21:54:20 - pico-train - INFO - ├── Loss: 4.7935 2025-08-30 21:54:20 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:54:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:55:13 - pico-train - INFO - Step 97600 -- 🔄 Training Metrics 2025-08-30 21:55:13 - pico-train - INFO - ├── Loss: 4.7908 2025-08-30 21:55:13 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:55:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:56:07 - pico-train - INFO - Step 97700 -- 🔄 Training Metrics 2025-08-30 21:56:07 - pico-train - INFO - ├── Loss: 4.7824 2025-08-30 21:56:07 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:56:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:56:59 - pico-train - INFO - Step 97800 -- 🔄 Training Metrics 2025-08-30 21:56:59 - pico-train - INFO - ├── Loss: 4.7913 2025-08-30 21:56:59 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:56:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:57:53 - pico-train - INFO - Step 97900 -- 🔄 Training Metrics 2025-08-30 21:57:53 - pico-train - INFO - ├── Loss: 4.7547 2025-08-30 21:57:53 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 21:57:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 21:58:45 - pico-train - INFO - Step 98000 -- 💾 Saving Checkpoint 2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- 📊 Evaluation Results 2025-08-30 22:00:46 - pico-train - INFO - └── paloma: inf 2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- 🔄 Training Metrics 2025-08-30 22:00:46 - pico-train - INFO - ├── Loss: 4.7784 2025-08-30 22:00:46 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:00:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- 📈 Saving Learning Dynamics 2025-08-30 22:01:41 - pico-train - INFO - Step 98100 -- 🔄 Training Metrics 2025-08-30 22:01:41 - pico-train - INFO - ├── Loss: 4.7555 2025-08-30 22:01:41 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:01:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:02:33 - pico-train - INFO - Step 98200 -- 🔄 Training Metrics 2025-08-30 22:02:33 - pico-train - INFO - ├── Loss: 4.7774 2025-08-30 22:02:33 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:02:33 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:03:25 - pico-train - INFO - Step 98300 -- 🔄 Training Metrics 2025-08-30 22:03:25 - pico-train - INFO - ├── Loss: 4.7961 2025-08-30 22:03:25 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:03:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:04:17 - pico-train - INFO - Step 98400 -- 🔄 Training Metrics 2025-08-30 22:04:17 - pico-train - INFO - ├── Loss: 4.7770 2025-08-30 22:04:17 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:04:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:05:09 - pico-train - INFO - Step 98500 -- 🔄 Training Metrics 2025-08-30 22:05:09 - pico-train - INFO - ├── Loss: 4.7789 2025-08-30 22:05:09 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:05:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:06:01 - pico-train - INFO - Step 98600 -- 🔄 Training Metrics 2025-08-30 22:06:01 - pico-train - INFO - ├── Loss: 4.7968 2025-08-30 22:06:01 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:06:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:06:53 - pico-train - INFO - Step 98700 -- 🔄 Training Metrics 2025-08-30 22:06:53 - pico-train - INFO - ├── Loss: 4.7691 2025-08-30 22:06:53 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:06:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:07:45 - pico-train - INFO - Step 98800 -- 🔄 Training Metrics 2025-08-30 22:07:45 - pico-train - INFO - ├── Loss: 4.7841 2025-08-30 22:07:45 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:07:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:08:37 - pico-train - INFO - Step 98900 -- 🔄 Training Metrics 2025-08-30 22:08:37 - pico-train - INFO - ├── Loss: 4.7785 2025-08-30 22:08:37 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:08:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:09:28 - pico-train - INFO - Step 99000 -- 🔄 Training Metrics 2025-08-30 22:09:28 - pico-train - INFO - ├── Loss: 4.7770 2025-08-30 22:09:28 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:09:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:10:20 - pico-train - INFO - Step 99100 -- 🔄 Training Metrics 2025-08-30 22:10:20 - pico-train - INFO - ├── Loss: 4.7774 2025-08-30 22:10:20 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:10:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:11:12 - pico-train - INFO - Step 99200 -- 🔄 Training Metrics 2025-08-30 22:11:12 - pico-train - INFO - ├── Loss: 4.7946 2025-08-30 22:11:12 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:11:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:12:04 - pico-train - INFO - Step 99300 -- 🔄 Training Metrics 2025-08-30 22:12:04 - pico-train - INFO - ├── Loss: 4.7804 2025-08-30 22:12:04 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:12:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:12:56 - pico-train - INFO - Step 99400 -- 🔄 Training Metrics 2025-08-30 22:12:56 - pico-train - INFO - ├── Loss: 4.7579 2025-08-30 22:12:56 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:12:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:13:48 - pico-train - INFO - Step 99500 -- 🔄 Training Metrics 2025-08-30 22:13:48 - pico-train - INFO - ├── Loss: 4.7916 2025-08-30 22:13:48 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:13:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:14:40 - pico-train - INFO - Step 99600 -- 🔄 Training Metrics 2025-08-30 22:14:40 - pico-train - INFO - ├── Loss: 4.7512 2025-08-30 22:14:40 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:14:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:15:32 - pico-train - INFO - Step 99700 -- 🔄 Training Metrics 2025-08-30 22:15:32 - pico-train - INFO - ├── Loss: 4.7774 2025-08-30 22:15:32 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:15:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:16:24 - pico-train - INFO - Step 99800 -- 🔄 Training Metrics 2025-08-30 22:16:24 - pico-train - INFO - ├── Loss: 4.7938 2025-08-30 22:16:24 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:16:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:17:16 - pico-train - INFO - Step 99900 -- 🔄 Training Metrics 2025-08-30 22:17:16 - pico-train - INFO - ├── Loss: 4.7923 2025-08-30 22:17:16 - pico-train - INFO - ├── Learning Rate: 2.00e-05 2025-08-30 22:17:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 22:18:07 - pico-train - INFO - Step 100000 -- 💾 Saving Checkpoint 2025-08-30 22:19:57 - pico-train - INFO - Step 100000 -- 📊 Evaluation Results 2025-08-30 22:19:57 - pico-train - INFO - └── paloma: inf 2025-08-30 22:19:57 - pico-train - INFO - 🎉 Training complete! Final step: 100000