2025-08-30 15:43:27 - pico-train - INFO - Step 62000 -- 📊 Evaluation Results 2025-08-30 15:43:27 - pico-train - INFO - └── paloma: inf 2025-08-30 15:43:28 - pico-train - INFO - ================================================== 2025-08-30 15:43:28 - pico-train - INFO - ✨ Training Configuration 2025-08-30 15:43:28 - pico-train - INFO - ================================================== 2025-08-30 15:43:28 - pico-train - INFO - ╭─────────────────────────────────────────────────────╮ 2025-08-30 15:43:28 - pico-train - INFO - │ checkpointing: │ 2025-08-30 15:43:28 - pico-train - INFO - │ checkpoints_dir: checkpoints │ 2025-08-30 15:43:28 - pico-train - INFO - │ evaluation: │ 2025-08-30 15:43:28 - pico-train - INFO - │ eval_results_dir: eval_results │ 2025-08-30 15:43:28 - pico-train - INFO - │ fabric_checkpoint_dir: fabric_state │ 2025-08-30 15:43:28 - pico-train - INFO - │ fabric_checkpoint_filename: checkpoint.pt │ 2025-08-30 15:43:28 - pico-train - INFO - │ hf_checkpoint: │ 2025-08-30 15:43:28 - pico-train - INFO - │ collection_slug: null │ 2025-08-30 15:43:28 - pico-train - INFO - │ repo_id: ThomasTheMaker/pico-decoder-tiny │ 2025-08-30 15:43:28 - pico-train - INFO - │ learning_dynamics: │ 2025-08-30 15:43:28 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 15:43:28 - pico-train - INFO - │ eval_data: null │ 2025-08-30 15:43:28 - pico-train - INFO - │ layer_suffixes: │ 2025-08-30 15:43:28 - pico-train - INFO - │ - attention.v_proj │ 2025-08-30 15:43:28 - pico-train - INFO - │ - attention.o_proj │ 2025-08-30 15:43:28 - pico-train - INFO - │ - swiglu.w_2 │ 2025-08-30 15:43:28 - pico-train - INFO - │ sequence_idx: -1 │ 2025-08-30 15:43:28 - pico-train - INFO - │ learning_dynamics_dir: learning_dynamics │ 2025-08-30 15:43:28 - pico-train - INFO - │ logs_dir: logs │ 2025-08-30 15:43:28 - pico-train - INFO - │ run_name: pico-decoder-tiny-dolma10M-v1 │ 2025-08-30 15:43:28 - pico-train - INFO - │ runs_dir: runs │ 2025-08-30 15:43:28 - pico-train - INFO - │ save_every_n_steps: 2000 │ 2025-08-30 15:43:28 - pico-train - INFO - │ save_to_hf: true │ 2025-08-30 15:43:28 - pico-train - INFO - │ training: │ 2025-08-30 15:43:28 - pico-train - INFO - │ auto_resume: true │ 2025-08-30 15:43:28 - pico-train - INFO - │ data: │ 2025-08-30 15:43:28 - pico-train - INFO - │ dataloader: │ 2025-08-30 15:43:28 - pico-train - INFO - │ batch_size: 16 │ 2025-08-30 15:43:28 - pico-train - INFO - │ dataset: │ 2025-08-30 15:43:28 - pico-train - INFO - │ name: ThomasTheMaker/pretokenized-dolma-10M │ 2025-08-30 15:43:28 - pico-train - INFO - │ tokenizer: │ 2025-08-30 15:43:28 - pico-train - INFO - │ name: allenai/OLMo-7B-0724-hf │ 2025-08-30 15:43:28 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 15:43:28 - pico-train - INFO - │ evaluation: │ 2025-08-30 15:43:28 - pico-train - INFO - │ metrics: │ 2025-08-30 15:43:28 - pico-train - INFO - │ - paloma │ 2025-08-30 15:43:28 - pico-train - INFO - │ paloma: │ 2025-08-30 15:43:28 - pico-train - INFO - │ batch_size: 1 │ 2025-08-30 15:43:28 - pico-train - INFO - │ dataset_name: pico-lm/pretokenized-paloma-tinsy │ 2025-08-30 15:43:28 - pico-train - INFO - │ dataset_split: val │ 2025-08-30 15:43:28 - pico-train - INFO - │ max_length: 2048 │ 2025-08-30 15:43:28 - pico-train - INFO - │ model: │ 2025-08-30 15:43:28 - pico-train - INFO - │ activation_hidden_dim: 384 │ 2025-08-30 15:43:28 - pico-train - INFO - │ attention_n_heads: 12 │ 2025-08-30 15:43:28 - pico-train - INFO - │ attention_n_kv_heads: 4 │ 2025-08-30 15:43:28 - pico-train - INFO - │ batch_size: 1024 │ 2025-08-30 15:43:28 - pico-train - INFO - │ d_model: 96 │ 2025-08-30 15:43:28 - pico-train - INFO - │ max_seq_len: 2048 │ 2025-08-30 15:43:28 - pico-train - INFO - │ model_type: pico_decoder │ 2025-08-30 15:43:28 - pico-train - INFO - │ n_layers: 12 │ 2025-08-30 15:43:28 - pico-train - INFO - │ norm_eps: 1.0e-06 │ 2025-08-30 15:43:28 - pico-train - INFO - │ position_emb_theta: 10000.0 │ 2025-08-30 15:43:28 - pico-train - INFO - │ vocab_size: 50304 │ 2025-08-30 15:43:28 - pico-train - INFO - │ monitoring: │ 2025-08-30 15:43:28 - pico-train - INFO - │ logging: │ 2025-08-30 15:43:28 - pico-train - INFO - │ log_every_n_steps: 100 │ 2025-08-30 15:43:28 - pico-train - INFO - │ log_level: INFO │ 2025-08-30 15:43:28 - pico-train - INFO - │ save_to_wandb: false │ 2025-08-30 15:43:28 - pico-train - INFO - │ wandb: │ 2025-08-30 15:43:28 - pico-train - INFO - │ entity: boymyc │ 2025-08-30 15:43:28 - pico-train - INFO - │ project: pico-decoder-tiny │ 2025-08-30 15:43:28 - pico-train - INFO - │ training: │ 2025-08-30 15:43:28 - pico-train - INFO - │ fabric: │ 2025-08-30 15:43:28 - pico-train - INFO - │ accelerator: cuda │ 2025-08-30 15:43:28 - pico-train - INFO - │ num_devices: 1 │ 2025-08-30 15:43:28 - pico-train - INFO - │ num_nodes: 1 │ 2025-08-30 15:43:28 - pico-train - INFO - │ precision: bf16-mixed │ 2025-08-30 15:43:28 - pico-train - INFO - │ max_steps: 100000 │ 2025-08-30 15:43:28 - pico-train - INFO - │ optimization: │ 2025-08-30 15:43:28 - pico-train - INFO - │ gradient_accumulation_steps: 1 │ 2025-08-30 15:43:28 - pico-train - INFO - │ lr: 0.0002 │ 2025-08-30 15:43:28 - pico-train - INFO - │ lr_scheduler: cosine │ 2025-08-30 15:43:28 - pico-train - INFO - │ lr_warmup_steps: 2000 │ 2025-08-30 15:43:28 - pico-train - INFO - │ optimizer: adamw │ 2025-08-30 15:43:28 - pico-train - INFO - │ │ 2025-08-30 15:43:28 - pico-train - INFO - ╰─────────────────────────────────────────────────────╯ 2025-08-30 15:43:28 - pico-train - INFO - ================================================== 2025-08-30 15:43:28 - pico-train - INFO - ⛭ Runtime Summary: 2025-08-30 15:43:28 - pico-train - INFO - ================================================== 2025-08-30 15:43:28 - pico-train - INFO - Starting from step: 62000 2025-08-30 15:43:28 - pico-train - INFO - Model Setup: 2025-08-30 15:43:28 - pico-train - INFO - └─ Total Parameters: 11,282,784 2025-08-30 15:43:28 - pico-train - INFO - └─ Trainable Parameters: 11,282,784 2025-08-30 15:43:28 - pico-train - INFO - Distributed Setup: 2025-08-30 15:43:28 - pico-train - INFO - └─ Number of Devices: 1 2025-08-30 15:43:28 - pico-train - INFO - └─ Device Type: NVIDIA H100 80GB HBM3 2025-08-30 15:43:28 - pico-train - INFO - └─ Available Memory: 85.03 GB 2025-08-30 15:43:28 - pico-train - INFO - Software Setup: 2025-08-30 15:43:28 - pico-train - INFO - └─ Python Version: 3.12.3 2025-08-30 15:43:28 - pico-train - INFO - └─ PyTorch Version: 2.8.0+cu128 2025-08-30 15:43:28 - pico-train - INFO - └─ CUDA Version: 12.8 2025-08-30 15:43:28 - pico-train - INFO - └─ Operating System: Linux 6.8.0-71-generic 2025-08-30 15:43:28 - pico-train - INFO - Batch Size Configuration: 2025-08-30 15:43:28 - pico-train - INFO - └─ Global Batch Size: 16 2025-08-30 15:43:28 - pico-train - INFO - └─ Per Device Batch Size: 16 2025-08-30 15:43:28 - pico-train - INFO - └─ Gradient Accumulation Steps: 1 2025-08-30 15:43:28 - pico-train - INFO - ================================================== 2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- 🔄 Training Metrics 2025-08-30 15:43:29 - pico-train - INFO - ├── Loss: 4.5970 2025-08-30 15:43:29 - pico-train - INFO - ├── Learning Rate: 6.55e-05 2025-08-30 15:43:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- 📈 Saving Learning Dynamics 2025-08-30 15:44:25 - pico-train - INFO - Step 62100 -- 🔄 Training Metrics 2025-08-30 15:44:25 - pico-train - INFO - ├── Loss: 4.8133 2025-08-30 15:44:25 - pico-train - INFO - ├── Learning Rate: 6.52e-05 2025-08-30 15:44:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:45:17 - pico-train - INFO - Step 62200 -- 🔄 Training Metrics 2025-08-30 15:45:17 - pico-train - INFO - ├── Loss: 4.8221 2025-08-30 15:45:17 - pico-train - INFO - ├── Learning Rate: 6.49e-05 2025-08-30 15:45:17 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:46:09 - pico-train - INFO - Step 62300 -- 🔄 Training Metrics 2025-08-30 15:46:09 - pico-train - INFO - ├── Loss: 4.8068 2025-08-30 15:46:09 - pico-train - INFO - ├── Learning Rate: 6.46e-05 2025-08-30 15:46:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:47:01 - pico-train - INFO - Step 62400 -- 🔄 Training Metrics 2025-08-30 15:47:01 - pico-train - INFO - ├── Loss: 4.7858 2025-08-30 15:47:01 - pico-train - INFO - ├── Learning Rate: 6.43e-05 2025-08-30 15:47:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:47:53 - pico-train - INFO - Step 62500 -- 🔄 Training Metrics 2025-08-30 15:47:53 - pico-train - INFO - ├── Loss: 4.8460 2025-08-30 15:47:53 - pico-train - INFO - ├── Learning Rate: 6.40e-05 2025-08-30 15:47:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:48:45 - pico-train - INFO - Step 62600 -- 🔄 Training Metrics 2025-08-30 15:48:45 - pico-train - INFO - ├── Loss: 4.8264 2025-08-30 15:48:45 - pico-train - INFO - ├── Learning Rate: 6.37e-05 2025-08-30 15:48:45 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:49:37 - pico-train - INFO - Step 62700 -- 🔄 Training Metrics 2025-08-30 15:49:37 - pico-train - INFO - ├── Loss: 4.8266 2025-08-30 15:49:37 - pico-train - INFO - ├── Learning Rate: 6.34e-05 2025-08-30 15:49:37 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:50:29 - pico-train - INFO - Step 62800 -- 🔄 Training Metrics 2025-08-30 15:50:29 - pico-train - INFO - ├── Loss: 4.8317 2025-08-30 15:50:29 - pico-train - INFO - ├── Learning Rate: 6.31e-05 2025-08-30 15:50:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:51:20 - pico-train - INFO - Step 62900 -- 🔄 Training Metrics 2025-08-30 15:51:20 - pico-train - INFO - ├── Loss: 4.8337 2025-08-30 15:51:20 - pico-train - INFO - ├── Learning Rate: 6.28e-05 2025-08-30 15:51:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:52:12 - pico-train - INFO - Step 63000 -- 🔄 Training Metrics 2025-08-30 15:52:12 - pico-train - INFO - ├── Loss: 4.8183 2025-08-30 15:52:12 - pico-train - INFO - ├── Learning Rate: 6.25e-05 2025-08-30 15:52:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:53:04 - pico-train - INFO - Step 63100 -- 🔄 Training Metrics 2025-08-30 15:53:04 - pico-train - INFO - ├── Loss: 4.8177 2025-08-30 15:53:04 - pico-train - INFO - ├── Learning Rate: 6.22e-05 2025-08-30 15:53:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:53:56 - pico-train - INFO - Step 63200 -- 🔄 Training Metrics 2025-08-30 15:53:56 - pico-train - INFO - ├── Loss: 4.8094 2025-08-30 15:53:56 - pico-train - INFO - ├── Learning Rate: 6.19e-05 2025-08-30 15:53:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:54:48 - pico-train - INFO - Step 63300 -- 🔄 Training Metrics 2025-08-30 15:54:48 - pico-train - INFO - ├── Loss: 4.8294 2025-08-30 15:54:48 - pico-train - INFO - ├── Learning Rate: 6.16e-05 2025-08-30 15:54:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:55:40 - pico-train - INFO - Step 63400 -- 🔄 Training Metrics 2025-08-30 15:55:40 - pico-train - INFO - ├── Loss: 4.8073 2025-08-30 15:55:40 - pico-train - INFO - ├── Learning Rate: 6.13e-05 2025-08-30 15:55:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:56:32 - pico-train - INFO - Step 63500 -- 🔄 Training Metrics 2025-08-30 15:56:32 - pico-train - INFO - ├── Loss: 4.8364 2025-08-30 15:56:32 - pico-train - INFO - ├── Learning Rate: 6.10e-05 2025-08-30 15:56:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:57:23 - pico-train - INFO - Step 63600 -- 🔄 Training Metrics 2025-08-30 15:57:23 - pico-train - INFO - ├── Loss: 4.8236 2025-08-30 15:57:23 - pico-train - INFO - ├── Learning Rate: 6.07e-05 2025-08-30 15:57:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:58:15 - pico-train - INFO - Step 63700 -- 🔄 Training Metrics 2025-08-30 15:58:15 - pico-train - INFO - ├── Loss: 4.8114 2025-08-30 15:58:15 - pico-train - INFO - ├── Learning Rate: 6.04e-05 2025-08-30 15:58:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:59:07 - pico-train - INFO - Step 63800 -- 🔄 Training Metrics 2025-08-30 15:59:07 - pico-train - INFO - ├── Loss: 4.8078 2025-08-30 15:59:07 - pico-train - INFO - ├── Learning Rate: 6.01e-05 2025-08-30 15:59:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 15:59:59 - pico-train - INFO - Step 63900 -- 🔄 Training Metrics 2025-08-30 15:59:59 - pico-train - INFO - ├── Loss: 4.8107 2025-08-30 15:59:59 - pico-train - INFO - ├── Learning Rate: 5.98e-05 2025-08-30 15:59:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:00:50 - pico-train - INFO - Step 64000 -- 💾 Saving Checkpoint 2025-08-30 16:02:54 - pico-train - INFO - Step 64000 -- 📊 Evaluation Results 2025-08-30 16:02:54 - pico-train - INFO - └── paloma: inf 2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- 🔄 Training Metrics 2025-08-30 16:02:56 - pico-train - INFO - ├── Loss: 4.8145 2025-08-30 16:02:56 - pico-train - INFO - ├── Learning Rate: 5.95e-05 2025-08-30 16:02:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- 📈 Saving Learning Dynamics 2025-08-30 16:03:52 - pico-train - INFO - Step 64100 -- 🔄 Training Metrics 2025-08-30 16:03:52 - pico-train - INFO - ├── Loss: 4.8479 2025-08-30 16:03:52 - pico-train - INFO - ├── Learning Rate: 5.92e-05 2025-08-30 16:03:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:04:44 - pico-train - INFO - Step 64200 -- 🔄 Training Metrics 2025-08-30 16:04:44 - pico-train - INFO - ├── Loss: 4.8139 2025-08-30 16:04:44 - pico-train - INFO - ├── Learning Rate: 5.89e-05 2025-08-30 16:04:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:05:36 - pico-train - INFO - Step 64300 -- 🔄 Training Metrics 2025-08-30 16:05:36 - pico-train - INFO - ├── Loss: 4.7867 2025-08-30 16:05:36 - pico-train - INFO - ├── Learning Rate: 5.86e-05 2025-08-30 16:05:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:06:28 - pico-train - INFO - Step 64400 -- 🔄 Training Metrics 2025-08-30 16:06:28 - pico-train - INFO - ├── Loss: 4.8168 2025-08-30 16:06:28 - pico-train - INFO - ├── Learning Rate: 5.84e-05 2025-08-30 16:06:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:07:20 - pico-train - INFO - Step 64500 -- 🔄 Training Metrics 2025-08-30 16:07:20 - pico-train - INFO - ├── Loss: 4.8131 2025-08-30 16:07:20 - pico-train - INFO - ├── Learning Rate: 5.81e-05 2025-08-30 16:07:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:08:12 - pico-train - INFO - Step 64600 -- 🔄 Training Metrics 2025-08-30 16:08:12 - pico-train - INFO - ├── Loss: 4.8285 2025-08-30 16:08:12 - pico-train - INFO - ├── Learning Rate: 5.78e-05 2025-08-30 16:08:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:09:04 - pico-train - INFO - Step 64700 -- 🔄 Training Metrics 2025-08-30 16:09:04 - pico-train - INFO - ├── Loss: 4.8170 2025-08-30 16:09:04 - pico-train - INFO - ├── Learning Rate: 5.75e-05 2025-08-30 16:09:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:09:56 - pico-train - INFO - Step 64800 -- 🔄 Training Metrics 2025-08-30 16:09:56 - pico-train - INFO - ├── Loss: 4.8317 2025-08-30 16:09:56 - pico-train - INFO - ├── Learning Rate: 5.72e-05 2025-08-30 16:09:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:10:48 - pico-train - INFO - Step 64900 -- 🔄 Training Metrics 2025-08-30 16:10:48 - pico-train - INFO - ├── Loss: 4.8368 2025-08-30 16:10:48 - pico-train - INFO - ├── Learning Rate: 5.69e-05 2025-08-30 16:10:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:11:40 - pico-train - INFO - Step 65000 -- 🔄 Training Metrics 2025-08-30 16:11:40 - pico-train - INFO - ├── Loss: 4.8129 2025-08-30 16:11:40 - pico-train - INFO - ├── Learning Rate: 5.66e-05 2025-08-30 16:11:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:12:32 - pico-train - INFO - Step 65100 -- 🔄 Training Metrics 2025-08-30 16:12:32 - pico-train - INFO - ├── Loss: 4.8226 2025-08-30 16:12:32 - pico-train - INFO - ├── Learning Rate: 5.63e-05 2025-08-30 16:12:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:13:24 - pico-train - INFO - Step 65200 -- 🔄 Training Metrics 2025-08-30 16:13:24 - pico-train - INFO - ├── Loss: 4.8321 2025-08-30 16:13:24 - pico-train - INFO - ├── Learning Rate: 5.60e-05 2025-08-30 16:13:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:14:16 - pico-train - INFO - Step 65300 -- 🔄 Training Metrics 2025-08-30 16:14:16 - pico-train - INFO - ├── Loss: 4.8352 2025-08-30 16:14:16 - pico-train - INFO - ├── Learning Rate: 5.57e-05 2025-08-30 16:14:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:15:08 - pico-train - INFO - Step 65400 -- 🔄 Training Metrics 2025-08-30 16:15:08 - pico-train - INFO - ├── Loss: 4.8119 2025-08-30 16:15:08 - pico-train - INFO - ├── Learning Rate: 5.55e-05 2025-08-30 16:15:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:16:00 - pico-train - INFO - Step 65500 -- 🔄 Training Metrics 2025-08-30 16:16:00 - pico-train - INFO - ├── Loss: 4.7889 2025-08-30 16:16:00 - pico-train - INFO - ├── Learning Rate: 5.52e-05 2025-08-30 16:16:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:16:52 - pico-train - INFO - Step 65600 -- 🔄 Training Metrics 2025-08-30 16:16:52 - pico-train - INFO - ├── Loss: 4.8119 2025-08-30 16:16:52 - pico-train - INFO - ├── Learning Rate: 5.49e-05 2025-08-30 16:16:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:17:44 - pico-train - INFO - Step 65700 -- 🔄 Training Metrics 2025-08-30 16:17:44 - pico-train - INFO - ├── Loss: 4.8193 2025-08-30 16:17:44 - pico-train - INFO - ├── Learning Rate: 5.46e-05 2025-08-30 16:17:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:18:35 - pico-train - INFO - Step 65800 -- 🔄 Training Metrics 2025-08-30 16:18:35 - pico-train - INFO - ├── Loss: 4.8121 2025-08-30 16:18:35 - pico-train - INFO - ├── Learning Rate: 5.43e-05 2025-08-30 16:18:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:19:27 - pico-train - INFO - Step 65900 -- 🔄 Training Metrics 2025-08-30 16:19:27 - pico-train - INFO - ├── Loss: 4.8057 2025-08-30 16:19:27 - pico-train - INFO - ├── Learning Rate: 5.40e-05 2025-08-30 16:19:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:20:19 - pico-train - INFO - Step 66000 -- 💾 Saving Checkpoint 2025-08-30 16:22:18 - pico-train - INFO - Step 66000 -- 📊 Evaluation Results 2025-08-30 16:22:18 - pico-train - INFO - └── paloma: inf 2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- 🔄 Training Metrics 2025-08-30 16:22:20 - pico-train - INFO - ├── Loss: 4.8260 2025-08-30 16:22:20 - pico-train - INFO - ├── Learning Rate: 5.37e-05 2025-08-30 16:22:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- 📈 Saving Learning Dynamics 2025-08-30 16:23:16 - pico-train - INFO - Step 66100 -- 🔄 Training Metrics 2025-08-30 16:23:16 - pico-train - INFO - ├── Loss: 4.8110 2025-08-30 16:23:16 - pico-train - INFO - ├── Learning Rate: 5.35e-05 2025-08-30 16:23:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:24:09 - pico-train - INFO - Step 66200 -- 🔄 Training Metrics 2025-08-30 16:24:09 - pico-train - INFO - ├── Loss: 4.8156 2025-08-30 16:24:09 - pico-train - INFO - ├── Learning Rate: 5.32e-05 2025-08-30 16:24:09 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:25:02 - pico-train - INFO - Step 66300 -- 🔄 Training Metrics 2025-08-30 16:25:02 - pico-train - INFO - ├── Loss: 4.7928 2025-08-30 16:25:02 - pico-train - INFO - ├── Learning Rate: 5.29e-05 2025-08-30 16:25:02 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:25:55 - pico-train - INFO - Step 66400 -- 🔄 Training Metrics 2025-08-30 16:25:55 - pico-train - INFO - ├── Loss: 4.8202 2025-08-30 16:25:55 - pico-train - INFO - ├── Learning Rate: 5.26e-05 2025-08-30 16:25:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:26:49 - pico-train - INFO - Step 66500 -- 🔄 Training Metrics 2025-08-30 16:26:49 - pico-train - INFO - ├── Loss: 4.8117 2025-08-30 16:26:49 - pico-train - INFO - ├── Learning Rate: 5.23e-05 2025-08-30 16:26:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:27:42 - pico-train - INFO - Step 66600 -- 🔄 Training Metrics 2025-08-30 16:27:42 - pico-train - INFO - ├── Loss: 4.8047 2025-08-30 16:27:42 - pico-train - INFO - ├── Learning Rate: 5.20e-05 2025-08-30 16:27:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:28:34 - pico-train - INFO - Step 66700 -- 🔄 Training Metrics 2025-08-30 16:28:34 - pico-train - INFO - ├── Loss: 4.7995 2025-08-30 16:28:34 - pico-train - INFO - ├── Learning Rate: 5.18e-05 2025-08-30 16:28:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:29:28 - pico-train - INFO - Step 66800 -- 🔄 Training Metrics 2025-08-30 16:29:28 - pico-train - INFO - ├── Loss: 4.8074 2025-08-30 16:29:28 - pico-train - INFO - ├── Learning Rate: 5.15e-05 2025-08-30 16:29:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:30:21 - pico-train - INFO - Step 66900 -- 🔄 Training Metrics 2025-08-30 16:30:21 - pico-train - INFO - ├── Loss: 4.7890 2025-08-30 16:30:21 - pico-train - INFO - ├── Learning Rate: 5.12e-05 2025-08-30 16:30:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:31:14 - pico-train - INFO - Step 67000 -- 🔄 Training Metrics 2025-08-30 16:31:14 - pico-train - INFO - ├── Loss: 4.8216 2025-08-30 16:31:14 - pico-train - INFO - ├── Learning Rate: 5.09e-05 2025-08-30 16:31:14 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:32:07 - pico-train - INFO - Step 67100 -- 🔄 Training Metrics 2025-08-30 16:32:07 - pico-train - INFO - ├── Loss: 4.8034 2025-08-30 16:32:07 - pico-train - INFO - ├── Learning Rate: 5.06e-05 2025-08-30 16:32:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:32:59 - pico-train - INFO - Step 67200 -- 🔄 Training Metrics 2025-08-30 16:32:59 - pico-train - INFO - ├── Loss: 4.8062 2025-08-30 16:32:59 - pico-train - INFO - ├── Learning Rate: 5.04e-05 2025-08-30 16:32:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:33:51 - pico-train - INFO - Step 67300 -- 🔄 Training Metrics 2025-08-30 16:33:51 - pico-train - INFO - ├── Loss: 4.8106 2025-08-30 16:33:51 - pico-train - INFO - ├── Learning Rate: 5.01e-05 2025-08-30 16:33:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:34:43 - pico-train - INFO - Step 67400 -- 🔄 Training Metrics 2025-08-30 16:34:43 - pico-train - INFO - ├── Loss: 4.8168 2025-08-30 16:34:43 - pico-train - INFO - ├── Learning Rate: 4.98e-05 2025-08-30 16:34:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:35:36 - pico-train - INFO - Step 67500 -- 🔄 Training Metrics 2025-08-30 16:35:36 - pico-train - INFO - ├── Loss: 4.7968 2025-08-30 16:35:36 - pico-train - INFO - ├── Learning Rate: 4.95e-05 2025-08-30 16:35:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:36:27 - pico-train - INFO - Step 67600 -- 🔄 Training Metrics 2025-08-30 16:36:27 - pico-train - INFO - ├── Loss: 4.7905 2025-08-30 16:36:27 - pico-train - INFO - ├── Learning Rate: 4.93e-05 2025-08-30 16:36:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:37:19 - pico-train - INFO - Step 67700 -- 🔄 Training Metrics 2025-08-30 16:37:19 - pico-train - INFO - ├── Loss: 4.8253 2025-08-30 16:37:19 - pico-train - INFO - ├── Learning Rate: 4.90e-05 2025-08-30 16:37:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:38:11 - pico-train - INFO - Step 67800 -- 🔄 Training Metrics 2025-08-30 16:38:11 - pico-train - INFO - ├── Loss: 4.7848 2025-08-30 16:38:11 - pico-train - INFO - ├── Learning Rate: 4.87e-05 2025-08-30 16:38:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:39:03 - pico-train - INFO - Step 67900 -- 🔄 Training Metrics 2025-08-30 16:39:03 - pico-train - INFO - ├── Loss: 4.8165 2025-08-30 16:39:03 - pico-train - INFO - ├── Learning Rate: 4.84e-05 2025-08-30 16:39:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:39:55 - pico-train - INFO - Step 68000 -- 💾 Saving Checkpoint 2025-08-30 16:42:09 - pico-train - INFO - Step 68000 -- 📊 Evaluation Results 2025-08-30 16:42:09 - pico-train - INFO - └── paloma: inf 2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- 🔄 Training Metrics 2025-08-30 16:42:10 - pico-train - INFO - ├── Loss: 4.8264 2025-08-30 16:42:10 - pico-train - INFO - ├── Learning Rate: 4.82e-05 2025-08-30 16:42:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- 📈 Saving Learning Dynamics 2025-08-30 16:43:07 - pico-train - INFO - Step 68100 -- 🔄 Training Metrics 2025-08-30 16:43:07 - pico-train - INFO - ├── Loss: 4.8363 2025-08-30 16:43:07 - pico-train - INFO - ├── Learning Rate: 4.79e-05 2025-08-30 16:43:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:43:59 - pico-train - INFO - Step 68200 -- 🔄 Training Metrics 2025-08-30 16:43:59 - pico-train - INFO - ├── Loss: 4.7964 2025-08-30 16:43:59 - pico-train - INFO - ├── Learning Rate: 4.76e-05 2025-08-30 16:43:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:44:51 - pico-train - INFO - Step 68300 -- 🔄 Training Metrics 2025-08-30 16:44:51 - pico-train - INFO - ├── Loss: 4.7999 2025-08-30 16:44:51 - pico-train - INFO - ├── Learning Rate: 4.73e-05 2025-08-30 16:44:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:45:43 - pico-train - INFO - Step 68400 -- 🔄 Training Metrics 2025-08-30 16:45:43 - pico-train - INFO - ├── Loss: 4.8119 2025-08-30 16:45:43 - pico-train - INFO - ├── Learning Rate: 4.71e-05 2025-08-30 16:45:43 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:46:35 - pico-train - INFO - Step 68500 -- 🔄 Training Metrics 2025-08-30 16:46:35 - pico-train - INFO - ├── Loss: 4.7998 2025-08-30 16:46:35 - pico-train - INFO - ├── Learning Rate: 4.68e-05 2025-08-30 16:46:35 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:47:27 - pico-train - INFO - Step 68600 -- 🔄 Training Metrics 2025-08-30 16:47:27 - pico-train - INFO - ├── Loss: 4.8010 2025-08-30 16:47:27 - pico-train - INFO - ├── Learning Rate: 4.65e-05 2025-08-30 16:47:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:48:19 - pico-train - INFO - Step 68700 -- 🔄 Training Metrics 2025-08-30 16:48:19 - pico-train - INFO - ├── Loss: 4.7986 2025-08-30 16:48:19 - pico-train - INFO - ├── Learning Rate: 4.63e-05 2025-08-30 16:48:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:49:12 - pico-train - INFO - Step 68800 -- 🔄 Training Metrics 2025-08-30 16:49:12 - pico-train - INFO - ├── Loss: 4.8133 2025-08-30 16:49:12 - pico-train - INFO - ├── Learning Rate: 4.60e-05 2025-08-30 16:49:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:50:05 - pico-train - INFO - Step 68900 -- 🔄 Training Metrics 2025-08-30 16:50:05 - pico-train - INFO - ├── Loss: 4.7944 2025-08-30 16:50:05 - pico-train - INFO - ├── Learning Rate: 4.57e-05 2025-08-30 16:50:05 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:50:58 - pico-train - INFO - Step 69000 -- 🔄 Training Metrics 2025-08-30 16:50:58 - pico-train - INFO - ├── Loss: 4.8021 2025-08-30 16:50:58 - pico-train - INFO - ├── Learning Rate: 4.54e-05 2025-08-30 16:50:58 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:51:51 - pico-train - INFO - Step 69100 -- 🔄 Training Metrics 2025-08-30 16:51:51 - pico-train - INFO - ├── Loss: 4.7611 2025-08-30 16:51:51 - pico-train - INFO - ├── Learning Rate: 4.52e-05 2025-08-30 16:51:51 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:52:44 - pico-train - INFO - Step 69200 -- 🔄 Training Metrics 2025-08-30 16:52:44 - pico-train - INFO - ├── Loss: 4.7981 2025-08-30 16:52:44 - pico-train - INFO - ├── Learning Rate: 4.49e-05 2025-08-30 16:52:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:53:38 - pico-train - INFO - Step 69300 -- 🔄 Training Metrics 2025-08-30 16:53:38 - pico-train - INFO - ├── Loss: 4.8066 2025-08-30 16:53:38 - pico-train - INFO - ├── Learning Rate: 4.46e-05 2025-08-30 16:53:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:54:31 - pico-train - INFO - Step 69400 -- 🔄 Training Metrics 2025-08-30 16:54:31 - pico-train - INFO - ├── Loss: 4.8053 2025-08-30 16:54:31 - pico-train - INFO - ├── Learning Rate: 4.44e-05 2025-08-30 16:54:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:55:23 - pico-train - INFO - Step 69500 -- 🔄 Training Metrics 2025-08-30 16:55:23 - pico-train - INFO - ├── Loss: 4.7953 2025-08-30 16:55:23 - pico-train - INFO - ├── Learning Rate: 4.41e-05 2025-08-30 16:55:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:56:16 - pico-train - INFO - Step 69600 -- 🔄 Training Metrics 2025-08-30 16:56:16 - pico-train - INFO - ├── Loss: 4.8087 2025-08-30 16:56:16 - pico-train - INFO - ├── Learning Rate: 4.38e-05 2025-08-30 16:56:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:57:10 - pico-train - INFO - Step 69700 -- 🔄 Training Metrics 2025-08-30 16:57:10 - pico-train - INFO - ├── Loss: 4.7915 2025-08-30 16:57:10 - pico-train - INFO - ├── Learning Rate: 4.36e-05 2025-08-30 16:57:10 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:58:03 - pico-train - INFO - Step 69800 -- 🔄 Training Metrics 2025-08-30 16:58:03 - pico-train - INFO - ├── Loss: 4.8145 2025-08-30 16:58:03 - pico-train - INFO - ├── Learning Rate: 4.33e-05 2025-08-30 16:58:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:58:56 - pico-train - INFO - Step 69900 -- 🔄 Training Metrics 2025-08-30 16:58:56 - pico-train - INFO - ├── Loss: 4.8056 2025-08-30 16:58:56 - pico-train - INFO - ├── Learning Rate: 4.31e-05 2025-08-30 16:58:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 16:59:48 - pico-train - INFO - Step 70000 -- 💾 Saving Checkpoint 2025-08-30 17:01:50 - pico-train - INFO - Step 70000 -- 📊 Evaluation Results 2025-08-30 17:01:50 - pico-train - INFO - └── paloma: inf 2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- 🔄 Training Metrics 2025-08-30 17:01:52 - pico-train - INFO - ├── Loss: 4.7898 2025-08-30 17:01:52 - pico-train - INFO - ├── Learning Rate: 4.28e-05 2025-08-30 17:01:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- 📈 Saving Learning Dynamics 2025-08-30 17:02:48 - pico-train - INFO - Step 70100 -- 🔄 Training Metrics 2025-08-30 17:02:48 - pico-train - INFO - ├── Loss: 4.7929 2025-08-30 17:02:48 - pico-train - INFO - ├── Learning Rate: 4.25e-05 2025-08-30 17:02:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:03:40 - pico-train - INFO - Step 70200 -- 🔄 Training Metrics 2025-08-30 17:03:40 - pico-train - INFO - ├── Loss: 4.8215 2025-08-30 17:03:40 - pico-train - INFO - ├── Learning Rate: 4.23e-05 2025-08-30 17:03:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:04:32 - pico-train - INFO - Step 70300 -- 🔄 Training Metrics 2025-08-30 17:04:32 - pico-train - INFO - ├── Loss: 4.8139 2025-08-30 17:04:32 - pico-train - INFO - ├── Learning Rate: 4.20e-05 2025-08-30 17:04:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:05:24 - pico-train - INFO - Step 70400 -- 🔄 Training Metrics 2025-08-30 17:05:24 - pico-train - INFO - ├── Loss: 4.7922 2025-08-30 17:05:24 - pico-train - INFO - ├── Learning Rate: 4.17e-05 2025-08-30 17:05:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:06:16 - pico-train - INFO - Step 70500 -- 🔄 Training Metrics 2025-08-30 17:06:16 - pico-train - INFO - ├── Loss: 4.7923 2025-08-30 17:06:16 - pico-train - INFO - ├── Learning Rate: 4.15e-05 2025-08-30 17:06:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:07:08 - pico-train - INFO - Step 70600 -- 🔄 Training Metrics 2025-08-30 17:07:08 - pico-train - INFO - ├── Loss: 4.8075 2025-08-30 17:07:08 - pico-train - INFO - ├── Learning Rate: 4.12e-05 2025-08-30 17:07:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:08:00 - pico-train - INFO - Step 70700 -- 🔄 Training Metrics 2025-08-30 17:08:00 - pico-train - INFO - ├── Loss: 4.7833 2025-08-30 17:08:00 - pico-train - INFO - ├── Learning Rate: 4.10e-05 2025-08-30 17:08:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:08:52 - pico-train - INFO - Step 70800 -- 🔄 Training Metrics 2025-08-30 17:08:52 - pico-train - INFO - ├── Loss: 4.8036 2025-08-30 17:08:52 - pico-train - INFO - ├── Learning Rate: 4.07e-05 2025-08-30 17:08:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:09:44 - pico-train - INFO - Step 70900 -- 🔄 Training Metrics 2025-08-30 17:09:44 - pico-train - INFO - ├── Loss: 4.7910 2025-08-30 17:09:44 - pico-train - INFO - ├── Learning Rate: 4.04e-05 2025-08-30 17:09:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:10:36 - pico-train - INFO - Step 71000 -- 🔄 Training Metrics 2025-08-30 17:10:36 - pico-train - INFO - ├── Loss: 4.7723 2025-08-30 17:10:36 - pico-train - INFO - ├── Learning Rate: 4.02e-05 2025-08-30 17:10:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:11:28 - pico-train - INFO - Step 71100 -- 🔄 Training Metrics 2025-08-30 17:11:28 - pico-train - INFO - ├── Loss: 4.7768 2025-08-30 17:11:28 - pico-train - INFO - ├── Learning Rate: 3.99e-05 2025-08-30 17:11:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:12:19 - pico-train - INFO - Step 71200 -- 🔄 Training Metrics 2025-08-30 17:12:19 - pico-train - INFO - ├── Loss: 4.7984 2025-08-30 17:12:19 - pico-train - INFO - ├── Learning Rate: 3.97e-05 2025-08-30 17:12:19 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:13:11 - pico-train - INFO - Step 71300 -- 🔄 Training Metrics 2025-08-30 17:13:11 - pico-train - INFO - ├── Loss: 4.7825 2025-08-30 17:13:11 - pico-train - INFO - ├── Learning Rate: 3.94e-05 2025-08-30 17:13:11 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:14:03 - pico-train - INFO - Step 71400 -- 🔄 Training Metrics 2025-08-30 17:14:03 - pico-train - INFO - ├── Loss: 4.8093 2025-08-30 17:14:03 - pico-train - INFO - ├── Learning Rate: 3.92e-05 2025-08-30 17:14:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:14:55 - pico-train - INFO - Step 71500 -- 🔄 Training Metrics 2025-08-30 17:14:55 - pico-train - INFO - ├── Loss: 4.7903 2025-08-30 17:14:55 - pico-train - INFO - ├── Learning Rate: 3.89e-05 2025-08-30 17:14:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:15:47 - pico-train - INFO - Step 71600 -- 🔄 Training Metrics 2025-08-30 17:15:47 - pico-train - INFO - ├── Loss: 4.8269 2025-08-30 17:15:47 - pico-train - INFO - ├── Learning Rate: 3.87e-05 2025-08-30 17:15:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:16:39 - pico-train - INFO - Step 71700 -- 🔄 Training Metrics 2025-08-30 17:16:39 - pico-train - INFO - ├── Loss: 4.8135 2025-08-30 17:16:39 - pico-train - INFO - ├── Learning Rate: 3.84e-05 2025-08-30 17:16:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:17:31 - pico-train - INFO - Step 71800 -- 🔄 Training Metrics 2025-08-30 17:17:31 - pico-train - INFO - ├── Loss: 4.7759 2025-08-30 17:17:31 - pico-train - INFO - ├── Learning Rate: 3.82e-05 2025-08-30 17:17:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:18:22 - pico-train - INFO - Step 71900 -- 🔄 Training Metrics 2025-08-30 17:18:22 - pico-train - INFO - ├── Loss: 4.7837 2025-08-30 17:18:22 - pico-train - INFO - ├── Learning Rate: 3.79e-05 2025-08-30 17:18:22 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:19:15 - pico-train - INFO - Step 72000 -- 💾 Saving Checkpoint 2025-08-30 17:21:27 - pico-train - INFO - Step 72000 -- 📊 Evaluation Results 2025-08-30 17:21:27 - pico-train - INFO - └── paloma: inf 2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- 🔄 Training Metrics 2025-08-30 17:21:28 - pico-train - INFO - ├── Loss: 4.8016 2025-08-30 17:21:28 - pico-train - INFO - ├── Learning Rate: 3.77e-05 2025-08-30 17:21:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- 📈 Saving Learning Dynamics 2025-08-30 17:22:25 - pico-train - INFO - Step 72100 -- 🔄 Training Metrics 2025-08-30 17:22:25 - pico-train - INFO - ├── Loss: 4.7643 2025-08-30 17:22:25 - pico-train - INFO - ├── Learning Rate: 3.74e-05 2025-08-30 17:22:25 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:23:16 - pico-train - INFO - Step 72200 -- 🔄 Training Metrics 2025-08-30 17:23:16 - pico-train - INFO - ├── Loss: 4.7938 2025-08-30 17:23:16 - pico-train - INFO - ├── Learning Rate: 3.72e-05 2025-08-30 17:23:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:24:08 - pico-train - INFO - Step 72300 -- 🔄 Training Metrics 2025-08-30 17:24:08 - pico-train - INFO - ├── Loss: 4.7962 2025-08-30 17:24:08 - pico-train - INFO - ├── Learning Rate: 3.69e-05 2025-08-30 17:24:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:25:00 - pico-train - INFO - Step 72400 -- 🔄 Training Metrics 2025-08-30 17:25:00 - pico-train - INFO - ├── Loss: 4.8089 2025-08-30 17:25:00 - pico-train - INFO - ├── Learning Rate: 3.67e-05 2025-08-30 17:25:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:25:52 - pico-train - INFO - Step 72500 -- 🔄 Training Metrics 2025-08-30 17:25:52 - pico-train - INFO - ├── Loss: 4.8081 2025-08-30 17:25:52 - pico-train - INFO - ├── Learning Rate: 3.64e-05 2025-08-30 17:25:52 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:26:44 - pico-train - INFO - Step 72600 -- 🔄 Training Metrics 2025-08-30 17:26:44 - pico-train - INFO - ├── Loss: 4.8095 2025-08-30 17:26:44 - pico-train - INFO - ├── Learning Rate: 3.62e-05 2025-08-30 17:26:44 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:27:36 - pico-train - INFO - Step 72700 -- 🔄 Training Metrics 2025-08-30 17:27:36 - pico-train - INFO - ├── Loss: 4.8020 2025-08-30 17:27:36 - pico-train - INFO - ├── Learning Rate: 3.59e-05 2025-08-30 17:27:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:28:28 - pico-train - INFO - Step 72800 -- 🔄 Training Metrics 2025-08-30 17:28:28 - pico-train - INFO - ├── Loss: 4.7579 2025-08-30 17:28:28 - pico-train - INFO - ├── Learning Rate: 3.57e-05 2025-08-30 17:28:28 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:29:20 - pico-train - INFO - Step 72900 -- 🔄 Training Metrics 2025-08-30 17:29:20 - pico-train - INFO - ├── Loss: 4.7869 2025-08-30 17:29:20 - pico-train - INFO - ├── Learning Rate: 3.54e-05 2025-08-30 17:29:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:30:12 - pico-train - INFO - Step 73000 -- 🔄 Training Metrics 2025-08-30 17:30:12 - pico-train - INFO - ├── Loss: 4.7825 2025-08-30 17:30:12 - pico-train - INFO - ├── Learning Rate: 3.52e-05 2025-08-30 17:30:12 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:31:03 - pico-train - INFO - Step 73100 -- 🔄 Training Metrics 2025-08-30 17:31:03 - pico-train - INFO - ├── Loss: 4.8111 2025-08-30 17:31:03 - pico-train - INFO - ├── Learning Rate: 3.49e-05 2025-08-30 17:31:03 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:31:55 - pico-train - INFO - Step 73200 -- 🔄 Training Metrics 2025-08-30 17:31:55 - pico-train - INFO - ├── Loss: 4.8028 2025-08-30 17:31:55 - pico-train - INFO - ├── Learning Rate: 3.47e-05 2025-08-30 17:31:55 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:32:47 - pico-train - INFO - Step 73300 -- 🔄 Training Metrics 2025-08-30 17:32:47 - pico-train - INFO - ├── Loss: 4.8025 2025-08-30 17:32:47 - pico-train - INFO - ├── Learning Rate: 3.44e-05 2025-08-30 17:32:47 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:33:39 - pico-train - INFO - Step 73400 -- 🔄 Training Metrics 2025-08-30 17:33:39 - pico-train - INFO - ├── Loss: 4.7917 2025-08-30 17:33:39 - pico-train - INFO - ├── Learning Rate: 3.42e-05 2025-08-30 17:33:39 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:34:31 - pico-train - INFO - Step 73500 -- 🔄 Training Metrics 2025-08-30 17:34:31 - pico-train - INFO - ├── Loss: 4.7851 2025-08-30 17:34:31 - pico-train - INFO - ├── Learning Rate: 3.40e-05 2025-08-30 17:34:31 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:35:23 - pico-train - INFO - Step 73600 -- 🔄 Training Metrics 2025-08-30 17:35:23 - pico-train - INFO - ├── Loss: 4.7807 2025-08-30 17:35:23 - pico-train - INFO - ├── Learning Rate: 3.37e-05 2025-08-30 17:35:23 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:36:15 - pico-train - INFO - Step 73700 -- 🔄 Training Metrics 2025-08-30 17:36:15 - pico-train - INFO - ├── Loss: 4.7741 2025-08-30 17:36:15 - pico-train - INFO - ├── Learning Rate: 3.35e-05 2025-08-30 17:36:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:37:07 - pico-train - INFO - Step 73800 -- 🔄 Training Metrics 2025-08-30 17:37:07 - pico-train - INFO - ├── Loss: 4.8076 2025-08-30 17:37:07 - pico-train - INFO - ├── Learning Rate: 3.32e-05 2025-08-30 17:37:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:37:59 - pico-train - INFO - Step 73900 -- 🔄 Training Metrics 2025-08-30 17:37:59 - pico-train - INFO - ├── Loss: 4.8119 2025-08-30 17:37:59 - pico-train - INFO - ├── Learning Rate: 3.30e-05 2025-08-30 17:37:59 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:38:50 - pico-train - INFO - Step 74000 -- 💾 Saving Checkpoint 2025-08-30 17:40:51 - pico-train - INFO - Step 74000 -- 📊 Evaluation Results 2025-08-30 17:40:51 - pico-train - INFO - └── paloma: inf 2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- 🔄 Training Metrics 2025-08-30 17:40:53 - pico-train - INFO - ├── Loss: 4.7960 2025-08-30 17:40:53 - pico-train - INFO - ├── Learning Rate: 3.28e-05 2025-08-30 17:40:53 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- 📈 Saving Learning Dynamics 2025-08-30 17:41:49 - pico-train - INFO - Step 74100 -- 🔄 Training Metrics 2025-08-30 17:41:49 - pico-train - INFO - ├── Loss: 4.7909 2025-08-30 17:41:49 - pico-train - INFO - ├── Learning Rate: 3.25e-05 2025-08-30 17:41:49 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:42:42 - pico-train - INFO - Step 74200 -- 🔄 Training Metrics 2025-08-30 17:42:42 - pico-train - INFO - ├── Loss: 4.7807 2025-08-30 17:42:42 - pico-train - INFO - ├── Learning Rate: 3.23e-05 2025-08-30 17:42:42 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:43:36 - pico-train - INFO - Step 74300 -- 🔄 Training Metrics 2025-08-30 17:43:36 - pico-train - INFO - ├── Loss: 4.7711 2025-08-30 17:43:36 - pico-train - INFO - ├── Learning Rate: 3.21e-05 2025-08-30 17:43:36 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:44:29 - pico-train - INFO - Step 74400 -- 🔄 Training Metrics 2025-08-30 17:44:29 - pico-train - INFO - ├── Loss: 4.7837 2025-08-30 17:44:29 - pico-train - INFO - ├── Learning Rate: 3.18e-05 2025-08-30 17:44:29 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:45:21 - pico-train - INFO - Step 74500 -- 🔄 Training Metrics 2025-08-30 17:45:21 - pico-train - INFO - ├── Loss: 4.7668 2025-08-30 17:45:21 - pico-train - INFO - ├── Learning Rate: 3.16e-05 2025-08-30 17:45:21 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:46:15 - pico-train - INFO - Step 74600 -- 🔄 Training Metrics 2025-08-30 17:46:15 - pico-train - INFO - ├── Loss: 4.7985 2025-08-30 17:46:15 - pico-train - INFO - ├── Learning Rate: 3.14e-05 2025-08-30 17:46:15 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:47:08 - pico-train - INFO - Step 74700 -- 🔄 Training Metrics 2025-08-30 17:47:08 - pico-train - INFO - ├── Loss: 4.7702 2025-08-30 17:47:08 - pico-train - INFO - ├── Learning Rate: 3.11e-05 2025-08-30 17:47:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:48:01 - pico-train - INFO - Step 74800 -- 🔄 Training Metrics 2025-08-30 17:48:01 - pico-train - INFO - ├── Loss: 4.8002 2025-08-30 17:48:01 - pico-train - INFO - ├── Learning Rate: 3.09e-05 2025-08-30 17:48:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:48:54 - pico-train - INFO - Step 74900 -- 🔄 Training Metrics 2025-08-30 17:48:54 - pico-train - INFO - ├── Loss: 4.7955 2025-08-30 17:48:54 - pico-train - INFO - ├── Learning Rate: 3.07e-05 2025-08-30 17:48:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:49:48 - pico-train - INFO - Step 75000 -- 🔄 Training Metrics 2025-08-30 17:49:48 - pico-train - INFO - ├── Loss: 4.8023 2025-08-30 17:49:48 - pico-train - INFO - ├── Learning Rate: 3.04e-05 2025-08-30 17:49:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:50:41 - pico-train - INFO - Step 75100 -- 🔄 Training Metrics 2025-08-30 17:50:41 - pico-train - INFO - ├── Loss: 4.7842 2025-08-30 17:50:41 - pico-train - INFO - ├── Learning Rate: 3.02e-05 2025-08-30 17:50:41 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:51:34 - pico-train - INFO - Step 75200 -- 🔄 Training Metrics 2025-08-30 17:51:34 - pico-train - INFO - ├── Loss: 4.7890 2025-08-30 17:51:34 - pico-train - INFO - ├── Learning Rate: 3.00e-05 2025-08-30 17:51:34 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:52:27 - pico-train - INFO - Step 75300 -- 🔄 Training Metrics 2025-08-30 17:52:27 - pico-train - INFO - ├── Loss: 4.8004 2025-08-30 17:52:27 - pico-train - INFO - ├── Learning Rate: 2.97e-05 2025-08-30 17:52:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:53:20 - pico-train - INFO - Step 75400 -- 🔄 Training Metrics 2025-08-30 17:53:20 - pico-train - INFO - ├── Loss: 4.7917 2025-08-30 17:53:20 - pico-train - INFO - ├── Learning Rate: 2.95e-05 2025-08-30 17:53:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:54:13 - pico-train - INFO - Step 75500 -- 🔄 Training Metrics 2025-08-30 17:54:13 - pico-train - INFO - ├── Loss: 4.7867 2025-08-30 17:54:13 - pico-train - INFO - ├── Learning Rate: 2.93e-05 2025-08-30 17:54:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:55:07 - pico-train - INFO - Step 75600 -- 🔄 Training Metrics 2025-08-30 17:55:07 - pico-train - INFO - ├── Loss: 4.7957 2025-08-30 17:55:07 - pico-train - INFO - ├── Learning Rate: 2.91e-05 2025-08-30 17:55:07 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:56:00 - pico-train - INFO - Step 75700 -- 🔄 Training Metrics 2025-08-30 17:56:00 - pico-train - INFO - ├── Loss: 4.7840 2025-08-30 17:56:00 - pico-train - INFO - ├── Learning Rate: 2.88e-05 2025-08-30 17:56:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:56:56 - pico-train - INFO - Step 75800 -- 🔄 Training Metrics 2025-08-30 17:56:56 - pico-train - INFO - ├── Loss: 4.7990 2025-08-30 17:56:56 - pico-train - INFO - ├── Learning Rate: 2.86e-05 2025-08-30 17:56:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:57:48 - pico-train - INFO - Step 75900 -- 🔄 Training Metrics 2025-08-30 17:57:48 - pico-train - INFO - ├── Loss: 4.7904 2025-08-30 17:57:48 - pico-train - INFO - ├── Learning Rate: 2.84e-05 2025-08-30 17:57:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 17:58:41 - pico-train - INFO - Step 76000 -- 💾 Saving Checkpoint 2025-08-30 18:01:59 - pico-train - INFO - Step 76000 -- 📊 Evaluation Results 2025-08-30 18:01:59 - pico-train - INFO - └── paloma: inf 2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- 🔄 Training Metrics 2025-08-30 18:02:00 - pico-train - INFO - ├── Loss: 4.7972 2025-08-30 18:02:00 - pico-train - INFO - ├── Learning Rate: 2.82e-05 2025-08-30 18:02:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- 📈 Saving Learning Dynamics 2025-08-30 18:03:04 - pico-train - INFO - Step 76100 -- 🔄 Training Metrics 2025-08-30 18:03:04 - pico-train - INFO - ├── Loss: 4.7730 2025-08-30 18:03:04 - pico-train - INFO - ├── Learning Rate: 2.79e-05 2025-08-30 18:03:04 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:03:56 - pico-train - INFO - Step 76200 -- 🔄 Training Metrics 2025-08-30 18:03:56 - pico-train - INFO - ├── Loss: 4.7997 2025-08-30 18:03:56 - pico-train - INFO - ├── Learning Rate: 2.77e-05 2025-08-30 18:03:56 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:04:48 - pico-train - INFO - Step 76300 -- 🔄 Training Metrics 2025-08-30 18:04:48 - pico-train - INFO - ├── Loss: 4.7843 2025-08-30 18:04:48 - pico-train - INFO - ├── Learning Rate: 2.75e-05 2025-08-30 18:04:48 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:05:40 - pico-train - INFO - Step 76400 -- 🔄 Training Metrics 2025-08-30 18:05:40 - pico-train - INFO - ├── Loss: 4.7858 2025-08-30 18:05:40 - pico-train - INFO - ├── Learning Rate: 2.73e-05 2025-08-30 18:05:40 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:06:32 - pico-train - INFO - Step 76500 -- 🔄 Training Metrics 2025-08-30 18:06:32 - pico-train - INFO - ├── Loss: 4.8110 2025-08-30 18:06:32 - pico-train - INFO - ├── Learning Rate: 2.71e-05 2025-08-30 18:06:32 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:07:24 - pico-train - INFO - Step 76600 -- 🔄 Training Metrics 2025-08-30 18:07:24 - pico-train - INFO - ├── Loss: 4.7834 2025-08-30 18:07:24 - pico-train - INFO - ├── Learning Rate: 2.68e-05 2025-08-30 18:07:24 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:08:16 - pico-train - INFO - Step 76700 -- 🔄 Training Metrics 2025-08-30 18:08:16 - pico-train - INFO - ├── Loss: 4.7936 2025-08-30 18:08:16 - pico-train - INFO - ├── Learning Rate: 2.66e-05 2025-08-30 18:08:16 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:09:08 - pico-train - INFO - Step 76800 -- 🔄 Training Metrics 2025-08-30 18:09:08 - pico-train - INFO - ├── Loss: 4.7869 2025-08-30 18:09:08 - pico-train - INFO - ├── Learning Rate: 2.64e-05 2025-08-30 18:09:08 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:10:00 - pico-train - INFO - Step 76900 -- 🔄 Training Metrics 2025-08-30 18:10:00 - pico-train - INFO - ├── Loss: 4.7979 2025-08-30 18:10:00 - pico-train - INFO - ├── Learning Rate: 2.62e-05 2025-08-30 18:10:00 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:10:54 - pico-train - INFO - Step 77000 -- 🔄 Training Metrics 2025-08-30 18:10:54 - pico-train - INFO - ├── Loss: 4.7956 2025-08-30 18:10:54 - pico-train - INFO - ├── Learning Rate: 2.60e-05 2025-08-30 18:10:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:11:46 - pico-train - INFO - Step 77100 -- 🔄 Training Metrics 2025-08-30 18:11:46 - pico-train - INFO - ├── Loss: 4.7974 2025-08-30 18:11:46 - pico-train - INFO - ├── Learning Rate: 2.58e-05 2025-08-30 18:11:46 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:12:38 - pico-train - INFO - Step 77200 -- 🔄 Training Metrics 2025-08-30 18:12:38 - pico-train - INFO - ├── Loss: 4.8074 2025-08-30 18:12:38 - pico-train - INFO - ├── Learning Rate: 2.55e-05 2025-08-30 18:12:38 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:13:30 - pico-train - INFO - Step 77300 -- 🔄 Training Metrics 2025-08-30 18:13:30 - pico-train - INFO - ├── Loss: 4.8276 2025-08-30 18:13:30 - pico-train - INFO - ├── Learning Rate: 2.53e-05 2025-08-30 18:13:30 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:14:27 - pico-train - INFO - Step 77400 -- 🔄 Training Metrics 2025-08-30 18:14:27 - pico-train - INFO - ├── Loss: 4.7908 2025-08-30 18:14:27 - pico-train - INFO - ├── Learning Rate: 2.51e-05 2025-08-30 18:14:27 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:15:20 - pico-train - INFO - Step 77500 -- 🔄 Training Metrics 2025-08-30 18:15:20 - pico-train - INFO - ├── Loss: 4.8142 2025-08-30 18:15:20 - pico-train - INFO - ├── Learning Rate: 2.49e-05 2025-08-30 18:15:20 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:16:13 - pico-train - INFO - Step 77600 -- 🔄 Training Metrics 2025-08-30 18:16:13 - pico-train - INFO - ├── Loss: 4.8052 2025-08-30 18:16:13 - pico-train - INFO - ├── Learning Rate: 2.47e-05 2025-08-30 18:16:13 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:17:06 - pico-train - INFO - Step 77700 -- 🔄 Training Metrics 2025-08-30 18:17:06 - pico-train - INFO - ├── Loss: 4.7876 2025-08-30 18:17:06 - pico-train - INFO - ├── Learning Rate: 2.45e-05 2025-08-30 18:17:06 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:18:01 - pico-train - INFO - Step 77800 -- 🔄 Training Metrics 2025-08-30 18:18:01 - pico-train - INFO - ├── Loss: 4.8011 2025-08-30 18:18:01 - pico-train - INFO - ├── Learning Rate: 2.43e-05 2025-08-30 18:18:01 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:18:54 - pico-train - INFO - Step 77900 -- 🔄 Training Metrics 2025-08-30 18:18:54 - pico-train - INFO - ├── Loss: 4.7936 2025-08-30 18:18:54 - pico-train - INFO - ├── Learning Rate: 2.41e-05 2025-08-30 18:18:54 - pico-train - INFO - └── Inf/NaN count: 0 2025-08-30 18:19:47 - pico-train - INFO - Step 78000 -- 💾 Saving Checkpoint