ThomasTheMaker's picture
Upload folder using huggingface_hub
6b0f104 verified
2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - โœจ Training Configuration
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma250M-v1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ save_every_n_steps: 2000 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ save_to_hf: false โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 16 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ name: pico-lm/pretokenized-dolma โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ log_every_n_steps: 100 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ max_steps: 100000 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ lr: 0.0002 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ lr_warmup_steps: 2000 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - Starting from step: 100000
2025-08-31 17:03:52 - pico-train - INFO - Model Setup:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-31 17:03:52 - pico-train - INFO - Distributed Setup:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA H100 80GB HBM3
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Available Memory: 85.03 GB
2025-08-31 17:03:52 - pico-train - INFO - Software Setup:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Python Version: 3.12.3
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-71-generic
2025-08-31 17:03:52 - pico-train - INFO - Batch Size Configuration:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Global Batch Size: 16
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 16
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 1
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:03:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.9432
2025-08-31 17:03:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 17:04:49 - pico-train - INFO - Step 100100 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:04:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7703
2025-08-31 17:04:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:04:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:05:43 - pico-train - INFO - Step 100200 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:05:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8047
2025-08-31 17:05:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:05:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:06:37 - pico-train - INFO - Step 100300 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:06:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8076
2025-08-31 17:06:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:06:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:07:31 - pico-train - INFO - Step 100400 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:07:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7926
2025-08-31 17:07:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:07:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:08:25 - pico-train - INFO - Step 100500 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:08:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8059
2025-08-31 17:08:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:08:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:09:19 - pico-train - INFO - Step 100600 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:09:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7896
2025-08-31 17:09:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:09:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:10:12 - pico-train - INFO - Step 100700 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:10:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8066
2025-08-31 17:10:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:10:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:11:07 - pico-train - INFO - Step 100800 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:11:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7870
2025-08-31 17:11:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:11:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:12:01 - pico-train - INFO - Step 100900 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:12:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7958
2025-08-31 17:12:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:12:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:12:55 - pico-train - INFO - Step 101000 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:12:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8081
2025-08-31 17:12:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:12:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:13:48 - pico-train - INFO - Step 101100 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:13:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8023
2025-08-31 17:13:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.98e-05
2025-08-31 17:13:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:14:43 - pico-train - INFO - Step 101200 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:14:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7830
2025-08-31 17:14:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.97e-05
2025-08-31 17:14:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:15:38 - pico-train - INFO - Step 101300 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:15:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8071
2025-08-31 17:15:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.95e-05
2025-08-31 17:15:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:16:32 - pico-train - INFO - Step 101400 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:16:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8072
2025-08-31 17:16:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.94e-05
2025-08-31 17:16:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:17:27 - pico-train - INFO - Step 101500 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:17:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8027
2025-08-31 17:17:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.92e-05
2025-08-31 17:17:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:18:20 - pico-train - INFO - Step 101600 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:18:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7874
2025-08-31 17:18:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.90e-05
2025-08-31 17:18:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:19:15 - pico-train - INFO - Step 101700 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:19:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7817
2025-08-31 17:19:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.89e-05
2025-08-31 17:19:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:20:09 - pico-train - INFO - Step 101800 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:20:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8188
2025-08-31 17:20:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.87e-05
2025-08-31 17:20:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:21:04 - pico-train - INFO - Step 101900 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:21:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7880
2025-08-31 17:21:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.86e-05
2025-08-31 17:21:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:21:58 - pico-train - INFO - Step 102000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 18:00:17 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:00:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8055
2025-08-31 18:00:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.84e-05
2025-08-31 18:00:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 18:01:13 - pico-train - INFO - Step 102100 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:01:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7742
2025-08-31 18:01:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.83e-05
2025-08-31 18:01:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:02:07 - pico-train - INFO - Step 102200 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:02:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8050
2025-08-31 18:02:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.81e-05
2025-08-31 18:02:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:03:01 - pico-train - INFO - Step 102300 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:03:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8066
2025-08-31 18:03:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.79e-05
2025-08-31 18:03:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:03:57 - pico-train - INFO - Step 102400 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:03:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7865
2025-08-31 18:03:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.78e-05
2025-08-31 18:03:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:04:50 - pico-train - INFO - Step 102500 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:04:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8019
2025-08-31 18:04:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.76e-05
2025-08-31 18:04:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:05:45 - pico-train - INFO - Step 102600 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:05:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7948
2025-08-31 18:05:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.75e-05
2025-08-31 18:05:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:06:39 - pico-train - INFO - Step 102700 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:06:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8006
2025-08-31 18:06:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.73e-05
2025-08-31 18:06:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:07:33 - pico-train - INFO - Step 102800 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:07:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8049
2025-08-31 18:07:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.71e-05
2025-08-31 18:07:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:08:27 - pico-train - INFO - Step 102900 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:08:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8086
2025-08-31 18:08:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.70e-05
2025-08-31 18:08:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:09:21 - pico-train - INFO - Step 103000 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:09:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8154
2025-08-31 18:09:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.68e-05
2025-08-31 18:09:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:10:15 - pico-train - INFO - Step 103100 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:10:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8232
2025-08-31 18:10:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.67e-05
2025-08-31 18:10:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:11:10 - pico-train - INFO - Step 103200 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:11:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8032
2025-08-31 18:11:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.65e-05
2025-08-31 18:11:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:12:05 - pico-train - INFO - Step 103300 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:12:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8157
2025-08-31 18:12:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.64e-05
2025-08-31 18:12:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:13:00 - pico-train - INFO - Step 103400 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:13:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7903
2025-08-31 18:13:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.62e-05
2025-08-31 18:13:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:13:54 - pico-train - INFO - Step 103500 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:13:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7786
2025-08-31 18:13:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.60e-05
2025-08-31 18:13:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:14:48 - pico-train - INFO - Step 103600 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:14:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7962
2025-08-31 18:14:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.59e-05
2025-08-31 18:14:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:15:43 - pico-train - INFO - Step 103700 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:15:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8097
2025-08-31 18:15:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.57e-05
2025-08-31 18:15:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:16:37 - pico-train - INFO - Step 103800 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:16:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7613
2025-08-31 18:16:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.56e-05
2025-08-31 18:16:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:17:31 - pico-train - INFO - Step 103900 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:17:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7992
2025-08-31 18:17:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.54e-05
2025-08-31 18:17:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:18:25 - pico-train - INFO - Step 104000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 18:55:41 - pico-train - INFO - Step 104000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 18:55:41 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 18:55:41 - pico-train - INFO - Step 104000 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:55:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8021
2025-08-31 18:55:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.52e-05
2025-08-31 18:55:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:55:41 - pico-train - INFO - Step 104000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 18:56:38 - pico-train - INFO - Step 104100 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:56:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7678
2025-08-31 18:56:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.51e-05
2025-08-31 18:56:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:57:32 - pico-train - INFO - Step 104200 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:57:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7840
2025-08-31 18:57:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.49e-05
2025-08-31 18:57:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:58:25 - pico-train - INFO - Step 104300 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:58:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7948
2025-08-31 18:58:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.48e-05
2025-08-31 18:58:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:59:20 - pico-train - INFO - Step 104400 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:59:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7842
2025-08-31 18:59:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.46e-05
2025-08-31 18:59:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:00:14 - pico-train - INFO - Step 104500 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:00:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7806
2025-08-31 19:00:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.44e-05
2025-08-31 19:00:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:01:08 - pico-train - INFO - Step 104600 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:01:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8013
2025-08-31 19:01:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.43e-05
2025-08-31 19:01:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:02:02 - pico-train - INFO - Step 104700 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:02:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7920
2025-08-31 19:02:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.41e-05
2025-08-31 19:02:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:02:55 - pico-train - INFO - Step 104800 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:02:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7982
2025-08-31 19:02:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.40e-05
2025-08-31 19:02:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:03:50 - pico-train - INFO - Step 104900 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:03:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7977
2025-08-31 19:03:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.38e-05
2025-08-31 19:03:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:04:44 - pico-train - INFO - Step 105000 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:04:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7973
2025-08-31 19:04:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.37e-05
2025-08-31 19:04:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:05:38 - pico-train - INFO - Step 105100 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:05:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7657
2025-08-31 19:05:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.35e-05
2025-08-31 19:05:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:06:32 - pico-train - INFO - Step 105200 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:06:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7921
2025-08-31 19:06:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.33e-05
2025-08-31 19:06:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:07:27 - pico-train - INFO - Step 105300 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:07:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7775
2025-08-31 19:07:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.32e-05
2025-08-31 19:07:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:08:22 - pico-train - INFO - Step 105400 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:08:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7990
2025-08-31 19:08:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.30e-05
2025-08-31 19:08:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:09:18 - pico-train - INFO - Step 105500 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:09:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7902
2025-08-31 19:09:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.29e-05
2025-08-31 19:09:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:10:12 - pico-train - INFO - Step 105600 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:10:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7873
2025-08-31 19:10:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.27e-05
2025-08-31 19:10:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:11:06 - pico-train - INFO - Step 105700 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:11:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7870
2025-08-31 19:11:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.25e-05
2025-08-31 19:11:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:12:00 - pico-train - INFO - Step 105800 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:12:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7853
2025-08-31 19:12:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.24e-05
2025-08-31 19:12:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:12:54 - pico-train - INFO - Step 105900 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:12:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7667
2025-08-31 19:12:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.22e-05
2025-08-31 19:12:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:13:48 - pico-train - INFO - Step 106000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 19:52:30 - pico-train - INFO - Step 106000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 19:52:30 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 19:52:30 - pico-train - INFO - Step 106000 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:52:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7840
2025-08-31 19:52:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.21e-05
2025-08-31 19:52:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:52:30 - pico-train - INFO - Step 106000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 19:53:27 - pico-train - INFO - Step 106100 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:53:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7948
2025-08-31 19:53:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.19e-05
2025-08-31 19:53:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:54:22 - pico-train - INFO - Step 106200 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:54:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7926
2025-08-31 19:54:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.18e-05
2025-08-31 19:54:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:55:18 - pico-train - INFO - Step 106300 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:55:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7992
2025-08-31 19:55:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.16e-05
2025-08-31 19:55:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:56:13 - pico-train - INFO - Step 106400 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:56:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7733
2025-08-31 19:56:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.14e-05
2025-08-31 19:56:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:57:08 - pico-train - INFO - Step 106500 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:57:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8039
2025-08-31 19:57:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.13e-05
2025-08-31 19:57:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:58:03 - pico-train - INFO - Step 106600 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:58:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7913
2025-08-31 19:58:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.11e-05
2025-08-31 19:58:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:58:58 - pico-train - INFO - Step 106700 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:58:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8052
2025-08-31 19:58:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.10e-05
2025-08-31 19:58:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 19:59:54 - pico-train - INFO - Step 106800 -- ๐Ÿ”„ Training Metrics
2025-08-31 19:59:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8040
2025-08-31 19:59:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.08e-05
2025-08-31 19:59:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:00:49 - pico-train - INFO - Step 106900 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:00:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7788
2025-08-31 20:00:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.07e-05
2025-08-31 20:00:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:01:44 - pico-train - INFO - Step 107000 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:01:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7795
2025-08-31 20:01:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.05e-05
2025-08-31 20:01:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:02:39 - pico-train - INFO - Step 107100 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:02:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7627
2025-08-31 20:02:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.03e-05
2025-08-31 20:02:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:03:34 - pico-train - INFO - Step 107200 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:03:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7699
2025-08-31 20:03:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.02e-05
2025-08-31 20:03:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:04:29 - pico-train - INFO - Step 107300 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:04:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7992
2025-08-31 20:04:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.00e-05
2025-08-31 20:04:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:05:25 - pico-train - INFO - Step 107400 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:05:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8061
2025-08-31 20:05:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.99e-05
2025-08-31 20:05:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:06:20 - pico-train - INFO - Step 107500 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:06:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7807
2025-08-31 20:06:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.97e-05
2025-08-31 20:06:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:07:15 - pico-train - INFO - Step 107600 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:07:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8012
2025-08-31 20:07:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.95e-05
2025-08-31 20:07:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:08:11 - pico-train - INFO - Step 107700 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:08:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7822
2025-08-31 20:08:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.94e-05
2025-08-31 20:08:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:09:06 - pico-train - INFO - Step 107800 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:09:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7630
2025-08-31 20:09:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.92e-05
2025-08-31 20:09:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:10:01 - pico-train - INFO - Step 107900 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:10:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7928
2025-08-31 20:10:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.91e-05
2025-08-31 20:10:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:10:56 - pico-train - INFO - Step 108000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 20:48:49 - pico-train - INFO - Step 108000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 20:48:49 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 20:48:50 - pico-train - INFO - Step 108000 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:48:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7865
2025-08-31 20:48:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.89e-05
2025-08-31 20:48:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:48:50 - pico-train - INFO - Step 108000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 20:49:48 - pico-train - INFO - Step 108100 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:49:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7775
2025-08-31 20:49:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.88e-05
2025-08-31 20:49:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:50:43 - pico-train - INFO - Step 108200 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:50:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7740
2025-08-31 20:50:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.86e-05
2025-08-31 20:50:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:51:38 - pico-train - INFO - Step 108300 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:51:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7898
2025-08-31 20:51:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.84e-05
2025-08-31 20:51:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:52:35 - pico-train - INFO - Step 108400 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:52:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8049
2025-08-31 20:52:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.83e-05
2025-08-31 20:52:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:53:30 - pico-train - INFO - Step 108500 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:53:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7662
2025-08-31 20:53:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.81e-05
2025-08-31 20:53:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:54:25 - pico-train - INFO - Step 108600 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:54:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7673
2025-08-31 20:54:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.80e-05
2025-08-31 20:54:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:55:21 - pico-train - INFO - Step 108700 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:55:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8010
2025-08-31 20:55:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.78e-05
2025-08-31 20:55:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:56:17 - pico-train - INFO - Step 108800 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:56:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7886
2025-08-31 20:56:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.77e-05
2025-08-31 20:56:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:57:11 - pico-train - INFO - Step 108900 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:57:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7790
2025-08-31 20:57:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.75e-05
2025-08-31 20:57:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:58:07 - pico-train - INFO - Step 109000 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:58:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7760
2025-08-31 20:58:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.73e-05
2025-08-31 20:58:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:59:02 - pico-train - INFO - Step 109100 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:59:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7922
2025-08-31 20:59:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.72e-05
2025-08-31 20:59:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 20:59:57 - pico-train - INFO - Step 109200 -- ๐Ÿ”„ Training Metrics
2025-08-31 20:59:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7920
2025-08-31 20:59:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.70e-05
2025-08-31 20:59:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:00:52 - pico-train - INFO - Step 109300 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:00:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8020
2025-08-31 21:00:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.69e-05
2025-08-31 21:00:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:01:47 - pico-train - INFO - Step 109400 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:01:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7930
2025-08-31 21:01:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.67e-05
2025-08-31 21:01:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:02:42 - pico-train - INFO - Step 109500 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:02:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8030
2025-08-31 21:02:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.66e-05
2025-08-31 21:02:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:03:37 - pico-train - INFO - Step 109600 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:03:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7972
2025-08-31 21:03:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.64e-05
2025-08-31 21:03:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:04:33 - pico-train - INFO - Step 109700 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:04:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7616
2025-08-31 21:04:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.62e-05
2025-08-31 21:04:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:05:28 - pico-train - INFO - Step 109800 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:05:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7960
2025-08-31 21:05:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.61e-05
2025-08-31 21:05:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:06:23 - pico-train - INFO - Step 109900 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:06:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7931
2025-08-31 21:06:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.59e-05
2025-08-31 21:06:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:07:17 - pico-train - INFO - Step 110000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 21:46:40 - pico-train - INFO - Step 110000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 21:46:40 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 21:46:40 - pico-train - INFO - Step 110000 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:46:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7744
2025-08-31 21:46:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.58e-05
2025-08-31 21:46:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:46:40 - pico-train - INFO - Step 110000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 21:47:38 - pico-train - INFO - Step 110100 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:47:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7719
2025-08-31 21:47:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.56e-05
2025-08-31 21:47:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:48:33 - pico-train - INFO - Step 110200 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:48:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7851
2025-08-31 21:48:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.55e-05
2025-08-31 21:48:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:49:28 - pico-train - INFO - Step 110300 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:49:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7918
2025-08-31 21:49:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.53e-05
2025-08-31 21:49:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:50:23 - pico-train - INFO - Step 110400 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:50:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7676
2025-08-31 21:50:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.51e-05
2025-08-31 21:50:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:51:18 - pico-train - INFO - Step 110500 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:51:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7964
2025-08-31 21:51:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.50e-05
2025-08-31 21:51:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:52:14 - pico-train - INFO - Step 110600 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:52:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7715
2025-08-31 21:52:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.48e-05
2025-08-31 21:52:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:53:09 - pico-train - INFO - Step 110700 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:53:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7874
2025-08-31 21:53:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.47e-05
2025-08-31 21:53:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:54:04 - pico-train - INFO - Step 110800 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:54:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7858
2025-08-31 21:54:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.45e-05
2025-08-31 21:54:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:54:59 - pico-train - INFO - Step 110900 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:54:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8073
2025-08-31 21:54:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.44e-05
2025-08-31 21:54:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:55:55 - pico-train - INFO - Step 111000 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:55:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7721
2025-08-31 21:55:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.42e-05
2025-08-31 21:55:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:56:50 - pico-train - INFO - Step 111100 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:56:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7713
2025-08-31 21:56:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.40e-05
2025-08-31 21:56:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:57:45 - pico-train - INFO - Step 111200 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:57:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7687
2025-08-31 21:57:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.39e-05
2025-08-31 21:57:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:58:40 - pico-train - INFO - Step 111300 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:58:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7715
2025-08-31 21:58:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.37e-05
2025-08-31 21:58:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 21:59:36 - pico-train - INFO - Step 111400 -- ๐Ÿ”„ Training Metrics
2025-08-31 21:59:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7610
2025-08-31 21:59:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.36e-05
2025-08-31 21:59:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:00:31 - pico-train - INFO - Step 111500 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:00:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7668
2025-08-31 22:00:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.34e-05
2025-08-31 22:00:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:01:26 - pico-train - INFO - Step 111600 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:01:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7816
2025-08-31 22:01:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.33e-05
2025-08-31 22:01:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:02:21 - pico-train - INFO - Step 111700 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:02:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8025
2025-08-31 22:02:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.31e-05
2025-08-31 22:02:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:03:16 - pico-train - INFO - Step 111800 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:03:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7870
2025-08-31 22:03:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.29e-05
2025-08-31 22:03:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:04:12 - pico-train - INFO - Step 111900 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:04:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7972
2025-08-31 22:04:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.28e-05
2025-08-31 22:04:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:05:06 - pico-train - INFO - Step 112000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 22:43:48 - pico-train - INFO - Step 112000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 22:43:48 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 22:43:49 - pico-train - INFO - Step 112000 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:43:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7665
2025-08-31 22:43:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.26e-05
2025-08-31 22:43:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:43:49 - pico-train - INFO - Step 112000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 22:44:45 - pico-train - INFO - Step 112100 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:44:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7908
2025-08-31 22:44:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.25e-05
2025-08-31 22:44:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:45:39 - pico-train - INFO - Step 112200 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:45:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7975
2025-08-31 22:45:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.23e-05
2025-08-31 22:45:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:46:33 - pico-train - INFO - Step 112300 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:46:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7640
2025-08-31 22:46:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.22e-05
2025-08-31 22:46:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:47:27 - pico-train - INFO - Step 112400 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:47:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7924
2025-08-31 22:47:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.20e-05
2025-08-31 22:47:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:48:21 - pico-train - INFO - Step 112500 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:48:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7599
2025-08-31 22:48:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.19e-05
2025-08-31 22:48:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:49:15 - pico-train - INFO - Step 112600 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:49:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7767
2025-08-31 22:49:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.17e-05
2025-08-31 22:49:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:50:10 - pico-train - INFO - Step 112700 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:50:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7574
2025-08-31 22:50:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.15e-05
2025-08-31 22:50:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:51:04 - pico-train - INFO - Step 112800 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:51:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7592
2025-08-31 22:51:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.14e-05
2025-08-31 22:51:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:51:58 - pico-train - INFO - Step 112900 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:51:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7704
2025-08-31 22:51:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.12e-05
2025-08-31 22:51:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:52:51 - pico-train - INFO - Step 113000 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:52:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7620
2025-08-31 22:52:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.11e-05
2025-08-31 22:52:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:53:46 - pico-train - INFO - Step 113100 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:53:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7904
2025-08-31 22:53:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.09e-05
2025-08-31 22:53:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:54:40 - pico-train - INFO - Step 113200 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:54:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7851
2025-08-31 22:54:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.08e-05
2025-08-31 22:54:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:55:34 - pico-train - INFO - Step 113300 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:55:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7766
2025-08-31 22:55:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.06e-05
2025-08-31 22:55:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:56:28 - pico-train - INFO - Step 113400 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:56:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7659
2025-08-31 22:56:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.05e-05
2025-08-31 22:56:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:57:22 - pico-train - INFO - Step 113500 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:57:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7723
2025-08-31 22:57:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.03e-05
2025-08-31 22:57:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:58:16 - pico-train - INFO - Step 113600 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:58:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7792
2025-08-31 22:58:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.01e-05
2025-08-31 22:58:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 22:59:10 - pico-train - INFO - Step 113700 -- ๐Ÿ”„ Training Metrics
2025-08-31 22:59:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7774
2025-08-31 22:59:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.00e-05
2025-08-31 22:59:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:00:05 - pico-train - INFO - Step 113800 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:00:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7757
2025-08-31 23:00:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.98e-05
2025-08-31 23:00:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:01:00 - pico-train - INFO - Step 113900 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:01:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7841
2025-08-31 23:01:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.97e-05
2025-08-31 23:01:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:01:55 - pico-train - INFO - Step 114000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 23:42:10 - pico-train - INFO - Step 114000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 23:42:10 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 23:42:11 - pico-train - INFO - Step 114000 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:42:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7682
2025-08-31 23:42:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.95e-05
2025-08-31 23:42:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:42:11 - pico-train - INFO - Step 114000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 23:43:07 - pico-train - INFO - Step 114100 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:43:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7732
2025-08-31 23:43:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.94e-05
2025-08-31 23:43:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:44:01 - pico-train - INFO - Step 114200 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:44:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7557
2025-08-31 23:44:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.92e-05
2025-08-31 23:44:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:44:55 - pico-train - INFO - Step 114300 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:44:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7418
2025-08-31 23:44:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.91e-05
2025-08-31 23:44:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:45:49 - pico-train - INFO - Step 114400 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:45:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7622
2025-08-31 23:45:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.89e-05
2025-08-31 23:45:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:46:44 - pico-train - INFO - Step 114500 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:46:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7745
2025-08-31 23:46:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.87e-05
2025-08-31 23:46:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:47:38 - pico-train - INFO - Step 114600 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:47:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7669
2025-08-31 23:47:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.86e-05
2025-08-31 23:47:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:48:32 - pico-train - INFO - Step 114700 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:48:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7675
2025-08-31 23:48:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.84e-05
2025-08-31 23:48:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:49:26 - pico-train - INFO - Step 114800 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:49:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7752
2025-08-31 23:49:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.83e-05
2025-08-31 23:49:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:50:20 - pico-train - INFO - Step 114900 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:50:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7719
2025-08-31 23:50:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.81e-05
2025-08-31 23:50:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:51:14 - pico-train - INFO - Step 115000 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:51:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7845
2025-08-31 23:51:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.80e-05
2025-08-31 23:51:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:52:10 - pico-train - INFO - Step 115100 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:52:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7656
2025-08-31 23:52:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.78e-05
2025-08-31 23:52:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:53:05 - pico-train - INFO - Step 115200 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:53:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7658
2025-08-31 23:53:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.77e-05
2025-08-31 23:53:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:54:00 - pico-train - INFO - Step 115300 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:54:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7655
2025-08-31 23:54:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.75e-05
2025-08-31 23:54:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:54:55 - pico-train - INFO - Step 115400 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:54:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7857
2025-08-31 23:54:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.74e-05
2025-08-31 23:54:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:55:50 - pico-train - INFO - Step 115500 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:55:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7564
2025-08-31 23:55:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.72e-05
2025-08-31 23:55:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:56:45 - pico-train - INFO - Step 115600 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:56:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7920
2025-08-31 23:56:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.70e-05
2025-08-31 23:56:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:57:40 - pico-train - INFO - Step 115700 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:57:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7652
2025-08-31 23:57:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.69e-05
2025-08-31 23:57:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:58:35 - pico-train - INFO - Step 115800 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:58:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7682
2025-08-31 23:58:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.67e-05
2025-08-31 23:58:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 23:59:30 - pico-train - INFO - Step 115900 -- ๐Ÿ”„ Training Metrics
2025-08-31 23:59:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7581
2025-08-31 23:59:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.66e-05
2025-08-31 23:59:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:00:25 - pico-train - INFO - Step 116000 -- ๐Ÿ’พ Saving Checkpoint
2025-09-01 00:38:43 - pico-train - INFO - Step 116000 -- ๐Ÿ“Š Evaluation Results
2025-09-01 00:38:43 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-09-01 00:38:44 - pico-train - INFO - Step 116000 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:38:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7403
2025-09-01 00:38:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.64e-05
2025-09-01 00:38:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:38:44 - pico-train - INFO - Step 116000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-09-01 00:39:41 - pico-train - INFO - Step 116100 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:39:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7747
2025-09-01 00:39:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.63e-05
2025-09-01 00:39:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:40:35 - pico-train - INFO - Step 116200 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:40:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7540
2025-09-01 00:40:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.61e-05
2025-09-01 00:40:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:41:28 - pico-train - INFO - Step 116300 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:41:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7835
2025-09-01 00:41:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.60e-05
2025-09-01 00:41:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:42:23 - pico-train - INFO - Step 116400 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:42:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7547
2025-09-01 00:42:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.58e-05
2025-09-01 00:42:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:43:17 - pico-train - INFO - Step 116500 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:43:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7698
2025-09-01 00:43:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.57e-05
2025-09-01 00:43:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:44:11 - pico-train - INFO - Step 116600 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:44:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7806
2025-09-01 00:44:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.55e-05
2025-09-01 00:44:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:45:05 - pico-train - INFO - Step 116700 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:45:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7499
2025-09-01 00:45:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.53e-05
2025-09-01 00:45:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:45:59 - pico-train - INFO - Step 116800 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:45:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7724
2025-09-01 00:45:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.52e-05
2025-09-01 00:45:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:46:54 - pico-train - INFO - Step 116900 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:46:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7790
2025-09-01 00:46:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.50e-05
2025-09-01 00:46:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:48:04 - pico-train - INFO - Step 117000 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:48:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7729
2025-09-01 00:48:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.49e-05
2025-09-01 00:48:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:49:16 - pico-train - INFO - Step 117100 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:49:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7553
2025-09-01 00:49:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.47e-05
2025-09-01 00:49:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:50:30 - pico-train - INFO - Step 117200 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:50:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7685
2025-09-01 00:50:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.46e-05
2025-09-01 00:50:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:51:43 - pico-train - INFO - Step 117300 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:51:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7522
2025-09-01 00:51:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.44e-05
2025-09-01 00:51:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:52:56 - pico-train - INFO - Step 117400 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:52:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7770
2025-09-01 00:52:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.43e-05
2025-09-01 00:52:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:54:05 - pico-train - INFO - Step 117500 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:54:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7848
2025-09-01 00:54:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.41e-05
2025-09-01 00:54:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:55:00 - pico-train - INFO - Step 117600 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:55:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7610
2025-09-01 00:55:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.40e-05
2025-09-01 00:55:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:55:56 - pico-train - INFO - Step 117700 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:55:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7598
2025-09-01 00:55:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.38e-05
2025-09-01 00:55:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:56:52 - pico-train - INFO - Step 117800 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:56:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7555
2025-09-01 00:56:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.37e-05
2025-09-01 00:56:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:57:47 - pico-train - INFO - Step 117900 -- ๐Ÿ”„ Training Metrics
2025-09-01 00:57:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7664
2025-09-01 00:57:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.35e-05
2025-09-01 00:57:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-09-01 00:58:41 - pico-train - INFO - Step 118000 -- ๐Ÿ’พ Saving Checkpoint