ThomasTheMaker's picture
Upload folder using huggingface_hub
ce2c393 verified
2025-08-29 22:50:26 - pico-train - INFO - Step 20000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.8399778163273925e+24
2025-08-29 22:50:26 - pico-train - INFO - ==================================================
2025-08-29 22:50:26 - pico-train - INFO - โœจ Training Configuration
2025-08-29 22:50:26 - pico-train - INFO - ==================================================
2025-08-29 22:50:26 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma5M-v1 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ save_every_n_steps: 500 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ batch_size: 4 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ name: ThomasTheMaker/pretokenized-dolma-5M โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ log_every_n_steps: 25 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ max_steps: 20000 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 4 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ lr: 5.0e-05 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ lr_warmup_steps: 8000 โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ”‚ โ”‚
2025-08-29 22:50:26 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-29 22:50:26 - pico-train - INFO - ==================================================
2025-08-29 22:50:26 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-29 22:50:26 - pico-train - INFO - ==================================================
2025-08-29 22:50:26 - pico-train - INFO - Starting from step: 20000
2025-08-29 22:50:26 - pico-train - INFO - Model Setup:
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-29 22:50:26 - pico-train - INFO - Distributed Setup:
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA GeForce RTX 5090
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Available Memory: 33.68 GB
2025-08-29 22:50:26 - pico-train - INFO - Software Setup:
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Python Version: 3.10.12
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-63-generic
2025-08-29 22:50:26 - pico-train - INFO - Batch Size Configuration:
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Global Batch Size: 4
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 1
2025-08-29 22:50:26 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 4
2025-08-29 22:50:26 - pico-train - INFO - ==================================================
2025-08-29 22:50:27 - pico-train - INFO - Step 20000 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:50:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5103
2025-08-29 22:50:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 22:50:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:50:27 - pico-train - INFO - Step 20000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 22:50:43 - pico-train - INFO - Step 20025 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:50:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4274
2025-08-29 22:50:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.45e-05
2025-08-29 22:50:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:50:55 - pico-train - INFO - Step 20050 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:50:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3770
2025-08-29 22:50:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.45e-05
2025-08-29 22:50:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:51:08 - pico-train - INFO - Step 20075 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:51:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2797
2025-08-29 22:51:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.44e-05
2025-08-29 22:51:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:51:21 - pico-train - INFO - Step 20100 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:51:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3924
2025-08-29 22:51:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.43e-05
2025-08-29 22:51:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:51:34 - pico-train - INFO - Step 20125 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:51:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4442
2025-08-29 22:51:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.43e-05
2025-08-29 22:51:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:51:47 - pico-train - INFO - Step 20150 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:51:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3881
2025-08-29 22:51:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.42e-05
2025-08-29 22:51:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:52:00 - pico-train - INFO - Step 20175 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:52:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4008
2025-08-29 22:52:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.42e-05
2025-08-29 22:52:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:52:12 - pico-train - INFO - Step 20200 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:52:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4257
2025-08-29 22:52:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.41e-05
2025-08-29 22:52:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:52:25 - pico-train - INFO - Step 20225 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:52:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4125
2025-08-29 22:52:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.41e-05
2025-08-29 22:52:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:52:38 - pico-train - INFO - Step 20250 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:52:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3390
2025-08-29 22:52:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.40e-05
2025-08-29 22:52:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:52:50 - pico-train - INFO - Step 20275 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:52:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3328
2025-08-29 22:52:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.39e-05
2025-08-29 22:52:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:53:03 - pico-train - INFO - Step 20300 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:53:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3035
2025-08-29 22:53:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.39e-05
2025-08-29 22:53:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:53:16 - pico-train - INFO - Step 20325 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:53:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2862
2025-08-29 22:53:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.38e-05
2025-08-29 22:53:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:53:28 - pico-train - INFO - Step 20350 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:53:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4249
2025-08-29 22:53:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.38e-05
2025-08-29 22:53:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:53:41 - pico-train - INFO - Step 20375 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:53:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3582
2025-08-29 22:53:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.37e-05
2025-08-29 22:53:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:53:54 - pico-train - INFO - Step 20400 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:53:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3195
2025-08-29 22:53:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.37e-05
2025-08-29 22:53:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:54:07 - pico-train - INFO - Step 20425 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:54:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4802
2025-08-29 22:54:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.36e-05
2025-08-29 22:54:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:54:22 - pico-train - INFO - Step 20450 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:54:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3126
2025-08-29 22:54:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.35e-05
2025-08-29 22:54:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:54:35 - pico-train - INFO - Step 20475 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:54:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4323
2025-08-29 22:54:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.35e-05
2025-08-29 22:54:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:54:50 - pico-train - INFO - Step 20500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 22:59:37 - pico-train - INFO - Step 20500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 22:59:37 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.281028602870165e+24
2025-08-29 22:59:42 - pico-train - INFO - Step 20500 -- ๐Ÿ”„ Training Metrics
2025-08-29 22:59:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4138
2025-08-29 22:59:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.34e-05
2025-08-29 22:59:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 22:59:42 - pico-train - INFO - Step 20500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:00:21 - pico-train - INFO - Step 20525 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:00:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3971
2025-08-29 23:00:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.34e-05
2025-08-29 23:00:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:00:53 - pico-train - INFO - Step 20550 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:00:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3632
2025-08-29 23:00:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.33e-05
2025-08-29 23:00:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:01:27 - pico-train - INFO - Step 20575 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:01:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4202
2025-08-29 23:01:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.32e-05
2025-08-29 23:01:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:02:01 - pico-train - INFO - Step 20600 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:02:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4792
2025-08-29 23:02:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.32e-05
2025-08-29 23:02:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:02:34 - pico-train - INFO - Step 20625 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:02:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3213
2025-08-29 23:02:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.31e-05
2025-08-29 23:02:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:03:09 - pico-train - INFO - Step 20650 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:03:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4173
2025-08-29 23:03:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.31e-05
2025-08-29 23:03:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:03:43 - pico-train - INFO - Step 20675 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:03:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4062
2025-08-29 23:03:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.30e-05
2025-08-29 23:03:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:04:19 - pico-train - INFO - Step 20700 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:04:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3742
2025-08-29 23:04:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.30e-05
2025-08-29 23:04:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:04:56 - pico-train - INFO - Step 20725 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:04:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3820
2025-08-29 23:04:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.29e-05
2025-08-29 23:04:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:05:17 - pico-train - INFO - Step 20750 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:05:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3374
2025-08-29 23:05:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.28e-05
2025-08-29 23:05:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:05:30 - pico-train - INFO - Step 20775 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:05:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4028
2025-08-29 23:05:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.28e-05
2025-08-29 23:05:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:05:43 - pico-train - INFO - Step 20800 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:05:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3732
2025-08-29 23:05:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.27e-05
2025-08-29 23:05:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:05:55 - pico-train - INFO - Step 20825 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:05:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3486
2025-08-29 23:05:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.27e-05
2025-08-29 23:05:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:06:08 - pico-train - INFO - Step 20850 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:06:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3611
2025-08-29 23:06:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.26e-05
2025-08-29 23:06:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:06:21 - pico-train - INFO - Step 20875 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:06:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3278
2025-08-29 23:06:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.26e-05
2025-08-29 23:06:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:06:33 - pico-train - INFO - Step 20900 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:06:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3287
2025-08-29 23:06:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.25e-05
2025-08-29 23:06:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:06:46 - pico-train - INFO - Step 20925 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:06:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3276
2025-08-29 23:06:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.24e-05
2025-08-29 23:06:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:06:58 - pico-train - INFO - Step 20950 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:06:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4450
2025-08-29 23:06:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.24e-05
2025-08-29 23:06:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:07:11 - pico-train - INFO - Step 20975 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:07:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4429
2025-08-29 23:07:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.23e-05
2025-08-29 23:07:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:07:23 - pico-train - INFO - Step 21000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:09:25 - pico-train - INFO - Step 21000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:09:25 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.816115022517074e+24
2025-08-29 23:09:28 - pico-train - INFO - Step 21000 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:09:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2970
2025-08-29 23:09:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.23e-05
2025-08-29 23:09:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:09:28 - pico-train - INFO - Step 21000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:09:43 - pico-train - INFO - Step 21025 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:09:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3206
2025-08-29 23:09:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.22e-05
2025-08-29 23:09:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:09:56 - pico-train - INFO - Step 21050 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:09:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3337
2025-08-29 23:09:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.21e-05
2025-08-29 23:09:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:10:08 - pico-train - INFO - Step 21075 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:10:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3274
2025-08-29 23:10:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.21e-05
2025-08-29 23:10:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:10:21 - pico-train - INFO - Step 21100 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:10:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4202
2025-08-29 23:10:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.20e-05
2025-08-29 23:10:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:10:33 - pico-train - INFO - Step 21125 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:10:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3698
2025-08-29 23:10:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.20e-05
2025-08-29 23:10:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:10:46 - pico-train - INFO - Step 21150 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:10:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2671
2025-08-29 23:10:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.19e-05
2025-08-29 23:10:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:10:59 - pico-train - INFO - Step 21175 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:10:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4334
2025-08-29 23:10:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.18e-05
2025-08-29 23:10:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:11:11 - pico-train - INFO - Step 21200 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:11:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4208
2025-08-29 23:11:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.18e-05
2025-08-29 23:11:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:11:24 - pico-train - INFO - Step 21225 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:11:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3380
2025-08-29 23:11:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.17e-05
2025-08-29 23:11:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:11:37 - pico-train - INFO - Step 21250 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:11:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3026
2025-08-29 23:11:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.17e-05
2025-08-29 23:11:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:11:49 - pico-train - INFO - Step 21275 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:11:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3123
2025-08-29 23:11:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.16e-05
2025-08-29 23:11:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:12:02 - pico-train - INFO - Step 21300 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:12:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2566
2025-08-29 23:12:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.15e-05
2025-08-29 23:12:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:12:15 - pico-train - INFO - Step 21325 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:12:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2697
2025-08-29 23:12:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.15e-05
2025-08-29 23:12:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:12:27 - pico-train - INFO - Step 21350 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:12:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2998
2025-08-29 23:12:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.14e-05
2025-08-29 23:12:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:12:40 - pico-train - INFO - Step 21375 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:12:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3903
2025-08-29 23:12:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.14e-05
2025-08-29 23:12:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:12:52 - pico-train - INFO - Step 21400 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:12:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2831
2025-08-29 23:12:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.13e-05
2025-08-29 23:12:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:13:05 - pico-train - INFO - Step 21425 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:13:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3768
2025-08-29 23:13:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.13e-05
2025-08-29 23:13:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:13:18 - pico-train - INFO - Step 21450 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:13:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3917
2025-08-29 23:13:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.12e-05
2025-08-29 23:13:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:13:30 - pico-train - INFO - Step 21475 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:13:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3183
2025-08-29 23:13:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.11e-05
2025-08-29 23:13:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:13:43 - pico-train - INFO - Step 21500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:15:44 - pico-train - INFO - Step 21500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:15:44 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.18596463935147e+24
2025-08-29 23:15:47 - pico-train - INFO - Step 21500 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:15:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3327
2025-08-29 23:15:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.11e-05
2025-08-29 23:15:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:15:47 - pico-train - INFO - Step 21500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:16:02 - pico-train - INFO - Step 21525 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:16:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3111
2025-08-29 23:16:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.10e-05
2025-08-29 23:16:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:16:14 - pico-train - INFO - Step 21550 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:16:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2823
2025-08-29 23:16:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.10e-05
2025-08-29 23:16:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:16:27 - pico-train - INFO - Step 21575 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:16:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3073
2025-08-29 23:16:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.09e-05
2025-08-29 23:16:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:16:40 - pico-train - INFO - Step 21600 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:16:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3168
2025-08-29 23:16:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.08e-05
2025-08-29 23:16:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:16:52 - pico-train - INFO - Step 21625 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:16:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3106
2025-08-29 23:16:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.08e-05
2025-08-29 23:16:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:17:05 - pico-train - INFO - Step 21650 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:17:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3128
2025-08-29 23:17:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.07e-05
2025-08-29 23:17:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:17:18 - pico-train - INFO - Step 21675 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:17:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2762
2025-08-29 23:17:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.07e-05
2025-08-29 23:17:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:17:30 - pico-train - INFO - Step 21700 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:17:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3577
2025-08-29 23:17:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.06e-05
2025-08-29 23:17:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:17:43 - pico-train - INFO - Step 21725 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:17:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3495
2025-08-29 23:17:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.05e-05
2025-08-29 23:17:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:17:56 - pico-train - INFO - Step 21750 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:17:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3331
2025-08-29 23:17:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.05e-05
2025-08-29 23:17:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:18:08 - pico-train - INFO - Step 21775 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:18:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3146
2025-08-29 23:18:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.04e-05
2025-08-29 23:18:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:18:21 - pico-train - INFO - Step 21800 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:18:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3567
2025-08-29 23:18:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.04e-05
2025-08-29 23:18:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:18:33 - pico-train - INFO - Step 21825 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:18:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3185
2025-08-29 23:18:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.03e-05
2025-08-29 23:18:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:18:46 - pico-train - INFO - Step 21850 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:18:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3087
2025-08-29 23:18:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.02e-05
2025-08-29 23:18:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:18:59 - pico-train - INFO - Step 21875 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:18:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3817
2025-08-29 23:18:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.02e-05
2025-08-29 23:18:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:19:12 - pico-train - INFO - Step 21900 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:19:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3398
2025-08-29 23:19:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.01e-05
2025-08-29 23:19:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:19:25 - pico-train - INFO - Step 21925 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:19:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4012
2025-08-29 23:19:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.01e-05
2025-08-29 23:19:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:19:37 - pico-train - INFO - Step 21950 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:19:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3352
2025-08-29 23:19:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-05
2025-08-29 23:19:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:19:50 - pico-train - INFO - Step 21975 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:19:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3857
2025-08-29 23:19:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-05
2025-08-29 23:19:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:20:02 - pico-train - INFO - Step 22000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:22:06 - pico-train - INFO - Step 22000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:22:06 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.840233924864941e+24
2025-08-29 23:22:08 - pico-train - INFO - Step 22000 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:22:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3421
2025-08-29 23:22:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-05
2025-08-29 23:22:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:22:08 - pico-train - INFO - Step 22000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:22:24 - pico-train - INFO - Step 22025 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:22:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4107
2025-08-29 23:22:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-05
2025-08-29 23:22:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:22:36 - pico-train - INFO - Step 22050 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:22:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3296
2025-08-29 23:22:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-05
2025-08-29 23:22:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:22:49 - pico-train - INFO - Step 22075 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:22:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2576
2025-08-29 23:22:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-05
2025-08-29 23:22:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:23:01 - pico-train - INFO - Step 22100 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:23:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2705
2025-08-29 23:23:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-05
2025-08-29 23:23:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:23:14 - pico-train - INFO - Step 22125 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:23:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2784
2025-08-29 23:23:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-05
2025-08-29 23:23:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:23:27 - pico-train - INFO - Step 22150 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:23:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3673
2025-08-29 23:23:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-05
2025-08-29 23:23:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:23:39 - pico-train - INFO - Step 22175 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:23:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3914
2025-08-29 23:23:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-05
2025-08-29 23:23:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:23:52 - pico-train - INFO - Step 22200 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:23:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3081
2025-08-29 23:23:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-05
2025-08-29 23:23:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:24:05 - pico-train - INFO - Step 22225 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:24:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4045
2025-08-29 23:24:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-05
2025-08-29 23:24:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:24:17 - pico-train - INFO - Step 22250 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:24:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3830
2025-08-29 23:24:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-05
2025-08-29 23:24:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:24:30 - pico-train - INFO - Step 22275 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:24:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2955
2025-08-29 23:24:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-05
2025-08-29 23:24:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:24:43 - pico-train - INFO - Step 22300 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:24:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3121
2025-08-29 23:24:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-05
2025-08-29 23:24:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:24:56 - pico-train - INFO - Step 22325 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:24:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3725
2025-08-29 23:24:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-05
2025-08-29 23:24:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:25:08 - pico-train - INFO - Step 22350 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:25:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3311
2025-08-29 23:25:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-05
2025-08-29 23:25:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:25:21 - pico-train - INFO - Step 22375 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:25:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2346
2025-08-29 23:25:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-05
2025-08-29 23:25:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:25:33 - pico-train - INFO - Step 22400 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:25:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3869
2025-08-29 23:25:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-05
2025-08-29 23:25:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:25:46 - pico-train - INFO - Step 22425 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:25:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3370
2025-08-29 23:25:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-05
2025-08-29 23:25:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:25:59 - pico-train - INFO - Step 22450 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:25:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3366
2025-08-29 23:25:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.88e-05
2025-08-29 23:25:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:26:11 - pico-train - INFO - Step 22475 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:26:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3641
2025-08-29 23:26:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.87e-05
2025-08-29 23:26:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:26:23 - pico-train - INFO - Step 22500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:28:22 - pico-train - INFO - Step 22500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:28:22 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.0171611158112828e+25
2025-08-29 23:28:23 - pico-train - INFO - Step 22500 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:28:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2880
2025-08-29 23:28:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.87e-05
2025-08-29 23:28:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:28:23 - pico-train - INFO - Step 22500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:28:39 - pico-train - INFO - Step 22525 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:28:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2955
2025-08-29 23:28:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.86e-05
2025-08-29 23:28:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:28:51 - pico-train - INFO - Step 22550 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:28:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3124
2025-08-29 23:28:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.85e-05
2025-08-29 23:28:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:29:04 - pico-train - INFO - Step 22575 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:29:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3214
2025-08-29 23:29:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.85e-05
2025-08-29 23:29:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:29:17 - pico-train - INFO - Step 22600 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:29:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2929
2025-08-29 23:29:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.84e-05
2025-08-29 23:29:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:29:29 - pico-train - INFO - Step 22625 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:29:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3454
2025-08-29 23:29:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.84e-05
2025-08-29 23:29:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:29:42 - pico-train - INFO - Step 22650 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:29:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2994
2025-08-29 23:29:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.83e-05
2025-08-29 23:29:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:29:55 - pico-train - INFO - Step 22675 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:29:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3245
2025-08-29 23:29:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.82e-05
2025-08-29 23:29:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:30:07 - pico-train - INFO - Step 22700 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:30:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1874
2025-08-29 23:30:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.82e-05
2025-08-29 23:30:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:30:20 - pico-train - INFO - Step 22725 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:30:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2636
2025-08-29 23:30:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.81e-05
2025-08-29 23:30:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:30:32 - pico-train - INFO - Step 22750 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:30:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3870
2025-08-29 23:30:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.81e-05
2025-08-29 23:30:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:30:45 - pico-train - INFO - Step 22775 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:30:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3157
2025-08-29 23:30:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.80e-05
2025-08-29 23:30:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:30:57 - pico-train - INFO - Step 22800 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:30:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3617
2025-08-29 23:30:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.79e-05
2025-08-29 23:30:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:31:10 - pico-train - INFO - Step 22825 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:31:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3006
2025-08-29 23:31:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.79e-05
2025-08-29 23:31:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:31:23 - pico-train - INFO - Step 22850 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:31:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2552
2025-08-29 23:31:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.78e-05
2025-08-29 23:31:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:31:35 - pico-train - INFO - Step 22875 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:31:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3537
2025-08-29 23:31:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.78e-05
2025-08-29 23:31:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:31:48 - pico-train - INFO - Step 22900 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:31:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4096
2025-08-29 23:31:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.77e-05
2025-08-29 23:31:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:32:01 - pico-train - INFO - Step 22925 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:32:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2037
2025-08-29 23:32:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.76e-05
2025-08-29 23:32:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:32:13 - pico-train - INFO - Step 22950 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:32:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3007
2025-08-29 23:32:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.76e-05
2025-08-29 23:32:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:32:26 - pico-train - INFO - Step 22975 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:32:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2575
2025-08-29 23:32:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.75e-05
2025-08-29 23:32:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:32:38 - pico-train - INFO - Step 23000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:34:52 - pico-train - INFO - Step 23000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:34:52 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.3786488388612157e+25
2025-08-29 23:34:53 - pico-train - INFO - Step 23000 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:34:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4702
2025-08-29 23:34:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.75e-05
2025-08-29 23:34:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:34:53 - pico-train - INFO - Step 23000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:35:08 - pico-train - INFO - Step 23025 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:35:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3198
2025-08-29 23:35:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.74e-05
2025-08-29 23:35:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:35:21 - pico-train - INFO - Step 23050 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:35:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3015
2025-08-29 23:35:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.73e-05
2025-08-29 23:35:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:35:33 - pico-train - INFO - Step 23075 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:35:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3222
2025-08-29 23:35:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.73e-05
2025-08-29 23:35:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:35:46 - pico-train - INFO - Step 23100 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:35:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2917
2025-08-29 23:35:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.72e-05
2025-08-29 23:35:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:35:59 - pico-train - INFO - Step 23125 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:35:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3574
2025-08-29 23:35:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.71e-05
2025-08-29 23:35:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:36:11 - pico-train - INFO - Step 23150 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:36:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2434
2025-08-29 23:36:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.71e-05
2025-08-29 23:36:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:36:24 - pico-train - INFO - Step 23175 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:36:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2580
2025-08-29 23:36:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.70e-05
2025-08-29 23:36:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:36:36 - pico-train - INFO - Step 23200 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:36:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3214
2025-08-29 23:36:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.70e-05
2025-08-29 23:36:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:36:49 - pico-train - INFO - Step 23225 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:36:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2731
2025-08-29 23:36:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.69e-05
2025-08-29 23:36:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:37:02 - pico-train - INFO - Step 23250 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:37:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3255
2025-08-29 23:37:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.68e-05
2025-08-29 23:37:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:37:14 - pico-train - INFO - Step 23275 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:37:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3348
2025-08-29 23:37:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.68e-05
2025-08-29 23:37:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:37:27 - pico-train - INFO - Step 23300 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:37:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3476
2025-08-29 23:37:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.67e-05
2025-08-29 23:37:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:37:39 - pico-train - INFO - Step 23325 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:37:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3392
2025-08-29 23:37:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.67e-05
2025-08-29 23:37:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:37:52 - pico-train - INFO - Step 23350 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:37:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3051
2025-08-29 23:37:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.66e-05
2025-08-29 23:37:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:38:05 - pico-train - INFO - Step 23375 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:38:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2683
2025-08-29 23:38:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.65e-05
2025-08-29 23:38:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:38:17 - pico-train - INFO - Step 23400 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:38:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2929
2025-08-29 23:38:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.65e-05
2025-08-29 23:38:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:38:30 - pico-train - INFO - Step 23425 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:38:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3546
2025-08-29 23:38:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.64e-05
2025-08-29 23:38:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:38:42 - pico-train - INFO - Step 23450 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:38:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3572
2025-08-29 23:38:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.63e-05
2025-08-29 23:38:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:38:55 - pico-train - INFO - Step 23475 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:38:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2350
2025-08-29 23:38:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.63e-05
2025-08-29 23:38:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:39:07 - pico-train - INFO - Step 23500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:41:03 - pico-train - INFO - Step 23500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:41:03 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.5734245831645979e+25
2025-08-29 23:41:04 - pico-train - INFO - Step 23500 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:41:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3544
2025-08-29 23:41:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.62e-05
2025-08-29 23:41:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:41:04 - pico-train - INFO - Step 23500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:41:19 - pico-train - INFO - Step 23525 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:41:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2607
2025-08-29 23:41:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.62e-05
2025-08-29 23:41:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:41:32 - pico-train - INFO - Step 23550 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:41:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2912
2025-08-29 23:41:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.61e-05
2025-08-29 23:41:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:41:45 - pico-train - INFO - Step 23575 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:41:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2348
2025-08-29 23:41:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.60e-05
2025-08-29 23:41:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:41:57 - pico-train - INFO - Step 23600 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:41:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2372
2025-08-29 23:41:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.60e-05
2025-08-29 23:41:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:42:10 - pico-train - INFO - Step 23625 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:42:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3467
2025-08-29 23:42:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.59e-05
2025-08-29 23:42:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:42:22 - pico-train - INFO - Step 23650 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:42:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2611
2025-08-29 23:42:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.59e-05
2025-08-29 23:42:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:42:35 - pico-train - INFO - Step 23675 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:42:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2587
2025-08-29 23:42:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.58e-05
2025-08-29 23:42:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:42:47 - pico-train - INFO - Step 23700 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:42:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3048
2025-08-29 23:42:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.57e-05
2025-08-29 23:42:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:43:00 - pico-train - INFO - Step 23725 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:43:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2627
2025-08-29 23:43:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.57e-05
2025-08-29 23:43:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:43:13 - pico-train - INFO - Step 23750 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:43:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2880
2025-08-29 23:43:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.56e-05
2025-08-29 23:43:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:43:25 - pico-train - INFO - Step 23775 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:43:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3205
2025-08-29 23:43:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.56e-05
2025-08-29 23:43:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:43:38 - pico-train - INFO - Step 23800 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:43:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2730
2025-08-29 23:43:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.55e-05
2025-08-29 23:43:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:43:51 - pico-train - INFO - Step 23825 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:43:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2649
2025-08-29 23:43:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.54e-05
2025-08-29 23:43:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:44:03 - pico-train - INFO - Step 23850 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:44:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2840
2025-08-29 23:44:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.54e-05
2025-08-29 23:44:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:44:16 - pico-train - INFO - Step 23875 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:44:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3253
2025-08-29 23:44:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.53e-05
2025-08-29 23:44:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:44:28 - pico-train - INFO - Step 23900 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:44:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3487
2025-08-29 23:44:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.52e-05
2025-08-29 23:44:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:44:41 - pico-train - INFO - Step 23925 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:44:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2998
2025-08-29 23:44:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.52e-05
2025-08-29 23:44:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:44:54 - pico-train - INFO - Step 23950 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:44:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2444
2025-08-29 23:44:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.51e-05
2025-08-29 23:44:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:45:06 - pico-train - INFO - Step 23975 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:45:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2611
2025-08-29 23:45:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.51e-05
2025-08-29 23:45:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:45:18 - pico-train - INFO - Step 24000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:47:14 - pico-train - INFO - Step 24000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:47:14 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.548011467855507e+25
2025-08-29 23:47:17 - pico-train - INFO - Step 24000 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:47:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1774
2025-08-29 23:47:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.50e-05
2025-08-29 23:47:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:47:17 - pico-train - INFO - Step 24000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:47:32 - pico-train - INFO - Step 24025 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:47:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2658
2025-08-29 23:47:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.49e-05
2025-08-29 23:47:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:47:44 - pico-train - INFO - Step 24050 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:47:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2641
2025-08-29 23:47:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.49e-05
2025-08-29 23:47:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:47:57 - pico-train - INFO - Step 24075 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:47:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1837
2025-08-29 23:47:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.48e-05
2025-08-29 23:47:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:48:10 - pico-train - INFO - Step 24100 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:48:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3345
2025-08-29 23:48:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.48e-05
2025-08-29 23:48:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:48:23 - pico-train - INFO - Step 24125 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:48:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2665
2025-08-29 23:48:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.47e-05
2025-08-29 23:48:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:48:35 - pico-train - INFO - Step 24150 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:48:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2894
2025-08-29 23:48:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.46e-05
2025-08-29 23:48:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:48:48 - pico-train - INFO - Step 24175 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:48:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2354
2025-08-29 23:48:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.46e-05
2025-08-29 23:48:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:49:00 - pico-train - INFO - Step 24200 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:49:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2110
2025-08-29 23:49:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.45e-05
2025-08-29 23:49:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:49:13 - pico-train - INFO - Step 24225 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:49:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2512
2025-08-29 23:49:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.44e-05
2025-08-29 23:49:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:49:25 - pico-train - INFO - Step 24250 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:49:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2544
2025-08-29 23:49:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.44e-05
2025-08-29 23:49:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:49:38 - pico-train - INFO - Step 24275 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:49:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2934
2025-08-29 23:49:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.43e-05
2025-08-29 23:49:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:49:51 - pico-train - INFO - Step 24300 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:49:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2608
2025-08-29 23:49:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.43e-05
2025-08-29 23:49:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:50:03 - pico-train - INFO - Step 24325 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:50:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2280
2025-08-29 23:50:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.42e-05
2025-08-29 23:50:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:50:16 - pico-train - INFO - Step 24350 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:50:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2431
2025-08-29 23:50:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.41e-05
2025-08-29 23:50:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:50:29 - pico-train - INFO - Step 24375 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:50:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2120
2025-08-29 23:50:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.41e-05
2025-08-29 23:50:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:50:41 - pico-train - INFO - Step 24400 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:50:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2375
2025-08-29 23:50:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.40e-05
2025-08-29 23:50:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:50:54 - pico-train - INFO - Step 24425 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:50:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3604
2025-08-29 23:50:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.40e-05
2025-08-29 23:50:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:51:07 - pico-train - INFO - Step 24450 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:51:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2451
2025-08-29 23:51:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.39e-05
2025-08-29 23:51:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:51:20 - pico-train - INFO - Step 24475 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:51:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2877
2025-08-29 23:51:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.38e-05
2025-08-29 23:51:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:51:32 - pico-train - INFO - Step 24500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:53:26 - pico-train - INFO - Step 24500 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:53:26 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.937466297559389e+25
2025-08-29 23:53:29 - pico-train - INFO - Step 24500 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:53:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3104
2025-08-29 23:53:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.38e-05
2025-08-29 23:53:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:53:29 - pico-train - INFO - Step 24500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 23:53:44 - pico-train - INFO - Step 24525 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:53:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2830
2025-08-29 23:53:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.37e-05
2025-08-29 23:53:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:53:56 - pico-train - INFO - Step 24550 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:53:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2558
2025-08-29 23:53:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.37e-05
2025-08-29 23:53:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:54:09 - pico-train - INFO - Step 24575 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:54:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2140
2025-08-29 23:54:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.36e-05
2025-08-29 23:54:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:54:22 - pico-train - INFO - Step 24600 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:54:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2546
2025-08-29 23:54:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.35e-05
2025-08-29 23:54:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:54:34 - pico-train - INFO - Step 24625 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:54:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2569
2025-08-29 23:54:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.35e-05
2025-08-29 23:54:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:54:47 - pico-train - INFO - Step 24650 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:54:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2170
2025-08-29 23:54:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.34e-05
2025-08-29 23:54:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:55:00 - pico-train - INFO - Step 24675 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:55:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2187
2025-08-29 23:55:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.33e-05
2025-08-29 23:55:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:55:12 - pico-train - INFO - Step 24700 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:55:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2933
2025-08-29 23:55:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.33e-05
2025-08-29 23:55:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:55:25 - pico-train - INFO - Step 24725 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:55:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2359
2025-08-29 23:55:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.32e-05
2025-08-29 23:55:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:55:38 - pico-train - INFO - Step 24750 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:55:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2789
2025-08-29 23:55:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.32e-05
2025-08-29 23:55:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:55:50 - pico-train - INFO - Step 24775 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:55:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3001
2025-08-29 23:55:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.31e-05
2025-08-29 23:55:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:56:03 - pico-train - INFO - Step 24800 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:56:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2419
2025-08-29 23:56:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.30e-05
2025-08-29 23:56:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:56:16 - pico-train - INFO - Step 24825 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:56:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2251
2025-08-29 23:56:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.30e-05
2025-08-29 23:56:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:56:28 - pico-train - INFO - Step 24850 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:56:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2023
2025-08-29 23:56:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.29e-05
2025-08-29 23:56:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:56:41 - pico-train - INFO - Step 24875 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:56:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2911
2025-08-29 23:56:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.29e-05
2025-08-29 23:56:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:56:54 - pico-train - INFO - Step 24900 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:56:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2723
2025-08-29 23:56:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.28e-05
2025-08-29 23:56:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:57:07 - pico-train - INFO - Step 24925 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:57:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2993
2025-08-29 23:57:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.27e-05
2025-08-29 23:57:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:57:19 - pico-train - INFO - Step 24950 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:57:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2579
2025-08-29 23:57:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.27e-05
2025-08-29 23:57:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:57:32 - pico-train - INFO - Step 24975 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:57:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2620
2025-08-29 23:57:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.26e-05
2025-08-29 23:57:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:57:44 - pico-train - INFO - Step 25000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 23:59:48 - pico-train - INFO - Step 25000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 23:59:48 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.4105304760288245e+25
2025-08-29 23:59:49 - pico-train - INFO - Step 25000 -- ๐Ÿ”„ Training Metrics
2025-08-29 23:59:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2956
2025-08-29 23:59:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.25e-05
2025-08-29 23:59:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 23:59:49 - pico-train - INFO - Step 25000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:00:04 - pico-train - INFO - Step 25025 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:00:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2348
2025-08-30 00:00:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.25e-05
2025-08-30 00:00:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:00:17 - pico-train - INFO - Step 25050 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:00:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2363
2025-08-30 00:00:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.24e-05
2025-08-30 00:00:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:00:30 - pico-train - INFO - Step 25075 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:00:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2567
2025-08-30 00:00:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.24e-05
2025-08-30 00:00:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:00:43 - pico-train - INFO - Step 25100 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:00:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2186
2025-08-30 00:00:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.23e-05
2025-08-30 00:00:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:00:56 - pico-train - INFO - Step 25125 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:00:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2886
2025-08-30 00:00:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.22e-05
2025-08-30 00:00:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:01:08 - pico-train - INFO - Step 25150 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:01:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2310
2025-08-30 00:01:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.22e-05
2025-08-30 00:01:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:01:21 - pico-train - INFO - Step 25175 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:01:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3884
2025-08-30 00:01:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.21e-05
2025-08-30 00:01:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:01:34 - pico-train - INFO - Step 25200 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:01:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2232
2025-08-30 00:01:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.21e-05
2025-08-30 00:01:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:01:46 - pico-train - INFO - Step 25225 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:01:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2254
2025-08-30 00:01:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.20e-05
2025-08-30 00:01:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:01:59 - pico-train - INFO - Step 25250 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:01:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2140
2025-08-30 00:01:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.19e-05
2025-08-30 00:01:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:02:12 - pico-train - INFO - Step 25275 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:02:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3619
2025-08-30 00:02:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.19e-05
2025-08-30 00:02:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:02:24 - pico-train - INFO - Step 25300 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:02:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2660
2025-08-30 00:02:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.18e-05
2025-08-30 00:02:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:02:37 - pico-train - INFO - Step 25325 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:02:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1959
2025-08-30 00:02:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.18e-05
2025-08-30 00:02:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:02:49 - pico-train - INFO - Step 25350 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:02:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2983
2025-08-30 00:02:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.17e-05
2025-08-30 00:02:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:03:02 - pico-train - INFO - Step 25375 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:03:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2441
2025-08-30 00:03:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.16e-05
2025-08-30 00:03:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:03:15 - pico-train - INFO - Step 25400 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:03:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2454
2025-08-30 00:03:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.16e-05
2025-08-30 00:03:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:03:28 - pico-train - INFO - Step 25425 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:03:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2099
2025-08-30 00:03:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.15e-05
2025-08-30 00:03:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:03:40 - pico-train - INFO - Step 25450 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:03:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1991
2025-08-30 00:03:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.15e-05
2025-08-30 00:03:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:03:53 - pico-train - INFO - Step 25475 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:03:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1905
2025-08-30 00:03:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.14e-05
2025-08-30 00:03:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:04:05 - pico-train - INFO - Step 25500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:06:01 - pico-train - INFO - Step 25500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:06:01 - pico-train - INFO - โ””โ”€โ”€ paloma: 5.167340298104552e+25
2025-08-30 00:06:03 - pico-train - INFO - Step 25500 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:06:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2849
2025-08-30 00:06:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.13e-05
2025-08-30 00:06:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:06:03 - pico-train - INFO - Step 25500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:06:19 - pico-train - INFO - Step 25525 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:06:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2454
2025-08-30 00:06:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.13e-05
2025-08-30 00:06:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:06:32 - pico-train - INFO - Step 25550 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:06:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2327
2025-08-30 00:06:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.12e-05
2025-08-30 00:06:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:06:45 - pico-train - INFO - Step 25575 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:06:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2783
2025-08-30 00:06:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.11e-05
2025-08-30 00:06:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:06:57 - pico-train - INFO - Step 25600 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:06:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1487
2025-08-30 00:06:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.11e-05
2025-08-30 00:06:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:07:11 - pico-train - INFO - Step 25625 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:07:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3194
2025-08-30 00:07:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.10e-05
2025-08-30 00:07:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:07:24 - pico-train - INFO - Step 25650 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:07:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2920
2025-08-30 00:07:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.10e-05
2025-08-30 00:07:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:07:37 - pico-train - INFO - Step 25675 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:07:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2623
2025-08-30 00:07:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.09e-05
2025-08-30 00:07:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:07:49 - pico-train - INFO - Step 25700 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:07:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2687
2025-08-30 00:07:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.08e-05
2025-08-30 00:07:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:08:02 - pico-train - INFO - Step 25725 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:08:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2595
2025-08-30 00:08:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.08e-05
2025-08-30 00:08:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:08:15 - pico-train - INFO - Step 25750 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:08:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2781
2025-08-30 00:08:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.07e-05
2025-08-30 00:08:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:08:27 - pico-train - INFO - Step 25775 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:08:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2089
2025-08-30 00:08:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.07e-05
2025-08-30 00:08:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:08:40 - pico-train - INFO - Step 25800 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:08:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2729
2025-08-30 00:08:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.06e-05
2025-08-30 00:08:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:08:53 - pico-train - INFO - Step 25825 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:08:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2478
2025-08-30 00:08:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.05e-05
2025-08-30 00:08:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:09:05 - pico-train - INFO - Step 25850 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:09:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2238
2025-08-30 00:09:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.05e-05
2025-08-30 00:09:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:09:18 - pico-train - INFO - Step 25875 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:09:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2437
2025-08-30 00:09:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.04e-05
2025-08-30 00:09:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:09:31 - pico-train - INFO - Step 25900 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:09:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2743
2025-08-30 00:09:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.04e-05
2025-08-30 00:09:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:09:43 - pico-train - INFO - Step 25925 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:09:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2143
2025-08-30 00:09:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.03e-05
2025-08-30 00:09:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:09:56 - pico-train - INFO - Step 25950 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:09:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1636
2025-08-30 00:09:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.02e-05
2025-08-30 00:09:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:10:08 - pico-train - INFO - Step 25975 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:10:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2028
2025-08-30 00:10:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.02e-05
2025-08-30 00:10:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:10:21 - pico-train - INFO - Step 26000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:12:22 - pico-train - INFO - Step 26000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:12:22 - pico-train - INFO - โ””โ”€โ”€ paloma: 5.374017629915336e+25
2025-08-30 00:12:25 - pico-train - INFO - Step 26000 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:12:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3023
2025-08-30 00:12:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.01e-05
2025-08-30 00:12:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:12:25 - pico-train - INFO - Step 26000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:12:40 - pico-train - INFO - Step 26025 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:12:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2060
2025-08-30 00:12:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.01e-05
2025-08-30 00:12:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:12:52 - pico-train - INFO - Step 26050 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:12:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2001
2025-08-30 00:12:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 00:12:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:13:05 - pico-train - INFO - Step 26075 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:13:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2546
2025-08-30 00:13:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.99e-05
2025-08-30 00:13:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:13:18 - pico-train - INFO - Step 26100 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:13:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1986
2025-08-30 00:13:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.99e-05
2025-08-30 00:13:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:13:32 - pico-train - INFO - Step 26125 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:13:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2415
2025-08-30 00:13:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.98e-05
2025-08-30 00:13:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:13:44 - pico-train - INFO - Step 26150 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:13:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2411
2025-08-30 00:13:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.98e-05
2025-08-30 00:13:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:13:57 - pico-train - INFO - Step 26175 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:13:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1756
2025-08-30 00:13:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.97e-05
2025-08-30 00:13:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:14:10 - pico-train - INFO - Step 26200 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:14:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1444
2025-08-30 00:14:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.96e-05
2025-08-30 00:14:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:14:22 - pico-train - INFO - Step 26225 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:14:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3335
2025-08-30 00:14:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.96e-05
2025-08-30 00:14:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:14:35 - pico-train - INFO - Step 26250 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:14:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1491
2025-08-30 00:14:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.95e-05
2025-08-30 00:14:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:14:48 - pico-train - INFO - Step 26275 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:14:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1959
2025-08-30 00:14:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.95e-05
2025-08-30 00:14:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:15:00 - pico-train - INFO - Step 26300 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:15:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2494
2025-08-30 00:15:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.94e-05
2025-08-30 00:15:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:15:13 - pico-train - INFO - Step 26325 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:15:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2893
2025-08-30 00:15:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.93e-05
2025-08-30 00:15:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:15:26 - pico-train - INFO - Step 26350 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:15:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2732
2025-08-30 00:15:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.93e-05
2025-08-30 00:15:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:15:38 - pico-train - INFO - Step 26375 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:15:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2804
2025-08-30 00:15:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.92e-05
2025-08-30 00:15:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:15:51 - pico-train - INFO - Step 26400 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:15:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2117
2025-08-30 00:15:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.92e-05
2025-08-30 00:15:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:16:04 - pico-train - INFO - Step 26425 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:16:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2055
2025-08-30 00:16:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.91e-05
2025-08-30 00:16:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:16:17 - pico-train - INFO - Step 26450 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:16:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3085
2025-08-30 00:16:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.90e-05
2025-08-30 00:16:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:16:29 - pico-train - INFO - Step 26475 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:16:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1870
2025-08-30 00:16:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.90e-05
2025-08-30 00:16:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:16:41 - pico-train - INFO - Step 26500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:18:38 - pico-train - INFO - Step 26500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:18:38 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.002764153086805e+25
2025-08-30 00:18:39 - pico-train - INFO - Step 26500 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:18:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2219
2025-08-30 00:18:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.89e-05
2025-08-30 00:18:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:18:39 - pico-train - INFO - Step 26500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:18:54 - pico-train - INFO - Step 26525 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:18:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1945
2025-08-30 00:18:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.89e-05
2025-08-30 00:18:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:19:07 - pico-train - INFO - Step 26550 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:19:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1917
2025-08-30 00:19:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.88e-05
2025-08-30 00:19:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:19:20 - pico-train - INFO - Step 26575 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:19:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1611
2025-08-30 00:19:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.87e-05
2025-08-30 00:19:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:19:32 - pico-train - INFO - Step 26600 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:19:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2254
2025-08-30 00:19:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.87e-05
2025-08-30 00:19:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:19:45 - pico-train - INFO - Step 26625 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:19:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2633
2025-08-30 00:19:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.86e-05
2025-08-30 00:19:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:19:58 - pico-train - INFO - Step 26650 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:19:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2096
2025-08-30 00:19:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.86e-05
2025-08-30 00:19:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:20:10 - pico-train - INFO - Step 26675 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:20:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2665
2025-08-30 00:20:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.85e-05
2025-08-30 00:20:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:20:23 - pico-train - INFO - Step 26700 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:20:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2534
2025-08-30 00:20:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.85e-05
2025-08-30 00:20:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:20:36 - pico-train - INFO - Step 26725 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:20:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2207
2025-08-30 00:20:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.84e-05
2025-08-30 00:20:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:20:48 - pico-train - INFO - Step 26750 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:20:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2923
2025-08-30 00:20:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.83e-05
2025-08-30 00:20:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:21:01 - pico-train - INFO - Step 26775 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:21:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2678
2025-08-30 00:21:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.83e-05
2025-08-30 00:21:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:21:14 - pico-train - INFO - Step 26800 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:21:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2139
2025-08-30 00:21:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.82e-05
2025-08-30 00:21:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:21:26 - pico-train - INFO - Step 26825 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:21:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1680
2025-08-30 00:21:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.82e-05
2025-08-30 00:21:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:21:39 - pico-train - INFO - Step 26850 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:21:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1858
2025-08-30 00:21:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.81e-05
2025-08-30 00:21:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:21:52 - pico-train - INFO - Step 26875 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:21:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1172
2025-08-30 00:21:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.80e-05
2025-08-30 00:21:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:22:05 - pico-train - INFO - Step 26900 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:22:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2332
2025-08-30 00:22:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.80e-05
2025-08-30 00:22:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:22:17 - pico-train - INFO - Step 26925 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:22:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2099
2025-08-30 00:22:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.79e-05
2025-08-30 00:22:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:22:30 - pico-train - INFO - Step 26950 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:22:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2551
2025-08-30 00:22:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.79e-05
2025-08-30 00:22:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:22:43 - pico-train - INFO - Step 26975 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:22:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2033
2025-08-30 00:22:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.78e-05
2025-08-30 00:22:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:22:55 - pico-train - INFO - Step 27000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:24:53 - pico-train - INFO - Step 27000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:24:53 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.722641414937935e+25
2025-08-30 00:24:55 - pico-train - INFO - Step 27000 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:24:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2512
2025-08-30 00:24:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.77e-05
2025-08-30 00:24:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:24:55 - pico-train - INFO - Step 27000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:25:09 - pico-train - INFO - Step 27025 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:25:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2686
2025-08-30 00:25:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.77e-05
2025-08-30 00:25:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:25:22 - pico-train - INFO - Step 27050 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:25:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1854
2025-08-30 00:25:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.76e-05
2025-08-30 00:25:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:25:35 - pico-train - INFO - Step 27075 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:25:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1974
2025-08-30 00:25:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.76e-05
2025-08-30 00:25:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:25:47 - pico-train - INFO - Step 27100 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:25:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2597
2025-08-30 00:25:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.75e-05
2025-08-30 00:25:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:26:00 - pico-train - INFO - Step 27125 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:26:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2280
2025-08-30 00:26:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.74e-05
2025-08-30 00:26:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:26:13 - pico-train - INFO - Step 27150 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:26:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2126
2025-08-30 00:26:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.74e-05
2025-08-30 00:26:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:26:26 - pico-train - INFO - Step 27175 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:26:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2233
2025-08-30 00:26:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.73e-05
2025-08-30 00:26:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:26:38 - pico-train - INFO - Step 27200 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:26:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1393
2025-08-30 00:26:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.73e-05
2025-08-30 00:26:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:26:51 - pico-train - INFO - Step 27225 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:26:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3226
2025-08-30 00:26:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.72e-05
2025-08-30 00:26:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:27:03 - pico-train - INFO - Step 27250 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:27:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1570
2025-08-30 00:27:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.72e-05
2025-08-30 00:27:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:27:16 - pico-train - INFO - Step 27275 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:27:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2252
2025-08-30 00:27:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.71e-05
2025-08-30 00:27:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:27:29 - pico-train - INFO - Step 27300 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:27:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1647
2025-08-30 00:27:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.70e-05
2025-08-30 00:27:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:27:41 - pico-train - INFO - Step 27325 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:27:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1219
2025-08-30 00:27:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.70e-05
2025-08-30 00:27:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:27:54 - pico-train - INFO - Step 27350 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:27:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2250
2025-08-30 00:27:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.69e-05
2025-08-30 00:27:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:28:06 - pico-train - INFO - Step 27375 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:28:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1883
2025-08-30 00:28:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.69e-05
2025-08-30 00:28:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:28:19 - pico-train - INFO - Step 27400 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:28:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2074
2025-08-30 00:28:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.68e-05
2025-08-30 00:28:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:28:31 - pico-train - INFO - Step 27425 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:28:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1881
2025-08-30 00:28:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.68e-05
2025-08-30 00:28:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:28:44 - pico-train - INFO - Step 27450 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:28:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1977
2025-08-30 00:28:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.67e-05
2025-08-30 00:28:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:28:57 - pico-train - INFO - Step 27475 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:28:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2394
2025-08-30 00:28:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.66e-05
2025-08-30 00:28:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:29:09 - pico-train - INFO - Step 27500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:31:15 - pico-train - INFO - Step 27500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:31:15 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.0733810806931749e+26
2025-08-30 00:31:19 - pico-train - INFO - Step 27500 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:31:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2657
2025-08-30 00:31:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.66e-05
2025-08-30 00:31:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:31:19 - pico-train - INFO - Step 27500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:31:34 - pico-train - INFO - Step 27525 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:31:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1848
2025-08-30 00:31:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.65e-05
2025-08-30 00:31:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:31:46 - pico-train - INFO - Step 27550 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:31:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1677
2025-08-30 00:31:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.65e-05
2025-08-30 00:31:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:31:59 - pico-train - INFO - Step 27575 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:31:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2103
2025-08-30 00:31:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.64e-05
2025-08-30 00:31:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:32:12 - pico-train - INFO - Step 27600 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:32:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2026
2025-08-30 00:32:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.63e-05
2025-08-30 00:32:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:32:25 - pico-train - INFO - Step 27625 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:32:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1656
2025-08-30 00:32:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.63e-05
2025-08-30 00:32:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:32:38 - pico-train - INFO - Step 27650 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:32:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1600
2025-08-30 00:32:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.62e-05
2025-08-30 00:32:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:32:50 - pico-train - INFO - Step 27675 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:32:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2803
2025-08-30 00:32:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.62e-05
2025-08-30 00:32:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:33:03 - pico-train - INFO - Step 27700 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:33:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2837
2025-08-30 00:33:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.61e-05
2025-08-30 00:33:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:33:15 - pico-train - INFO - Step 27725 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:33:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1344
2025-08-30 00:33:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.61e-05
2025-08-30 00:33:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:33:28 - pico-train - INFO - Step 27750 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:33:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2066
2025-08-30 00:33:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.60e-05
2025-08-30 00:33:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:33:41 - pico-train - INFO - Step 27775 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:33:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1848
2025-08-30 00:33:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.59e-05
2025-08-30 00:33:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:33:53 - pico-train - INFO - Step 27800 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:33:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2565
2025-08-30 00:33:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.59e-05
2025-08-30 00:33:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:34:06 - pico-train - INFO - Step 27825 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:34:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2278
2025-08-30 00:34:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.58e-05
2025-08-30 00:34:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:34:19 - pico-train - INFO - Step 27850 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:34:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2249
2025-08-30 00:34:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.58e-05
2025-08-30 00:34:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:34:32 - pico-train - INFO - Step 27875 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:34:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1730
2025-08-30 00:34:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.57e-05
2025-08-30 00:34:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:34:44 - pico-train - INFO - Step 27900 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:34:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1503
2025-08-30 00:34:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.57e-05
2025-08-30 00:34:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:34:57 - pico-train - INFO - Step 27925 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:34:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1955
2025-08-30 00:34:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.56e-05
2025-08-30 00:34:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:35:09 - pico-train - INFO - Step 27950 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:35:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1747
2025-08-30 00:35:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.55e-05
2025-08-30 00:35:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:35:22 - pico-train - INFO - Step 27975 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:35:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2607
2025-08-30 00:35:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.55e-05
2025-08-30 00:35:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:35:34 - pico-train - INFO - Step 28000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:37:31 - pico-train - INFO - Step 28000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:37:31 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.2438803536426585e+26
2025-08-30 00:37:34 - pico-train - INFO - Step 28000 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:37:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2990
2025-08-30 00:37:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.54e-05
2025-08-30 00:37:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:37:34 - pico-train - INFO - Step 28000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:37:49 - pico-train - INFO - Step 28025 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:37:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1938
2025-08-30 00:37:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.54e-05
2025-08-30 00:37:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:38:01 - pico-train - INFO - Step 28050 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:38:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2467
2025-08-30 00:38:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.53e-05
2025-08-30 00:38:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:38:14 - pico-train - INFO - Step 28075 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:38:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1609
2025-08-30 00:38:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.53e-05
2025-08-30 00:38:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:38:26 - pico-train - INFO - Step 28100 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:38:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1691
2025-08-30 00:38:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.52e-05
2025-08-30 00:38:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:38:39 - pico-train - INFO - Step 28125 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:38:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2517
2025-08-30 00:38:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.52e-05
2025-08-30 00:38:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:38:52 - pico-train - INFO - Step 28150 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:38:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2758
2025-08-30 00:38:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.51e-05
2025-08-30 00:38:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:39:05 - pico-train - INFO - Step 28175 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:39:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2979
2025-08-30 00:39:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.50e-05
2025-08-30 00:39:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:39:17 - pico-train - INFO - Step 28200 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:39:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1294
2025-08-30 00:39:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.50e-05
2025-08-30 00:39:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:39:30 - pico-train - INFO - Step 28225 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:39:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1557
2025-08-30 00:39:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.49e-05
2025-08-30 00:39:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:39:43 - pico-train - INFO - Step 28250 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:39:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2283
2025-08-30 00:39:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.49e-05
2025-08-30 00:39:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:39:56 - pico-train - INFO - Step 28275 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:39:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2104
2025-08-30 00:39:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.48e-05
2025-08-30 00:39:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:40:08 - pico-train - INFO - Step 28300 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:40:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2633
2025-08-30 00:40:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.48e-05
2025-08-30 00:40:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:40:21 - pico-train - INFO - Step 28325 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:40:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1844
2025-08-30 00:40:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.47e-05
2025-08-30 00:40:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:40:34 - pico-train - INFO - Step 28350 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:40:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1349
2025-08-30 00:40:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.46e-05
2025-08-30 00:40:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:40:46 - pico-train - INFO - Step 28375 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:40:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2638
2025-08-30 00:40:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.46e-05
2025-08-30 00:40:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:40:59 - pico-train - INFO - Step 28400 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:40:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1960
2025-08-30 00:40:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.45e-05
2025-08-30 00:40:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:41:11 - pico-train - INFO - Step 28425 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:41:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2582
2025-08-30 00:41:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.45e-05
2025-08-30 00:41:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:41:24 - pico-train - INFO - Step 28450 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:41:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2071
2025-08-30 00:41:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.44e-05
2025-08-30 00:41:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:41:37 - pico-train - INFO - Step 28475 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:41:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2106
2025-08-30 00:41:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.44e-05
2025-08-30 00:41:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:41:49 - pico-train - INFO - Step 28500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:43:48 - pico-train - INFO - Step 28500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:43:48 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.3653691992013197e+26
2025-08-30 00:43:51 - pico-train - INFO - Step 28500 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:43:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2141
2025-08-30 00:43:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.43e-05
2025-08-30 00:43:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:43:51 - pico-train - INFO - Step 28500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:44:06 - pico-train - INFO - Step 28525 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:44:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1702
2025-08-30 00:44:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.43e-05
2025-08-30 00:44:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:44:19 - pico-train - INFO - Step 28550 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:44:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1650
2025-08-30 00:44:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.42e-05
2025-08-30 00:44:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:44:31 - pico-train - INFO - Step 28575 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:44:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1357
2025-08-30 00:44:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.41e-05
2025-08-30 00:44:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:44:44 - pico-train - INFO - Step 28600 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:44:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2757
2025-08-30 00:44:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.41e-05
2025-08-30 00:44:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:44:57 - pico-train - INFO - Step 28625 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:44:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1983
2025-08-30 00:44:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.40e-05
2025-08-30 00:44:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:45:09 - pico-train - INFO - Step 28650 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:45:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1417
2025-08-30 00:45:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.40e-05
2025-08-30 00:45:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:45:22 - pico-train - INFO - Step 28675 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:45:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1524
2025-08-30 00:45:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.39e-05
2025-08-30 00:45:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:45:34 - pico-train - INFO - Step 28700 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:45:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2928
2025-08-30 00:45:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.39e-05
2025-08-30 00:45:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:45:47 - pico-train - INFO - Step 28725 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:45:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1187
2025-08-30 00:45:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.38e-05
2025-08-30 00:45:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:46:00 - pico-train - INFO - Step 28750 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:46:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1926
2025-08-30 00:46:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.38e-05
2025-08-30 00:46:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:46:12 - pico-train - INFO - Step 28775 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:46:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1810
2025-08-30 00:46:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.37e-05
2025-08-30 00:46:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:46:25 - pico-train - INFO - Step 28800 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:46:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1615
2025-08-30 00:46:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.37e-05
2025-08-30 00:46:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:46:37 - pico-train - INFO - Step 28825 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:46:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1871
2025-08-30 00:46:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.36e-05
2025-08-30 00:46:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:46:50 - pico-train - INFO - Step 28850 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:46:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1287
2025-08-30 00:46:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.35e-05
2025-08-30 00:46:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:47:02 - pico-train - INFO - Step 28875 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:47:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1008
2025-08-30 00:47:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.35e-05
2025-08-30 00:47:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:47:15 - pico-train - INFO - Step 28900 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:47:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2167
2025-08-30 00:47:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.34e-05
2025-08-30 00:47:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:47:28 - pico-train - INFO - Step 28925 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:47:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1657
2025-08-30 00:47:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.34e-05
2025-08-30 00:47:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:47:40 - pico-train - INFO - Step 28950 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:47:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2003
2025-08-30 00:47:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.33e-05
2025-08-30 00:47:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:47:53 - pico-train - INFO - Step 28975 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:47:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2189
2025-08-30 00:47:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.33e-05
2025-08-30 00:47:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:48:05 - pico-train - INFO - Step 29000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:50:04 - pico-train - INFO - Step 29000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:50:04 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.4417132887690374e+26
2025-08-30 00:50:06 - pico-train - INFO - Step 29000 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:50:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1592
2025-08-30 00:50:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.32e-05
2025-08-30 00:50:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:50:06 - pico-train - INFO - Step 29000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:50:22 - pico-train - INFO - Step 29025 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:50:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2133
2025-08-30 00:50:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.32e-05
2025-08-30 00:50:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:50:35 - pico-train - INFO - Step 29050 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:50:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1536
2025-08-30 00:50:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.31e-05
2025-08-30 00:50:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:50:47 - pico-train - INFO - Step 29075 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:50:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1872
2025-08-30 00:50:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.31e-05
2025-08-30 00:50:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:51:00 - pico-train - INFO - Step 29100 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:51:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1469
2025-08-30 00:51:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.30e-05
2025-08-30 00:51:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:51:13 - pico-train - INFO - Step 29125 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:51:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2113
2025-08-30 00:51:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.29e-05
2025-08-30 00:51:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:51:26 - pico-train - INFO - Step 29150 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:51:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1172
2025-08-30 00:51:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.29e-05
2025-08-30 00:51:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:51:38 - pico-train - INFO - Step 29175 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:51:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1350
2025-08-30 00:51:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.28e-05
2025-08-30 00:51:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:51:51 - pico-train - INFO - Step 29200 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:51:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2083
2025-08-30 00:51:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.28e-05
2025-08-30 00:51:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:52:03 - pico-train - INFO - Step 29225 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:52:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3192
2025-08-30 00:52:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.27e-05
2025-08-30 00:52:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:52:16 - pico-train - INFO - Step 29250 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:52:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1807
2025-08-30 00:52:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.27e-05
2025-08-30 00:52:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:52:29 - pico-train - INFO - Step 29275 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:52:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1737
2025-08-30 00:52:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.26e-05
2025-08-30 00:52:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:52:41 - pico-train - INFO - Step 29300 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:52:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0887
2025-08-30 00:52:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.26e-05
2025-08-30 00:52:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:52:54 - pico-train - INFO - Step 29325 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:52:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2875
2025-08-30 00:52:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.25e-05
2025-08-30 00:52:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:53:06 - pico-train - INFO - Step 29350 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:53:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2426
2025-08-30 00:53:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.25e-05
2025-08-30 00:53:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:53:19 - pico-train - INFO - Step 29375 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:53:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1058
2025-08-30 00:53:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.24e-05
2025-08-30 00:53:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:53:32 - pico-train - INFO - Step 29400 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:53:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1215
2025-08-30 00:53:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.24e-05
2025-08-30 00:53:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:53:44 - pico-train - INFO - Step 29425 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:53:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2543
2025-08-30 00:53:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.23e-05
2025-08-30 00:53:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:53:57 - pico-train - INFO - Step 29450 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:53:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1715
2025-08-30 00:53:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.23e-05
2025-08-30 00:53:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:54:09 - pico-train - INFO - Step 29475 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:54:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1795
2025-08-30 00:54:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.22e-05
2025-08-30 00:54:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:54:21 - pico-train - INFO - Step 29500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 00:56:18 - pico-train - INFO - Step 29500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 00:56:18 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.7095266725777237e+26
2025-08-30 00:56:21 - pico-train - INFO - Step 29500 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:56:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1663
2025-08-30 00:56:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.21e-05
2025-08-30 00:56:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:56:21 - pico-train - INFO - Step 29500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 00:56:36 - pico-train - INFO - Step 29525 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:56:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1521
2025-08-30 00:56:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.21e-05
2025-08-30 00:56:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:56:48 - pico-train - INFO - Step 29550 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:56:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0880
2025-08-30 00:56:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.20e-05
2025-08-30 00:56:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:57:01 - pico-train - INFO - Step 29575 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:57:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1806
2025-08-30 00:57:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.20e-05
2025-08-30 00:57:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:57:13 - pico-train - INFO - Step 29600 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:57:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3067
2025-08-30 00:57:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.19e-05
2025-08-30 00:57:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:57:27 - pico-train - INFO - Step 29625 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:57:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2586
2025-08-30 00:57:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.19e-05
2025-08-30 00:57:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:57:40 - pico-train - INFO - Step 29650 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:57:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1478
2025-08-30 00:57:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.18e-05
2025-08-30 00:57:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:57:52 - pico-train - INFO - Step 29675 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:57:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1101
2025-08-30 00:57:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.18e-05
2025-08-30 00:57:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:58:05 - pico-train - INFO - Step 29700 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:58:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1873
2025-08-30 00:58:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.17e-05
2025-08-30 00:58:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:58:17 - pico-train - INFO - Step 29725 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:58:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0894
2025-08-30 00:58:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.17e-05
2025-08-30 00:58:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:58:30 - pico-train - INFO - Step 29750 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:58:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1793
2025-08-30 00:58:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.16e-05
2025-08-30 00:58:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:58:42 - pico-train - INFO - Step 29775 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:58:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1858
2025-08-30 00:58:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.16e-05
2025-08-30 00:58:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:58:55 - pico-train - INFO - Step 29800 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:58:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1729
2025-08-30 00:58:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.15e-05
2025-08-30 00:58:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:59:07 - pico-train - INFO - Step 29825 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:59:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1856
2025-08-30 00:59:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.15e-05
2025-08-30 00:59:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:59:20 - pico-train - INFO - Step 29850 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:59:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1591
2025-08-30 00:59:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.14e-05
2025-08-30 00:59:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:59:33 - pico-train - INFO - Step 29875 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:59:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2964
2025-08-30 00:59:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.14e-05
2025-08-30 00:59:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:59:45 - pico-train - INFO - Step 29900 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:59:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2506
2025-08-30 00:59:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.13e-05
2025-08-30 00:59:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 00:59:58 - pico-train - INFO - Step 29925 -- ๐Ÿ”„ Training Metrics
2025-08-30 00:59:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1630
2025-08-30 00:59:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.13e-05
2025-08-30 00:59:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:00:11 - pico-train - INFO - Step 29950 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:00:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2033
2025-08-30 01:00:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.12e-05
2025-08-30 01:00:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:00:23 - pico-train - INFO - Step 29975 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:00:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0846
2025-08-30 01:00:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.12e-05
2025-08-30 01:00:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:00:35 - pico-train - INFO - Step 30000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 01:02:29 - pico-train - INFO - Step 30000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 01:02:29 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.0463060977945524e+26
2025-08-30 01:02:31 - pico-train - INFO - Step 30000 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:02:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1682
2025-08-30 01:02:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.11e-05
2025-08-30 01:02:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:02:31 - pico-train - INFO - Step 30000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 01:02:46 - pico-train - INFO - Step 30025 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:02:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2143
2025-08-30 01:02:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.11e-05
2025-08-30 01:02:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:02:58 - pico-train - INFO - Step 30050 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:02:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1476
2025-08-30 01:02:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.10e-05
2025-08-30 01:02:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:03:11 - pico-train - INFO - Step 30075 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:03:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1530
2025-08-30 01:03:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.10e-05
2025-08-30 01:03:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:03:23 - pico-train - INFO - Step 30100 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:03:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1518
2025-08-30 01:03:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.09e-05
2025-08-30 01:03:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:03:37 - pico-train - INFO - Step 30125 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:03:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1752
2025-08-30 01:03:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.09e-05
2025-08-30 01:03:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:03:49 - pico-train - INFO - Step 30150 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:03:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2413
2025-08-30 01:03:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.08e-05
2025-08-30 01:03:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:04:02 - pico-train - INFO - Step 30175 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:04:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2624
2025-08-30 01:04:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.08e-05
2025-08-30 01:04:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:04:14 - pico-train - INFO - Step 30200 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:04:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2339
2025-08-30 01:04:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.07e-05
2025-08-30 01:04:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:04:27 - pico-train - INFO - Step 30225 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:04:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1617
2025-08-30 01:04:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.07e-05
2025-08-30 01:04:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:04:40 - pico-train - INFO - Step 30250 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:04:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1225
2025-08-30 01:04:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.06e-05
2025-08-30 01:04:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:04:52 - pico-train - INFO - Step 30275 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:04:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2344
2025-08-30 01:04:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.06e-05
2025-08-30 01:04:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:05:05 - pico-train - INFO - Step 30300 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:05:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1970
2025-08-30 01:05:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.05e-05
2025-08-30 01:05:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:05:18 - pico-train - INFO - Step 30325 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:05:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1580
2025-08-30 01:05:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.05e-05
2025-08-30 01:05:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:05:30 - pico-train - INFO - Step 30350 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:05:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2210
2025-08-30 01:05:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.04e-05
2025-08-30 01:05:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:05:43 - pico-train - INFO - Step 30375 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:05:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1991
2025-08-30 01:05:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.04e-05
2025-08-30 01:05:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:05:56 - pico-train - INFO - Step 30400 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:05:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2500
2025-08-30 01:05:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.03e-05
2025-08-30 01:05:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:06:08 - pico-train - INFO - Step 30425 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:06:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2252
2025-08-30 01:06:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.03e-05
2025-08-30 01:06:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:06:21 - pico-train - INFO - Step 30450 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:06:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2010
2025-08-30 01:06:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.02e-05
2025-08-30 01:06:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:06:33 - pico-train - INFO - Step 30475 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:06:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1309
2025-08-30 01:06:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.02e-05
2025-08-30 01:06:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:06:46 - pico-train - INFO - Step 30500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 01:08:46 - pico-train - INFO - Step 30500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 01:08:46 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.2542988490213366e+26
2025-08-30 01:08:49 - pico-train - INFO - Step 30500 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:08:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1853
2025-08-30 01:08:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-05
2025-08-30 01:08:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:08:49 - pico-train - INFO - Step 30500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 01:09:04 - pico-train - INFO - Step 30525 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:09:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1358
2025-08-30 01:09:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-05
2025-08-30 01:09:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:09:17 - pico-train - INFO - Step 30550 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:09:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1170
2025-08-30 01:09:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-05
2025-08-30 01:09:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:09:29 - pico-train - INFO - Step 30575 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:09:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1497
2025-08-30 01:09:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.96e-06
2025-08-30 01:09:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:09:42 - pico-train - INFO - Step 30600 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:09:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2103
2025-08-30 01:09:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.91e-06
2025-08-30 01:09:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:09:54 - pico-train - INFO - Step 30625 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:09:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1137
2025-08-30 01:09:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.86e-06
2025-08-30 01:09:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:10:07 - pico-train - INFO - Step 30650 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:10:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1631
2025-08-30 01:10:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.81e-06
2025-08-30 01:10:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:10:19 - pico-train - INFO - Step 30675 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:10:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1651
2025-08-30 01:10:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.76e-06
2025-08-30 01:10:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:10:32 - pico-train - INFO - Step 30700 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:10:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1969
2025-08-30 01:10:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.72e-06
2025-08-30 01:10:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:10:45 - pico-train - INFO - Step 30725 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:10:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1007
2025-08-30 01:10:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.67e-06
2025-08-30 01:10:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:10:58 - pico-train - INFO - Step 30750 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:10:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1865
2025-08-30 01:10:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.62e-06
2025-08-30 01:10:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:11:10 - pico-train - INFO - Step 30775 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:11:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1659
2025-08-30 01:11:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.57e-06
2025-08-30 01:11:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:11:23 - pico-train - INFO - Step 30800 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:11:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2281
2025-08-30 01:11:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.52e-06
2025-08-30 01:11:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:11:36 - pico-train - INFO - Step 30825 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:11:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1316
2025-08-30 01:11:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.47e-06
2025-08-30 01:11:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:11:48 - pico-train - INFO - Step 30850 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:11:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2135
2025-08-30 01:11:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.43e-06
2025-08-30 01:11:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:12:01 - pico-train - INFO - Step 30875 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:12:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2395
2025-08-30 01:12:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.38e-06
2025-08-30 01:12:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:12:13 - pico-train - INFO - Step 30900 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:12:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2277
2025-08-30 01:12:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.33e-06
2025-08-30 01:12:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:12:26 - pico-train - INFO - Step 30925 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:12:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1863
2025-08-30 01:12:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.28e-06
2025-08-30 01:12:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:12:39 - pico-train - INFO - Step 30950 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:12:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2133
2025-08-30 01:12:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.24e-06
2025-08-30 01:12:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:12:51 - pico-train - INFO - Step 30975 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:12:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2132
2025-08-30 01:12:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.19e-06
2025-08-30 01:12:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:13:03 - pico-train - INFO - Step 31000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 01:14:57 - pico-train - INFO - Step 31000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 01:14:57 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.4568970443260916e+26
2025-08-30 01:14:59 - pico-train - INFO - Step 31000 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:14:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1313
2025-08-30 01:14:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.14e-06
2025-08-30 01:14:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:14:59 - pico-train - INFO - Step 31000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 01:15:15 - pico-train - INFO - Step 31025 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:15:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2095
2025-08-30 01:15:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.09e-06
2025-08-30 01:15:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:15:27 - pico-train - INFO - Step 31050 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:15:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1753
2025-08-30 01:15:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.05e-06
2025-08-30 01:15:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:15:40 - pico-train - INFO - Step 31075 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:15:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1722
2025-08-30 01:15:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.00e-06
2025-08-30 01:15:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:15:53 - pico-train - INFO - Step 31100 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:15:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1917
2025-08-30 01:15:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.95e-06
2025-08-30 01:15:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:16:05 - pico-train - INFO - Step 31125 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:16:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1442
2025-08-30 01:16:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.90e-06
2025-08-30 01:16:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:16:18 - pico-train - INFO - Step 31150 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:16:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2128
2025-08-30 01:16:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.86e-06
2025-08-30 01:16:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:16:30 - pico-train - INFO - Step 31175 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:16:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1192
2025-08-30 01:16:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.81e-06
2025-08-30 01:16:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:16:43 - pico-train - INFO - Step 31200 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:16:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1648
2025-08-30 01:16:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.76e-06
2025-08-30 01:16:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:16:55 - pico-train - INFO - Step 31225 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:16:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2030
2025-08-30 01:16:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.72e-06
2025-08-30 01:16:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:17:08 - pico-train - INFO - Step 31250 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:17:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1564
2025-08-30 01:17:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.67e-06
2025-08-30 01:17:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:17:21 - pico-train - INFO - Step 31275 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:17:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2193
2025-08-30 01:17:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.62e-06
2025-08-30 01:17:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:17:33 - pico-train - INFO - Step 31300 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:17:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1630
2025-08-30 01:17:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.58e-06
2025-08-30 01:17:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:17:46 - pico-train - INFO - Step 31325 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:17:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1765
2025-08-30 01:17:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.53e-06
2025-08-30 01:17:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:17:58 - pico-train - INFO - Step 31350 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:17:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2315
2025-08-30 01:17:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.49e-06
2025-08-30 01:17:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:18:11 - pico-train - INFO - Step 31375 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:18:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1719
2025-08-30 01:18:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.44e-06
2025-08-30 01:18:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:18:24 - pico-train - INFO - Step 31400 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:18:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2234
2025-08-30 01:18:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.39e-06
2025-08-30 01:18:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:18:36 - pico-train - INFO - Step 31425 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:18:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1782
2025-08-30 01:18:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.35e-06
2025-08-30 01:18:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:18:49 - pico-train - INFO - Step 31450 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:18:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1711
2025-08-30 01:18:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.30e-06
2025-08-30 01:18:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:19:01 - pico-train - INFO - Step 31475 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:19:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1834
2025-08-30 01:19:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.26e-06
2025-08-30 01:19:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:19:14 - pico-train - INFO - Step 31500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 01:21:14 - pico-train - INFO - Step 31500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 01:21:14 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.8663430235000883e+26
2025-08-30 01:21:17 - pico-train - INFO - Step 31500 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:21:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1338
2025-08-30 01:21:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.21e-06
2025-08-30 01:21:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:21:17 - pico-train - INFO - Step 31500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 01:21:33 - pico-train - INFO - Step 31525 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:21:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1819
2025-08-30 01:21:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.17e-06
2025-08-30 01:21:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:21:46 - pico-train - INFO - Step 31550 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:21:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1695
2025-08-30 01:21:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.12e-06
2025-08-30 01:21:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:21:58 - pico-train - INFO - Step 31575 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:21:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2089
2025-08-30 01:21:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.08e-06
2025-08-30 01:21:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:22:11 - pico-train - INFO - Step 31600 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:22:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1555
2025-08-30 01:22:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.03e-06
2025-08-30 01:22:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:22:24 - pico-train - INFO - Step 31625 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:22:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1820
2025-08-30 01:22:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.98e-06
2025-08-30 01:22:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:22:36 - pico-train - INFO - Step 31650 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:22:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1091
2025-08-30 01:22:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.94e-06
2025-08-30 01:22:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:22:49 - pico-train - INFO - Step 31675 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:22:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2098
2025-08-30 01:22:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.90e-06
2025-08-30 01:22:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:23:01 - pico-train - INFO - Step 31700 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:23:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0611
2025-08-30 01:23:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.85e-06
2025-08-30 01:23:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:23:14 - pico-train - INFO - Step 31725 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:23:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1088
2025-08-30 01:23:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.81e-06
2025-08-30 01:23:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:23:27 - pico-train - INFO - Step 31750 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:23:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2220
2025-08-30 01:23:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.76e-06
2025-08-30 01:23:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:23:39 - pico-train - INFO - Step 31775 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:23:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2271
2025-08-30 01:23:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.72e-06
2025-08-30 01:23:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:23:52 - pico-train - INFO - Step 31800 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:23:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1465
2025-08-30 01:23:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.67e-06
2025-08-30 01:23:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:24:04 - pico-train - INFO - Step 31825 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:24:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1742
2025-08-30 01:24:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.63e-06
2025-08-30 01:24:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:24:17 - pico-train - INFO - Step 31850 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:24:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2199
2025-08-30 01:24:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.58e-06
2025-08-30 01:24:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:24:30 - pico-train - INFO - Step 31875 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:24:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1934
2025-08-30 01:24:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.54e-06
2025-08-30 01:24:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:24:42 - pico-train - INFO - Step 31900 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:24:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1503
2025-08-30 01:24:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.50e-06
2025-08-30 01:24:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:24:55 - pico-train - INFO - Step 31925 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:24:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0399
2025-08-30 01:24:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.45e-06
2025-08-30 01:24:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:25:07 - pico-train - INFO - Step 31950 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:25:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2147
2025-08-30 01:25:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.41e-06
2025-08-30 01:25:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:25:20 - pico-train - INFO - Step 31975 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:25:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1952
2025-08-30 01:25:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.37e-06
2025-08-30 01:25:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:25:32 - pico-train - INFO - Step 32000 -- ๐Ÿ’พ Saving Checkpoint