ThomasTheMaker's picture
Upload folder using huggingface_hub
697e0ac verified
2025-08-30 18:43:39 - pico-train - INFO - Step 78000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 18:43:39 - pico-train - INFO - ==================================================
2025-08-30 18:43:39 - pico-train - INFO - โœจ Training Configuration
2025-08-30 18:43:39 - pico-train - INFO - ==================================================
2025-08-30 18:43:39 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma10M-v1 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ save_every_n_steps: 2000 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ batch_size: 16 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ name: ThomasTheMaker/pretokenized-dolma-10M โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ log_every_n_steps: 100 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ max_steps: 100000 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 1 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ lr: 0.0002 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ lr_warmup_steps: 2000 โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ”‚ โ”‚
2025-08-30 18:43:39 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-30 18:43:39 - pico-train - INFO - ==================================================
2025-08-30 18:43:39 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-30 18:43:39 - pico-train - INFO - ==================================================
2025-08-30 18:43:39 - pico-train - INFO - Starting from step: 78000
2025-08-30 18:43:39 - pico-train - INFO - Model Setup:
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-30 18:43:39 - pico-train - INFO - Distributed Setup:
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA H100 80GB HBM3
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Available Memory: 85.03 GB
2025-08-30 18:43:39 - pico-train - INFO - Software Setup:
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Python Version: 3.12.3
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-71-generic
2025-08-30 18:43:39 - pico-train - INFO - Batch Size Configuration:
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Global Batch Size: 16
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 16
2025-08-30 18:43:39 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 1
2025-08-30 18:43:39 - pico-train - INFO - ==================================================
2025-08-30 18:43:40 - pico-train - INFO - Step 78000 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:43:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.5461
2025-08-30 18:43:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.39e-05
2025-08-30 18:43:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:43:40 - pico-train - INFO - Step 78000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 18:44:34 - pico-train - INFO - Step 78100 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:44:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7732
2025-08-30 18:44:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.36e-05
2025-08-30 18:44:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:45:26 - pico-train - INFO - Step 78200 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:45:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7809
2025-08-30 18:45:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.34e-05
2025-08-30 18:45:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:46:18 - pico-train - INFO - Step 78300 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:46:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7659
2025-08-30 18:46:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.32e-05
2025-08-30 18:46:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:47:16 - pico-train - INFO - Step 78400 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:47:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7466
2025-08-30 18:47:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.30e-05
2025-08-30 18:47:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:48:27 - pico-train - INFO - Step 78500 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:48:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8076
2025-08-30 18:48:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.28e-05
2025-08-30 18:48:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:49:39 - pico-train - INFO - Step 78600 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:49:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7884
2025-08-30 18:49:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.26e-05
2025-08-30 18:49:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:50:50 - pico-train - INFO - Step 78700 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:50:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7882
2025-08-30 18:50:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.24e-05
2025-08-30 18:50:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:51:55 - pico-train - INFO - Step 78800 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:51:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7942
2025-08-30 18:51:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.22e-05
2025-08-30 18:51:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:52:48 - pico-train - INFO - Step 78900 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:52:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7966
2025-08-30 18:52:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.20e-05
2025-08-30 18:52:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:53:41 - pico-train - INFO - Step 79000 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:53:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7800
2025-08-30 18:53:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.18e-05
2025-08-30 18:53:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:54:34 - pico-train - INFO - Step 79100 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:54:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7808
2025-08-30 18:54:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.16e-05
2025-08-30 18:54:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:55:27 - pico-train - INFO - Step 79200 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:55:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7704
2025-08-30 18:55:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.14e-05
2025-08-30 18:55:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:56:20 - pico-train - INFO - Step 79300 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:56:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7921
2025-08-30 18:56:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.12e-05
2025-08-30 18:56:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:57:12 - pico-train - INFO - Step 79400 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:57:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7701
2025-08-30 18:57:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.10e-05
2025-08-30 18:57:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:58:05 - pico-train - INFO - Step 79500 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:58:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7990
2025-08-30 18:58:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.08e-05
2025-08-30 18:58:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:58:58 - pico-train - INFO - Step 79600 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:58:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7864
2025-08-30 18:58:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.06e-05
2025-08-30 18:58:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:59:51 - pico-train - INFO - Step 79700 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:59:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7747
2025-08-30 18:59:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.04e-05
2025-08-30 18:59:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:00:44 - pico-train - INFO - Step 79800 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:00:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7703
2025-08-30 19:00:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.02e-05
2025-08-30 19:00:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:01:37 - pico-train - INFO - Step 79900 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:01:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7738
2025-08-30 19:01:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.01e-05
2025-08-30 19:01:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:02:29 - pico-train - INFO - Step 80000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 19:04:30 - pico-train - INFO - Step 80000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 19:04:30 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 19:04:31 - pico-train - INFO - Step 80000 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:04:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7781
2025-08-30 19:04:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:04:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:04:31 - pico-train - INFO - Step 80000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 19:05:25 - pico-train - INFO - Step 80100 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:05:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8125
2025-08-30 19:05:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:05:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:06:17 - pico-train - INFO - Step 80200 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:06:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7764
2025-08-30 19:06:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:06:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:07:09 - pico-train - INFO - Step 80300 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:07:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7498
2025-08-30 19:07:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:07:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:08:02 - pico-train - INFO - Step 80400 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:08:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7809
2025-08-30 19:08:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:08:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:08:55 - pico-train - INFO - Step 80500 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:08:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7766
2025-08-30 19:08:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:08:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:09:48 - pico-train - INFO - Step 80600 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:09:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7933
2025-08-30 19:09:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:09:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:10:40 - pico-train - INFO - Step 80700 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:10:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7826
2025-08-30 19:10:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:10:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:11:32 - pico-train - INFO - Step 80800 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:11:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7968
2025-08-30 19:11:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:11:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:12:24 - pico-train - INFO - Step 80900 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:12:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8019
2025-08-30 19:12:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:12:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:13:16 - pico-train - INFO - Step 81000 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:13:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7786
2025-08-30 19:13:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:13:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:14:07 - pico-train - INFO - Step 81100 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:14:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7870
2025-08-30 19:14:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:14:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:14:59 - pico-train - INFO - Step 81200 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:14:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7989
2025-08-30 19:14:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:14:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:15:51 - pico-train - INFO - Step 81300 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:15:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8003
2025-08-30 19:15:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:15:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:16:44 - pico-train - INFO - Step 81400 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:16:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7783
2025-08-30 19:16:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:16:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:17:36 - pico-train - INFO - Step 81500 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:17:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7549
2025-08-30 19:17:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:17:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:18:28 - pico-train - INFO - Step 81600 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:18:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7775
2025-08-30 19:18:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:18:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:19:19 - pico-train - INFO - Step 81700 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:19:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7858
2025-08-30 19:19:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:19:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:20:11 - pico-train - INFO - Step 81800 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:20:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7789
2025-08-30 19:20:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:20:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:21:03 - pico-train - INFO - Step 81900 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:21:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7737
2025-08-30 19:21:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:21:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:21:55 - pico-train - INFO - Step 82000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 19:23:45 - pico-train - INFO - Step 82000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 19:23:45 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 19:23:46 - pico-train - INFO - Step 82000 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:23:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7934
2025-08-30 19:23:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:23:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:23:46 - pico-train - INFO - Step 82000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 19:24:40 - pico-train - INFO - Step 82100 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:24:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7784
2025-08-30 19:24:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:24:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:25:32 - pico-train - INFO - Step 82200 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:25:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7837
2025-08-30 19:25:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:25:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:26:24 - pico-train - INFO - Step 82300 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:26:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7611
2025-08-30 19:26:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:26:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:27:16 - pico-train - INFO - Step 82400 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:27:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7873
2025-08-30 19:27:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:27:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:28:09 - pico-train - INFO - Step 82500 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:28:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7805
2025-08-30 19:28:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:28:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:29:02 - pico-train - INFO - Step 82600 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:29:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7728
2025-08-30 19:29:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:29:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:29:53 - pico-train - INFO - Step 82700 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:29:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7685
2025-08-30 19:29:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:29:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:30:45 - pico-train - INFO - Step 82800 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:30:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7772
2025-08-30 19:30:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:30:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:31:37 - pico-train - INFO - Step 82900 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:31:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7580
2025-08-30 19:31:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:31:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:32:30 - pico-train - INFO - Step 83000 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:32:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7907
2025-08-30 19:32:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:32:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:33:23 - pico-train - INFO - Step 83100 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:33:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7721
2025-08-30 19:33:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:33:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:34:16 - pico-train - INFO - Step 83200 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:34:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7750
2025-08-30 19:34:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:34:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:35:09 - pico-train - INFO - Step 83300 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:35:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7808
2025-08-30 19:35:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:35:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:36:02 - pico-train - INFO - Step 83400 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:36:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7869
2025-08-30 19:36:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:36:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:36:55 - pico-train - INFO - Step 83500 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:36:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7670
2025-08-30 19:36:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:36:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:37:48 - pico-train - INFO - Step 83600 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:37:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7615
2025-08-30 19:37:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:37:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:38:40 - pico-train - INFO - Step 83700 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:38:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7976
2025-08-30 19:38:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:38:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:39:32 - pico-train - INFO - Step 83800 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:39:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7549
2025-08-30 19:39:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:39:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:40:24 - pico-train - INFO - Step 83900 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:40:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7879
2025-08-30 19:40:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:40:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:41:15 - pico-train - INFO - Step 84000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 19:43:17 - pico-train - INFO - Step 84000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 19:43:17 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 19:43:18 - pico-train - INFO - Step 84000 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:43:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7979
2025-08-30 19:43:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:43:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:43:18 - pico-train - INFO - Step 84000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 19:44:12 - pico-train - INFO - Step 84100 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:44:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8088
2025-08-30 19:44:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:44:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:45:04 - pico-train - INFO - Step 84200 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:45:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7678
2025-08-30 19:45:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:45:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:45:56 - pico-train - INFO - Step 84300 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:45:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7725
2025-08-30 19:45:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:45:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:46:48 - pico-train - INFO - Step 84400 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:46:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7841
2025-08-30 19:46:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:46:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:47:40 - pico-train - INFO - Step 84500 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:47:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7708
2025-08-30 19:47:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:47:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:48:32 - pico-train - INFO - Step 84600 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:48:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7748
2025-08-30 19:48:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:48:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:49:24 - pico-train - INFO - Step 84700 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:49:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7714
2025-08-30 19:49:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:49:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:50:16 - pico-train - INFO - Step 84800 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:50:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7860
2025-08-30 19:50:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:50:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:51:09 - pico-train - INFO - Step 84900 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:51:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7671
2025-08-30 19:51:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:51:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:52:02 - pico-train - INFO - Step 85000 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:52:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7753
2025-08-30 19:52:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:52:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:52:55 - pico-train - INFO - Step 85100 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:52:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7335
2025-08-30 19:52:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:52:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:53:48 - pico-train - INFO - Step 85200 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:53:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7700
2025-08-30 19:53:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:53:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:54:41 - pico-train - INFO - Step 85300 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:54:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7800
2025-08-30 19:54:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:54:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:55:34 - pico-train - INFO - Step 85400 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:55:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7782
2025-08-30 19:55:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:55:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:56:27 - pico-train - INFO - Step 85500 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:56:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7698
2025-08-30 19:56:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:56:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:57:21 - pico-train - INFO - Step 85600 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:57:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7835
2025-08-30 19:57:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:57:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:58:14 - pico-train - INFO - Step 85700 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:58:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7651
2025-08-30 19:58:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:58:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:59:06 - pico-train - INFO - Step 85800 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:59:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7900
2025-08-30 19:59:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:59:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 19:59:58 - pico-train - INFO - Step 85900 -- ๐Ÿ”„ Training Metrics
2025-08-30 19:59:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7797
2025-08-30 19:59:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 19:59:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:00:50 - pico-train - INFO - Step 86000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 20:02:55 - pico-train - INFO - Step 86000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 20:02:55 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 20:02:56 - pico-train - INFO - Step 86000 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:02:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7650
2025-08-30 20:02:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:02:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:02:56 - pico-train - INFO - Step 86000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 20:03:51 - pico-train - INFO - Step 86100 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:03:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7682
2025-08-30 20:03:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:03:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:04:44 - pico-train - INFO - Step 86200 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:04:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7968
2025-08-30 20:04:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:04:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:05:37 - pico-train - INFO - Step 86300 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:05:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7895
2025-08-30 20:05:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:05:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:06:30 - pico-train - INFO - Step 86400 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:06:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7680
2025-08-30 20:06:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:06:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:07:23 - pico-train - INFO - Step 86500 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:07:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7686
2025-08-30 20:07:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:07:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:08:16 - pico-train - INFO - Step 86600 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:08:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7828
2025-08-30 20:08:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:08:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:09:10 - pico-train - INFO - Step 86700 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:09:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7595
2025-08-30 20:09:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:09:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:10:02 - pico-train - INFO - Step 86800 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:10:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7808
2025-08-30 20:10:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:10:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:10:56 - pico-train - INFO - Step 86900 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:10:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7668
2025-08-30 20:10:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:10:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:11:49 - pico-train - INFO - Step 87000 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:11:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7481
2025-08-30 20:11:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:11:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:12:41 - pico-train - INFO - Step 87100 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:12:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7536
2025-08-30 20:12:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:12:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:13:32 - pico-train - INFO - Step 87200 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:13:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7748
2025-08-30 20:13:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:13:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:14:26 - pico-train - INFO - Step 87300 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:14:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7597
2025-08-30 20:14:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:14:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:15:19 - pico-train - INFO - Step 87400 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:15:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7862
2025-08-30 20:15:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:15:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:16:12 - pico-train - INFO - Step 87500 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:16:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7682
2025-08-30 20:16:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:16:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:17:05 - pico-train - INFO - Step 87600 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:17:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8045
2025-08-30 20:17:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:17:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:17:58 - pico-train - INFO - Step 87700 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:17:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7911
2025-08-30 20:17:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:17:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:18:51 - pico-train - INFO - Step 87800 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:18:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7530
2025-08-30 20:18:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:18:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:19:45 - pico-train - INFO - Step 87900 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:19:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7618
2025-08-30 20:19:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:19:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:20:37 - pico-train - INFO - Step 88000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 20:22:42 - pico-train - INFO - Step 88000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 20:22:42 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 20:22:43 - pico-train - INFO - Step 88000 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:22:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7796
2025-08-30 20:22:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:22:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:22:43 - pico-train - INFO - Step 88000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 20:23:38 - pico-train - INFO - Step 88100 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:23:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7432
2025-08-30 20:23:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:23:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:24:31 - pico-train - INFO - Step 88200 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:24:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7725
2025-08-30 20:24:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:24:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:25:24 - pico-train - INFO - Step 88300 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:25:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7749
2025-08-30 20:25:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:25:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:26:17 - pico-train - INFO - Step 88400 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:26:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7883
2025-08-30 20:26:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:26:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:27:10 - pico-train - INFO - Step 88500 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:27:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7871
2025-08-30 20:27:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:27:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:28:03 - pico-train - INFO - Step 88600 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:28:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7894
2025-08-30 20:28:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:28:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:28:56 - pico-train - INFO - Step 88700 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:28:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7812
2025-08-30 20:28:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:28:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:29:49 - pico-train - INFO - Step 88800 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:29:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7371
2025-08-30 20:29:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:29:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:30:43 - pico-train - INFO - Step 88900 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:30:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7666
2025-08-30 20:30:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:30:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:31:36 - pico-train - INFO - Step 89000 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:31:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7623
2025-08-30 20:31:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:31:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:32:29 - pico-train - INFO - Step 89100 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:32:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7911
2025-08-30 20:32:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:32:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:33:22 - pico-train - INFO - Step 89200 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:33:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7823
2025-08-30 20:33:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:33:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:34:15 - pico-train - INFO - Step 89300 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:34:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7830
2025-08-30 20:34:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:34:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:35:08 - pico-train - INFO - Step 89400 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:35:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7724
2025-08-30 20:35:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:35:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:36:01 - pico-train - INFO - Step 89500 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:36:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7654
2025-08-30 20:36:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:36:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:36:54 - pico-train - INFO - Step 89600 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:36:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7613
2025-08-30 20:36:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:36:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:37:47 - pico-train - INFO - Step 89700 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:37:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7544
2025-08-30 20:37:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:37:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:38:41 - pico-train - INFO - Step 89800 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:38:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7889
2025-08-30 20:38:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:38:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:39:34 - pico-train - INFO - Step 89900 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:39:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7928
2025-08-30 20:39:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:39:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:40:26 - pico-train - INFO - Step 90000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 20:42:31 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:42:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7777
2025-08-30 20:42:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:42:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 20:43:26 - pico-train - INFO - Step 90100 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:43:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7721
2025-08-30 20:43:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:43:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:44:18 - pico-train - INFO - Step 90200 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:44:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7616
2025-08-30 20:44:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:44:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:45:10 - pico-train - INFO - Step 90300 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:45:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7529
2025-08-30 20:45:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:45:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:46:04 - pico-train - INFO - Step 90400 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:46:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7656
2025-08-30 20:46:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:46:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:46:56 - pico-train - INFO - Step 90500 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:46:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7484
2025-08-30 20:46:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:46:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:47:50 - pico-train - INFO - Step 90600 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:47:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7811
2025-08-30 20:47:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:47:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:48:43 - pico-train - INFO - Step 90700 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:48:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7523
2025-08-30 20:48:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:48:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:49:36 - pico-train - INFO - Step 90800 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:49:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7822
2025-08-30 20:49:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:49:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:50:29 - pico-train - INFO - Step 90900 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:50:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7780
2025-08-30 20:50:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:50:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:51:22 - pico-train - INFO - Step 91000 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:51:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7850
2025-08-30 20:51:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:51:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:52:15 - pico-train - INFO - Step 91100 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:52:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7669
2025-08-30 20:52:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:52:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:53:09 - pico-train - INFO - Step 91200 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:53:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7713
2025-08-30 20:53:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:53:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:54:02 - pico-train - INFO - Step 91300 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:54:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7832
2025-08-30 20:54:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:54:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:54:55 - pico-train - INFO - Step 91400 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:54:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7749
2025-08-30 20:54:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:54:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:55:48 - pico-train - INFO - Step 91500 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:55:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7702
2025-08-30 20:55:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:55:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:56:41 - pico-train - INFO - Step 91600 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:56:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7792
2025-08-30 20:56:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:56:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:57:34 - pico-train - INFO - Step 91700 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:57:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7678
2025-08-30 20:57:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:57:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:58:28 - pico-train - INFO - Step 91800 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:58:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7831
2025-08-30 20:58:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:58:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 20:59:21 - pico-train - INFO - Step 91900 -- ๐Ÿ”„ Training Metrics
2025-08-30 20:59:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7746
2025-08-30 20:59:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 20:59:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:00:13 - pico-train - INFO - Step 92000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 21:02:18 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:02:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7812
2025-08-30 21:02:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:02:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 21:03:14 - pico-train - INFO - Step 92100 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:03:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7569
2025-08-30 21:03:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:03:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:04:06 - pico-train - INFO - Step 92200 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:04:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7846
2025-08-30 21:04:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:04:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:04:58 - pico-train - INFO - Step 92300 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:04:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7687
2025-08-30 21:04:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:04:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:05:50 - pico-train - INFO - Step 92400 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:05:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7699
2025-08-30 21:05:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:05:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:06:42 - pico-train - INFO - Step 92500 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:06:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7961
2025-08-30 21:06:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:06:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:07:34 - pico-train - INFO - Step 92600 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:07:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7682
2025-08-30 21:07:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:07:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:08:26 - pico-train - INFO - Step 92700 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:08:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7786
2025-08-30 21:08:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:08:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:09:18 - pico-train - INFO - Step 92800 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:09:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7716
2025-08-30 21:09:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:09:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:10:11 - pico-train - INFO - Step 92900 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:10:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7837
2025-08-30 21:10:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:10:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:11:04 - pico-train - INFO - Step 93000 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:11:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7811
2025-08-30 21:11:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:11:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:11:57 - pico-train - INFO - Step 93100 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:11:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7830
2025-08-30 21:11:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:11:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:12:50 - pico-train - INFO - Step 93200 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:12:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7935
2025-08-30 21:12:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:12:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:13:43 - pico-train - INFO - Step 93300 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:13:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8135
2025-08-30 21:13:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:13:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:14:36 - pico-train - INFO - Step 93400 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:14:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7767
2025-08-30 21:14:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:14:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:15:29 - pico-train - INFO - Step 93500 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:15:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8005
2025-08-30 21:15:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:15:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:16:22 - pico-train - INFO - Step 93600 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:16:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7913
2025-08-30 21:16:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:16:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:17:15 - pico-train - INFO - Step 93700 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:17:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7739
2025-08-30 21:17:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:17:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:18:08 - pico-train - INFO - Step 93800 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:18:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7875
2025-08-30 21:18:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:18:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:19:00 - pico-train - INFO - Step 93900 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:19:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7801
2025-08-30 21:19:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:19:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:19:52 - pico-train - INFO - Step 94000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 21:21:41 - pico-train - INFO - Step 94000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 21:21:41 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 21:21:42 - pico-train - INFO - Step 94000 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:21:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7826
2025-08-30 21:21:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:21:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:21:42 - pico-train - INFO - Step 94000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 21:22:36 - pico-train - INFO - Step 94100 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:22:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7712
2025-08-30 21:22:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:22:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:23:28 - pico-train - INFO - Step 94200 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:23:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7528
2025-08-30 21:23:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:23:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:24:21 - pico-train - INFO - Step 94300 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:24:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7867
2025-08-30 21:24:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:24:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:25:14 - pico-train - INFO - Step 94400 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:25:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7694
2025-08-30 21:25:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:25:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:26:07 - pico-train - INFO - Step 94500 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:26:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7677
2025-08-30 21:26:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:26:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:26:59 - pico-train - INFO - Step 94600 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:26:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7968
2025-08-30 21:26:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:26:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:27:52 - pico-train - INFO - Step 94700 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:27:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7716
2025-08-30 21:27:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:27:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:28:44 - pico-train - INFO - Step 94800 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:28:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7446
2025-08-30 21:28:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:28:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:29:36 - pico-train - INFO - Step 94900 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:29:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7763
2025-08-30 21:29:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:29:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:30:28 - pico-train - INFO - Step 95000 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:30:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7830
2025-08-30 21:30:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:30:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:31:20 - pico-train - INFO - Step 95100 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:31:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7890
2025-08-30 21:31:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:31:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:32:13 - pico-train - INFO - Step 95200 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:32:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7685
2025-08-30 21:32:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:32:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:33:06 - pico-train - INFO - Step 95300 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:33:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8231
2025-08-30 21:33:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:33:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:33:58 - pico-train - INFO - Step 95400 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:33:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7698
2025-08-30 21:33:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:33:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:34:50 - pico-train - INFO - Step 95500 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:34:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7614
2025-08-30 21:34:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:34:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:35:42 - pico-train - INFO - Step 95600 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:35:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7906
2025-08-30 21:35:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:35:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:36:34 - pico-train - INFO - Step 95700 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:36:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7685
2025-08-30 21:36:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:36:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:37:26 - pico-train - INFO - Step 95800 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:37:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7466
2025-08-30 21:37:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:37:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:38:18 - pico-train - INFO - Step 95900 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:38:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7771
2025-08-30 21:38:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:38:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:39:09 - pico-train - INFO - Step 96000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 21:41:02 - pico-train - INFO - Step 96000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 21:41:02 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 21:41:03 - pico-train - INFO - Step 96000 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:41:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7812
2025-08-30 21:41:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:41:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:41:03 - pico-train - INFO - Step 96000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 21:41:58 - pico-train - INFO - Step 96100 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:41:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7849
2025-08-30 21:41:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:41:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:42:51 - pico-train - INFO - Step 96200 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:42:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7649
2025-08-30 21:42:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:42:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:43:45 - pico-train - INFO - Step 96300 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:43:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7696
2025-08-30 21:43:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:43:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:44:37 - pico-train - INFO - Step 96400 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:44:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7768
2025-08-30 21:44:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:44:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:45:31 - pico-train - INFO - Step 96500 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:45:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7631
2025-08-30 21:45:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:45:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:46:24 - pico-train - INFO - Step 96600 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:46:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7730
2025-08-30 21:46:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:46:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:47:17 - pico-train - INFO - Step 96700 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:47:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7832
2025-08-30 21:47:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:47:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:48:10 - pico-train - INFO - Step 96800 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:48:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7508
2025-08-30 21:48:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:48:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:49:03 - pico-train - INFO - Step 96900 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:49:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7688
2025-08-30 21:49:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:49:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:49:56 - pico-train - INFO - Step 97000 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:49:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7887
2025-08-30 21:49:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:49:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:50:49 - pico-train - INFO - Step 97100 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:50:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7774
2025-08-30 21:50:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:50:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:51:43 - pico-train - INFO - Step 97200 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:51:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7731
2025-08-30 21:51:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:51:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:52:34 - pico-train - INFO - Step 97300 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:52:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7823
2025-08-30 21:52:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:52:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:53:27 - pico-train - INFO - Step 97400 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:53:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7782
2025-08-30 21:53:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:53:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:54:20 - pico-train - INFO - Step 97500 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:54:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7935
2025-08-30 21:54:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:54:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:55:13 - pico-train - INFO - Step 97600 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:55:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7908
2025-08-30 21:55:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:55:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:56:07 - pico-train - INFO - Step 97700 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:56:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7824
2025-08-30 21:56:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:56:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:56:59 - pico-train - INFO - Step 97800 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:56:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7913
2025-08-30 21:56:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:56:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:57:53 - pico-train - INFO - Step 97900 -- ๐Ÿ”„ Training Metrics
2025-08-30 21:57:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7547
2025-08-30 21:57:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 21:57:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 21:58:45 - pico-train - INFO - Step 98000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 22:00:46 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:00:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7784
2025-08-30 22:00:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:00:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 22:01:41 - pico-train - INFO - Step 98100 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:01:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7555
2025-08-30 22:01:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:01:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:02:33 - pico-train - INFO - Step 98200 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:02:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7774
2025-08-30 22:02:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:02:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:03:25 - pico-train - INFO - Step 98300 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:03:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7961
2025-08-30 22:03:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:03:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:04:17 - pico-train - INFO - Step 98400 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:04:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7770
2025-08-30 22:04:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:04:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:05:09 - pico-train - INFO - Step 98500 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:05:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7789
2025-08-30 22:05:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:05:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:06:01 - pico-train - INFO - Step 98600 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:06:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7968
2025-08-30 22:06:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:06:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:06:53 - pico-train - INFO - Step 98700 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:06:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7691
2025-08-30 22:06:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:06:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:07:45 - pico-train - INFO - Step 98800 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:07:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7841
2025-08-30 22:07:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:07:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:08:37 - pico-train - INFO - Step 98900 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:08:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7785
2025-08-30 22:08:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:08:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:09:28 - pico-train - INFO - Step 99000 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:09:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7770
2025-08-30 22:09:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:09:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:10:20 - pico-train - INFO - Step 99100 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:10:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7774
2025-08-30 22:10:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:10:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:11:12 - pico-train - INFO - Step 99200 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:11:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7946
2025-08-30 22:11:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:11:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:12:04 - pico-train - INFO - Step 99300 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:12:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7804
2025-08-30 22:12:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:12:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:12:56 - pico-train - INFO - Step 99400 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:12:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7579
2025-08-30 22:12:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:12:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:13:48 - pico-train - INFO - Step 99500 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:13:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7916
2025-08-30 22:13:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:13:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:14:40 - pico-train - INFO - Step 99600 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:14:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7512
2025-08-30 22:14:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:14:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:15:32 - pico-train - INFO - Step 99700 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:15:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7774
2025-08-30 22:15:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:15:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:16:24 - pico-train - INFO - Step 99800 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:16:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7938
2025-08-30 22:16:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:16:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:17:16 - pico-train - INFO - Step 99900 -- ๐Ÿ”„ Training Metrics
2025-08-30 22:17:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7923
2025-08-30 22:17:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-30 22:17:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 22:18:07 - pico-train - INFO - Step 100000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 22:19:57 - pico-train - INFO - Step 100000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 22:19:57 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 22:19:57 - pico-train - INFO - ๐ŸŽ‰ Training complete! Final step: 100000