ThomasTheMaker's picture
Upload folder using huggingface_hub
7bc28b5 verified
2025-08-30 02:46:39 - pico-train - INFO - Step 32500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 02:46:39 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.383932336875151e+26
2025-08-30 02:46:41 - pico-train - INFO - ==================================================
2025-08-30 02:46:41 - pico-train - INFO - โœจ Training Configuration
2025-08-30 02:46:41 - pico-train - INFO - ==================================================
2025-08-30 02:46:41 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma5M-v1 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ save_every_n_steps: 500 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ batch_size: 4 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ name: ThomasTheMaker/pretokenized-dolma-5M โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ log_every_n_steps: 25 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ max_steps: 20000 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 4 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ lr: 5.0e-05 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ lr_warmup_steps: 8000 โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ”‚ โ”‚
2025-08-30 02:46:41 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-30 02:46:41 - pico-train - INFO - ==================================================
2025-08-30 02:46:41 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-30 02:46:41 - pico-train - INFO - ==================================================
2025-08-30 02:46:41 - pico-train - INFO - Starting from step: 32500
2025-08-30 02:46:41 - pico-train - INFO - Model Setup:
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-30 02:46:41 - pico-train - INFO - Distributed Setup:
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA GeForce RTX 5090
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Available Memory: 33.68 GB
2025-08-30 02:46:41 - pico-train - INFO - Software Setup:
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Python Version: 3.10.12
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-63-generic
2025-08-30 02:46:41 - pico-train - INFO - Batch Size Configuration:
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Global Batch Size: 4
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 1
2025-08-30 02:46:41 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 4
2025-08-30 02:46:41 - pico-train - INFO - ==================================================
2025-08-30 02:46:42 - pico-train - INFO - Step 32500 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:46:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3328
2025-08-30 02:46:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.48e-06
2025-08-30 02:46:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:46:42 - pico-train - INFO - Step 32500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 02:46:58 - pico-train - INFO - Step 32525 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:46:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1925
2025-08-30 02:46:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.44e-06
2025-08-30 02:46:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:47:11 - pico-train - INFO - Step 32550 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:47:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1404
2025-08-30 02:47:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.39e-06
2025-08-30 02:47:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:47:23 - pico-train - INFO - Step 32575 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:47:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0377
2025-08-30 02:47:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.35e-06
2025-08-30 02:47:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:47:36 - pico-train - INFO - Step 32600 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:47:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1541
2025-08-30 02:47:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.31e-06
2025-08-30 02:47:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:47:49 - pico-train - INFO - Step 32625 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:47:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1989
2025-08-30 02:47:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.27e-06
2025-08-30 02:47:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:48:01 - pico-train - INFO - Step 32650 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:48:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1580
2025-08-30 02:48:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.23e-06
2025-08-30 02:48:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:48:14 - pico-train - INFO - Step 32675 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:48:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1637
2025-08-30 02:48:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.19e-06
2025-08-30 02:48:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:48:27 - pico-train - INFO - Step 32700 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:48:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2011
2025-08-30 02:48:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.15e-06
2025-08-30 02:48:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:48:44 - pico-train - INFO - Step 32725 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:48:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1889
2025-08-30 02:48:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.11e-06
2025-08-30 02:48:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:49:00 - pico-train - INFO - Step 32750 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:49:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1063
2025-08-30 02:49:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.07e-06
2025-08-30 02:49:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:49:16 - pico-train - INFO - Step 32775 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:49:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0999
2025-08-30 02:49:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.03e-06
2025-08-30 02:49:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:49:33 - pico-train - INFO - Step 32800 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:49:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0746
2025-08-30 02:49:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.99e-06
2025-08-30 02:49:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:49:49 - pico-train - INFO - Step 32825 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:49:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0813
2025-08-30 02:49:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.95e-06
2025-08-30 02:49:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:50:06 - pico-train - INFO - Step 32850 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:50:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1923
2025-08-30 02:50:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.91e-06
2025-08-30 02:50:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:50:22 - pico-train - INFO - Step 32875 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:50:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1305
2025-08-30 02:50:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.87e-06
2025-08-30 02:50:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:50:39 - pico-train - INFO - Step 32900 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:50:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1007
2025-08-30 02:50:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.83e-06
2025-08-30 02:50:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:50:55 - pico-train - INFO - Step 32925 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:50:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2576
2025-08-30 02:50:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.79e-06
2025-08-30 02:50:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:51:12 - pico-train - INFO - Step 32950 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:51:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0948
2025-08-30 02:51:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.75e-06
2025-08-30 02:51:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:51:28 - pico-train - INFO - Step 32975 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:51:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2078
2025-08-30 02:51:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.71e-06
2025-08-30 02:51:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:51:44 - pico-train - INFO - Step 33000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 02:54:25 - pico-train - INFO - Step 33000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 02:54:25 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.740115385495705e+26
2025-08-30 02:54:28 - pico-train - INFO - Step 33000 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:54:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1535
2025-08-30 02:54:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.67e-06
2025-08-30 02:54:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:54:28 - pico-train - INFO - Step 33000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 02:54:48 - pico-train - INFO - Step 33025 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:54:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1780
2025-08-30 02:54:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.64e-06
2025-08-30 02:54:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:55:00 - pico-train - INFO - Step 33050 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:55:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1420
2025-08-30 02:55:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.60e-06
2025-08-30 02:55:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:55:13 - pico-train - INFO - Step 33075 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:55:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1944
2025-08-30 02:55:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.56e-06
2025-08-30 02:55:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:55:26 - pico-train - INFO - Step 33100 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:55:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2635
2025-08-30 02:55:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.52e-06
2025-08-30 02:55:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:55:38 - pico-train - INFO - Step 33125 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:55:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1137
2025-08-30 02:55:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.48e-06
2025-08-30 02:55:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:55:51 - pico-train - INFO - Step 33150 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:55:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2020
2025-08-30 02:55:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.44e-06
2025-08-30 02:55:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:56:03 - pico-train - INFO - Step 33175 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:56:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1916
2025-08-30 02:56:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.41e-06
2025-08-30 02:56:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:56:16 - pico-train - INFO - Step 33200 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:56:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1736
2025-08-30 02:56:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.37e-06
2025-08-30 02:56:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:56:29 - pico-train - INFO - Step 33225 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:56:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1748
2025-08-30 02:56:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.33e-06
2025-08-30 02:56:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:56:41 - pico-train - INFO - Step 33250 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:56:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1251
2025-08-30 02:56:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.29e-06
2025-08-30 02:56:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:56:54 - pico-train - INFO - Step 33275 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:56:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1933
2025-08-30 02:56:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.25e-06
2025-08-30 02:56:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:57:07 - pico-train - INFO - Step 33300 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:57:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1758
2025-08-30 02:57:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.22e-06
2025-08-30 02:57:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:57:19 - pico-train - INFO - Step 33325 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:57:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1299
2025-08-30 02:57:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.18e-06
2025-08-30 02:57:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:57:32 - pico-train - INFO - Step 33350 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:57:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1500
2025-08-30 02:57:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.14e-06
2025-08-30 02:57:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:57:45 - pico-train - INFO - Step 33375 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:57:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1306
2025-08-30 02:57:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.10e-06
2025-08-30 02:57:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:57:57 - pico-train - INFO - Step 33400 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:57:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1234
2025-08-30 02:57:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.07e-06
2025-08-30 02:57:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:58:10 - pico-train - INFO - Step 33425 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:58:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1160
2025-08-30 02:58:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.03e-06
2025-08-30 02:58:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:58:22 - pico-train - INFO - Step 33450 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:58:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2380
2025-08-30 02:58:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 02:58:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:58:35 - pico-train - INFO - Step 33475 -- ๐Ÿ”„ Training Metrics
2025-08-30 02:58:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2252
2025-08-30 02:58:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 02:58:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 02:58:47 - pico-train - INFO - Step 33500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:00:50 - pico-train - INFO - Step 33500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:00:50 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.820544484937776e+26
2025-08-30 03:00:53 - pico-train - INFO - Step 33500 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:00:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0954
2025-08-30 03:00:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:00:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:00:53 - pico-train - INFO - Step 33500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:01:09 - pico-train - INFO - Step 33525 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:01:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1169
2025-08-30 03:01:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:01:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:01:21 - pico-train - INFO - Step 33550 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:01:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1243
2025-08-30 03:01:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:01:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:01:34 - pico-train - INFO - Step 33575 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:01:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1169
2025-08-30 03:01:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:01:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:01:47 - pico-train - INFO - Step 33600 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:01:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2248
2025-08-30 03:01:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:01:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:01:59 - pico-train - INFO - Step 33625 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:01:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1610
2025-08-30 03:01:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:01:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:02:12 - pico-train - INFO - Step 33650 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:02:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0577
2025-08-30 03:02:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:02:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:02:24 - pico-train - INFO - Step 33675 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:02:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2475
2025-08-30 03:02:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:02:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:02:37 - pico-train - INFO - Step 33700 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:02:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2246
2025-08-30 03:02:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:02:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:02:49 - pico-train - INFO - Step 33725 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:02:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1246
2025-08-30 03:02:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:02:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:03:02 - pico-train - INFO - Step 33750 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:03:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0986
2025-08-30 03:03:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:03:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:03:14 - pico-train - INFO - Step 33775 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:03:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1151
2025-08-30 03:03:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:03:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:03:27 - pico-train - INFO - Step 33800 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:03:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0620
2025-08-30 03:03:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:03:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:03:40 - pico-train - INFO - Step 33825 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:03:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0791
2025-08-30 03:03:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:03:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:03:52 - pico-train - INFO - Step 33850 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:03:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1092
2025-08-30 03:03:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:03:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:04:05 - pico-train - INFO - Step 33875 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:04:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2018
2025-08-30 03:04:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:04:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:04:17 - pico-train - INFO - Step 33900 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:04:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0896
2025-08-30 03:04:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:04:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:04:30 - pico-train - INFO - Step 33925 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:04:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1839
2025-08-30 03:04:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:04:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:04:43 - pico-train - INFO - Step 33950 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:04:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2086
2025-08-30 03:04:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:04:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:04:55 - pico-train - INFO - Step 33975 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:04:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1246
2025-08-30 03:04:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:04:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:05:07 - pico-train - INFO - Step 34000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:07:21 - pico-train - INFO - Step 34000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:07:21 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.142503590895774e+26
2025-08-30 03:07:23 - pico-train - INFO - Step 34000 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:07:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1339
2025-08-30 03:07:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:07:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:07:23 - pico-train - INFO - Step 34000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:07:38 - pico-train - INFO - Step 34025 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:07:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1100
2025-08-30 03:07:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:07:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:07:51 - pico-train - INFO - Step 34050 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:07:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1062
2025-08-30 03:07:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:07:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:08:03 - pico-train - INFO - Step 34075 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:08:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1263
2025-08-30 03:08:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:08:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:08:16 - pico-train - INFO - Step 34100 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:08:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1318
2025-08-30 03:08:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:08:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:08:29 - pico-train - INFO - Step 34125 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:08:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1323
2025-08-30 03:08:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:08:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:08:41 - pico-train - INFO - Step 34150 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:08:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1358
2025-08-30 03:08:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:08:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:08:54 - pico-train - INFO - Step 34175 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:08:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0957
2025-08-30 03:08:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:08:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:09:06 - pico-train - INFO - Step 34200 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:09:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1673
2025-08-30 03:09:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:09:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:09:19 - pico-train - INFO - Step 34225 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:09:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1770
2025-08-30 03:09:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:09:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:09:32 - pico-train - INFO - Step 34250 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:09:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1482
2025-08-30 03:09:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:09:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:09:44 - pico-train - INFO - Step 34275 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:09:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1460
2025-08-30 03:09:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:09:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:09:57 - pico-train - INFO - Step 34300 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:09:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1870
2025-08-30 03:09:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:09:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:10:09 - pico-train - INFO - Step 34325 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:10:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1438
2025-08-30 03:10:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:10:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:10:22 - pico-train - INFO - Step 34350 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:10:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1337
2025-08-30 03:10:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:10:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:10:34 - pico-train - INFO - Step 34375 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:10:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2022
2025-08-30 03:10:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:10:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:10:47 - pico-train - INFO - Step 34400 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:10:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1670
2025-08-30 03:10:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:10:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:11:00 - pico-train - INFO - Step 34425 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:11:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2327
2025-08-30 03:11:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:11:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:11:12 - pico-train - INFO - Step 34450 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:11:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1547
2025-08-30 03:11:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:11:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:11:25 - pico-train - INFO - Step 34475 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:11:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2192
2025-08-30 03:11:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:11:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:11:37 - pico-train - INFO - Step 34500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:13:33 - pico-train - INFO - Step 34500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:13:33 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.2151375516962886e+26
2025-08-30 03:13:37 - pico-train - INFO - Step 34500 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:13:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1697
2025-08-30 03:13:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:13:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:13:37 - pico-train - INFO - Step 34500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:13:52 - pico-train - INFO - Step 34525 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:13:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2365
2025-08-30 03:13:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:13:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:14:05 - pico-train - INFO - Step 34550 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:14:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1503
2025-08-30 03:14:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:14:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:14:17 - pico-train - INFO - Step 34575 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:14:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0855
2025-08-30 03:14:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:14:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:14:30 - pico-train - INFO - Step 34600 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:14:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0957
2025-08-30 03:14:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:14:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:14:43 - pico-train - INFO - Step 34625 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:14:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1098
2025-08-30 03:14:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:14:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:14:55 - pico-train - INFO - Step 34650 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:14:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2054
2025-08-30 03:14:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:14:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:15:08 - pico-train - INFO - Step 34675 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:15:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2291
2025-08-30 03:15:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:15:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:15:20 - pico-train - INFO - Step 34700 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:15:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1366
2025-08-30 03:15:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:15:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:15:33 - pico-train - INFO - Step 34725 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:15:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2234
2025-08-30 03:15:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:15:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:15:46 - pico-train - INFO - Step 34750 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:15:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2019
2025-08-30 03:15:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:15:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:15:58 - pico-train - INFO - Step 34775 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:15:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1355
2025-08-30 03:15:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:15:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:16:11 - pico-train - INFO - Step 34800 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:16:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1488
2025-08-30 03:16:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:16:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:16:23 - pico-train - INFO - Step 34825 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:16:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2028
2025-08-30 03:16:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:16:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:16:36 - pico-train - INFO - Step 34850 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:16:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1645
2025-08-30 03:16:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:16:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:16:49 - pico-train - INFO - Step 34875 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:16:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0661
2025-08-30 03:16:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:16:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:17:01 - pico-train - INFO - Step 34900 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:17:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2181
2025-08-30 03:17:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:17:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:17:14 - pico-train - INFO - Step 34925 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:17:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1848
2025-08-30 03:17:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:17:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:17:26 - pico-train - INFO - Step 34950 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:17:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1827
2025-08-30 03:17:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:17:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:17:39 - pico-train - INFO - Step 34975 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:17:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2027
2025-08-30 03:17:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:17:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:17:51 - pico-train - INFO - Step 35000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:19:48 - pico-train - INFO - Step 35000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:19:48 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.304627505909107e+26
2025-08-30 03:19:53 - pico-train - INFO - Step 35000 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:19:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1136
2025-08-30 03:19:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:19:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:19:53 - pico-train - INFO - Step 35000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:20:08 - pico-train - INFO - Step 35025 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:20:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1389
2025-08-30 03:20:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:20:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:20:21 - pico-train - INFO - Step 35050 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:20:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1484
2025-08-30 03:20:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:20:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:20:33 - pico-train - INFO - Step 35075 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:20:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1523
2025-08-30 03:20:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:20:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:20:46 - pico-train - INFO - Step 35100 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:20:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1193
2025-08-30 03:20:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:20:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:20:58 - pico-train - INFO - Step 35125 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:20:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1958
2025-08-30 03:20:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:20:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:21:11 - pico-train - INFO - Step 35150 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:21:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1430
2025-08-30 03:21:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:21:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:21:24 - pico-train - INFO - Step 35175 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:21:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1746
2025-08-30 03:21:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:21:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:21:37 - pico-train - INFO - Step 35200 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:21:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0335
2025-08-30 03:21:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:21:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:21:49 - pico-train - INFO - Step 35225 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:21:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1020
2025-08-30 03:21:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:21:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:22:02 - pico-train - INFO - Step 35250 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:22:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2283
2025-08-30 03:22:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:22:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:22:14 - pico-train - INFO - Step 35275 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:22:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1576
2025-08-30 03:22:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:22:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:22:27 - pico-train - INFO - Step 35300 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:22:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1987
2025-08-30 03:22:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:22:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:22:39 - pico-train - INFO - Step 35325 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:22:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1550
2025-08-30 03:22:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:22:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:22:52 - pico-train - INFO - Step 35350 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:22:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0981
2025-08-30 03:22:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:22:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:23:05 - pico-train - INFO - Step 35375 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:23:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1997
2025-08-30 03:23:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:23:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:23:17 - pico-train - INFO - Step 35400 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:23:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2458
2025-08-30 03:23:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:23:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:23:30 - pico-train - INFO - Step 35425 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:23:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0444
2025-08-30 03:23:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:23:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:23:42 - pico-train - INFO - Step 35450 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:23:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1476
2025-08-30 03:23:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:23:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:23:55 - pico-train - INFO - Step 35475 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:23:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1142
2025-08-30 03:23:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:23:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:24:07 - pico-train - INFO - Step 35500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:26:04 - pico-train - INFO - Step 35500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:26:04 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.6700700670251875e+26
2025-08-30 03:26:07 - pico-train - INFO - Step 35500 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:26:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3161
2025-08-30 03:26:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:26:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:26:07 - pico-train - INFO - Step 35500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:26:22 - pico-train - INFO - Step 35525 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:26:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1685
2025-08-30 03:26:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:26:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:26:35 - pico-train - INFO - Step 35550 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:26:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1471
2025-08-30 03:26:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:26:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:26:47 - pico-train - INFO - Step 35575 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:26:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1814
2025-08-30 03:26:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:26:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:27:00 - pico-train - INFO - Step 35600 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:27:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1457
2025-08-30 03:27:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:27:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:27:12 - pico-train - INFO - Step 35625 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:27:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2045
2025-08-30 03:27:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:27:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:27:25 - pico-train - INFO - Step 35650 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:27:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1062
2025-08-30 03:27:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:27:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:27:37 - pico-train - INFO - Step 35675 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:27:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1095
2025-08-30 03:27:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:27:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:27:50 - pico-train - INFO - Step 35700 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:27:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1678
2025-08-30 03:27:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:27:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:28:03 - pico-train - INFO - Step 35725 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:28:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1324
2025-08-30 03:28:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:28:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:28:16 - pico-train - INFO - Step 35750 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:28:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1809
2025-08-30 03:28:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:28:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:28:29 - pico-train - INFO - Step 35775 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:28:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1851
2025-08-30 03:28:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:28:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:28:41 - pico-train - INFO - Step 35800 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:28:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2119
2025-08-30 03:28:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:28:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:28:54 - pico-train - INFO - Step 35825 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:28:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2010
2025-08-30 03:28:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:28:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:29:07 - pico-train - INFO - Step 35850 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:29:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1562
2025-08-30 03:29:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:29:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:29:19 - pico-train - INFO - Step 35875 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:29:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1291
2025-08-30 03:29:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:29:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:29:32 - pico-train - INFO - Step 35900 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:29:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1521
2025-08-30 03:29:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:29:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:29:44 - pico-train - INFO - Step 35925 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:29:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2151
2025-08-30 03:29:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:29:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:29:57 - pico-train - INFO - Step 35950 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:29:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2125
2025-08-30 03:29:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:29:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:30:09 - pico-train - INFO - Step 35975 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:30:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0902
2025-08-30 03:30:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:30:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:30:21 - pico-train - INFO - Step 36000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:32:16 - pico-train - INFO - Step 36000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:32:16 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.715374868651716e+26
2025-08-30 03:32:17 - pico-train - INFO - Step 36000 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:32:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2189
2025-08-30 03:32:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:32:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:32:17 - pico-train - INFO - Step 36000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:32:33 - pico-train - INFO - Step 36025 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:32:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1215
2025-08-30 03:32:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:32:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:32:45 - pico-train - INFO - Step 36050 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:32:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1553
2025-08-30 03:32:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:32:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:32:58 - pico-train - INFO - Step 36075 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:32:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0888
2025-08-30 03:32:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:32:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:33:10 - pico-train - INFO - Step 36100 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:33:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0934
2025-08-30 03:33:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:33:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:33:23 - pico-train - INFO - Step 36125 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:33:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2055
2025-08-30 03:33:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:33:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:33:36 - pico-train - INFO - Step 36150 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:33:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1254
2025-08-30 03:33:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:33:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:33:48 - pico-train - INFO - Step 36175 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:33:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1199
2025-08-30 03:33:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:33:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:34:01 - pico-train - INFO - Step 36200 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:34:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1706
2025-08-30 03:34:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:34:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:34:14 - pico-train - INFO - Step 36225 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:34:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1231
2025-08-30 03:34:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:34:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:34:26 - pico-train - INFO - Step 36250 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:34:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1492
2025-08-30 03:34:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:34:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:34:39 - pico-train - INFO - Step 36275 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:34:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1860
2025-08-30 03:34:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:34:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:34:51 - pico-train - INFO - Step 36300 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:34:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1325
2025-08-30 03:34:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:34:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:35:04 - pico-train - INFO - Step 36325 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:35:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1265
2025-08-30 03:35:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:35:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:35:17 - pico-train - INFO - Step 36350 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:35:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1401
2025-08-30 03:35:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:35:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:35:29 - pico-train - INFO - Step 36375 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:35:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2003
2025-08-30 03:35:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:35:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:35:42 - pico-train - INFO - Step 36400 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:35:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2138
2025-08-30 03:35:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:35:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:35:54 - pico-train - INFO - Step 36425 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:35:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1670
2025-08-30 03:35:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:35:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:36:07 - pico-train - INFO - Step 36450 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:36:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1172
2025-08-30 03:36:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:36:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:36:20 - pico-train - INFO - Step 36475 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:36:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1339
2025-08-30 03:36:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:36:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:36:32 - pico-train - INFO - Step 36500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:38:26 - pico-train - INFO - Step 36500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:38:26 - pico-train - INFO - โ””โ”€โ”€ paloma: 5.3158347592675254e+26
2025-08-30 03:38:28 - pico-train - INFO - Step 36500 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:38:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0491
2025-08-30 03:38:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:38:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:38:28 - pico-train - INFO - Step 36500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:38:43 - pico-train - INFO - Step 36525 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:38:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1327
2025-08-30 03:38:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:38:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:38:55 - pico-train - INFO - Step 36550 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:38:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1302
2025-08-30 03:38:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:38:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:39:08 - pico-train - INFO - Step 36575 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:39:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0575
2025-08-30 03:39:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:39:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:39:20 - pico-train - INFO - Step 36600 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:39:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2042
2025-08-30 03:39:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:39:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:39:33 - pico-train - INFO - Step 36625 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:39:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1372
2025-08-30 03:39:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:39:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:39:46 - pico-train - INFO - Step 36650 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:39:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1700
2025-08-30 03:39:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:39:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:39:58 - pico-train - INFO - Step 36675 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:39:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1158
2025-08-30 03:39:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:39:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:40:11 - pico-train - INFO - Step 36700 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:40:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0870
2025-08-30 03:40:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:40:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:40:23 - pico-train - INFO - Step 36725 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:40:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1286
2025-08-30 03:40:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:40:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:40:36 - pico-train - INFO - Step 36750 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:40:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1306
2025-08-30 03:40:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:40:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:40:49 - pico-train - INFO - Step 36775 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:40:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1677
2025-08-30 03:40:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:40:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:41:01 - pico-train - INFO - Step 36800 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:41:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1384
2025-08-30 03:41:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:41:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:41:14 - pico-train - INFO - Step 36825 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:41:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1046
2025-08-30 03:41:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:41:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:41:27 - pico-train - INFO - Step 36850 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:41:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1113
2025-08-30 03:41:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:41:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:41:39 - pico-train - INFO - Step 36875 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:41:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0820
2025-08-30 03:41:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:41:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:41:52 - pico-train - INFO - Step 36900 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:41:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1183
2025-08-30 03:41:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:41:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:42:04 - pico-train - INFO - Step 36925 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:42:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2381
2025-08-30 03:42:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:42:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:42:17 - pico-train - INFO - Step 36950 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:42:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1110
2025-08-30 03:42:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:42:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:42:29 - pico-train - INFO - Step 36975 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:42:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1691
2025-08-30 03:42:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:42:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:42:41 - pico-train - INFO - Step 37000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:44:57 - pico-train - INFO - Step 37000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:44:57 - pico-train - INFO - โ””โ”€โ”€ paloma: 5.5391129624917924e+26
2025-08-30 03:44:59 - pico-train - INFO - Step 37000 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:44:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1885
2025-08-30 03:44:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:44:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:44:59 - pico-train - INFO - Step 37000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:45:14 - pico-train - INFO - Step 37025 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:45:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1579
2025-08-30 03:45:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:45:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:45:27 - pico-train - INFO - Step 37050 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:45:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1313
2025-08-30 03:45:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:45:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:45:39 - pico-train - INFO - Step 37075 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:45:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0942
2025-08-30 03:45:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:45:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:45:52 - pico-train - INFO - Step 37100 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:45:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1327
2025-08-30 03:45:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:45:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:46:05 - pico-train - INFO - Step 37125 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:46:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1410
2025-08-30 03:46:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:46:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:46:17 - pico-train - INFO - Step 37150 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:46:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0944
2025-08-30 03:46:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:46:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:46:30 - pico-train - INFO - Step 37175 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:46:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0974
2025-08-30 03:46:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:46:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:46:42 - pico-train - INFO - Step 37200 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:46:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1735
2025-08-30 03:46:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:46:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:46:55 - pico-train - INFO - Step 37225 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:46:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1205
2025-08-30 03:46:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:46:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:47:08 - pico-train - INFO - Step 37250 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:47:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1602
2025-08-30 03:47:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:47:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:47:21 - pico-train - INFO - Step 37275 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:47:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1855
2025-08-30 03:47:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:47:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:47:33 - pico-train - INFO - Step 37300 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:47:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1308
2025-08-30 03:47:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:47:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:47:46 - pico-train - INFO - Step 37325 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:47:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1135
2025-08-30 03:47:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:47:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:47:58 - pico-train - INFO - Step 37350 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:47:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0931
2025-08-30 03:47:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:47:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:48:11 - pico-train - INFO - Step 37375 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:48:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1745
2025-08-30 03:48:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:48:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:48:23 - pico-train - INFO - Step 37400 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:48:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1609
2025-08-30 03:48:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:48:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:48:36 - pico-train - INFO - Step 37425 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:48:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1909
2025-08-30 03:48:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:48:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:48:48 - pico-train - INFO - Step 37450 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:48:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1441
2025-08-30 03:48:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:48:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:49:01 - pico-train - INFO - Step 37475 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:49:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1467
2025-08-30 03:49:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:49:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:49:13 - pico-train - INFO - Step 37500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:51:13 - pico-train - INFO - Step 37500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:51:13 - pico-train - INFO - โ””โ”€โ”€ paloma: 5.823151093085719e+26
2025-08-30 03:51:15 - pico-train - INFO - Step 37500 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:51:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1800
2025-08-30 03:51:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:51:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:51:15 - pico-train - INFO - Step 37500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:51:30 - pico-train - INFO - Step 37525 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:51:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1196
2025-08-30 03:51:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:51:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:51:43 - pico-train - INFO - Step 37550 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:51:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1254
2025-08-30 03:51:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:51:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:51:56 - pico-train - INFO - Step 37575 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:51:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1453
2025-08-30 03:51:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:51:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:52:08 - pico-train - INFO - Step 37600 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:52:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1073
2025-08-30 03:52:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:52:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:52:21 - pico-train - INFO - Step 37625 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:52:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1739
2025-08-30 03:52:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:52:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:52:34 - pico-train - INFO - Step 37650 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:52:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1204
2025-08-30 03:52:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:52:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:52:46 - pico-train - INFO - Step 37675 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:52:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2738
2025-08-30 03:52:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:52:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:52:59 - pico-train - INFO - Step 37700 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:52:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1134
2025-08-30 03:52:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:52:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:53:11 - pico-train - INFO - Step 37725 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:53:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1197
2025-08-30 03:53:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:53:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:53:24 - pico-train - INFO - Step 37750 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:53:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1062
2025-08-30 03:53:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:53:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:53:37 - pico-train - INFO - Step 37775 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:53:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2604
2025-08-30 03:53:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:53:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:53:50 - pico-train - INFO - Step 37800 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:53:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1548
2025-08-30 03:53:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:53:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:54:02 - pico-train - INFO - Step 37825 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:54:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0781
2025-08-30 03:54:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:54:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:54:15 - pico-train - INFO - Step 37850 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:54:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1886
2025-08-30 03:54:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:54:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:54:28 - pico-train - INFO - Step 37875 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:54:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1364
2025-08-30 03:54:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:54:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:54:40 - pico-train - INFO - Step 37900 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:54:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1419
2025-08-30 03:54:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:54:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:54:53 - pico-train - INFO - Step 37925 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:54:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0992
2025-08-30 03:54:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:54:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:55:05 - pico-train - INFO - Step 37950 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:55:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0952
2025-08-30 03:55:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:55:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:55:18 - pico-train - INFO - Step 37975 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:55:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0819
2025-08-30 03:55:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:55:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:55:30 - pico-train - INFO - Step 38000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 03:57:23 - pico-train - INFO - Step 38000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 03:57:23 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.202568645395263e+26
2025-08-30 03:57:25 - pico-train - INFO - Step 38000 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:57:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1806
2025-08-30 03:57:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:57:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:57:25 - pico-train - INFO - Step 38000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 03:57:40 - pico-train - INFO - Step 38025 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:57:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1459
2025-08-30 03:57:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:57:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:57:52 - pico-train - INFO - Step 38050 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:57:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1239
2025-08-30 03:57:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:57:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:58:05 - pico-train - INFO - Step 38075 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:58:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1724
2025-08-30 03:58:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:58:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:58:17 - pico-train - INFO - Step 38100 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:58:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0446
2025-08-30 03:58:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:58:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:58:30 - pico-train - INFO - Step 38125 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:58:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2129
2025-08-30 03:58:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:58:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:58:43 - pico-train - INFO - Step 38150 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:58:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1901
2025-08-30 03:58:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:58:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:58:56 - pico-train - INFO - Step 38175 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:58:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1603
2025-08-30 03:58:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:58:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:59:08 - pico-train - INFO - Step 38200 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:59:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1623
2025-08-30 03:59:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:59:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:59:21 - pico-train - INFO - Step 38225 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:59:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1621
2025-08-30 03:59:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:59:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:59:33 - pico-train - INFO - Step 38250 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:59:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1762
2025-08-30 03:59:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:59:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:59:46 - pico-train - INFO - Step 38275 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:59:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1080
2025-08-30 03:59:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:59:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 03:59:59 - pico-train - INFO - Step 38300 -- ๐Ÿ”„ Training Metrics
2025-08-30 03:59:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1721
2025-08-30 03:59:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 03:59:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:00:11 - pico-train - INFO - Step 38325 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:00:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1486
2025-08-30 04:00:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:00:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:00:24 - pico-train - INFO - Step 38350 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:00:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1230
2025-08-30 04:00:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:00:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:00:36 - pico-train - INFO - Step 38375 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:00:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1384
2025-08-30 04:00:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:00:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:00:49 - pico-train - INFO - Step 38400 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:00:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1810
2025-08-30 04:00:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:00:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:01:02 - pico-train - INFO - Step 38425 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:01:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1218
2025-08-30 04:01:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:01:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:01:14 - pico-train - INFO - Step 38450 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:01:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0706
2025-08-30 04:01:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:01:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:01:27 - pico-train - INFO - Step 38475 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:01:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1022
2025-08-30 04:01:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:01:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:01:39 - pico-train - INFO - Step 38500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 04:03:36 - pico-train - INFO - Step 38500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 04:03:36 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.267698412543812e+26
2025-08-30 04:03:37 - pico-train - INFO - Step 38500 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:03:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2061
2025-08-30 04:03:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:03:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:03:37 - pico-train - INFO - Step 38500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 04:03:52 - pico-train - INFO - Step 38525 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:03:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1100
2025-08-30 04:03:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:03:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:04:05 - pico-train - INFO - Step 38550 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:04:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0985
2025-08-30 04:04:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:04:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:04:17 - pico-train - INFO - Step 38575 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:04:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1607
2025-08-30 04:04:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:04:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:04:30 - pico-train - INFO - Step 38600 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:04:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1034
2025-08-30 04:04:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:04:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:04:43 - pico-train - INFO - Step 38625 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:04:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1405
2025-08-30 04:04:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:04:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:04:56 - pico-train - INFO - Step 38650 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:04:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1453
2025-08-30 04:04:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:04:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:05:08 - pico-train - INFO - Step 38675 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:05:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0796
2025-08-30 04:05:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:05:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:05:21 - pico-train - INFO - Step 38700 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:05:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0443
2025-08-30 04:05:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:05:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:05:34 - pico-train - INFO - Step 38725 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:05:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2451
2025-08-30 04:05:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:05:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:05:46 - pico-train - INFO - Step 38750 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:05:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0574
2025-08-30 04:05:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:05:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:05:59 - pico-train - INFO - Step 38775 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:05:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1047
2025-08-30 04:05:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:05:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:06:12 - pico-train - INFO - Step 38800 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:06:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1653
2025-08-30 04:06:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:06:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:06:24 - pico-train - INFO - Step 38825 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:06:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1954
2025-08-30 04:06:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:06:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:06:37 - pico-train - INFO - Step 38850 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:06:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1769
2025-08-30 04:06:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:06:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:06:49 - pico-train - INFO - Step 38875 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:06:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1837
2025-08-30 04:06:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:06:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:07:02 - pico-train - INFO - Step 38900 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:07:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1182
2025-08-30 04:07:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:07:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:07:15 - pico-train - INFO - Step 38925 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:07:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1174
2025-08-30 04:07:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:07:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:07:27 - pico-train - INFO - Step 38950 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:07:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2171
2025-08-30 04:07:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:07:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:07:42 - pico-train - INFO - Step 38975 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:07:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0979
2025-08-30 04:07:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:07:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:07:57 - pico-train - INFO - Step 39000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 04:10:27 - pico-train - INFO - Step 39000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 04:10:27 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.448910718445894e+26
2025-08-30 04:10:29 - pico-train - INFO - Step 39000 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:10:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1187
2025-08-30 04:10:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:10:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:10:29 - pico-train - INFO - Step 39000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 04:10:48 - pico-train - INFO - Step 39025 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:10:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0963
2025-08-30 04:10:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:10:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:11:04 - pico-train - INFO - Step 39050 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:11:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0935
2025-08-30 04:11:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:11:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:11:21 - pico-train - INFO - Step 39075 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:11:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0688
2025-08-30 04:11:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:11:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:11:37 - pico-train - INFO - Step 39100 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:11:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1405
2025-08-30 04:11:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:11:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:11:54 - pico-train - INFO - Step 39125 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:11:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1718
2025-08-30 04:11:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:11:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:12:10 - pico-train - INFO - Step 39150 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:12:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1167
2025-08-30 04:12:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:12:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:12:27 - pico-train - INFO - Step 39175 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:12:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1777
2025-08-30 04:12:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:12:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:12:43 - pico-train - INFO - Step 39200 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:12:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1680
2025-08-30 04:12:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:12:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:13:00 - pico-train - INFO - Step 39225 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:13:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1348
2025-08-30 04:13:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:13:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:13:16 - pico-train - INFO - Step 39250 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:13:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1949
2025-08-30 04:13:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:13:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:13:32 - pico-train - INFO - Step 39275 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:13:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1793
2025-08-30 04:13:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:13:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:13:49 - pico-train - INFO - Step 39300 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:13:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1331
2025-08-30 04:13:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:13:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:14:05 - pico-train - INFO - Step 39325 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:14:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0823
2025-08-30 04:14:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:14:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:14:22 - pico-train - INFO - Step 39350 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:14:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1036
2025-08-30 04:14:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:14:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:14:39 - pico-train - INFO - Step 39375 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:14:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0327
2025-08-30 04:14:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:14:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:14:55 - pico-train - INFO - Step 39400 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:14:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1494
2025-08-30 04:14:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:14:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:15:12 - pico-train - INFO - Step 39425 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:15:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1195
2025-08-30 04:15:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:15:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:15:25 - pico-train - INFO - Step 39450 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:15:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1699
2025-08-30 04:15:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:15:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:15:38 - pico-train - INFO - Step 39475 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:15:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1257
2025-08-30 04:15:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:15:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:15:50 - pico-train - INFO - Step 39500 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 04:17:42 - pico-train - INFO - Step 39500 -- ๐Ÿ“Š Evaluation Results
2025-08-30 04:17:42 - pico-train - INFO - โ””โ”€โ”€ paloma: 6.596819084881746e+26
2025-08-30 04:17:44 - pico-train - INFO - Step 39500 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:17:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1615
2025-08-30 04:17:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:17:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:17:44 - pico-train - INFO - Step 39500 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 04:18:00 - pico-train - INFO - Step 39525 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:18:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1811
2025-08-30 04:18:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:18:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:18:13 - pico-train - INFO - Step 39550 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:18:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0980
2025-08-30 04:18:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:18:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:18:25 - pico-train - INFO - Step 39575 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:18:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1116
2025-08-30 04:18:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:18:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:18:38 - pico-train - INFO - Step 39600 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:18:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1743
2025-08-30 04:18:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:18:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:18:51 - pico-train - INFO - Step 39625 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:18:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1422
2025-08-30 04:18:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:18:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:19:03 - pico-train - INFO - Step 39650 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:19:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1306
2025-08-30 04:19:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:19:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:19:16 - pico-train - INFO - Step 39675 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:19:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1426
2025-08-30 04:19:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:19:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:19:29 - pico-train - INFO - Step 39700 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:19:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0568
2025-08-30 04:19:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:19:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:19:41 - pico-train - INFO - Step 39725 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:19:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2368
2025-08-30 04:19:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:19:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:19:54 - pico-train - INFO - Step 39750 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:19:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0805
2025-08-30 04:19:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:19:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:20:06 - pico-train - INFO - Step 39775 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:20:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1451
2025-08-30 04:20:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:20:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:20:19 - pico-train - INFO - Step 39800 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:20:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0792
2025-08-30 04:20:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:20:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:20:31 - pico-train - INFO - Step 39825 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:20:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0382
2025-08-30 04:20:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:20:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:20:44 - pico-train - INFO - Step 39850 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:20:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1423
2025-08-30 04:20:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:20:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:20:57 - pico-train - INFO - Step 39875 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:20:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1012
2025-08-30 04:20:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:20:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:21:09 - pico-train - INFO - Step 39900 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:21:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1298
2025-08-30 04:21:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:21:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:21:22 - pico-train - INFO - Step 39925 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:21:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1046
2025-08-30 04:21:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:21:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:21:34 - pico-train - INFO - Step 39950 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:21:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1184
2025-08-30 04:21:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:21:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:21:47 - pico-train - INFO - Step 39975 -- ๐Ÿ”„ Training Metrics
2025-08-30 04:21:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1598
2025-08-30 04:21:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-30 04:21:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 04:21:59 - pico-train - INFO - Step 40000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 04:24:02 - pico-train - INFO - Step 40000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 04:24:02 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.314096757540847e+26
2025-08-30 04:24:03 - pico-train - INFO - ๐ŸŽ‰ Training complete! Final step: 40000