ThomasTheMaker's picture
Upload folder using huggingface_hub
697e0ac verified
2025-08-30 15:43:27 - pico-train - INFO - Step 62000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 15:43:27 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 15:43:28 - pico-train - INFO - ==================================================
2025-08-30 15:43:28 - pico-train - INFO - โœจ Training Configuration
2025-08-30 15:43:28 - pico-train - INFO - ==================================================
2025-08-30 15:43:28 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma10M-v1 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ save_every_n_steps: 2000 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ batch_size: 16 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ name: ThomasTheMaker/pretokenized-dolma-10M โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ log_every_n_steps: 100 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ max_steps: 100000 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 1 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ lr: 0.0002 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ lr_warmup_steps: 2000 โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ”‚ โ”‚
2025-08-30 15:43:28 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-30 15:43:28 - pico-train - INFO - ==================================================
2025-08-30 15:43:28 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-30 15:43:28 - pico-train - INFO - ==================================================
2025-08-30 15:43:28 - pico-train - INFO - Starting from step: 62000
2025-08-30 15:43:28 - pico-train - INFO - Model Setup:
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-30 15:43:28 - pico-train - INFO - Distributed Setup:
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA H100 80GB HBM3
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Available Memory: 85.03 GB
2025-08-30 15:43:28 - pico-train - INFO - Software Setup:
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Python Version: 3.12.3
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-71-generic
2025-08-30 15:43:28 - pico-train - INFO - Batch Size Configuration:
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Global Batch Size: 16
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 16
2025-08-30 15:43:28 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 1
2025-08-30 15:43:28 - pico-train - INFO - ==================================================
2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:43:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.5970
2025-08-30 15:43:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.55e-05
2025-08-30 15:43:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 15:44:25 - pico-train - INFO - Step 62100 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:44:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8133
2025-08-30 15:44:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.52e-05
2025-08-30 15:44:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:45:17 - pico-train - INFO - Step 62200 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:45:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8221
2025-08-30 15:45:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.49e-05
2025-08-30 15:45:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:46:09 - pico-train - INFO - Step 62300 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:46:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8068
2025-08-30 15:46:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.46e-05
2025-08-30 15:46:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:47:01 - pico-train - INFO - Step 62400 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:47:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7858
2025-08-30 15:47:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.43e-05
2025-08-30 15:47:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:47:53 - pico-train - INFO - Step 62500 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:47:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8460
2025-08-30 15:47:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.40e-05
2025-08-30 15:47:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:48:45 - pico-train - INFO - Step 62600 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:48:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8264
2025-08-30 15:48:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.37e-05
2025-08-30 15:48:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:49:37 - pico-train - INFO - Step 62700 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:49:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8266
2025-08-30 15:49:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.34e-05
2025-08-30 15:49:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:50:29 - pico-train - INFO - Step 62800 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:50:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8317
2025-08-30 15:50:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.31e-05
2025-08-30 15:50:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:51:20 - pico-train - INFO - Step 62900 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:51:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8337
2025-08-30 15:51:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.28e-05
2025-08-30 15:51:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:52:12 - pico-train - INFO - Step 63000 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:52:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8183
2025-08-30 15:52:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.25e-05
2025-08-30 15:52:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:53:04 - pico-train - INFO - Step 63100 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:53:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8177
2025-08-30 15:53:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.22e-05
2025-08-30 15:53:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:53:56 - pico-train - INFO - Step 63200 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:53:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8094
2025-08-30 15:53:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.19e-05
2025-08-30 15:53:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:54:48 - pico-train - INFO - Step 63300 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:54:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8294
2025-08-30 15:54:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.16e-05
2025-08-30 15:54:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:55:40 - pico-train - INFO - Step 63400 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:55:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8073
2025-08-30 15:55:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.13e-05
2025-08-30 15:55:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:56:32 - pico-train - INFO - Step 63500 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:56:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8364
2025-08-30 15:56:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.10e-05
2025-08-30 15:56:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:57:23 - pico-train - INFO - Step 63600 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:57:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8236
2025-08-30 15:57:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.07e-05
2025-08-30 15:57:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:58:15 - pico-train - INFO - Step 63700 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:58:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8114
2025-08-30 15:58:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.04e-05
2025-08-30 15:58:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:59:07 - pico-train - INFO - Step 63800 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:59:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8078
2025-08-30 15:59:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.01e-05
2025-08-30 15:59:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 15:59:59 - pico-train - INFO - Step 63900 -- ๐Ÿ”„ Training Metrics
2025-08-30 15:59:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8107
2025-08-30 15:59:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.98e-05
2025-08-30 15:59:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:00:50 - pico-train - INFO - Step 64000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 16:02:54 - pico-train - INFO - Step 64000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 16:02:54 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:02:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8145
2025-08-30 16:02:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.95e-05
2025-08-30 16:02:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 16:03:52 - pico-train - INFO - Step 64100 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:03:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8479
2025-08-30 16:03:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.92e-05
2025-08-30 16:03:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:04:44 - pico-train - INFO - Step 64200 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:04:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8139
2025-08-30 16:04:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.89e-05
2025-08-30 16:04:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:05:36 - pico-train - INFO - Step 64300 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:05:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7867
2025-08-30 16:05:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.86e-05
2025-08-30 16:05:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:06:28 - pico-train - INFO - Step 64400 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:06:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8168
2025-08-30 16:06:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.84e-05
2025-08-30 16:06:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:07:20 - pico-train - INFO - Step 64500 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:07:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8131
2025-08-30 16:07:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.81e-05
2025-08-30 16:07:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:08:12 - pico-train - INFO - Step 64600 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:08:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8285
2025-08-30 16:08:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.78e-05
2025-08-30 16:08:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:09:04 - pico-train - INFO - Step 64700 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:09:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8170
2025-08-30 16:09:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.75e-05
2025-08-30 16:09:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:09:56 - pico-train - INFO - Step 64800 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:09:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8317
2025-08-30 16:09:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.72e-05
2025-08-30 16:09:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:10:48 - pico-train - INFO - Step 64900 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:10:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8368
2025-08-30 16:10:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.69e-05
2025-08-30 16:10:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:11:40 - pico-train - INFO - Step 65000 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:11:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8129
2025-08-30 16:11:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.66e-05
2025-08-30 16:11:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:12:32 - pico-train - INFO - Step 65100 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:12:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8226
2025-08-30 16:12:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.63e-05
2025-08-30 16:12:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:13:24 - pico-train - INFO - Step 65200 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:13:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8321
2025-08-30 16:13:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.60e-05
2025-08-30 16:13:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:14:16 - pico-train - INFO - Step 65300 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:14:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8352
2025-08-30 16:14:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.57e-05
2025-08-30 16:14:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:15:08 - pico-train - INFO - Step 65400 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:15:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8119
2025-08-30 16:15:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.55e-05
2025-08-30 16:15:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:16:00 - pico-train - INFO - Step 65500 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:16:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7889
2025-08-30 16:16:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.52e-05
2025-08-30 16:16:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:16:52 - pico-train - INFO - Step 65600 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:16:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8119
2025-08-30 16:16:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.49e-05
2025-08-30 16:16:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:17:44 - pico-train - INFO - Step 65700 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:17:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8193
2025-08-30 16:17:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.46e-05
2025-08-30 16:17:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:18:35 - pico-train - INFO - Step 65800 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:18:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8121
2025-08-30 16:18:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.43e-05
2025-08-30 16:18:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:19:27 - pico-train - INFO - Step 65900 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:19:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8057
2025-08-30 16:19:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.40e-05
2025-08-30 16:19:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:20:19 - pico-train - INFO - Step 66000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 16:22:18 - pico-train - INFO - Step 66000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 16:22:18 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:22:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8260
2025-08-30 16:22:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.37e-05
2025-08-30 16:22:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 16:23:16 - pico-train - INFO - Step 66100 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:23:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8110
2025-08-30 16:23:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.35e-05
2025-08-30 16:23:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:24:09 - pico-train - INFO - Step 66200 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:24:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8156
2025-08-30 16:24:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.32e-05
2025-08-30 16:24:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:25:02 - pico-train - INFO - Step 66300 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:25:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7928
2025-08-30 16:25:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.29e-05
2025-08-30 16:25:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:25:55 - pico-train - INFO - Step 66400 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:25:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8202
2025-08-30 16:25:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.26e-05
2025-08-30 16:25:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:26:49 - pico-train - INFO - Step 66500 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:26:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8117
2025-08-30 16:26:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.23e-05
2025-08-30 16:26:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:27:42 - pico-train - INFO - Step 66600 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:27:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8047
2025-08-30 16:27:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.20e-05
2025-08-30 16:27:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:28:34 - pico-train - INFO - Step 66700 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:28:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7995
2025-08-30 16:28:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.18e-05
2025-08-30 16:28:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:29:28 - pico-train - INFO - Step 66800 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:29:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8074
2025-08-30 16:29:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.15e-05
2025-08-30 16:29:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:30:21 - pico-train - INFO - Step 66900 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:30:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7890
2025-08-30 16:30:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.12e-05
2025-08-30 16:30:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:31:14 - pico-train - INFO - Step 67000 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:31:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8216
2025-08-30 16:31:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.09e-05
2025-08-30 16:31:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:32:07 - pico-train - INFO - Step 67100 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:32:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8034
2025-08-30 16:32:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.06e-05
2025-08-30 16:32:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:32:59 - pico-train - INFO - Step 67200 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:32:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8062
2025-08-30 16:32:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.04e-05
2025-08-30 16:32:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:33:51 - pico-train - INFO - Step 67300 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:33:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8106
2025-08-30 16:33:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.01e-05
2025-08-30 16:33:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:34:43 - pico-train - INFO - Step 67400 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:34:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8168
2025-08-30 16:34:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.98e-05
2025-08-30 16:34:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:35:36 - pico-train - INFO - Step 67500 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:35:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7968
2025-08-30 16:35:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.95e-05
2025-08-30 16:35:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:36:27 - pico-train - INFO - Step 67600 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:36:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7905
2025-08-30 16:36:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.93e-05
2025-08-30 16:36:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:37:19 - pico-train - INFO - Step 67700 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:37:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8253
2025-08-30 16:37:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.90e-05
2025-08-30 16:37:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:38:11 - pico-train - INFO - Step 67800 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:38:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7848
2025-08-30 16:38:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.87e-05
2025-08-30 16:38:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:39:03 - pico-train - INFO - Step 67900 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:39:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8165
2025-08-30 16:39:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.84e-05
2025-08-30 16:39:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:39:55 - pico-train - INFO - Step 68000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 16:42:09 - pico-train - INFO - Step 68000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 16:42:09 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:42:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8264
2025-08-30 16:42:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.82e-05
2025-08-30 16:42:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 16:43:07 - pico-train - INFO - Step 68100 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:43:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8363
2025-08-30 16:43:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.79e-05
2025-08-30 16:43:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:43:59 - pico-train - INFO - Step 68200 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:43:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7964
2025-08-30 16:43:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.76e-05
2025-08-30 16:43:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:44:51 - pico-train - INFO - Step 68300 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:44:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7999
2025-08-30 16:44:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.73e-05
2025-08-30 16:44:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:45:43 - pico-train - INFO - Step 68400 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:45:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8119
2025-08-30 16:45:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.71e-05
2025-08-30 16:45:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:46:35 - pico-train - INFO - Step 68500 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:46:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7998
2025-08-30 16:46:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.68e-05
2025-08-30 16:46:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:47:27 - pico-train - INFO - Step 68600 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:47:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8010
2025-08-30 16:47:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.65e-05
2025-08-30 16:47:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:48:19 - pico-train - INFO - Step 68700 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:48:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7986
2025-08-30 16:48:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.63e-05
2025-08-30 16:48:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:49:12 - pico-train - INFO - Step 68800 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:49:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8133
2025-08-30 16:49:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.60e-05
2025-08-30 16:49:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:50:05 - pico-train - INFO - Step 68900 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:50:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7944
2025-08-30 16:50:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.57e-05
2025-08-30 16:50:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:50:58 - pico-train - INFO - Step 69000 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:50:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8021
2025-08-30 16:50:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.54e-05
2025-08-30 16:50:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:51:51 - pico-train - INFO - Step 69100 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:51:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7611
2025-08-30 16:51:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.52e-05
2025-08-30 16:51:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:52:44 - pico-train - INFO - Step 69200 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:52:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7981
2025-08-30 16:52:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.49e-05
2025-08-30 16:52:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:53:38 - pico-train - INFO - Step 69300 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:53:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8066
2025-08-30 16:53:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.46e-05
2025-08-30 16:53:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:54:31 - pico-train - INFO - Step 69400 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:54:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8053
2025-08-30 16:54:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.44e-05
2025-08-30 16:54:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:55:23 - pico-train - INFO - Step 69500 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:55:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7953
2025-08-30 16:55:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.41e-05
2025-08-30 16:55:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:56:16 - pico-train - INFO - Step 69600 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:56:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8087
2025-08-30 16:56:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.38e-05
2025-08-30 16:56:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:57:10 - pico-train - INFO - Step 69700 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:57:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7915
2025-08-30 16:57:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.36e-05
2025-08-30 16:57:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:58:03 - pico-train - INFO - Step 69800 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:58:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8145
2025-08-30 16:58:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.33e-05
2025-08-30 16:58:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:58:56 - pico-train - INFO - Step 69900 -- ๐Ÿ”„ Training Metrics
2025-08-30 16:58:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8056
2025-08-30 16:58:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.31e-05
2025-08-30 16:58:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 16:59:48 - pico-train - INFO - Step 70000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 17:01:50 - pico-train - INFO - Step 70000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 17:01:50 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:01:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7898
2025-08-30 17:01:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.28e-05
2025-08-30 17:01:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 17:02:48 - pico-train - INFO - Step 70100 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:02:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7929
2025-08-30 17:02:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.25e-05
2025-08-30 17:02:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:03:40 - pico-train - INFO - Step 70200 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:03:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8215
2025-08-30 17:03:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.23e-05
2025-08-30 17:03:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:04:32 - pico-train - INFO - Step 70300 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:04:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8139
2025-08-30 17:04:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.20e-05
2025-08-30 17:04:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:05:24 - pico-train - INFO - Step 70400 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:05:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7922
2025-08-30 17:05:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.17e-05
2025-08-30 17:05:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:06:16 - pico-train - INFO - Step 70500 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:06:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7923
2025-08-30 17:06:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.15e-05
2025-08-30 17:06:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:07:08 - pico-train - INFO - Step 70600 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:07:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8075
2025-08-30 17:07:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.12e-05
2025-08-30 17:07:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:08:00 - pico-train - INFO - Step 70700 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:08:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7833
2025-08-30 17:08:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.10e-05
2025-08-30 17:08:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:08:52 - pico-train - INFO - Step 70800 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:08:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8036
2025-08-30 17:08:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.07e-05
2025-08-30 17:08:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:09:44 - pico-train - INFO - Step 70900 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:09:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7910
2025-08-30 17:09:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.04e-05
2025-08-30 17:09:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:10:36 - pico-train - INFO - Step 71000 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:10:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7723
2025-08-30 17:10:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.02e-05
2025-08-30 17:10:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:11:28 - pico-train - INFO - Step 71100 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:11:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7768
2025-08-30 17:11:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.99e-05
2025-08-30 17:11:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:12:19 - pico-train - INFO - Step 71200 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:12:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7984
2025-08-30 17:12:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.97e-05
2025-08-30 17:12:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:13:11 - pico-train - INFO - Step 71300 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:13:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7825
2025-08-30 17:13:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.94e-05
2025-08-30 17:13:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:14:03 - pico-train - INFO - Step 71400 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:14:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8093
2025-08-30 17:14:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.92e-05
2025-08-30 17:14:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:14:55 - pico-train - INFO - Step 71500 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:14:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7903
2025-08-30 17:14:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.89e-05
2025-08-30 17:14:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:15:47 - pico-train - INFO - Step 71600 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:15:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8269
2025-08-30 17:15:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.87e-05
2025-08-30 17:15:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:16:39 - pico-train - INFO - Step 71700 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:16:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8135
2025-08-30 17:16:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.84e-05
2025-08-30 17:16:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:17:31 - pico-train - INFO - Step 71800 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:17:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7759
2025-08-30 17:17:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.82e-05
2025-08-30 17:17:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:18:22 - pico-train - INFO - Step 71900 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:18:22 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7837
2025-08-30 17:18:22 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.79e-05
2025-08-30 17:18:22 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:19:15 - pico-train - INFO - Step 72000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 17:21:27 - pico-train - INFO - Step 72000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 17:21:27 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:21:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8016
2025-08-30 17:21:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.77e-05
2025-08-30 17:21:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 17:22:25 - pico-train - INFO - Step 72100 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:22:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7643
2025-08-30 17:22:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.74e-05
2025-08-30 17:22:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:23:16 - pico-train - INFO - Step 72200 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:23:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7938
2025-08-30 17:23:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.72e-05
2025-08-30 17:23:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:24:08 - pico-train - INFO - Step 72300 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:24:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7962
2025-08-30 17:24:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.69e-05
2025-08-30 17:24:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:25:00 - pico-train - INFO - Step 72400 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:25:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8089
2025-08-30 17:25:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.67e-05
2025-08-30 17:25:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:25:52 - pico-train - INFO - Step 72500 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:25:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8081
2025-08-30 17:25:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.64e-05
2025-08-30 17:25:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:26:44 - pico-train - INFO - Step 72600 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:26:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8095
2025-08-30 17:26:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.62e-05
2025-08-30 17:26:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:27:36 - pico-train - INFO - Step 72700 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:27:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8020
2025-08-30 17:27:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.59e-05
2025-08-30 17:27:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:28:28 - pico-train - INFO - Step 72800 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:28:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7579
2025-08-30 17:28:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.57e-05
2025-08-30 17:28:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:29:20 - pico-train - INFO - Step 72900 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:29:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7869
2025-08-30 17:29:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.54e-05
2025-08-30 17:29:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:30:12 - pico-train - INFO - Step 73000 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:30:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7825
2025-08-30 17:30:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.52e-05
2025-08-30 17:30:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:31:03 - pico-train - INFO - Step 73100 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:31:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8111
2025-08-30 17:31:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.49e-05
2025-08-30 17:31:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:31:55 - pico-train - INFO - Step 73200 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:31:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8028
2025-08-30 17:31:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.47e-05
2025-08-30 17:31:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:32:47 - pico-train - INFO - Step 73300 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:32:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8025
2025-08-30 17:32:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.44e-05
2025-08-30 17:32:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:33:39 - pico-train - INFO - Step 73400 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:33:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7917
2025-08-30 17:33:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.42e-05
2025-08-30 17:33:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:34:31 - pico-train - INFO - Step 73500 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:34:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7851
2025-08-30 17:34:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.40e-05
2025-08-30 17:34:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:35:23 - pico-train - INFO - Step 73600 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:35:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7807
2025-08-30 17:35:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.37e-05
2025-08-30 17:35:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:36:15 - pico-train - INFO - Step 73700 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:36:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7741
2025-08-30 17:36:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.35e-05
2025-08-30 17:36:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:37:07 - pico-train - INFO - Step 73800 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:37:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8076
2025-08-30 17:37:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.32e-05
2025-08-30 17:37:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:37:59 - pico-train - INFO - Step 73900 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:37:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8119
2025-08-30 17:37:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.30e-05
2025-08-30 17:37:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:38:50 - pico-train - INFO - Step 74000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 17:40:51 - pico-train - INFO - Step 74000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 17:40:51 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:40:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7960
2025-08-30 17:40:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.28e-05
2025-08-30 17:40:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 17:41:49 - pico-train - INFO - Step 74100 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:41:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7909
2025-08-30 17:41:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.25e-05
2025-08-30 17:41:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:42:42 - pico-train - INFO - Step 74200 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:42:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7807
2025-08-30 17:42:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.23e-05
2025-08-30 17:42:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:43:36 - pico-train - INFO - Step 74300 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:43:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7711
2025-08-30 17:43:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.21e-05
2025-08-30 17:43:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:44:29 - pico-train - INFO - Step 74400 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:44:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7837
2025-08-30 17:44:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.18e-05
2025-08-30 17:44:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:45:21 - pico-train - INFO - Step 74500 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:45:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7668
2025-08-30 17:45:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.16e-05
2025-08-30 17:45:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:46:15 - pico-train - INFO - Step 74600 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:46:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7985
2025-08-30 17:46:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.14e-05
2025-08-30 17:46:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:47:08 - pico-train - INFO - Step 74700 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:47:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7702
2025-08-30 17:47:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.11e-05
2025-08-30 17:47:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:48:01 - pico-train - INFO - Step 74800 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:48:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8002
2025-08-30 17:48:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.09e-05
2025-08-30 17:48:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:48:54 - pico-train - INFO - Step 74900 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:48:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7955
2025-08-30 17:48:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.07e-05
2025-08-30 17:48:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:49:48 - pico-train - INFO - Step 75000 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:49:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8023
2025-08-30 17:49:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.04e-05
2025-08-30 17:49:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:50:41 - pico-train - INFO - Step 75100 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:50:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7842
2025-08-30 17:50:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.02e-05
2025-08-30 17:50:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:51:34 - pico-train - INFO - Step 75200 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:51:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7890
2025-08-30 17:51:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-05
2025-08-30 17:51:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:52:27 - pico-train - INFO - Step 75300 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:52:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8004
2025-08-30 17:52:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-05
2025-08-30 17:52:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:53:20 - pico-train - INFO - Step 75400 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:53:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7917
2025-08-30 17:53:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-05
2025-08-30 17:53:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:54:13 - pico-train - INFO - Step 75500 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:54:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7867
2025-08-30 17:54:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-05
2025-08-30 17:54:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:55:07 - pico-train - INFO - Step 75600 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:55:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7957
2025-08-30 17:55:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-05
2025-08-30 17:55:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:56:00 - pico-train - INFO - Step 75700 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:56:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7840
2025-08-30 17:56:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.88e-05
2025-08-30 17:56:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:56:56 - pico-train - INFO - Step 75800 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:56:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7990
2025-08-30 17:56:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.86e-05
2025-08-30 17:56:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:57:48 - pico-train - INFO - Step 75900 -- ๐Ÿ”„ Training Metrics
2025-08-30 17:57:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7904
2025-08-30 17:57:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.84e-05
2025-08-30 17:57:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 17:58:41 - pico-train - INFO - Step 76000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-30 18:01:59 - pico-train - INFO - Step 76000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 18:01:59 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:02:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7972
2025-08-30 18:02:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.82e-05
2025-08-30 18:02:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 18:03:04 - pico-train - INFO - Step 76100 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:03:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7730
2025-08-30 18:03:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.79e-05
2025-08-30 18:03:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:03:56 - pico-train - INFO - Step 76200 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:03:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7997
2025-08-30 18:03:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.77e-05
2025-08-30 18:03:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:04:48 - pico-train - INFO - Step 76300 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:04:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7843
2025-08-30 18:04:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.75e-05
2025-08-30 18:04:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:05:40 - pico-train - INFO - Step 76400 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:05:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7858
2025-08-30 18:05:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.73e-05
2025-08-30 18:05:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:06:32 - pico-train - INFO - Step 76500 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:06:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8110
2025-08-30 18:06:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.71e-05
2025-08-30 18:06:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:07:24 - pico-train - INFO - Step 76600 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:07:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7834
2025-08-30 18:07:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.68e-05
2025-08-30 18:07:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:08:16 - pico-train - INFO - Step 76700 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:08:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7936
2025-08-30 18:08:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.66e-05
2025-08-30 18:08:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:09:08 - pico-train - INFO - Step 76800 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:09:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7869
2025-08-30 18:09:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.64e-05
2025-08-30 18:09:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:10:00 - pico-train - INFO - Step 76900 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:10:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7979
2025-08-30 18:10:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.62e-05
2025-08-30 18:10:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:10:54 - pico-train - INFO - Step 77000 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:10:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7956
2025-08-30 18:10:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.60e-05
2025-08-30 18:10:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:11:46 - pico-train - INFO - Step 77100 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:11:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7974
2025-08-30 18:11:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.58e-05
2025-08-30 18:11:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:12:38 - pico-train - INFO - Step 77200 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:12:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8074
2025-08-30 18:12:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.55e-05
2025-08-30 18:12:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:13:30 - pico-train - INFO - Step 77300 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:13:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8276
2025-08-30 18:13:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.53e-05
2025-08-30 18:13:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:14:27 - pico-train - INFO - Step 77400 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:14:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7908
2025-08-30 18:14:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.51e-05
2025-08-30 18:14:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:15:20 - pico-train - INFO - Step 77500 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:15:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8142
2025-08-30 18:15:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.49e-05
2025-08-30 18:15:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:16:13 - pico-train - INFO - Step 77600 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:16:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8052
2025-08-30 18:16:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.47e-05
2025-08-30 18:16:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:17:06 - pico-train - INFO - Step 77700 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:17:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7876
2025-08-30 18:17:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.45e-05
2025-08-30 18:17:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:18:01 - pico-train - INFO - Step 77800 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:18:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8011
2025-08-30 18:18:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.43e-05
2025-08-30 18:18:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:18:54 - pico-train - INFO - Step 77900 -- ๐Ÿ”„ Training Metrics
2025-08-30 18:18:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7936
2025-08-30 18:18:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.41e-05
2025-08-30 18:18:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 18:19:47 - pico-train - INFO - Step 78000 -- ๐Ÿ’พ Saving Checkpoint