ThomasTheMaker's picture
Upload folder using huggingface_hub
f44ef3c verified
2025-08-29 00:40:55 - pico-train - INFO - Step 0 -- ๐Ÿ“Š Evaluation Results
2025-08-29 00:40:55 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-29 00:40:57 - pico-train - INFO - ==================================================
2025-08-29 00:40:57 - pico-train - INFO - โœจ Training Configuration
2025-08-29 00:40:57 - pico-train - INFO - ==================================================
2025-08-29 00:40:57 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma29k-v2 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ save_every_n_steps: 1000 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ batch_size: 8 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ name: pico-lm/pretokenized-dolma โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ log_every_n_steps: 50 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ max_steps: 200000 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 2 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ lr: 0.0001 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ lr_scheduler: linear_with_warmup โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ lr_warmup_steps: 5000 โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ”‚ โ”‚
2025-08-29 00:40:57 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-29 00:40:57 - pico-train - INFO - ==================================================
2025-08-29 00:40:57 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-29 00:40:57 - pico-train - INFO - ==================================================
2025-08-29 00:40:57 - pico-train - INFO - Starting from step: 0
2025-08-29 00:40:57 - pico-train - INFO - Model Setup:
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-29 00:40:57 - pico-train - INFO - Distributed Setup:
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA GeForce RTX 5090
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Available Memory: 33.68 GB
2025-08-29 00:40:57 - pico-train - INFO - Software Setup:
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Python Version: 3.10.12
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-63-generic
2025-08-29 00:40:57 - pico-train - INFO - Batch Size Configuration:
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Global Batch Size: 8
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 4
2025-08-29 00:40:57 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 2
2025-08-29 00:40:57 - pico-train - INFO - ==================================================
2025-08-29 00:40:58 - pico-train - INFO - Step 0 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:40:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9848
2025-08-29 00:40:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 0.00e+00
2025-08-29 00:40:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:40:58 - pico-train - INFO - Step 0 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 00:41:29 - pico-train - INFO - Step 50 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:41:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 11.0005
2025-08-29 00:41:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-06
2025-08-29 00:41:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:41:55 - pico-train - INFO - Step 100 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:41:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9918
2025-08-29 00:41:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-06
2025-08-29 00:41:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:42:21 - pico-train - INFO - Step 150 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:42:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9776
2025-08-29 00:42:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-06
2025-08-29 00:42:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:42:47 - pico-train - INFO - Step 200 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:42:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9569
2025-08-29 00:42:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.00e-06
2025-08-29 00:42:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:43:14 - pico-train - INFO - Step 250 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:43:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.9255
2025-08-29 00:43:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-06
2025-08-29 00:43:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:43:40 - pico-train - INFO - Step 300 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:43:40 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8883
2025-08-29 00:43:40 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.00e-06
2025-08-29 00:43:40 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:44:06 - pico-train - INFO - Step 350 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:44:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.8249
2025-08-29 00:44:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.00e-06
2025-08-29 00:44:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:44:32 - pico-train - INFO - Step 400 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:44:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.7344
2025-08-29 00:44:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.00e-06
2025-08-29 00:44:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:44:58 - pico-train - INFO - Step 450 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:44:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.6177
2025-08-29 00:44:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.00e-06
2025-08-29 00:44:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:45:24 - pico-train - INFO - Step 500 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:45:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.5025
2025-08-29 00:45:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-05
2025-08-29 00:45:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:45:50 - pico-train - INFO - Step 550 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:45:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.3986
2025-08-29 00:45:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.10e-05
2025-08-29 00:45:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:46:16 - pico-train - INFO - Step 600 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:46:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.3079
2025-08-29 00:46:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.20e-05
2025-08-29 00:46:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:46:42 - pico-train - INFO - Step 650 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:46:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.2142
2025-08-29 00:46:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.30e-05
2025-08-29 00:46:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:47:08 - pico-train - INFO - Step 700 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:47:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.1146
2025-08-29 00:47:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.40e-05
2025-08-29 00:47:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:47:34 - pico-train - INFO - Step 750 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:47:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 10.0398
2025-08-29 00:47:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.50e-05
2025-08-29 00:47:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:48:00 - pico-train - INFO - Step 800 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:48:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.9311
2025-08-29 00:48:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.60e-05
2025-08-29 00:48:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:48:26 - pico-train - INFO - Step 850 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:48:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.8431
2025-08-29 00:48:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.70e-05
2025-08-29 00:48:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:48:52 - pico-train - INFO - Step 900 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:48:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.7453
2025-08-29 00:48:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.80e-05
2025-08-29 00:48:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:49:18 - pico-train - INFO - Step 950 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:49:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.6527
2025-08-29 00:49:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.90e-05
2025-08-29 00:49:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:49:43 - pico-train - INFO - Step 1000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 00:52:44 - pico-train - INFO - Step 1000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 00:52:44 - pico-train - INFO - โ””โ”€โ”€ paloma: 5.073320568651489e+18
2025-08-29 00:52:45 - pico-train - INFO - Step 1000 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:52:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.5691
2025-08-29 00:52:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-29 00:52:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:52:45 - pico-train - INFO - Step 1000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 00:53:15 - pico-train - INFO - Step 1050 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:53:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.4600
2025-08-29 00:53:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.10e-05
2025-08-29 00:53:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:53:41 - pico-train - INFO - Step 1100 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:53:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.3525
2025-08-29 00:53:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.20e-05
2025-08-29 00:53:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:54:07 - pico-train - INFO - Step 1150 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:54:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.2715
2025-08-29 00:54:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.30e-05
2025-08-29 00:54:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:54:33 - pico-train - INFO - Step 1200 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:54:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.1618
2025-08-29 00:54:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.40e-05
2025-08-29 00:54:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:54:59 - pico-train - INFO - Step 1250 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:54:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 9.0547
2025-08-29 00:54:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.50e-05
2025-08-29 00:54:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:55:25 - pico-train - INFO - Step 1300 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:55:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.9550
2025-08-29 00:55:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.60e-05
2025-08-29 00:55:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:55:51 - pico-train - INFO - Step 1350 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:55:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.8251
2025-08-29 00:55:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.70e-05
2025-08-29 00:55:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:56:17 - pico-train - INFO - Step 1400 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:56:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.7711
2025-08-29 00:56:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.80e-05
2025-08-29 00:56:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:56:43 - pico-train - INFO - Step 1450 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:56:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.6834
2025-08-29 00:56:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-05
2025-08-29 00:56:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:57:09 - pico-train - INFO - Step 1500 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:57:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.5638
2025-08-29 00:57:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-05
2025-08-29 00:57:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:57:35 - pico-train - INFO - Step 1550 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:57:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.4572
2025-08-29 00:57:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.10e-05
2025-08-29 00:57:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:58:01 - pico-train - INFO - Step 1600 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:58:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.3940
2025-08-29 00:58:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.20e-05
2025-08-29 00:58:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:58:27 - pico-train - INFO - Step 1650 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:58:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.2973
2025-08-29 00:58:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.30e-05
2025-08-29 00:58:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:58:53 - pico-train - INFO - Step 1700 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:58:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.2264
2025-08-29 00:58:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.40e-05
2025-08-29 00:58:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:59:19 - pico-train - INFO - Step 1750 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:59:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.1672
2025-08-29 00:59:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.50e-05
2025-08-29 00:59:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:59:45 - pico-train - INFO - Step 1800 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:59:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0695
2025-08-29 00:59:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.60e-05
2025-08-29 00:59:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:00:11 - pico-train - INFO - Step 1850 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:00:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 8.0299
2025-08-29 01:00:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.70e-05
2025-08-29 01:00:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:00:37 - pico-train - INFO - Step 1900 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:00:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9883
2025-08-29 01:00:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.80e-05
2025-08-29 01:00:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:01:03 - pico-train - INFO - Step 1950 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:01:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.9429
2025-08-29 01:01:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.90e-05
2025-08-29 01:01:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:01:28 - pico-train - INFO - Step 2000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 01:03:57 - pico-train - INFO - Step 2000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 01:03:57 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.8978577072995303e+19
2025-08-29 01:04:01 - pico-train - INFO - Step 2000 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:04:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8447
2025-08-29 01:04:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.00e-05
2025-08-29 01:04:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:04:01 - pico-train - INFO - Step 2000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 01:04:31 - pico-train - INFO - Step 2050 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:04:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.8380
2025-08-29 01:04:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.10e-05
2025-08-29 01:04:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:04:57 - pico-train - INFO - Step 2100 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:04:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7671
2025-08-29 01:04:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.20e-05
2025-08-29 01:04:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:05:23 - pico-train - INFO - Step 2150 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:05:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7637
2025-08-29 01:05:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.30e-05
2025-08-29 01:05:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:05:49 - pico-train - INFO - Step 2200 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:05:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7060
2025-08-29 01:05:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.40e-05
2025-08-29 01:05:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:06:15 - pico-train - INFO - Step 2250 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:06:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7607
2025-08-29 01:06:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.50e-05
2025-08-29 01:06:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:06:41 - pico-train - INFO - Step 2300 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:06:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7076
2025-08-29 01:06:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.60e-05
2025-08-29 01:06:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:07:07 - pico-train - INFO - Step 2350 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:07:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6787
2025-08-29 01:07:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.70e-05
2025-08-29 01:07:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:07:33 - pico-train - INFO - Step 2400 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:07:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6446
2025-08-29 01:07:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.80e-05
2025-08-29 01:07:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:07:59 - pico-train - INFO - Step 2450 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:07:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5999
2025-08-29 01:07:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 4.90e-05
2025-08-29 01:07:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:08:25 - pico-train - INFO - Step 2500 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:08:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6154
2025-08-29 01:08:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.00e-05
2025-08-29 01:08:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:08:50 - pico-train - INFO - Step 2550 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:08:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5627
2025-08-29 01:08:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.10e-05
2025-08-29 01:08:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:09:17 - pico-train - INFO - Step 2600 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:09:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5747
2025-08-29 01:09:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.20e-05
2025-08-29 01:09:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:09:43 - pico-train - INFO - Step 2650 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:09:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5358
2025-08-29 01:09:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.30e-05
2025-08-29 01:09:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:10:09 - pico-train - INFO - Step 2700 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:10:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5148
2025-08-29 01:10:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.40e-05
2025-08-29 01:10:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:10:35 - pico-train - INFO - Step 2750 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:10:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4874
2025-08-29 01:10:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.50e-05
2025-08-29 01:10:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:11:01 - pico-train - INFO - Step 2800 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:11:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4438
2025-08-29 01:11:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.60e-05
2025-08-29 01:11:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:11:27 - pico-train - INFO - Step 2850 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:11:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4772
2025-08-29 01:11:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.70e-05
2025-08-29 01:11:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:11:53 - pico-train - INFO - Step 2900 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:11:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4135
2025-08-29 01:11:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.80e-05
2025-08-29 01:11:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:12:19 - pico-train - INFO - Step 2950 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:12:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3929
2025-08-29 01:12:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 5.90e-05
2025-08-29 01:12:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:12:44 - pico-train - INFO - Step 3000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 01:14:43 - pico-train - INFO - Step 3000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 01:14:43 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.1701596694317715e+19
2025-08-29 01:14:46 - pico-train - INFO - Step 3000 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:14:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3566
2025-08-29 01:14:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.00e-05
2025-08-29 01:14:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:14:46 - pico-train - INFO - Step 3000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 01:15:16 - pico-train - INFO - Step 3050 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:15:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3318
2025-08-29 01:15:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.10e-05
2025-08-29 01:15:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:15:42 - pico-train - INFO - Step 3100 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:15:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3114
2025-08-29 01:15:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.20e-05
2025-08-29 01:15:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:16:08 - pico-train - INFO - Step 3150 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:16:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2734
2025-08-29 01:16:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.30e-05
2025-08-29 01:16:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:16:34 - pico-train - INFO - Step 3200 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:16:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3220
2025-08-29 01:16:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.40e-05
2025-08-29 01:16:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:16:59 - pico-train - INFO - Step 3250 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:16:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2621
2025-08-29 01:16:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.50e-05
2025-08-29 01:16:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:17:25 - pico-train - INFO - Step 3300 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:17:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2257
2025-08-29 01:17:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.60e-05
2025-08-29 01:17:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:17:52 - pico-train - INFO - Step 3350 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:17:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2447
2025-08-29 01:17:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.70e-05
2025-08-29 01:17:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:18:18 - pico-train - INFO - Step 3400 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:18:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.2344
2025-08-29 01:18:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.80e-05
2025-08-29 01:18:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:18:43 - pico-train - INFO - Step 3450 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:18:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1488
2025-08-29 01:18:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.90e-05
2025-08-29 01:18:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:19:09 - pico-train - INFO - Step 3500 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:19:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1797
2025-08-29 01:19:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.00e-05
2025-08-29 01:19:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:19:35 - pico-train - INFO - Step 3550 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:19:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1737
2025-08-29 01:19:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.10e-05
2025-08-29 01:19:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:20:01 - pico-train - INFO - Step 3600 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:20:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1204
2025-08-29 01:20:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.20e-05
2025-08-29 01:20:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:20:27 - pico-train - INFO - Step 3650 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:20:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1102
2025-08-29 01:20:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.30e-05
2025-08-29 01:20:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:20:53 - pico-train - INFO - Step 3700 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:20:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0845
2025-08-29 01:20:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.40e-05
2025-08-29 01:20:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:21:19 - pico-train - INFO - Step 3750 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:21:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0858
2025-08-29 01:21:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.50e-05
2025-08-29 01:21:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:21:45 - pico-train - INFO - Step 3800 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:21:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0362
2025-08-29 01:21:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.60e-05
2025-08-29 01:21:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:22:11 - pico-train - INFO - Step 3850 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:22:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0603
2025-08-29 01:22:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.70e-05
2025-08-29 01:22:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:22:37 - pico-train - INFO - Step 3900 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:22:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0172
2025-08-29 01:22:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.80e-05
2025-08-29 01:22:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:23:03 - pico-train - INFO - Step 3950 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:23:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9948
2025-08-29 01:23:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.90e-05
2025-08-29 01:23:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:23:29 - pico-train - INFO - Step 4000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 01:25:52 - pico-train - INFO - Step 4000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 01:25:52 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.5015965971757485e+20
2025-08-29 01:25:54 - pico-train - INFO - Step 4000 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:25:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9909
2025-08-29 01:25:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.00e-05
2025-08-29 01:25:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:25:54 - pico-train - INFO - Step 4000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 01:26:24 - pico-train - INFO - Step 4050 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:26:24 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9477
2025-08-29 01:26:24 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.10e-05
2025-08-29 01:26:24 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:26:51 - pico-train - INFO - Step 4100 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:26:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9651
2025-08-29 01:26:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.20e-05
2025-08-29 01:26:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:27:17 - pico-train - INFO - Step 4150 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:27:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9149
2025-08-29 01:27:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.30e-05
2025-08-29 01:27:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:27:43 - pico-train - INFO - Step 4200 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:27:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8930
2025-08-29 01:27:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.40e-05
2025-08-29 01:27:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:28:08 - pico-train - INFO - Step 4250 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:28:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9227
2025-08-29 01:28:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.50e-05
2025-08-29 01:28:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:28:34 - pico-train - INFO - Step 4300 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:28:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8790
2025-08-29 01:28:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.60e-05
2025-08-29 01:28:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:29:01 - pico-train - INFO - Step 4350 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:29:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8649
2025-08-29 01:29:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.70e-05
2025-08-29 01:29:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:29:26 - pico-train - INFO - Step 4400 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:29:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8305
2025-08-29 01:29:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.80e-05
2025-08-29 01:29:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:29:52 - pico-train - INFO - Step 4450 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:29:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8085
2025-08-29 01:29:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 8.90e-05
2025-08-29 01:29:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:30:18 - pico-train - INFO - Step 4500 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:30:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8315
2025-08-29 01:30:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.00e-05
2025-08-29 01:30:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:30:44 - pico-train - INFO - Step 4550 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:30:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7885
2025-08-29 01:30:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.10e-05
2025-08-29 01:30:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:31:11 - pico-train - INFO - Step 4600 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:31:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7805
2025-08-29 01:31:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.20e-05
2025-08-29 01:31:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:31:36 - pico-train - INFO - Step 4650 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:31:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7737
2025-08-29 01:31:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.30e-05
2025-08-29 01:31:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:32:02 - pico-train - INFO - Step 4700 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:32:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7649
2025-08-29 01:32:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.40e-05
2025-08-29 01:32:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:32:28 - pico-train - INFO - Step 4750 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:32:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7562
2025-08-29 01:32:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.50e-05
2025-08-29 01:32:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:32:54 - pico-train - INFO - Step 4800 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:32:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7347
2025-08-29 01:32:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.60e-05
2025-08-29 01:32:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:33:20 - pico-train - INFO - Step 4850 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:33:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7161
2025-08-29 01:33:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.70e-05
2025-08-29 01:33:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:33:46 - pico-train - INFO - Step 4900 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:33:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6889
2025-08-29 01:33:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.80e-05
2025-08-29 01:33:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:34:12 - pico-train - INFO - Step 4950 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:34:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7299
2025-08-29 01:34:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.90e-05
2025-08-29 01:34:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:34:37 - pico-train - INFO - Step 5000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 01:36:35 - pico-train - INFO - Step 5000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 01:36:35 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.38712860824014e+21
2025-08-29 01:36:37 - pico-train - INFO - Step 5000 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:36:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6605
2025-08-29 01:36:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-29 01:36:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:36:37 - pico-train - INFO - Step 5000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 01:37:06 - pico-train - INFO - Step 5050 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:37:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6552
2025-08-29 01:37:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-29 01:37:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:37:33 - pico-train - INFO - Step 5100 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:37:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7038
2025-08-29 01:37:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.99e-05
2025-08-29 01:37:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:37:59 - pico-train - INFO - Step 5150 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:37:59 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6452
2025-08-29 01:37:59 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.99e-05
2025-08-29 01:37:59 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:38:25 - pico-train - INFO - Step 5200 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:38:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6522
2025-08-29 01:38:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.99e-05
2025-08-29 01:38:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:38:51 - pico-train - INFO - Step 5250 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:38:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6270
2025-08-29 01:38:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.99e-05
2025-08-29 01:38:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:39:17 - pico-train - INFO - Step 5300 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:39:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5733
2025-08-29 01:39:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.98e-05
2025-08-29 01:39:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:39:43 - pico-train - INFO - Step 5350 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:39:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5833
2025-08-29 01:39:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.98e-05
2025-08-29 01:39:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:40:09 - pico-train - INFO - Step 5400 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:40:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5854
2025-08-29 01:40:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.98e-05
2025-08-29 01:40:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:40:35 - pico-train - INFO - Step 5450 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:40:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6012
2025-08-29 01:40:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.98e-05
2025-08-29 01:40:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 01:41:01 - pico-train - INFO - Step 5500 -- ๐Ÿ”„ Training Metrics
2025-08-29 01:41:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5786
2025-08-29 01:41:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.97e-05
2025-08-29 01:41:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0