ThomasTheMaker's picture
Upload folder using huggingface_hub
3b448ed verified
2025-08-28 22:55:45 - pico-train - INFO - Step 1000 -- ๐Ÿ“Š Evaluation Results
2025-08-28 22:55:45 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.5468931158531133e+19
2025-08-28 22:55:47 - pico-train - INFO - ==================================================
2025-08-28 22:55:47 - pico-train - INFO - โœจ Training Configuration
2025-08-28 22:55:47 - pico-train - INFO - ==================================================
2025-08-28 22:55:47 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma29k โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ save_every_n_steps: 1000 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ batch_size: 4 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ name: pico-lm/pretokenized-dolma โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ log_every_n_steps: 100 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ max_steps: 200000 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 4 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ lr: 0.0003 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ lr_scheduler: linear_with_warmup โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ lr_warmup_steps: 2500 โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ”‚ โ”‚
2025-08-28 22:55:47 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-28 22:55:47 - pico-train - INFO - ==================================================
2025-08-28 22:55:47 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-28 22:55:47 - pico-train - INFO - ==================================================
2025-08-28 22:55:47 - pico-train - INFO - Starting from step: 1000
2025-08-28 22:55:47 - pico-train - INFO - Model Setup:
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-28 22:55:47 - pico-train - INFO - Distributed Setup:
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA GeForce RTX 5090
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Available Memory: 33.68 GB
2025-08-28 22:55:47 - pico-train - INFO - Software Setup:
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Python Version: 3.10.12
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-63-generic
2025-08-28 22:55:47 - pico-train - INFO - Batch Size Configuration:
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Global Batch Size: 4
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 1
2025-08-28 22:55:47 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 4
2025-08-28 22:55:47 - pico-train - INFO - ==================================================
2025-08-28 22:55:49 - pico-train - INFO - Step 1000 -- ๐Ÿ”„ Training Metrics
2025-08-28 22:55:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.7657
2025-08-28 22:55:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.20e-04
2025-08-28 22:55:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 22:55:49 - pico-train - INFO - Step 1000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-28 22:56:43 - pico-train - INFO - Step 1100 -- ๐Ÿ”„ Training Metrics
2025-08-28 22:56:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.6733
2025-08-28 22:56:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.32e-04
2025-08-28 22:56:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 22:57:34 - pico-train - INFO - Step 1200 -- ๐Ÿ”„ Training Metrics
2025-08-28 22:57:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.5969
2025-08-28 22:57:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.44e-04
2025-08-28 22:57:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 22:58:25 - pico-train - INFO - Step 1300 -- ๐Ÿ”„ Training Metrics
2025-08-28 22:58:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.4765
2025-08-28 22:58:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.56e-04
2025-08-28 22:58:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 22:59:16 - pico-train - INFO - Step 1400 -- ๐Ÿ”„ Training Metrics
2025-08-28 22:59:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3686
2025-08-28 22:59:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.68e-04
2025-08-28 22:59:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:00:07 - pico-train - INFO - Step 1500 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:00:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.3251
2025-08-28 23:00:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.80e-04
2025-08-28 23:00:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:00:58 - pico-train - INFO - Step 1600 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:00:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1840
2025-08-28 23:00:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.92e-04
2025-08-28 23:00:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:01:50 - pico-train - INFO - Step 1700 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:01:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.1116
2025-08-28 23:01:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.04e-04
2025-08-28 23:01:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:02:41 - pico-train - INFO - Step 1800 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:02:41 - pico-train - INFO - โ”œโ”€โ”€ Loss: 7.0565
2025-08-28 23:02:41 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.16e-04
2025-08-28 23:02:41 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:03:32 - pico-train - INFO - Step 1900 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:03:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9964
2025-08-28 23:03:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.28e-04
2025-08-28 23:03:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:04:23 - pico-train - INFO - Step 2000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-28 23:06:18 - pico-train - INFO - Step 2000 -- ๐Ÿ“Š Evaluation Results
2025-08-28 23:06:18 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.627192449295412e+21
2025-08-28 23:06:21 - pico-train - INFO - Step 2000 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:06:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.9690
2025-08-28 23:06:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.40e-04
2025-08-28 23:06:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:06:21 - pico-train - INFO - Step 2000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-28 23:07:15 - pico-train - INFO - Step 2100 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:07:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8840
2025-08-28 23:07:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.52e-04
2025-08-28 23:07:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:08:06 - pico-train - INFO - Step 2200 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:08:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8334
2025-08-28 23:08:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.64e-04
2025-08-28 23:08:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:08:57 - pico-train - INFO - Step 2300 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:08:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.8150
2025-08-28 23:08:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.76e-04
2025-08-28 23:08:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:09:48 - pico-train - INFO - Step 2400 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:09:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.7519
2025-08-28 23:09:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.88e-04
2025-08-28 23:09:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:10:39 - pico-train - INFO - Step 2500 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:10:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6908
2025-08-28 23:10:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-04
2025-08-28 23:10:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:11:30 - pico-train - INFO - Step 2600 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:11:30 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.6351
2025-08-28 23:11:30 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-04
2025-08-28 23:11:30 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:12:21 - pico-train - INFO - Step 2700 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:12:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5568
2025-08-28 23:12:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-04
2025-08-28 23:12:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:13:12 - pico-train - INFO - Step 2800 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:13:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5799
2025-08-28 23:13:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 3.00e-04
2025-08-28 23:13:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:14:03 - pico-train - INFO - Step 2900 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:14:03 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.5467
2025-08-28 23:14:03 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-04
2025-08-28 23:14:03 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:14:53 - pico-train - INFO - Step 3000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-28 23:16:58 - pico-train - INFO - Step 3000 -- ๐Ÿ“Š Evaluation Results
2025-08-28 23:16:58 - pico-train - INFO - โ””โ”€โ”€ paloma: 9.90975658825673e+22
2025-08-28 23:17:01 - pico-train - INFO - Step 3000 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:17:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4865
2025-08-28 23:17:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-04
2025-08-28 23:17:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:17:01 - pico-train - INFO - Step 3000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-28 23:17:55 - pico-train - INFO - Step 3100 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:17:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4604
2025-08-28 23:17:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-04
2025-08-28 23:17:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:18:46 - pico-train - INFO - Step 3200 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:18:46 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4205
2025-08-28 23:18:46 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-04
2025-08-28 23:18:46 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:19:36 - pico-train - INFO - Step 3300 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:19:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.4127
2025-08-28 23:19:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-04
2025-08-28 23:19:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:20:27 - pico-train - INFO - Step 3400 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:20:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3692
2025-08-28 23:20:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.99e-04
2025-08-28 23:20:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:21:18 - pico-train - INFO - Step 3500 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:21:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3761
2025-08-28 23:21:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-04
2025-08-28 23:21:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:22:09 - pico-train - INFO - Step 3600 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:22:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2796
2025-08-28 23:22:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-04
2025-08-28 23:22:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:23:00 - pico-train - INFO - Step 3700 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:23:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2988
2025-08-28 23:23:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-04
2025-08-28 23:23:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:23:51 - pico-train - INFO - Step 3800 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:23:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2673
2025-08-28 23:23:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-04
2025-08-28 23:23:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:24:42 - pico-train - INFO - Step 3900 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:24:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2715
2025-08-28 23:24:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-04
2025-08-28 23:24:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:25:32 - pico-train - INFO - Step 4000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-28 23:27:27 - pico-train - INFO - Step 4000 -- ๐Ÿ“Š Evaluation Results
2025-08-28 23:27:27 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.6252526658823776e+24
2025-08-28 23:27:29 - pico-train - INFO - Step 4000 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:27:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1890
2025-08-28 23:27:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-04
2025-08-28 23:27:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:27:29 - pico-train - INFO - Step 4000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-28 23:28:23 - pico-train - INFO - Step 4100 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:28:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1832
2025-08-28 23:28:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.98e-04
2025-08-28 23:28:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:29:13 - pico-train - INFO - Step 4200 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:29:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1553
2025-08-28 23:29:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-04
2025-08-28 23:29:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:30:04 - pico-train - INFO - Step 4300 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:30:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1629
2025-08-28 23:30:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-04
2025-08-28 23:30:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:30:56 - pico-train - INFO - Step 4400 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:30:56 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1061
2025-08-28 23:30:56 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-04
2025-08-28 23:30:56 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:31:47 - pico-train - INFO - Step 4500 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:31:47 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1601
2025-08-28 23:31:47 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-04
2025-08-28 23:31:47 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:32:38 - pico-train - INFO - Step 4600 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:32:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0963
2025-08-28 23:32:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-04
2025-08-28 23:32:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:33:29 - pico-train - INFO - Step 4700 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:33:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0780
2025-08-28 23:33:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-04
2025-08-28 23:33:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:34:20 - pico-train - INFO - Step 4800 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:34:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0835
2025-08-28 23:34:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.97e-04
2025-08-28 23:34:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:35:11 - pico-train - INFO - Step 4900 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:35:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0519
2025-08-28 23:35:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-04
2025-08-28 23:35:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:36:01 - pico-train - INFO - Step 5000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-28 23:38:14 - pico-train - INFO - Step 5000 -- ๐Ÿ“Š Evaluation Results
2025-08-28 23:38:14 - pico-train - INFO - โ””โ”€โ”€ paloma: 7.294956881845611e+25
2025-08-28 23:38:16 - pico-train - INFO - Step 5000 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:38:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0661
2025-08-28 23:38:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-04
2025-08-28 23:38:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:38:16 - pico-train - INFO - Step 5000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-28 23:39:10 - pico-train - INFO - Step 5100 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:39:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0121
2025-08-28 23:39:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-04
2025-08-28 23:39:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:40:02 - pico-train - INFO - Step 5200 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:40:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0544
2025-08-28 23:40:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-04
2025-08-28 23:40:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:40:53 - pico-train - INFO - Step 5300 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:40:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0224
2025-08-28 23:40:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-04
2025-08-28 23:40:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:41:44 - pico-train - INFO - Step 5400 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:41:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9831
2025-08-28 23:41:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.96e-04
2025-08-28 23:41:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:42:35 - pico-train - INFO - Step 5500 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:42:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9553
2025-08-28 23:42:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-04
2025-08-28 23:42:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:43:26 - pico-train - INFO - Step 5600 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:43:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9493
2025-08-28 23:43:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-04
2025-08-28 23:43:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:44:17 - pico-train - INFO - Step 5700 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:44:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9943
2025-08-28 23:44:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-04
2025-08-28 23:44:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:45:08 - pico-train - INFO - Step 5800 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:45:08 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9630
2025-08-28 23:45:08 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-04
2025-08-28 23:45:08 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:46:00 - pico-train - INFO - Step 5900 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:46:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9349
2025-08-28 23:46:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-04
2025-08-28 23:46:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:46:50 - pico-train - INFO - Step 6000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-28 23:48:48 - pico-train - INFO - Step 6000 -- ๐Ÿ“Š Evaluation Results
2025-08-28 23:48:48 - pico-train - INFO - โ””โ”€โ”€ paloma: 1.6856570425562805e+27
2025-08-28 23:48:50 - pico-train - INFO - Step 6000 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:48:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9087
2025-08-28 23:48:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-04
2025-08-28 23:48:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:48:50 - pico-train - INFO - Step 6000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-28 23:49:44 - pico-train - INFO - Step 6100 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:49:44 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8818
2025-08-28 23:49:44 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.95e-04
2025-08-28 23:49:44 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:50:35 - pico-train - INFO - Step 6200 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:50:35 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8535
2025-08-28 23:50:35 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-04
2025-08-28 23:50:35 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:51:26 - pico-train - INFO - Step 6300 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:51:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8896
2025-08-28 23:51:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-04
2025-08-28 23:51:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:52:18 - pico-train - INFO - Step 6400 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:52:18 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.9007
2025-08-28 23:52:18 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-04
2025-08-28 23:52:18 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:53:09 - pico-train - INFO - Step 6500 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:53:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8617
2025-08-28 23:53:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-04
2025-08-28 23:53:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:54:00 - pico-train - INFO - Step 6600 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:54:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8201
2025-08-28 23:54:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-04
2025-08-28 23:54:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:54:51 - pico-train - INFO - Step 6700 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:54:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8544
2025-08-28 23:54:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.94e-04
2025-08-28 23:54:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:55:42 - pico-train - INFO - Step 6800 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:55:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8532
2025-08-28 23:55:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-04
2025-08-28 23:55:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:56:33 - pico-train - INFO - Step 6900 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:56:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7950
2025-08-28 23:56:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-04
2025-08-28 23:56:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:57:24 - pico-train - INFO - Step 7000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-28 23:59:22 - pico-train - INFO - Step 7000 -- ๐Ÿ“Š Evaluation Results
2025-08-28 23:59:22 - pico-train - INFO - โ””โ”€โ”€ paloma: 9.22180682233585e+28
2025-08-28 23:59:23 - pico-train - INFO - Step 7000 -- ๐Ÿ”„ Training Metrics
2025-08-28 23:59:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.8146
2025-08-28 23:59:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-04
2025-08-28 23:59:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-28 23:59:23 - pico-train - INFO - Step 7000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 00:00:17 - pico-train - INFO - Step 7100 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:00:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7930
2025-08-29 00:00:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-04
2025-08-29 00:00:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:01:09 - pico-train - INFO - Step 7200 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:01:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7827
2025-08-29 00:01:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-04
2025-08-29 00:01:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:02:00 - pico-train - INFO - Step 7300 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:02:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7816
2025-08-29 00:02:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-04
2025-08-29 00:02:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:02:51 - pico-train - INFO - Step 7400 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:02:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7300
2025-08-29 00:02:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.93e-04
2025-08-29 00:02:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:03:42 - pico-train - INFO - Step 7500 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:03:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7670
2025-08-29 00:03:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-04
2025-08-29 00:03:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:04:33 - pico-train - INFO - Step 7600 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:04:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7450
2025-08-29 00:04:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-04
2025-08-29 00:04:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:05:25 - pico-train - INFO - Step 7700 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:05:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7499
2025-08-29 00:05:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-04
2025-08-29 00:05:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:06:16 - pico-train - INFO - Step 7800 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:06:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7233
2025-08-29 00:06:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-04
2025-08-29 00:06:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:07:07 - pico-train - INFO - Step 7900 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:07:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7219
2025-08-29 00:07:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-04
2025-08-29 00:07:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:07:57 - pico-train - INFO - Step 8000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 00:10:09 - pico-train - INFO - Step 8000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 00:10:09 - pico-train - INFO - โ””โ”€โ”€ paloma: 3.1300823362207656e+29
2025-08-29 00:10:11 - pico-train - INFO - Step 8000 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:10:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7523
2025-08-29 00:10:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.92e-04
2025-08-29 00:10:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:10:11 - pico-train - INFO - Step 8000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 00:11:05 - pico-train - INFO - Step 8100 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:11:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7145
2025-08-29 00:11:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-04
2025-08-29 00:11:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:11:57 - pico-train - INFO - Step 8200 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:11:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7469
2025-08-29 00:11:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-04
2025-08-29 00:11:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:12:48 - pico-train - INFO - Step 8300 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:12:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.7363
2025-08-29 00:12:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-04
2025-08-29 00:12:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:13:38 - pico-train - INFO - Step 8400 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:13:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6938
2025-08-29 00:13:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-04
2025-08-29 00:13:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:14:29 - pico-train - INFO - Step 8500 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:14:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6994
2025-08-29 00:14:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-04
2025-08-29 00:14:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:15:20 - pico-train - INFO - Step 8600 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:15:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6583
2025-08-29 00:15:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-04
2025-08-29 00:15:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:16:11 - pico-train - INFO - Step 8700 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:16:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6885
2025-08-29 00:16:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.91e-04
2025-08-29 00:16:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:17:02 - pico-train - INFO - Step 8800 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:17:02 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6313
2025-08-29 00:17:02 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-04
2025-08-29 00:17:02 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:17:53 - pico-train - INFO - Step 8900 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:17:53 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6314
2025-08-29 00:17:53 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-04
2025-08-29 00:17:53 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:18:44 - pico-train - INFO - Step 9000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-29 00:20:42 - pico-train - INFO - Step 9000 -- ๐Ÿ“Š Evaluation Results
2025-08-29 00:20:42 - pico-train - INFO - โ””โ”€โ”€ paloma: 4.983924509492406e+30
2025-08-29 00:20:43 - pico-train - INFO - Step 9000 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:20:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6501
2025-08-29 00:20:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-04
2025-08-29 00:20:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:20:43 - pico-train - INFO - Step 9000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-29 00:21:37 - pico-train - INFO - Step 9100 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:21:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6357
2025-08-29 00:21:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-04
2025-08-29 00:21:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:22:28 - pico-train - INFO - Step 9200 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:22:28 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6045
2025-08-29 00:22:28 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-04
2025-08-29 00:22:28 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:23:19 - pico-train - INFO - Step 9300 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:23:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6405
2025-08-29 00:23:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-04
2025-08-29 00:23:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:24:10 - pico-train - INFO - Step 9400 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:24:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6241
2025-08-29 00:24:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.90e-04
2025-08-29 00:24:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:25:00 - pico-train - INFO - Step 9500 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:25:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.6247
2025-08-29 00:25:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-04
2025-08-29 00:25:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:25:51 - pico-train - INFO - Step 9600 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:25:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.5983
2025-08-29 00:25:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-04
2025-08-29 00:25:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:26:43 - pico-train - INFO - Step 9700 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:26:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.5978
2025-08-29 00:26:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-04
2025-08-29 00:26:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-29 00:27:34 - pico-train - INFO - Step 9800 -- ๐Ÿ”„ Training Metrics
2025-08-29 00:27:34 - pico-train - INFO - โ”œโ”€โ”€ Loss: 5.5746
2025-08-29 00:27:34 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.89e-04
2025-08-29 00:27:34 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0