|
2025-08-29 00:40:55 - pico-train - INFO - Step 0 -- ๐ Evaluation Results |
|
2025-08-29 00:40:55 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-29 00:40:57 - pico-train - INFO - ================================================== |
|
2025-08-29 00:40:57 - pico-train - INFO - โจ Training Configuration |
|
2025-08-29 00:40:57 - pico-train - INFO - ================================================== |
|
2025-08-29 00:40:57 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ checkpointing: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ checkpoints_dir: checkpoints โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ evaluation: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ eval_results_dir: eval_results โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ hf_checkpoint: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ collection_slug: null โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ learning_dynamics: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ eval_data: null โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ layer_suffixes: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ - attention.v_proj โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ - attention.o_proj โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ - swiglu.w_2 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ sequence_idx: -1 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ logs_dir: logs โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma29k-v2 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ runs_dir: runs โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ save_every_n_steps: 1000 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ save_to_hf: true โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ training: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ auto_resume: true โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ data: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ dataloader: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ batch_size: 8 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ dataset: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ name: pico-lm/pretokenized-dolma โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ tokenizer: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ evaluation: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ metrics: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ - paloma โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ paloma: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ dataset_split: val โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ max_length: 2048 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ model: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ activation_hidden_dim: 384 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ attention_n_heads: 12 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ attention_n_kv_heads: 4 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ batch_size: 1024 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ d_model: 96 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ max_seq_len: 2048 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ model_type: pico_decoder โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ n_layers: 12 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ norm_eps: 1.0e-06 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ position_emb_theta: 10000.0 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ monitoring: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ logging: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ log_every_n_steps: 50 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ log_level: INFO โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ save_to_wandb: false โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ wandb: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ entity: boymyc โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ project: pico-decoder-tiny โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ training: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ fabric: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ accelerator: cuda โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ num_devices: 1 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ num_nodes: 1 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ precision: bf16-mixed โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ max_steps: 200000 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ optimization: โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ gradient_accumulation_steps: 2 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ lr: 0.0001 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ lr_scheduler: linear_with_warmup โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ lr_warmup_steps: 5000 โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ optimizer: adamw โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โ โ |
|
2025-08-29 00:40:57 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ |
|
2025-08-29 00:40:57 - pico-train - INFO - ================================================== |
|
2025-08-29 00:40:57 - pico-train - INFO - โญ Runtime Summary: |
|
2025-08-29 00:40:57 - pico-train - INFO - ================================================== |
|
2025-08-29 00:40:57 - pico-train - INFO - Starting from step: 0 |
|
2025-08-29 00:40:57 - pico-train - INFO - Model Setup: |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Total Parameters: 11,282,784 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 |
|
2025-08-29 00:40:57 - pico-train - INFO - Distributed Setup: |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Number of Devices: 1 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Available Memory: 33.68 GB |
|
2025-08-29 00:40:57 - pico-train - INFO - Software Setup: |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Python Version: 3.10.12 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ CUDA Version: 12.8 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic |
|
2025-08-29 00:40:57 - pico-train - INFO - Batch Size Configuration: |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Global Batch Size: 8 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Per Device Batch Size: 4 |
|
2025-08-29 00:40:57 - pico-train - INFO - โโ Gradient Accumulation Steps: 2 |
|
2025-08-29 00:40:57 - pico-train - INFO - ================================================== |
|
2025-08-29 00:40:58 - pico-train - INFO - Step 0 -- ๐ Training Metrics |
|
2025-08-29 00:40:58 - pico-train - INFO - โโโ Loss: 10.9848 |
|
2025-08-29 00:40:58 - pico-train - INFO - โโโ Learning Rate: 0.00e+00 |
|
2025-08-29 00:40:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:40:58 - pico-train - INFO - Step 0 -- ๐ Saving Learning Dynamics |
|
2025-08-29 00:41:29 - pico-train - INFO - Step 50 -- ๐ Training Metrics |
|
2025-08-29 00:41:29 - pico-train - INFO - โโโ Loss: 11.0005 |
|
2025-08-29 00:41:29 - pico-train - INFO - โโโ Learning Rate: 1.00e-06 |
|
2025-08-29 00:41:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:41:55 - pico-train - INFO - Step 100 -- ๐ Training Metrics |
|
2025-08-29 00:41:55 - pico-train - INFO - โโโ Loss: 10.9918 |
|
2025-08-29 00:41:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-06 |
|
2025-08-29 00:41:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:42:21 - pico-train - INFO - Step 150 -- ๐ Training Metrics |
|
2025-08-29 00:42:21 - pico-train - INFO - โโโ Loss: 10.9776 |
|
2025-08-29 00:42:21 - pico-train - INFO - โโโ Learning Rate: 3.00e-06 |
|
2025-08-29 00:42:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:42:47 - pico-train - INFO - Step 200 -- ๐ Training Metrics |
|
2025-08-29 00:42:47 - pico-train - INFO - โโโ Loss: 10.9569 |
|
2025-08-29 00:42:47 - pico-train - INFO - โโโ Learning Rate: 4.00e-06 |
|
2025-08-29 00:42:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:43:14 - pico-train - INFO - Step 250 -- ๐ Training Metrics |
|
2025-08-29 00:43:14 - pico-train - INFO - โโโ Loss: 10.9255 |
|
2025-08-29 00:43:14 - pico-train - INFO - โโโ Learning Rate: 5.00e-06 |
|
2025-08-29 00:43:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:43:40 - pico-train - INFO - Step 300 -- ๐ Training Metrics |
|
2025-08-29 00:43:40 - pico-train - INFO - โโโ Loss: 10.8883 |
|
2025-08-29 00:43:40 - pico-train - INFO - โโโ Learning Rate: 6.00e-06 |
|
2025-08-29 00:43:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:44:06 - pico-train - INFO - Step 350 -- ๐ Training Metrics |
|
2025-08-29 00:44:06 - pico-train - INFO - โโโ Loss: 10.8249 |
|
2025-08-29 00:44:06 - pico-train - INFO - โโโ Learning Rate: 7.00e-06 |
|
2025-08-29 00:44:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:44:32 - pico-train - INFO - Step 400 -- ๐ Training Metrics |
|
2025-08-29 00:44:32 - pico-train - INFO - โโโ Loss: 10.7344 |
|
2025-08-29 00:44:32 - pico-train - INFO - โโโ Learning Rate: 8.00e-06 |
|
2025-08-29 00:44:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:44:58 - pico-train - INFO - Step 450 -- ๐ Training Metrics |
|
2025-08-29 00:44:58 - pico-train - INFO - โโโ Loss: 10.6177 |
|
2025-08-29 00:44:58 - pico-train - INFO - โโโ Learning Rate: 9.00e-06 |
|
2025-08-29 00:44:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:45:24 - pico-train - INFO - Step 500 -- ๐ Training Metrics |
|
2025-08-29 00:45:24 - pico-train - INFO - โโโ Loss: 10.5025 |
|
2025-08-29 00:45:24 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 |
|
2025-08-29 00:45:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:45:50 - pico-train - INFO - Step 550 -- ๐ Training Metrics |
|
2025-08-29 00:45:50 - pico-train - INFO - โโโ Loss: 10.3986 |
|
2025-08-29 00:45:50 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 |
|
2025-08-29 00:45:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:46:16 - pico-train - INFO - Step 600 -- ๐ Training Metrics |
|
2025-08-29 00:46:16 - pico-train - INFO - โโโ Loss: 10.3079 |
|
2025-08-29 00:46:16 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 |
|
2025-08-29 00:46:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:46:42 - pico-train - INFO - Step 650 -- ๐ Training Metrics |
|
2025-08-29 00:46:42 - pico-train - INFO - โโโ Loss: 10.2142 |
|
2025-08-29 00:46:42 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 |
|
2025-08-29 00:46:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:47:08 - pico-train - INFO - Step 700 -- ๐ Training Metrics |
|
2025-08-29 00:47:08 - pico-train - INFO - โโโ Loss: 10.1146 |
|
2025-08-29 00:47:08 - pico-train - INFO - โโโ Learning Rate: 1.40e-05 |
|
2025-08-29 00:47:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:47:34 - pico-train - INFO - Step 750 -- ๐ Training Metrics |
|
2025-08-29 00:47:34 - pico-train - INFO - โโโ Loss: 10.0398 |
|
2025-08-29 00:47:34 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 |
|
2025-08-29 00:47:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:48:00 - pico-train - INFO - Step 800 -- ๐ Training Metrics |
|
2025-08-29 00:48:00 - pico-train - INFO - โโโ Loss: 9.9311 |
|
2025-08-29 00:48:00 - pico-train - INFO - โโโ Learning Rate: 1.60e-05 |
|
2025-08-29 00:48:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:48:26 - pico-train - INFO - Step 850 -- ๐ Training Metrics |
|
2025-08-29 00:48:26 - pico-train - INFO - โโโ Loss: 9.8431 |
|
2025-08-29 00:48:26 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 |
|
2025-08-29 00:48:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:48:52 - pico-train - INFO - Step 900 -- ๐ Training Metrics |
|
2025-08-29 00:48:52 - pico-train - INFO - โโโ Loss: 9.7453 |
|
2025-08-29 00:48:52 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 |
|
2025-08-29 00:48:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:49:18 - pico-train - INFO - Step 950 -- ๐ Training Metrics |
|
2025-08-29 00:49:18 - pico-train - INFO - โโโ Loss: 9.6527 |
|
2025-08-29 00:49:18 - pico-train - INFO - โโโ Learning Rate: 1.90e-05 |
|
2025-08-29 00:49:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:49:43 - pico-train - INFO - Step 1000 -- ๐พ Saving Checkpoint |
|
2025-08-29 00:52:44 - pico-train - INFO - Step 1000 -- ๐ Evaluation Results |
|
2025-08-29 00:52:44 - pico-train - INFO - โโโ paloma: 5.073320568651489e+18 |
|
2025-08-29 00:52:45 - pico-train - INFO - Step 1000 -- ๐ Training Metrics |
|
2025-08-29 00:52:45 - pico-train - INFO - โโโ Loss: 9.5691 |
|
2025-08-29 00:52:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-29 00:52:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:52:45 - pico-train - INFO - Step 1000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 00:53:15 - pico-train - INFO - Step 1050 -- ๐ Training Metrics |
|
2025-08-29 00:53:15 - pico-train - INFO - โโโ Loss: 9.4600 |
|
2025-08-29 00:53:15 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
|
2025-08-29 00:53:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:53:41 - pico-train - INFO - Step 1100 -- ๐ Training Metrics |
|
2025-08-29 00:53:41 - pico-train - INFO - โโโ Loss: 9.3525 |
|
2025-08-29 00:53:41 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
|
2025-08-29 00:53:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:54:07 - pico-train - INFO - Step 1150 -- ๐ Training Metrics |
|
2025-08-29 00:54:07 - pico-train - INFO - โโโ Loss: 9.2715 |
|
2025-08-29 00:54:07 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
|
2025-08-29 00:54:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:54:33 - pico-train - INFO - Step 1200 -- ๐ Training Metrics |
|
2025-08-29 00:54:33 - pico-train - INFO - โโโ Loss: 9.1618 |
|
2025-08-29 00:54:33 - pico-train - INFO - โโโ Learning Rate: 2.40e-05 |
|
2025-08-29 00:54:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:54:59 - pico-train - INFO - Step 1250 -- ๐ Training Metrics |
|
2025-08-29 00:54:59 - pico-train - INFO - โโโ Loss: 9.0547 |
|
2025-08-29 00:54:59 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 |
|
2025-08-29 00:54:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:55:25 - pico-train - INFO - Step 1300 -- ๐ Training Metrics |
|
2025-08-29 00:55:25 - pico-train - INFO - โโโ Loss: 8.9550 |
|
2025-08-29 00:55:25 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
|
2025-08-29 00:55:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:55:51 - pico-train - INFO - Step 1350 -- ๐ Training Metrics |
|
2025-08-29 00:55:51 - pico-train - INFO - โโโ Loss: 8.8251 |
|
2025-08-29 00:55:51 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 |
|
2025-08-29 00:55:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:56:17 - pico-train - INFO - Step 1400 -- ๐ Training Metrics |
|
2025-08-29 00:56:17 - pico-train - INFO - โโโ Loss: 8.7711 |
|
2025-08-29 00:56:17 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 |
|
2025-08-29 00:56:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:56:43 - pico-train - INFO - Step 1450 -- ๐ Training Metrics |
|
2025-08-29 00:56:43 - pico-train - INFO - โโโ Loss: 8.6834 |
|
2025-08-29 00:56:43 - pico-train - INFO - โโโ Learning Rate: 2.90e-05 |
|
2025-08-29 00:56:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:57:09 - pico-train - INFO - Step 1500 -- ๐ Training Metrics |
|
2025-08-29 00:57:09 - pico-train - INFO - โโโ Loss: 8.5638 |
|
2025-08-29 00:57:09 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
|
2025-08-29 00:57:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:57:35 - pico-train - INFO - Step 1550 -- ๐ Training Metrics |
|
2025-08-29 00:57:35 - pico-train - INFO - โโโ Loss: 8.4572 |
|
2025-08-29 00:57:35 - pico-train - INFO - โโโ Learning Rate: 3.10e-05 |
|
2025-08-29 00:57:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:58:01 - pico-train - INFO - Step 1600 -- ๐ Training Metrics |
|
2025-08-29 00:58:01 - pico-train - INFO - โโโ Loss: 8.3940 |
|
2025-08-29 00:58:01 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 |
|
2025-08-29 00:58:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:58:27 - pico-train - INFO - Step 1650 -- ๐ Training Metrics |
|
2025-08-29 00:58:27 - pico-train - INFO - โโโ Loss: 8.2973 |
|
2025-08-29 00:58:27 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
|
2025-08-29 00:58:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:58:53 - pico-train - INFO - Step 1700 -- ๐ Training Metrics |
|
2025-08-29 00:58:53 - pico-train - INFO - โโโ Loss: 8.2264 |
|
2025-08-29 00:58:53 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
|
2025-08-29 00:58:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:59:19 - pico-train - INFO - Step 1750 -- ๐ Training Metrics |
|
2025-08-29 00:59:19 - pico-train - INFO - โโโ Loss: 8.1672 |
|
2025-08-29 00:59:19 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 |
|
2025-08-29 00:59:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:59:45 - pico-train - INFO - Step 1800 -- ๐ Training Metrics |
|
2025-08-29 00:59:45 - pico-train - INFO - โโโ Loss: 8.0695 |
|
2025-08-29 00:59:45 - pico-train - INFO - โโโ Learning Rate: 3.60e-05 |
|
2025-08-29 00:59:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:00:11 - pico-train - INFO - Step 1850 -- ๐ Training Metrics |
|
2025-08-29 01:00:11 - pico-train - INFO - โโโ Loss: 8.0299 |
|
2025-08-29 01:00:11 - pico-train - INFO - โโโ Learning Rate: 3.70e-05 |
|
2025-08-29 01:00:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:00:37 - pico-train - INFO - Step 1900 -- ๐ Training Metrics |
|
2025-08-29 01:00:37 - pico-train - INFO - โโโ Loss: 7.9883 |
|
2025-08-29 01:00:37 - pico-train - INFO - โโโ Learning Rate: 3.80e-05 |
|
2025-08-29 01:00:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:01:03 - pico-train - INFO - Step 1950 -- ๐ Training Metrics |
|
2025-08-29 01:01:03 - pico-train - INFO - โโโ Loss: 7.9429 |
|
2025-08-29 01:01:03 - pico-train - INFO - โโโ Learning Rate: 3.90e-05 |
|
2025-08-29 01:01:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:01:28 - pico-train - INFO - Step 2000 -- ๐พ Saving Checkpoint |
|
2025-08-29 01:03:57 - pico-train - INFO - Step 2000 -- ๐ Evaluation Results |
|
2025-08-29 01:03:57 - pico-train - INFO - โโโ paloma: 1.8978577072995303e+19 |
|
2025-08-29 01:04:01 - pico-train - INFO - Step 2000 -- ๐ Training Metrics |
|
2025-08-29 01:04:01 - pico-train - INFO - โโโ Loss: 7.8447 |
|
2025-08-29 01:04:01 - pico-train - INFO - โโโ Learning Rate: 4.00e-05 |
|
2025-08-29 01:04:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:04:01 - pico-train - INFO - Step 2000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 01:04:31 - pico-train - INFO - Step 2050 -- ๐ Training Metrics |
|
2025-08-29 01:04:31 - pico-train - INFO - โโโ Loss: 7.8380 |
|
2025-08-29 01:04:31 - pico-train - INFO - โโโ Learning Rate: 4.10e-05 |
|
2025-08-29 01:04:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:04:57 - pico-train - INFO - Step 2100 -- ๐ Training Metrics |
|
2025-08-29 01:04:57 - pico-train - INFO - โโโ Loss: 7.7671 |
|
2025-08-29 01:04:57 - pico-train - INFO - โโโ Learning Rate: 4.20e-05 |
|
2025-08-29 01:04:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:05:23 - pico-train - INFO - Step 2150 -- ๐ Training Metrics |
|
2025-08-29 01:05:23 - pico-train - INFO - โโโ Loss: 7.7637 |
|
2025-08-29 01:05:23 - pico-train - INFO - โโโ Learning Rate: 4.30e-05 |
|
2025-08-29 01:05:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:05:49 - pico-train - INFO - Step 2200 -- ๐ Training Metrics |
|
2025-08-29 01:05:49 - pico-train - INFO - โโโ Loss: 7.7060 |
|
2025-08-29 01:05:49 - pico-train - INFO - โโโ Learning Rate: 4.40e-05 |
|
2025-08-29 01:05:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:06:15 - pico-train - INFO - Step 2250 -- ๐ Training Metrics |
|
2025-08-29 01:06:15 - pico-train - INFO - โโโ Loss: 7.7607 |
|
2025-08-29 01:06:15 - pico-train - INFO - โโโ Learning Rate: 4.50e-05 |
|
2025-08-29 01:06:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:06:41 - pico-train - INFO - Step 2300 -- ๐ Training Metrics |
|
2025-08-29 01:06:41 - pico-train - INFO - โโโ Loss: 7.7076 |
|
2025-08-29 01:06:41 - pico-train - INFO - โโโ Learning Rate: 4.60e-05 |
|
2025-08-29 01:06:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:07:07 - pico-train - INFO - Step 2350 -- ๐ Training Metrics |
|
2025-08-29 01:07:07 - pico-train - INFO - โโโ Loss: 7.6787 |
|
2025-08-29 01:07:07 - pico-train - INFO - โโโ Learning Rate: 4.70e-05 |
|
2025-08-29 01:07:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:07:33 - pico-train - INFO - Step 2400 -- ๐ Training Metrics |
|
2025-08-29 01:07:33 - pico-train - INFO - โโโ Loss: 7.6446 |
|
2025-08-29 01:07:33 - pico-train - INFO - โโโ Learning Rate: 4.80e-05 |
|
2025-08-29 01:07:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:07:59 - pico-train - INFO - Step 2450 -- ๐ Training Metrics |
|
2025-08-29 01:07:59 - pico-train - INFO - โโโ Loss: 7.5999 |
|
2025-08-29 01:07:59 - pico-train - INFO - โโโ Learning Rate: 4.90e-05 |
|
2025-08-29 01:07:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:08:25 - pico-train - INFO - Step 2500 -- ๐ Training Metrics |
|
2025-08-29 01:08:25 - pico-train - INFO - โโโ Loss: 7.6154 |
|
2025-08-29 01:08:25 - pico-train - INFO - โโโ Learning Rate: 5.00e-05 |
|
2025-08-29 01:08:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:08:50 - pico-train - INFO - Step 2550 -- ๐ Training Metrics |
|
2025-08-29 01:08:50 - pico-train - INFO - โโโ Loss: 7.5627 |
|
2025-08-29 01:08:50 - pico-train - INFO - โโโ Learning Rate: 5.10e-05 |
|
2025-08-29 01:08:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:09:17 - pico-train - INFO - Step 2600 -- ๐ Training Metrics |
|
2025-08-29 01:09:17 - pico-train - INFO - โโโ Loss: 7.5747 |
|
2025-08-29 01:09:17 - pico-train - INFO - โโโ Learning Rate: 5.20e-05 |
|
2025-08-29 01:09:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:09:43 - pico-train - INFO - Step 2650 -- ๐ Training Metrics |
|
2025-08-29 01:09:43 - pico-train - INFO - โโโ Loss: 7.5358 |
|
2025-08-29 01:09:43 - pico-train - INFO - โโโ Learning Rate: 5.30e-05 |
|
2025-08-29 01:09:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:10:09 - pico-train - INFO - Step 2700 -- ๐ Training Metrics |
|
2025-08-29 01:10:09 - pico-train - INFO - โโโ Loss: 7.5148 |
|
2025-08-29 01:10:09 - pico-train - INFO - โโโ Learning Rate: 5.40e-05 |
|
2025-08-29 01:10:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:10:35 - pico-train - INFO - Step 2750 -- ๐ Training Metrics |
|
2025-08-29 01:10:35 - pico-train - INFO - โโโ Loss: 7.4874 |
|
2025-08-29 01:10:35 - pico-train - INFO - โโโ Learning Rate: 5.50e-05 |
|
2025-08-29 01:10:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:11:01 - pico-train - INFO - Step 2800 -- ๐ Training Metrics |
|
2025-08-29 01:11:01 - pico-train - INFO - โโโ Loss: 7.4438 |
|
2025-08-29 01:11:01 - pico-train - INFO - โโโ Learning Rate: 5.60e-05 |
|
2025-08-29 01:11:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:11:27 - pico-train - INFO - Step 2850 -- ๐ Training Metrics |
|
2025-08-29 01:11:27 - pico-train - INFO - โโโ Loss: 7.4772 |
|
2025-08-29 01:11:27 - pico-train - INFO - โโโ Learning Rate: 5.70e-05 |
|
2025-08-29 01:11:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:11:53 - pico-train - INFO - Step 2900 -- ๐ Training Metrics |
|
2025-08-29 01:11:53 - pico-train - INFO - โโโ Loss: 7.4135 |
|
2025-08-29 01:11:53 - pico-train - INFO - โโโ Learning Rate: 5.80e-05 |
|
2025-08-29 01:11:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:12:19 - pico-train - INFO - Step 2950 -- ๐ Training Metrics |
|
2025-08-29 01:12:19 - pico-train - INFO - โโโ Loss: 7.3929 |
|
2025-08-29 01:12:19 - pico-train - INFO - โโโ Learning Rate: 5.90e-05 |
|
2025-08-29 01:12:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:12:44 - pico-train - INFO - Step 3000 -- ๐พ Saving Checkpoint |
|
2025-08-29 01:14:43 - pico-train - INFO - Step 3000 -- ๐ Evaluation Results |
|
2025-08-29 01:14:43 - pico-train - INFO - โโโ paloma: 3.1701596694317715e+19 |
|
2025-08-29 01:14:46 - pico-train - INFO - Step 3000 -- ๐ Training Metrics |
|
2025-08-29 01:14:46 - pico-train - INFO - โโโ Loss: 7.3566 |
|
2025-08-29 01:14:46 - pico-train - INFO - โโโ Learning Rate: 6.00e-05 |
|
2025-08-29 01:14:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:14:46 - pico-train - INFO - Step 3000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 01:15:16 - pico-train - INFO - Step 3050 -- ๐ Training Metrics |
|
2025-08-29 01:15:16 - pico-train - INFO - โโโ Loss: 7.3318 |
|
2025-08-29 01:15:16 - pico-train - INFO - โโโ Learning Rate: 6.10e-05 |
|
2025-08-29 01:15:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:15:42 - pico-train - INFO - Step 3100 -- ๐ Training Metrics |
|
2025-08-29 01:15:42 - pico-train - INFO - โโโ Loss: 7.3114 |
|
2025-08-29 01:15:42 - pico-train - INFO - โโโ Learning Rate: 6.20e-05 |
|
2025-08-29 01:15:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:16:08 - pico-train - INFO - Step 3150 -- ๐ Training Metrics |
|
2025-08-29 01:16:08 - pico-train - INFO - โโโ Loss: 7.2734 |
|
2025-08-29 01:16:08 - pico-train - INFO - โโโ Learning Rate: 6.30e-05 |
|
2025-08-29 01:16:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:16:34 - pico-train - INFO - Step 3200 -- ๐ Training Metrics |
|
2025-08-29 01:16:34 - pico-train - INFO - โโโ Loss: 7.3220 |
|
2025-08-29 01:16:34 - pico-train - INFO - โโโ Learning Rate: 6.40e-05 |
|
2025-08-29 01:16:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:16:59 - pico-train - INFO - Step 3250 -- ๐ Training Metrics |
|
2025-08-29 01:16:59 - pico-train - INFO - โโโ Loss: 7.2621 |
|
2025-08-29 01:16:59 - pico-train - INFO - โโโ Learning Rate: 6.50e-05 |
|
2025-08-29 01:16:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:17:25 - pico-train - INFO - Step 3300 -- ๐ Training Metrics |
|
2025-08-29 01:17:25 - pico-train - INFO - โโโ Loss: 7.2257 |
|
2025-08-29 01:17:25 - pico-train - INFO - โโโ Learning Rate: 6.60e-05 |
|
2025-08-29 01:17:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:17:52 - pico-train - INFO - Step 3350 -- ๐ Training Metrics |
|
2025-08-29 01:17:52 - pico-train - INFO - โโโ Loss: 7.2447 |
|
2025-08-29 01:17:52 - pico-train - INFO - โโโ Learning Rate: 6.70e-05 |
|
2025-08-29 01:17:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:18:18 - pico-train - INFO - Step 3400 -- ๐ Training Metrics |
|
2025-08-29 01:18:18 - pico-train - INFO - โโโ Loss: 7.2344 |
|
2025-08-29 01:18:18 - pico-train - INFO - โโโ Learning Rate: 6.80e-05 |
|
2025-08-29 01:18:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:18:43 - pico-train - INFO - Step 3450 -- ๐ Training Metrics |
|
2025-08-29 01:18:43 - pico-train - INFO - โโโ Loss: 7.1488 |
|
2025-08-29 01:18:43 - pico-train - INFO - โโโ Learning Rate: 6.90e-05 |
|
2025-08-29 01:18:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:19:09 - pico-train - INFO - Step 3500 -- ๐ Training Metrics |
|
2025-08-29 01:19:09 - pico-train - INFO - โโโ Loss: 7.1797 |
|
2025-08-29 01:19:09 - pico-train - INFO - โโโ Learning Rate: 7.00e-05 |
|
2025-08-29 01:19:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:19:35 - pico-train - INFO - Step 3550 -- ๐ Training Metrics |
|
2025-08-29 01:19:35 - pico-train - INFO - โโโ Loss: 7.1737 |
|
2025-08-29 01:19:35 - pico-train - INFO - โโโ Learning Rate: 7.10e-05 |
|
2025-08-29 01:19:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:20:01 - pico-train - INFO - Step 3600 -- ๐ Training Metrics |
|
2025-08-29 01:20:01 - pico-train - INFO - โโโ Loss: 7.1204 |
|
2025-08-29 01:20:01 - pico-train - INFO - โโโ Learning Rate: 7.20e-05 |
|
2025-08-29 01:20:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:20:27 - pico-train - INFO - Step 3650 -- ๐ Training Metrics |
|
2025-08-29 01:20:27 - pico-train - INFO - โโโ Loss: 7.1102 |
|
2025-08-29 01:20:27 - pico-train - INFO - โโโ Learning Rate: 7.30e-05 |
|
2025-08-29 01:20:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:20:53 - pico-train - INFO - Step 3700 -- ๐ Training Metrics |
|
2025-08-29 01:20:53 - pico-train - INFO - โโโ Loss: 7.0845 |
|
2025-08-29 01:20:53 - pico-train - INFO - โโโ Learning Rate: 7.40e-05 |
|
2025-08-29 01:20:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:21:19 - pico-train - INFO - Step 3750 -- ๐ Training Metrics |
|
2025-08-29 01:21:19 - pico-train - INFO - โโโ Loss: 7.0858 |
|
2025-08-29 01:21:19 - pico-train - INFO - โโโ Learning Rate: 7.50e-05 |
|
2025-08-29 01:21:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:21:45 - pico-train - INFO - Step 3800 -- ๐ Training Metrics |
|
2025-08-29 01:21:45 - pico-train - INFO - โโโ Loss: 7.0362 |
|
2025-08-29 01:21:45 - pico-train - INFO - โโโ Learning Rate: 7.60e-05 |
|
2025-08-29 01:21:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:22:11 - pico-train - INFO - Step 3850 -- ๐ Training Metrics |
|
2025-08-29 01:22:11 - pico-train - INFO - โโโ Loss: 7.0603 |
|
2025-08-29 01:22:11 - pico-train - INFO - โโโ Learning Rate: 7.70e-05 |
|
2025-08-29 01:22:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:22:37 - pico-train - INFO - Step 3900 -- ๐ Training Metrics |
|
2025-08-29 01:22:37 - pico-train - INFO - โโโ Loss: 7.0172 |
|
2025-08-29 01:22:37 - pico-train - INFO - โโโ Learning Rate: 7.80e-05 |
|
2025-08-29 01:22:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:23:03 - pico-train - INFO - Step 3950 -- ๐ Training Metrics |
|
2025-08-29 01:23:03 - pico-train - INFO - โโโ Loss: 6.9948 |
|
2025-08-29 01:23:03 - pico-train - INFO - โโโ Learning Rate: 7.90e-05 |
|
2025-08-29 01:23:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:23:29 - pico-train - INFO - Step 4000 -- ๐พ Saving Checkpoint |
|
2025-08-29 01:25:52 - pico-train - INFO - Step 4000 -- ๐ Evaluation Results |
|
2025-08-29 01:25:52 - pico-train - INFO - โโโ paloma: 2.5015965971757485e+20 |
|
2025-08-29 01:25:54 - pico-train - INFO - Step 4000 -- ๐ Training Metrics |
|
2025-08-29 01:25:54 - pico-train - INFO - โโโ Loss: 6.9909 |
|
2025-08-29 01:25:54 - pico-train - INFO - โโโ Learning Rate: 8.00e-05 |
|
2025-08-29 01:25:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:25:54 - pico-train - INFO - Step 4000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 01:26:24 - pico-train - INFO - Step 4050 -- ๐ Training Metrics |
|
2025-08-29 01:26:24 - pico-train - INFO - โโโ Loss: 6.9477 |
|
2025-08-29 01:26:24 - pico-train - INFO - โโโ Learning Rate: 8.10e-05 |
|
2025-08-29 01:26:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:26:51 - pico-train - INFO - Step 4100 -- ๐ Training Metrics |
|
2025-08-29 01:26:51 - pico-train - INFO - โโโ Loss: 6.9651 |
|
2025-08-29 01:26:51 - pico-train - INFO - โโโ Learning Rate: 8.20e-05 |
|
2025-08-29 01:26:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:27:17 - pico-train - INFO - Step 4150 -- ๐ Training Metrics |
|
2025-08-29 01:27:17 - pico-train - INFO - โโโ Loss: 6.9149 |
|
2025-08-29 01:27:17 - pico-train - INFO - โโโ Learning Rate: 8.30e-05 |
|
2025-08-29 01:27:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:27:43 - pico-train - INFO - Step 4200 -- ๐ Training Metrics |
|
2025-08-29 01:27:43 - pico-train - INFO - โโโ Loss: 6.8930 |
|
2025-08-29 01:27:43 - pico-train - INFO - โโโ Learning Rate: 8.40e-05 |
|
2025-08-29 01:27:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:28:08 - pico-train - INFO - Step 4250 -- ๐ Training Metrics |
|
2025-08-29 01:28:08 - pico-train - INFO - โโโ Loss: 6.9227 |
|
2025-08-29 01:28:08 - pico-train - INFO - โโโ Learning Rate: 8.50e-05 |
|
2025-08-29 01:28:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:28:34 - pico-train - INFO - Step 4300 -- ๐ Training Metrics |
|
2025-08-29 01:28:34 - pico-train - INFO - โโโ Loss: 6.8790 |
|
2025-08-29 01:28:34 - pico-train - INFO - โโโ Learning Rate: 8.60e-05 |
|
2025-08-29 01:28:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:29:01 - pico-train - INFO - Step 4350 -- ๐ Training Metrics |
|
2025-08-29 01:29:01 - pico-train - INFO - โโโ Loss: 6.8649 |
|
2025-08-29 01:29:01 - pico-train - INFO - โโโ Learning Rate: 8.70e-05 |
|
2025-08-29 01:29:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:29:26 - pico-train - INFO - Step 4400 -- ๐ Training Metrics |
|
2025-08-29 01:29:26 - pico-train - INFO - โโโ Loss: 6.8305 |
|
2025-08-29 01:29:26 - pico-train - INFO - โโโ Learning Rate: 8.80e-05 |
|
2025-08-29 01:29:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:29:52 - pico-train - INFO - Step 4450 -- ๐ Training Metrics |
|
2025-08-29 01:29:52 - pico-train - INFO - โโโ Loss: 6.8085 |
|
2025-08-29 01:29:52 - pico-train - INFO - โโโ Learning Rate: 8.90e-05 |
|
2025-08-29 01:29:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:30:18 - pico-train - INFO - Step 4500 -- ๐ Training Metrics |
|
2025-08-29 01:30:18 - pico-train - INFO - โโโ Loss: 6.8315 |
|
2025-08-29 01:30:18 - pico-train - INFO - โโโ Learning Rate: 9.00e-05 |
|
2025-08-29 01:30:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:30:44 - pico-train - INFO - Step 4550 -- ๐ Training Metrics |
|
2025-08-29 01:30:44 - pico-train - INFO - โโโ Loss: 6.7885 |
|
2025-08-29 01:30:44 - pico-train - INFO - โโโ Learning Rate: 9.10e-05 |
|
2025-08-29 01:30:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:31:11 - pico-train - INFO - Step 4600 -- ๐ Training Metrics |
|
2025-08-29 01:31:11 - pico-train - INFO - โโโ Loss: 6.7805 |
|
2025-08-29 01:31:11 - pico-train - INFO - โโโ Learning Rate: 9.20e-05 |
|
2025-08-29 01:31:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:31:36 - pico-train - INFO - Step 4650 -- ๐ Training Metrics |
|
2025-08-29 01:31:36 - pico-train - INFO - โโโ Loss: 6.7737 |
|
2025-08-29 01:31:36 - pico-train - INFO - โโโ Learning Rate: 9.30e-05 |
|
2025-08-29 01:31:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:32:02 - pico-train - INFO - Step 4700 -- ๐ Training Metrics |
|
2025-08-29 01:32:02 - pico-train - INFO - โโโ Loss: 6.7649 |
|
2025-08-29 01:32:02 - pico-train - INFO - โโโ Learning Rate: 9.40e-05 |
|
2025-08-29 01:32:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:32:28 - pico-train - INFO - Step 4750 -- ๐ Training Metrics |
|
2025-08-29 01:32:28 - pico-train - INFO - โโโ Loss: 6.7562 |
|
2025-08-29 01:32:28 - pico-train - INFO - โโโ Learning Rate: 9.50e-05 |
|
2025-08-29 01:32:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:32:54 - pico-train - INFO - Step 4800 -- ๐ Training Metrics |
|
2025-08-29 01:32:54 - pico-train - INFO - โโโ Loss: 6.7347 |
|
2025-08-29 01:32:54 - pico-train - INFO - โโโ Learning Rate: 9.60e-05 |
|
2025-08-29 01:32:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:33:20 - pico-train - INFO - Step 4850 -- ๐ Training Metrics |
|
2025-08-29 01:33:20 - pico-train - INFO - โโโ Loss: 6.7161 |
|
2025-08-29 01:33:20 - pico-train - INFO - โโโ Learning Rate: 9.70e-05 |
|
2025-08-29 01:33:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:33:46 - pico-train - INFO - Step 4900 -- ๐ Training Metrics |
|
2025-08-29 01:33:46 - pico-train - INFO - โโโ Loss: 6.6889 |
|
2025-08-29 01:33:46 - pico-train - INFO - โโโ Learning Rate: 9.80e-05 |
|
2025-08-29 01:33:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:34:12 - pico-train - INFO - Step 4950 -- ๐ Training Metrics |
|
2025-08-29 01:34:12 - pico-train - INFO - โโโ Loss: 6.7299 |
|
2025-08-29 01:34:12 - pico-train - INFO - โโโ Learning Rate: 9.90e-05 |
|
2025-08-29 01:34:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:34:37 - pico-train - INFO - Step 5000 -- ๐พ Saving Checkpoint |
|
2025-08-29 01:36:35 - pico-train - INFO - Step 5000 -- ๐ Evaluation Results |
|
2025-08-29 01:36:35 - pico-train - INFO - โโโ paloma: 2.38712860824014e+21 |
|
2025-08-29 01:36:37 - pico-train - INFO - Step 5000 -- ๐ Training Metrics |
|
2025-08-29 01:36:37 - pico-train - INFO - โโโ Loss: 6.6605 |
|
2025-08-29 01:36:37 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 |
|
2025-08-29 01:36:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:36:37 - pico-train - INFO - Step 5000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 01:37:06 - pico-train - INFO - Step 5050 -- ๐ Training Metrics |
|
2025-08-29 01:37:06 - pico-train - INFO - โโโ Loss: 6.6552 |
|
2025-08-29 01:37:06 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 |
|
2025-08-29 01:37:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:37:33 - pico-train - INFO - Step 5100 -- ๐ Training Metrics |
|
2025-08-29 01:37:33 - pico-train - INFO - โโโ Loss: 6.7038 |
|
2025-08-29 01:37:33 - pico-train - INFO - โโโ Learning Rate: 9.99e-05 |
|
2025-08-29 01:37:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:37:59 - pico-train - INFO - Step 5150 -- ๐ Training Metrics |
|
2025-08-29 01:37:59 - pico-train - INFO - โโโ Loss: 6.6452 |
|
2025-08-29 01:37:59 - pico-train - INFO - โโโ Learning Rate: 9.99e-05 |
|
2025-08-29 01:37:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:38:25 - pico-train - INFO - Step 5200 -- ๐ Training Metrics |
|
2025-08-29 01:38:25 - pico-train - INFO - โโโ Loss: 6.6522 |
|
2025-08-29 01:38:25 - pico-train - INFO - โโโ Learning Rate: 9.99e-05 |
|
2025-08-29 01:38:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:38:51 - pico-train - INFO - Step 5250 -- ๐ Training Metrics |
|
2025-08-29 01:38:51 - pico-train - INFO - โโโ Loss: 6.6270 |
|
2025-08-29 01:38:51 - pico-train - INFO - โโโ Learning Rate: 9.99e-05 |
|
2025-08-29 01:38:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:39:17 - pico-train - INFO - Step 5300 -- ๐ Training Metrics |
|
2025-08-29 01:39:17 - pico-train - INFO - โโโ Loss: 6.5733 |
|
2025-08-29 01:39:17 - pico-train - INFO - โโโ Learning Rate: 9.98e-05 |
|
2025-08-29 01:39:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:39:43 - pico-train - INFO - Step 5350 -- ๐ Training Metrics |
|
2025-08-29 01:39:43 - pico-train - INFO - โโโ Loss: 6.5833 |
|
2025-08-29 01:39:43 - pico-train - INFO - โโโ Learning Rate: 9.98e-05 |
|
2025-08-29 01:39:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:40:09 - pico-train - INFO - Step 5400 -- ๐ Training Metrics |
|
2025-08-29 01:40:09 - pico-train - INFO - โโโ Loss: 6.5854 |
|
2025-08-29 01:40:09 - pico-train - INFO - โโโ Learning Rate: 9.98e-05 |
|
2025-08-29 01:40:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:40:35 - pico-train - INFO - Step 5450 -- ๐ Training Metrics |
|
2025-08-29 01:40:35 - pico-train - INFO - โโโ Loss: 6.6012 |
|
2025-08-29 01:40:35 - pico-train - INFO - โโโ Learning Rate: 9.98e-05 |
|
2025-08-29 01:40:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 01:41:01 - pico-train - INFO - Step 5500 -- ๐ Training Metrics |
|
2025-08-29 01:41:01 - pico-train - INFO - โโโ Loss: 6.5786 |
|
2025-08-29 01:41:01 - pico-train - INFO - โโโ Learning Rate: 9.97e-05 |
|
2025-08-29 01:41:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
|