|
2025-08-28 22:55:45 - pico-train - INFO - Step 1000 -- ๐ Evaluation Results |
|
2025-08-28 22:55:45 - pico-train - INFO - โโโ paloma: 2.5468931158531133e+19 |
|
2025-08-28 22:55:47 - pico-train - INFO - ================================================== |
|
2025-08-28 22:55:47 - pico-train - INFO - โจ Training Configuration |
|
2025-08-28 22:55:47 - pico-train - INFO - ================================================== |
|
2025-08-28 22:55:47 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ checkpointing: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ checkpoints_dir: checkpoints โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ evaluation: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ eval_results_dir: eval_results โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ hf_checkpoint: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ collection_slug: null โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ learning_dynamics: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ eval_data: null โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ layer_suffixes: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ - attention.v_proj โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ - attention.o_proj โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ - swiglu.w_2 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ sequence_idx: -1 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ logs_dir: logs โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma29k โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ runs_dir: runs โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ save_every_n_steps: 1000 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ save_to_hf: true โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ training: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ auto_resume: true โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ data: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ dataloader: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ batch_size: 4 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ dataset: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ name: pico-lm/pretokenized-dolma โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ tokenizer: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ evaluation: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ metrics: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ - paloma โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ paloma: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ dataset_split: val โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ max_length: 2048 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ model: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ activation_hidden_dim: 384 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ attention_n_heads: 12 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ attention_n_kv_heads: 4 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ batch_size: 1024 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ d_model: 96 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ max_seq_len: 2048 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ model_type: pico_decoder โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ n_layers: 12 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ norm_eps: 1.0e-06 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ position_emb_theta: 10000.0 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ monitoring: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ logging: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ log_every_n_steps: 100 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ log_level: INFO โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ save_to_wandb: false โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ wandb: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ entity: boymyc โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ project: pico-decoder-tiny โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ training: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ fabric: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ accelerator: cuda โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ num_devices: 1 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ num_nodes: 1 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ precision: bf16-mixed โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ max_steps: 200000 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ optimization: โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ gradient_accumulation_steps: 4 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ lr: 0.0003 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ lr_scheduler: linear_with_warmup โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ lr_warmup_steps: 2500 โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ optimizer: adamw โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โ โ |
|
2025-08-28 22:55:47 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ |
|
2025-08-28 22:55:47 - pico-train - INFO - ================================================== |
|
2025-08-28 22:55:47 - pico-train - INFO - โญ Runtime Summary: |
|
2025-08-28 22:55:47 - pico-train - INFO - ================================================== |
|
2025-08-28 22:55:47 - pico-train - INFO - Starting from step: 1000 |
|
2025-08-28 22:55:47 - pico-train - INFO - Model Setup: |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Total Parameters: 11,282,784 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 |
|
2025-08-28 22:55:47 - pico-train - INFO - Distributed Setup: |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Number of Devices: 1 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Available Memory: 33.68 GB |
|
2025-08-28 22:55:47 - pico-train - INFO - Software Setup: |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Python Version: 3.10.12 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ CUDA Version: 12.8 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic |
|
2025-08-28 22:55:47 - pico-train - INFO - Batch Size Configuration: |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Global Batch Size: 4 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Per Device Batch Size: 1 |
|
2025-08-28 22:55:47 - pico-train - INFO - โโ Gradient Accumulation Steps: 4 |
|
2025-08-28 22:55:47 - pico-train - INFO - ================================================== |
|
2025-08-28 22:55:49 - pico-train - INFO - Step 1000 -- ๐ Training Metrics |
|
2025-08-28 22:55:49 - pico-train - INFO - โโโ Loss: 7.7657 |
|
2025-08-28 22:55:49 - pico-train - INFO - โโโ Learning Rate: 1.20e-04 |
|
2025-08-28 22:55:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 22:55:49 - pico-train - INFO - Step 1000 -- ๐ Saving Learning Dynamics |
|
2025-08-28 22:56:43 - pico-train - INFO - Step 1100 -- ๐ Training Metrics |
|
2025-08-28 22:56:43 - pico-train - INFO - โโโ Loss: 7.6733 |
|
2025-08-28 22:56:43 - pico-train - INFO - โโโ Learning Rate: 1.32e-04 |
|
2025-08-28 22:56:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 22:57:34 - pico-train - INFO - Step 1200 -- ๐ Training Metrics |
|
2025-08-28 22:57:34 - pico-train - INFO - โโโ Loss: 7.5969 |
|
2025-08-28 22:57:34 - pico-train - INFO - โโโ Learning Rate: 1.44e-04 |
|
2025-08-28 22:57:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 22:58:25 - pico-train - INFO - Step 1300 -- ๐ Training Metrics |
|
2025-08-28 22:58:25 - pico-train - INFO - โโโ Loss: 7.4765 |
|
2025-08-28 22:58:25 - pico-train - INFO - โโโ Learning Rate: 1.56e-04 |
|
2025-08-28 22:58:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 22:59:16 - pico-train - INFO - Step 1400 -- ๐ Training Metrics |
|
2025-08-28 22:59:16 - pico-train - INFO - โโโ Loss: 7.3686 |
|
2025-08-28 22:59:16 - pico-train - INFO - โโโ Learning Rate: 1.68e-04 |
|
2025-08-28 22:59:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:00:07 - pico-train - INFO - Step 1500 -- ๐ Training Metrics |
|
2025-08-28 23:00:07 - pico-train - INFO - โโโ Loss: 7.3251 |
|
2025-08-28 23:00:07 - pico-train - INFO - โโโ Learning Rate: 1.80e-04 |
|
2025-08-28 23:00:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:00:58 - pico-train - INFO - Step 1600 -- ๐ Training Metrics |
|
2025-08-28 23:00:58 - pico-train - INFO - โโโ Loss: 7.1840 |
|
2025-08-28 23:00:58 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
|
2025-08-28 23:00:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:01:50 - pico-train - INFO - Step 1700 -- ๐ Training Metrics |
|
2025-08-28 23:01:50 - pico-train - INFO - โโโ Loss: 7.1116 |
|
2025-08-28 23:01:50 - pico-train - INFO - โโโ Learning Rate: 2.04e-04 |
|
2025-08-28 23:01:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:02:41 - pico-train - INFO - Step 1800 -- ๐ Training Metrics |
|
2025-08-28 23:02:41 - pico-train - INFO - โโโ Loss: 7.0565 |
|
2025-08-28 23:02:41 - pico-train - INFO - โโโ Learning Rate: 2.16e-04 |
|
2025-08-28 23:02:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:03:32 - pico-train - INFO - Step 1900 -- ๐ Training Metrics |
|
2025-08-28 23:03:32 - pico-train - INFO - โโโ Loss: 6.9964 |
|
2025-08-28 23:03:32 - pico-train - INFO - โโโ Learning Rate: 2.28e-04 |
|
2025-08-28 23:03:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:04:23 - pico-train - INFO - Step 2000 -- ๐พ Saving Checkpoint |
|
2025-08-28 23:06:18 - pico-train - INFO - Step 2000 -- ๐ Evaluation Results |
|
2025-08-28 23:06:18 - pico-train - INFO - โโโ paloma: 3.627192449295412e+21 |
|
2025-08-28 23:06:21 - pico-train - INFO - Step 2000 -- ๐ Training Metrics |
|
2025-08-28 23:06:21 - pico-train - INFO - โโโ Loss: 6.9690 |
|
2025-08-28 23:06:21 - pico-train - INFO - โโโ Learning Rate: 2.40e-04 |
|
2025-08-28 23:06:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:06:21 - pico-train - INFO - Step 2000 -- ๐ Saving Learning Dynamics |
|
2025-08-28 23:07:15 - pico-train - INFO - Step 2100 -- ๐ Training Metrics |
|
2025-08-28 23:07:15 - pico-train - INFO - โโโ Loss: 6.8840 |
|
2025-08-28 23:07:15 - pico-train - INFO - โโโ Learning Rate: 2.52e-04 |
|
2025-08-28 23:07:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:08:06 - pico-train - INFO - Step 2200 -- ๐ Training Metrics |
|
2025-08-28 23:08:06 - pico-train - INFO - โโโ Loss: 6.8334 |
|
2025-08-28 23:08:06 - pico-train - INFO - โโโ Learning Rate: 2.64e-04 |
|
2025-08-28 23:08:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:08:57 - pico-train - INFO - Step 2300 -- ๐ Training Metrics |
|
2025-08-28 23:08:57 - pico-train - INFO - โโโ Loss: 6.8150 |
|
2025-08-28 23:08:57 - pico-train - INFO - โโโ Learning Rate: 2.76e-04 |
|
2025-08-28 23:08:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:09:48 - pico-train - INFO - Step 2400 -- ๐ Training Metrics |
|
2025-08-28 23:09:48 - pico-train - INFO - โโโ Loss: 6.7519 |
|
2025-08-28 23:09:48 - pico-train - INFO - โโโ Learning Rate: 2.88e-04 |
|
2025-08-28 23:09:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:10:39 - pico-train - INFO - Step 2500 -- ๐ Training Metrics |
|
2025-08-28 23:10:39 - pico-train - INFO - โโโ Loss: 6.6908 |
|
2025-08-28 23:10:39 - pico-train - INFO - โโโ Learning Rate: 3.00e-04 |
|
2025-08-28 23:10:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:11:30 - pico-train - INFO - Step 2600 -- ๐ Training Metrics |
|
2025-08-28 23:11:30 - pico-train - INFO - โโโ Loss: 6.6351 |
|
2025-08-28 23:11:30 - pico-train - INFO - โโโ Learning Rate: 3.00e-04 |
|
2025-08-28 23:11:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:12:21 - pico-train - INFO - Step 2700 -- ๐ Training Metrics |
|
2025-08-28 23:12:21 - pico-train - INFO - โโโ Loss: 6.5568 |
|
2025-08-28 23:12:21 - pico-train - INFO - โโโ Learning Rate: 3.00e-04 |
|
2025-08-28 23:12:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:13:12 - pico-train - INFO - Step 2800 -- ๐ Training Metrics |
|
2025-08-28 23:13:12 - pico-train - INFO - โโโ Loss: 6.5799 |
|
2025-08-28 23:13:12 - pico-train - INFO - โโโ Learning Rate: 3.00e-04 |
|
2025-08-28 23:13:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:14:03 - pico-train - INFO - Step 2900 -- ๐ Training Metrics |
|
2025-08-28 23:14:03 - pico-train - INFO - โโโ Loss: 6.5467 |
|
2025-08-28 23:14:03 - pico-train - INFO - โโโ Learning Rate: 2.99e-04 |
|
2025-08-28 23:14:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:14:53 - pico-train - INFO - Step 3000 -- ๐พ Saving Checkpoint |
|
2025-08-28 23:16:58 - pico-train - INFO - Step 3000 -- ๐ Evaluation Results |
|
2025-08-28 23:16:58 - pico-train - INFO - โโโ paloma: 9.90975658825673e+22 |
|
2025-08-28 23:17:01 - pico-train - INFO - Step 3000 -- ๐ Training Metrics |
|
2025-08-28 23:17:01 - pico-train - INFO - โโโ Loss: 6.4865 |
|
2025-08-28 23:17:01 - pico-train - INFO - โโโ Learning Rate: 2.99e-04 |
|
2025-08-28 23:17:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:17:01 - pico-train - INFO - Step 3000 -- ๐ Saving Learning Dynamics |
|
2025-08-28 23:17:55 - pico-train - INFO - Step 3100 -- ๐ Training Metrics |
|
2025-08-28 23:17:55 - pico-train - INFO - โโโ Loss: 6.4604 |
|
2025-08-28 23:17:55 - pico-train - INFO - โโโ Learning Rate: 2.99e-04 |
|
2025-08-28 23:17:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:18:46 - pico-train - INFO - Step 3200 -- ๐ Training Metrics |
|
2025-08-28 23:18:46 - pico-train - INFO - โโโ Loss: 6.4205 |
|
2025-08-28 23:18:46 - pico-train - INFO - โโโ Learning Rate: 2.99e-04 |
|
2025-08-28 23:18:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:19:36 - pico-train - INFO - Step 3300 -- ๐ Training Metrics |
|
2025-08-28 23:19:36 - pico-train - INFO - โโโ Loss: 6.4127 |
|
2025-08-28 23:19:36 - pico-train - INFO - โโโ Learning Rate: 2.99e-04 |
|
2025-08-28 23:19:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:20:27 - pico-train - INFO - Step 3400 -- ๐ Training Metrics |
|
2025-08-28 23:20:27 - pico-train - INFO - โโโ Loss: 6.3692 |
|
2025-08-28 23:20:27 - pico-train - INFO - โโโ Learning Rate: 2.99e-04 |
|
2025-08-28 23:20:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:21:18 - pico-train - INFO - Step 3500 -- ๐ Training Metrics |
|
2025-08-28 23:21:18 - pico-train - INFO - โโโ Loss: 6.3761 |
|
2025-08-28 23:21:18 - pico-train - INFO - โโโ Learning Rate: 2.98e-04 |
|
2025-08-28 23:21:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:22:09 - pico-train - INFO - Step 3600 -- ๐ Training Metrics |
|
2025-08-28 23:22:09 - pico-train - INFO - โโโ Loss: 6.2796 |
|
2025-08-28 23:22:09 - pico-train - INFO - โโโ Learning Rate: 2.98e-04 |
|
2025-08-28 23:22:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:23:00 - pico-train - INFO - Step 3700 -- ๐ Training Metrics |
|
2025-08-28 23:23:00 - pico-train - INFO - โโโ Loss: 6.2988 |
|
2025-08-28 23:23:00 - pico-train - INFO - โโโ Learning Rate: 2.98e-04 |
|
2025-08-28 23:23:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:23:51 - pico-train - INFO - Step 3800 -- ๐ Training Metrics |
|
2025-08-28 23:23:51 - pico-train - INFO - โโโ Loss: 6.2673 |
|
2025-08-28 23:23:51 - pico-train - INFO - โโโ Learning Rate: 2.98e-04 |
|
2025-08-28 23:23:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:24:42 - pico-train - INFO - Step 3900 -- ๐ Training Metrics |
|
2025-08-28 23:24:42 - pico-train - INFO - โโโ Loss: 6.2715 |
|
2025-08-28 23:24:42 - pico-train - INFO - โโโ Learning Rate: 2.98e-04 |
|
2025-08-28 23:24:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:25:32 - pico-train - INFO - Step 4000 -- ๐พ Saving Checkpoint |
|
2025-08-28 23:27:27 - pico-train - INFO - Step 4000 -- ๐ Evaluation Results |
|
2025-08-28 23:27:27 - pico-train - INFO - โโโ paloma: 2.6252526658823776e+24 |
|
2025-08-28 23:27:29 - pico-train - INFO - Step 4000 -- ๐ Training Metrics |
|
2025-08-28 23:27:29 - pico-train - INFO - โโโ Loss: 6.1890 |
|
2025-08-28 23:27:29 - pico-train - INFO - โโโ Learning Rate: 2.98e-04 |
|
2025-08-28 23:27:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:27:29 - pico-train - INFO - Step 4000 -- ๐ Saving Learning Dynamics |
|
2025-08-28 23:28:23 - pico-train - INFO - Step 4100 -- ๐ Training Metrics |
|
2025-08-28 23:28:23 - pico-train - INFO - โโโ Loss: 6.1832 |
|
2025-08-28 23:28:23 - pico-train - INFO - โโโ Learning Rate: 2.98e-04 |
|
2025-08-28 23:28:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:29:13 - pico-train - INFO - Step 4200 -- ๐ Training Metrics |
|
2025-08-28 23:29:13 - pico-train - INFO - โโโ Loss: 6.1553 |
|
2025-08-28 23:29:13 - pico-train - INFO - โโโ Learning Rate: 2.97e-04 |
|
2025-08-28 23:29:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:30:04 - pico-train - INFO - Step 4300 -- ๐ Training Metrics |
|
2025-08-28 23:30:04 - pico-train - INFO - โโโ Loss: 6.1629 |
|
2025-08-28 23:30:04 - pico-train - INFO - โโโ Learning Rate: 2.97e-04 |
|
2025-08-28 23:30:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:30:56 - pico-train - INFO - Step 4400 -- ๐ Training Metrics |
|
2025-08-28 23:30:56 - pico-train - INFO - โโโ Loss: 6.1061 |
|
2025-08-28 23:30:56 - pico-train - INFO - โโโ Learning Rate: 2.97e-04 |
|
2025-08-28 23:30:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:31:47 - pico-train - INFO - Step 4500 -- ๐ Training Metrics |
|
2025-08-28 23:31:47 - pico-train - INFO - โโโ Loss: 6.1601 |
|
2025-08-28 23:31:47 - pico-train - INFO - โโโ Learning Rate: 2.97e-04 |
|
2025-08-28 23:31:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:32:38 - pico-train - INFO - Step 4600 -- ๐ Training Metrics |
|
2025-08-28 23:32:38 - pico-train - INFO - โโโ Loss: 6.0963 |
|
2025-08-28 23:32:38 - pico-train - INFO - โโโ Learning Rate: 2.97e-04 |
|
2025-08-28 23:32:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:33:29 - pico-train - INFO - Step 4700 -- ๐ Training Metrics |
|
2025-08-28 23:33:29 - pico-train - INFO - โโโ Loss: 6.0780 |
|
2025-08-28 23:33:29 - pico-train - INFO - โโโ Learning Rate: 2.97e-04 |
|
2025-08-28 23:33:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:34:20 - pico-train - INFO - Step 4800 -- ๐ Training Metrics |
|
2025-08-28 23:34:20 - pico-train - INFO - โโโ Loss: 6.0835 |
|
2025-08-28 23:34:20 - pico-train - INFO - โโโ Learning Rate: 2.97e-04 |
|
2025-08-28 23:34:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:35:11 - pico-train - INFO - Step 4900 -- ๐ Training Metrics |
|
2025-08-28 23:35:11 - pico-train - INFO - โโโ Loss: 6.0519 |
|
2025-08-28 23:35:11 - pico-train - INFO - โโโ Learning Rate: 2.96e-04 |
|
2025-08-28 23:35:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:36:01 - pico-train - INFO - Step 5000 -- ๐พ Saving Checkpoint |
|
2025-08-28 23:38:14 - pico-train - INFO - Step 5000 -- ๐ Evaluation Results |
|
2025-08-28 23:38:14 - pico-train - INFO - โโโ paloma: 7.294956881845611e+25 |
|
2025-08-28 23:38:16 - pico-train - INFO - Step 5000 -- ๐ Training Metrics |
|
2025-08-28 23:38:16 - pico-train - INFO - โโโ Loss: 6.0661 |
|
2025-08-28 23:38:16 - pico-train - INFO - โโโ Learning Rate: 2.96e-04 |
|
2025-08-28 23:38:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:38:16 - pico-train - INFO - Step 5000 -- ๐ Saving Learning Dynamics |
|
2025-08-28 23:39:10 - pico-train - INFO - Step 5100 -- ๐ Training Metrics |
|
2025-08-28 23:39:10 - pico-train - INFO - โโโ Loss: 6.0121 |
|
2025-08-28 23:39:10 - pico-train - INFO - โโโ Learning Rate: 2.96e-04 |
|
2025-08-28 23:39:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:40:02 - pico-train - INFO - Step 5200 -- ๐ Training Metrics |
|
2025-08-28 23:40:02 - pico-train - INFO - โโโ Loss: 6.0544 |
|
2025-08-28 23:40:02 - pico-train - INFO - โโโ Learning Rate: 2.96e-04 |
|
2025-08-28 23:40:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:40:53 - pico-train - INFO - Step 5300 -- ๐ Training Metrics |
|
2025-08-28 23:40:53 - pico-train - INFO - โโโ Loss: 6.0224 |
|
2025-08-28 23:40:53 - pico-train - INFO - โโโ Learning Rate: 2.96e-04 |
|
2025-08-28 23:40:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:41:44 - pico-train - INFO - Step 5400 -- ๐ Training Metrics |
|
2025-08-28 23:41:44 - pico-train - INFO - โโโ Loss: 5.9831 |
|
2025-08-28 23:41:44 - pico-train - INFO - โโโ Learning Rate: 2.96e-04 |
|
2025-08-28 23:41:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:42:35 - pico-train - INFO - Step 5500 -- ๐ Training Metrics |
|
2025-08-28 23:42:35 - pico-train - INFO - โโโ Loss: 5.9553 |
|
2025-08-28 23:42:35 - pico-train - INFO - โโโ Learning Rate: 2.95e-04 |
|
2025-08-28 23:42:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:43:26 - pico-train - INFO - Step 5600 -- ๐ Training Metrics |
|
2025-08-28 23:43:26 - pico-train - INFO - โโโ Loss: 5.9493 |
|
2025-08-28 23:43:26 - pico-train - INFO - โโโ Learning Rate: 2.95e-04 |
|
2025-08-28 23:43:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:44:17 - pico-train - INFO - Step 5700 -- ๐ Training Metrics |
|
2025-08-28 23:44:17 - pico-train - INFO - โโโ Loss: 5.9943 |
|
2025-08-28 23:44:17 - pico-train - INFO - โโโ Learning Rate: 2.95e-04 |
|
2025-08-28 23:44:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:45:08 - pico-train - INFO - Step 5800 -- ๐ Training Metrics |
|
2025-08-28 23:45:08 - pico-train - INFO - โโโ Loss: 5.9630 |
|
2025-08-28 23:45:08 - pico-train - INFO - โโโ Learning Rate: 2.95e-04 |
|
2025-08-28 23:45:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:46:00 - pico-train - INFO - Step 5900 -- ๐ Training Metrics |
|
2025-08-28 23:46:00 - pico-train - INFO - โโโ Loss: 5.9349 |
|
2025-08-28 23:46:00 - pico-train - INFO - โโโ Learning Rate: 2.95e-04 |
|
2025-08-28 23:46:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:46:50 - pico-train - INFO - Step 6000 -- ๐พ Saving Checkpoint |
|
2025-08-28 23:48:48 - pico-train - INFO - Step 6000 -- ๐ Evaluation Results |
|
2025-08-28 23:48:48 - pico-train - INFO - โโโ paloma: 1.6856570425562805e+27 |
|
2025-08-28 23:48:50 - pico-train - INFO - Step 6000 -- ๐ Training Metrics |
|
2025-08-28 23:48:50 - pico-train - INFO - โโโ Loss: 5.9087 |
|
2025-08-28 23:48:50 - pico-train - INFO - โโโ Learning Rate: 2.95e-04 |
|
2025-08-28 23:48:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:48:50 - pico-train - INFO - Step 6000 -- ๐ Saving Learning Dynamics |
|
2025-08-28 23:49:44 - pico-train - INFO - Step 6100 -- ๐ Training Metrics |
|
2025-08-28 23:49:44 - pico-train - INFO - โโโ Loss: 5.8818 |
|
2025-08-28 23:49:44 - pico-train - INFO - โโโ Learning Rate: 2.95e-04 |
|
2025-08-28 23:49:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:50:35 - pico-train - INFO - Step 6200 -- ๐ Training Metrics |
|
2025-08-28 23:50:35 - pico-train - INFO - โโโ Loss: 5.8535 |
|
2025-08-28 23:50:35 - pico-train - INFO - โโโ Learning Rate: 2.94e-04 |
|
2025-08-28 23:50:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:51:26 - pico-train - INFO - Step 6300 -- ๐ Training Metrics |
|
2025-08-28 23:51:26 - pico-train - INFO - โโโ Loss: 5.8896 |
|
2025-08-28 23:51:26 - pico-train - INFO - โโโ Learning Rate: 2.94e-04 |
|
2025-08-28 23:51:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:52:18 - pico-train - INFO - Step 6400 -- ๐ Training Metrics |
|
2025-08-28 23:52:18 - pico-train - INFO - โโโ Loss: 5.9007 |
|
2025-08-28 23:52:18 - pico-train - INFO - โโโ Learning Rate: 2.94e-04 |
|
2025-08-28 23:52:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:53:09 - pico-train - INFO - Step 6500 -- ๐ Training Metrics |
|
2025-08-28 23:53:09 - pico-train - INFO - โโโ Loss: 5.8617 |
|
2025-08-28 23:53:09 - pico-train - INFO - โโโ Learning Rate: 2.94e-04 |
|
2025-08-28 23:53:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:54:00 - pico-train - INFO - Step 6600 -- ๐ Training Metrics |
|
2025-08-28 23:54:00 - pico-train - INFO - โโโ Loss: 5.8201 |
|
2025-08-28 23:54:00 - pico-train - INFO - โโโ Learning Rate: 2.94e-04 |
|
2025-08-28 23:54:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:54:51 - pico-train - INFO - Step 6700 -- ๐ Training Metrics |
|
2025-08-28 23:54:51 - pico-train - INFO - โโโ Loss: 5.8544 |
|
2025-08-28 23:54:51 - pico-train - INFO - โโโ Learning Rate: 2.94e-04 |
|
2025-08-28 23:54:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:55:42 - pico-train - INFO - Step 6800 -- ๐ Training Metrics |
|
2025-08-28 23:55:42 - pico-train - INFO - โโโ Loss: 5.8532 |
|
2025-08-28 23:55:42 - pico-train - INFO - โโโ Learning Rate: 2.93e-04 |
|
2025-08-28 23:55:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:56:33 - pico-train - INFO - Step 6900 -- ๐ Training Metrics |
|
2025-08-28 23:56:33 - pico-train - INFO - โโโ Loss: 5.7950 |
|
2025-08-28 23:56:33 - pico-train - INFO - โโโ Learning Rate: 2.93e-04 |
|
2025-08-28 23:56:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:57:24 - pico-train - INFO - Step 7000 -- ๐พ Saving Checkpoint |
|
2025-08-28 23:59:22 - pico-train - INFO - Step 7000 -- ๐ Evaluation Results |
|
2025-08-28 23:59:22 - pico-train - INFO - โโโ paloma: 9.22180682233585e+28 |
|
2025-08-28 23:59:23 - pico-train - INFO - Step 7000 -- ๐ Training Metrics |
|
2025-08-28 23:59:23 - pico-train - INFO - โโโ Loss: 5.8146 |
|
2025-08-28 23:59:23 - pico-train - INFO - โโโ Learning Rate: 2.93e-04 |
|
2025-08-28 23:59:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-28 23:59:23 - pico-train - INFO - Step 7000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 00:00:17 - pico-train - INFO - Step 7100 -- ๐ Training Metrics |
|
2025-08-29 00:00:17 - pico-train - INFO - โโโ Loss: 5.7930 |
|
2025-08-29 00:00:17 - pico-train - INFO - โโโ Learning Rate: 2.93e-04 |
|
2025-08-29 00:00:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:01:09 - pico-train - INFO - Step 7200 -- ๐ Training Metrics |
|
2025-08-29 00:01:09 - pico-train - INFO - โโโ Loss: 5.7827 |
|
2025-08-29 00:01:09 - pico-train - INFO - โโโ Learning Rate: 2.93e-04 |
|
2025-08-29 00:01:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:02:00 - pico-train - INFO - Step 7300 -- ๐ Training Metrics |
|
2025-08-29 00:02:00 - pico-train - INFO - โโโ Loss: 5.7816 |
|
2025-08-29 00:02:00 - pico-train - INFO - โโโ Learning Rate: 2.93e-04 |
|
2025-08-29 00:02:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:02:51 - pico-train - INFO - Step 7400 -- ๐ Training Metrics |
|
2025-08-29 00:02:51 - pico-train - INFO - โโโ Loss: 5.7300 |
|
2025-08-29 00:02:51 - pico-train - INFO - โโโ Learning Rate: 2.93e-04 |
|
2025-08-29 00:02:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:03:42 - pico-train - INFO - Step 7500 -- ๐ Training Metrics |
|
2025-08-29 00:03:42 - pico-train - INFO - โโโ Loss: 5.7670 |
|
2025-08-29 00:03:42 - pico-train - INFO - โโโ Learning Rate: 2.92e-04 |
|
2025-08-29 00:03:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:04:33 - pico-train - INFO - Step 7600 -- ๐ Training Metrics |
|
2025-08-29 00:04:33 - pico-train - INFO - โโโ Loss: 5.7450 |
|
2025-08-29 00:04:33 - pico-train - INFO - โโโ Learning Rate: 2.92e-04 |
|
2025-08-29 00:04:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:05:25 - pico-train - INFO - Step 7700 -- ๐ Training Metrics |
|
2025-08-29 00:05:25 - pico-train - INFO - โโโ Loss: 5.7499 |
|
2025-08-29 00:05:25 - pico-train - INFO - โโโ Learning Rate: 2.92e-04 |
|
2025-08-29 00:05:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:06:16 - pico-train - INFO - Step 7800 -- ๐ Training Metrics |
|
2025-08-29 00:06:16 - pico-train - INFO - โโโ Loss: 5.7233 |
|
2025-08-29 00:06:16 - pico-train - INFO - โโโ Learning Rate: 2.92e-04 |
|
2025-08-29 00:06:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:07:07 - pico-train - INFO - Step 7900 -- ๐ Training Metrics |
|
2025-08-29 00:07:07 - pico-train - INFO - โโโ Loss: 5.7219 |
|
2025-08-29 00:07:07 - pico-train - INFO - โโโ Learning Rate: 2.92e-04 |
|
2025-08-29 00:07:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:07:57 - pico-train - INFO - Step 8000 -- ๐พ Saving Checkpoint |
|
2025-08-29 00:10:09 - pico-train - INFO - Step 8000 -- ๐ Evaluation Results |
|
2025-08-29 00:10:09 - pico-train - INFO - โโโ paloma: 3.1300823362207656e+29 |
|
2025-08-29 00:10:11 - pico-train - INFO - Step 8000 -- ๐ Training Metrics |
|
2025-08-29 00:10:11 - pico-train - INFO - โโโ Loss: 5.7523 |
|
2025-08-29 00:10:11 - pico-train - INFO - โโโ Learning Rate: 2.92e-04 |
|
2025-08-29 00:10:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:10:11 - pico-train - INFO - Step 8000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 00:11:05 - pico-train - INFO - Step 8100 -- ๐ Training Metrics |
|
2025-08-29 00:11:05 - pico-train - INFO - โโโ Loss: 5.7145 |
|
2025-08-29 00:11:05 - pico-train - INFO - โโโ Learning Rate: 2.91e-04 |
|
2025-08-29 00:11:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:11:57 - pico-train - INFO - Step 8200 -- ๐ Training Metrics |
|
2025-08-29 00:11:57 - pico-train - INFO - โโโ Loss: 5.7469 |
|
2025-08-29 00:11:57 - pico-train - INFO - โโโ Learning Rate: 2.91e-04 |
|
2025-08-29 00:11:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:12:48 - pico-train - INFO - Step 8300 -- ๐ Training Metrics |
|
2025-08-29 00:12:48 - pico-train - INFO - โโโ Loss: 5.7363 |
|
2025-08-29 00:12:48 - pico-train - INFO - โโโ Learning Rate: 2.91e-04 |
|
2025-08-29 00:12:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:13:38 - pico-train - INFO - Step 8400 -- ๐ Training Metrics |
|
2025-08-29 00:13:38 - pico-train - INFO - โโโ Loss: 5.6938 |
|
2025-08-29 00:13:38 - pico-train - INFO - โโโ Learning Rate: 2.91e-04 |
|
2025-08-29 00:13:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:14:29 - pico-train - INFO - Step 8500 -- ๐ Training Metrics |
|
2025-08-29 00:14:29 - pico-train - INFO - โโโ Loss: 5.6994 |
|
2025-08-29 00:14:29 - pico-train - INFO - โโโ Learning Rate: 2.91e-04 |
|
2025-08-29 00:14:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:15:20 - pico-train - INFO - Step 8600 -- ๐ Training Metrics |
|
2025-08-29 00:15:20 - pico-train - INFO - โโโ Loss: 5.6583 |
|
2025-08-29 00:15:20 - pico-train - INFO - โโโ Learning Rate: 2.91e-04 |
|
2025-08-29 00:15:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:16:11 - pico-train - INFO - Step 8700 -- ๐ Training Metrics |
|
2025-08-29 00:16:11 - pico-train - INFO - โโโ Loss: 5.6885 |
|
2025-08-29 00:16:11 - pico-train - INFO - โโโ Learning Rate: 2.91e-04 |
|
2025-08-29 00:16:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:17:02 - pico-train - INFO - Step 8800 -- ๐ Training Metrics |
|
2025-08-29 00:17:02 - pico-train - INFO - โโโ Loss: 5.6313 |
|
2025-08-29 00:17:02 - pico-train - INFO - โโโ Learning Rate: 2.90e-04 |
|
2025-08-29 00:17:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:17:53 - pico-train - INFO - Step 8900 -- ๐ Training Metrics |
|
2025-08-29 00:17:53 - pico-train - INFO - โโโ Loss: 5.6314 |
|
2025-08-29 00:17:53 - pico-train - INFO - โโโ Learning Rate: 2.90e-04 |
|
2025-08-29 00:17:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:18:44 - pico-train - INFO - Step 9000 -- ๐พ Saving Checkpoint |
|
2025-08-29 00:20:42 - pico-train - INFO - Step 9000 -- ๐ Evaluation Results |
|
2025-08-29 00:20:42 - pico-train - INFO - โโโ paloma: 4.983924509492406e+30 |
|
2025-08-29 00:20:43 - pico-train - INFO - Step 9000 -- ๐ Training Metrics |
|
2025-08-29 00:20:43 - pico-train - INFO - โโโ Loss: 5.6501 |
|
2025-08-29 00:20:43 - pico-train - INFO - โโโ Learning Rate: 2.90e-04 |
|
2025-08-29 00:20:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:20:43 - pico-train - INFO - Step 9000 -- ๐ Saving Learning Dynamics |
|
2025-08-29 00:21:37 - pico-train - INFO - Step 9100 -- ๐ Training Metrics |
|
2025-08-29 00:21:37 - pico-train - INFO - โโโ Loss: 5.6357 |
|
2025-08-29 00:21:37 - pico-train - INFO - โโโ Learning Rate: 2.90e-04 |
|
2025-08-29 00:21:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:22:28 - pico-train - INFO - Step 9200 -- ๐ Training Metrics |
|
2025-08-29 00:22:28 - pico-train - INFO - โโโ Loss: 5.6045 |
|
2025-08-29 00:22:28 - pico-train - INFO - โโโ Learning Rate: 2.90e-04 |
|
2025-08-29 00:22:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:23:19 - pico-train - INFO - Step 9300 -- ๐ Training Metrics |
|
2025-08-29 00:23:19 - pico-train - INFO - โโโ Loss: 5.6405 |
|
2025-08-29 00:23:19 - pico-train - INFO - โโโ Learning Rate: 2.90e-04 |
|
2025-08-29 00:23:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:24:10 - pico-train - INFO - Step 9400 -- ๐ Training Metrics |
|
2025-08-29 00:24:10 - pico-train - INFO - โโโ Loss: 5.6241 |
|
2025-08-29 00:24:10 - pico-train - INFO - โโโ Learning Rate: 2.90e-04 |
|
2025-08-29 00:24:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:25:00 - pico-train - INFO - Step 9500 -- ๐ Training Metrics |
|
2025-08-29 00:25:00 - pico-train - INFO - โโโ Loss: 5.6247 |
|
2025-08-29 00:25:00 - pico-train - INFO - โโโ Learning Rate: 2.89e-04 |
|
2025-08-29 00:25:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:25:51 - pico-train - INFO - Step 9600 -- ๐ Training Metrics |
|
2025-08-29 00:25:51 - pico-train - INFO - โโโ Loss: 5.5983 |
|
2025-08-29 00:25:51 - pico-train - INFO - โโโ Learning Rate: 2.89e-04 |
|
2025-08-29 00:25:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:26:43 - pico-train - INFO - Step 9700 -- ๐ Training Metrics |
|
2025-08-29 00:26:43 - pico-train - INFO - โโโ Loss: 5.5978 |
|
2025-08-29 00:26:43 - pico-train - INFO - โโโ Learning Rate: 2.89e-04 |
|
2025-08-29 00:26:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-29 00:27:34 - pico-train - INFO - Step 9800 -- ๐ Training Metrics |
|
2025-08-29 00:27:34 - pico-train - INFO - โโโ Loss: 5.5746 |
|
2025-08-29 00:27:34 - pico-train - INFO - โโโ Learning Rate: 2.89e-04 |
|
2025-08-29 00:27:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
|