|
2025-08-30 18:43:39 - pico-train - INFO - Step 78000 -- ๐ Evaluation Results |
|
2025-08-30 18:43:39 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 18:43:39 - pico-train - INFO - ================================================== |
|
2025-08-30 18:43:39 - pico-train - INFO - โจ Training Configuration |
|
2025-08-30 18:43:39 - pico-train - INFO - ================================================== |
|
2025-08-30 18:43:39 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ checkpointing: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ checkpoints_dir: checkpoints โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ evaluation: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ eval_results_dir: eval_results โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ hf_checkpoint: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ collection_slug: null โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ learning_dynamics: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ eval_data: null โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ layer_suffixes: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ - attention.v_proj โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ - attention.o_proj โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ - swiglu.w_2 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ sequence_idx: -1 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ logs_dir: logs โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma10M-v1 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ runs_dir: runs โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ save_every_n_steps: 2000 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ save_to_hf: true โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ training: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ auto_resume: true โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ data: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ dataloader: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ batch_size: 16 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ dataset: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ name: ThomasTheMaker/pretokenized-dolma-10M โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ tokenizer: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ evaluation: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ metrics: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ - paloma โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ paloma: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ dataset_split: val โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ max_length: 2048 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ model: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ activation_hidden_dim: 384 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ attention_n_heads: 12 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ attention_n_kv_heads: 4 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ batch_size: 1024 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ d_model: 96 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ max_seq_len: 2048 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ model_type: pico_decoder โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ n_layers: 12 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ norm_eps: 1.0e-06 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ position_emb_theta: 10000.0 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ monitoring: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ logging: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ log_every_n_steps: 100 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ log_level: INFO โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ save_to_wandb: false โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ wandb: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ entity: boymyc โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ project: pico-decoder-tiny โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ training: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ fabric: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ accelerator: cuda โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ num_devices: 1 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ num_nodes: 1 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ precision: bf16-mixed โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ max_steps: 100000 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ optimization: โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ gradient_accumulation_steps: 1 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ lr: 0.0002 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ lr_scheduler: cosine โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ lr_warmup_steps: 2000 โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ optimizer: adamw โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โ โ |
|
2025-08-30 18:43:39 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ |
|
2025-08-30 18:43:39 - pico-train - INFO - ================================================== |
|
2025-08-30 18:43:39 - pico-train - INFO - โญ Runtime Summary: |
|
2025-08-30 18:43:39 - pico-train - INFO - ================================================== |
|
2025-08-30 18:43:39 - pico-train - INFO - Starting from step: 78000 |
|
2025-08-30 18:43:39 - pico-train - INFO - Model Setup: |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Total Parameters: 11,282,784 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 |
|
2025-08-30 18:43:39 - pico-train - INFO - Distributed Setup: |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Number of Devices: 1 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Device Type: NVIDIA H100 80GB HBM3 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Available Memory: 85.03 GB |
|
2025-08-30 18:43:39 - pico-train - INFO - Software Setup: |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Python Version: 3.12.3 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ CUDA Version: 12.8 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Operating System: Linux 6.8.0-71-generic |
|
2025-08-30 18:43:39 - pico-train - INFO - Batch Size Configuration: |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Global Batch Size: 16 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Per Device Batch Size: 16 |
|
2025-08-30 18:43:39 - pico-train - INFO - โโ Gradient Accumulation Steps: 1 |
|
2025-08-30 18:43:39 - pico-train - INFO - ================================================== |
|
2025-08-30 18:43:40 - pico-train - INFO - Step 78000 -- ๐ Training Metrics |
|
2025-08-30 18:43:40 - pico-train - INFO - โโโ Loss: 4.5461 |
|
2025-08-30 18:43:40 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 |
|
2025-08-30 18:43:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:43:40 - pico-train - INFO - Step 78000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 18:44:34 - pico-train - INFO - Step 78100 -- ๐ Training Metrics |
|
2025-08-30 18:44:34 - pico-train - INFO - โโโ Loss: 4.7732 |
|
2025-08-30 18:44:34 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 |
|
2025-08-30 18:44:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:45:26 - pico-train - INFO - Step 78200 -- ๐ Training Metrics |
|
2025-08-30 18:45:26 - pico-train - INFO - โโโ Loss: 4.7809 |
|
2025-08-30 18:45:26 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 |
|
2025-08-30 18:45:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:46:18 - pico-train - INFO - Step 78300 -- ๐ Training Metrics |
|
2025-08-30 18:46:18 - pico-train - INFO - โโโ Loss: 4.7659 |
|
2025-08-30 18:46:18 - pico-train - INFO - โโโ Learning Rate: 2.32e-05 |
|
2025-08-30 18:46:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:47:16 - pico-train - INFO - Step 78400 -- ๐ Training Metrics |
|
2025-08-30 18:47:16 - pico-train - INFO - โโโ Loss: 4.7466 |
|
2025-08-30 18:47:16 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
|
2025-08-30 18:47:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:48:27 - pico-train - INFO - Step 78500 -- ๐ Training Metrics |
|
2025-08-30 18:48:27 - pico-train - INFO - โโโ Loss: 4.8076 |
|
2025-08-30 18:48:27 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 |
|
2025-08-30 18:48:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:49:39 - pico-train - INFO - Step 78600 -- ๐ Training Metrics |
|
2025-08-30 18:49:39 - pico-train - INFO - โโโ Loss: 4.7884 |
|
2025-08-30 18:49:39 - pico-train - INFO - โโโ Learning Rate: 2.26e-05 |
|
2025-08-30 18:49:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:50:50 - pico-train - INFO - Step 78700 -- ๐ Training Metrics |
|
2025-08-30 18:50:50 - pico-train - INFO - โโโ Loss: 4.7882 |
|
2025-08-30 18:50:50 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 |
|
2025-08-30 18:50:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:51:55 - pico-train - INFO - Step 78800 -- ๐ Training Metrics |
|
2025-08-30 18:51:55 - pico-train - INFO - โโโ Loss: 4.7942 |
|
2025-08-30 18:51:55 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 |
|
2025-08-30 18:51:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:52:48 - pico-train - INFO - Step 78900 -- ๐ Training Metrics |
|
2025-08-30 18:52:48 - pico-train - INFO - โโโ Loss: 4.7966 |
|
2025-08-30 18:52:48 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
|
2025-08-30 18:52:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:53:41 - pico-train - INFO - Step 79000 -- ๐ Training Metrics |
|
2025-08-30 18:53:41 - pico-train - INFO - โโโ Loss: 4.7800 |
|
2025-08-30 18:53:41 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 |
|
2025-08-30 18:53:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:54:34 - pico-train - INFO - Step 79100 -- ๐ Training Metrics |
|
2025-08-30 18:54:34 - pico-train - INFO - โโโ Loss: 4.7808 |
|
2025-08-30 18:54:34 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 |
|
2025-08-30 18:54:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:55:27 - pico-train - INFO - Step 79200 -- ๐ Training Metrics |
|
2025-08-30 18:55:27 - pico-train - INFO - โโโ Loss: 4.7704 |
|
2025-08-30 18:55:27 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 |
|
2025-08-30 18:55:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:56:20 - pico-train - INFO - Step 79300 -- ๐ Training Metrics |
|
2025-08-30 18:56:20 - pico-train - INFO - โโโ Loss: 4.7921 |
|
2025-08-30 18:56:20 - pico-train - INFO - โโโ Learning Rate: 2.12e-05 |
|
2025-08-30 18:56:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:57:12 - pico-train - INFO - Step 79400 -- ๐ Training Metrics |
|
2025-08-30 18:57:12 - pico-train - INFO - โโโ Loss: 4.7701 |
|
2025-08-30 18:57:12 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
|
2025-08-30 18:57:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:58:05 - pico-train - INFO - Step 79500 -- ๐ Training Metrics |
|
2025-08-30 18:58:05 - pico-train - INFO - โโโ Loss: 4.7990 |
|
2025-08-30 18:58:05 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 |
|
2025-08-30 18:58:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:58:58 - pico-train - INFO - Step 79600 -- ๐ Training Metrics |
|
2025-08-30 18:58:58 - pico-train - INFO - โโโ Loss: 4.7864 |
|
2025-08-30 18:58:58 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 |
|
2025-08-30 18:58:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:59:51 - pico-train - INFO - Step 79700 -- ๐ Training Metrics |
|
2025-08-30 18:59:51 - pico-train - INFO - โโโ Loss: 4.7747 |
|
2025-08-30 18:59:51 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 |
|
2025-08-30 18:59:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:00:44 - pico-train - INFO - Step 79800 -- ๐ Training Metrics |
|
2025-08-30 19:00:44 - pico-train - INFO - โโโ Loss: 4.7703 |
|
2025-08-30 19:00:44 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 |
|
2025-08-30 19:00:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:01:37 - pico-train - INFO - Step 79900 -- ๐ Training Metrics |
|
2025-08-30 19:01:37 - pico-train - INFO - โโโ Loss: 4.7738 |
|
2025-08-30 19:01:37 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 |
|
2025-08-30 19:01:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:02:29 - pico-train - INFO - Step 80000 -- ๐พ Saving Checkpoint |
|
2025-08-30 19:04:30 - pico-train - INFO - Step 80000 -- ๐ Evaluation Results |
|
2025-08-30 19:04:30 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 19:04:31 - pico-train - INFO - Step 80000 -- ๐ Training Metrics |
|
2025-08-30 19:04:31 - pico-train - INFO - โโโ Loss: 4.7781 |
|
2025-08-30 19:04:31 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:04:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:04:31 - pico-train - INFO - Step 80000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 19:05:25 - pico-train - INFO - Step 80100 -- ๐ Training Metrics |
|
2025-08-30 19:05:25 - pico-train - INFO - โโโ Loss: 4.8125 |
|
2025-08-30 19:05:25 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:05:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:06:17 - pico-train - INFO - Step 80200 -- ๐ Training Metrics |
|
2025-08-30 19:06:17 - pico-train - INFO - โโโ Loss: 4.7764 |
|
2025-08-30 19:06:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:06:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:07:09 - pico-train - INFO - Step 80300 -- ๐ Training Metrics |
|
2025-08-30 19:07:09 - pico-train - INFO - โโโ Loss: 4.7498 |
|
2025-08-30 19:07:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:07:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:08:02 - pico-train - INFO - Step 80400 -- ๐ Training Metrics |
|
2025-08-30 19:08:02 - pico-train - INFO - โโโ Loss: 4.7809 |
|
2025-08-30 19:08:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:08:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:08:55 - pico-train - INFO - Step 80500 -- ๐ Training Metrics |
|
2025-08-30 19:08:55 - pico-train - INFO - โโโ Loss: 4.7766 |
|
2025-08-30 19:08:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:08:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:09:48 - pico-train - INFO - Step 80600 -- ๐ Training Metrics |
|
2025-08-30 19:09:48 - pico-train - INFO - โโโ Loss: 4.7933 |
|
2025-08-30 19:09:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:09:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:10:40 - pico-train - INFO - Step 80700 -- ๐ Training Metrics |
|
2025-08-30 19:10:40 - pico-train - INFO - โโโ Loss: 4.7826 |
|
2025-08-30 19:10:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:10:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:11:32 - pico-train - INFO - Step 80800 -- ๐ Training Metrics |
|
2025-08-30 19:11:32 - pico-train - INFO - โโโ Loss: 4.7968 |
|
2025-08-30 19:11:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:11:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:12:24 - pico-train - INFO - Step 80900 -- ๐ Training Metrics |
|
2025-08-30 19:12:24 - pico-train - INFO - โโโ Loss: 4.8019 |
|
2025-08-30 19:12:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:12:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:13:16 - pico-train - INFO - Step 81000 -- ๐ Training Metrics |
|
2025-08-30 19:13:16 - pico-train - INFO - โโโ Loss: 4.7786 |
|
2025-08-30 19:13:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:13:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:14:07 - pico-train - INFO - Step 81100 -- ๐ Training Metrics |
|
2025-08-30 19:14:07 - pico-train - INFO - โโโ Loss: 4.7870 |
|
2025-08-30 19:14:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:14:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:14:59 - pico-train - INFO - Step 81200 -- ๐ Training Metrics |
|
2025-08-30 19:14:59 - pico-train - INFO - โโโ Loss: 4.7989 |
|
2025-08-30 19:14:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:14:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:15:51 - pico-train - INFO - Step 81300 -- ๐ Training Metrics |
|
2025-08-30 19:15:51 - pico-train - INFO - โโโ Loss: 4.8003 |
|
2025-08-30 19:15:51 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:15:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:16:44 - pico-train - INFO - Step 81400 -- ๐ Training Metrics |
|
2025-08-30 19:16:44 - pico-train - INFO - โโโ Loss: 4.7783 |
|
2025-08-30 19:16:44 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:16:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:17:36 - pico-train - INFO - Step 81500 -- ๐ Training Metrics |
|
2025-08-30 19:17:36 - pico-train - INFO - โโโ Loss: 4.7549 |
|
2025-08-30 19:17:36 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:17:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:18:28 - pico-train - INFO - Step 81600 -- ๐ Training Metrics |
|
2025-08-30 19:18:28 - pico-train - INFO - โโโ Loss: 4.7775 |
|
2025-08-30 19:18:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:18:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:19:19 - pico-train - INFO - Step 81700 -- ๐ Training Metrics |
|
2025-08-30 19:19:19 - pico-train - INFO - โโโ Loss: 4.7858 |
|
2025-08-30 19:19:19 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:19:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:20:11 - pico-train - INFO - Step 81800 -- ๐ Training Metrics |
|
2025-08-30 19:20:11 - pico-train - INFO - โโโ Loss: 4.7789 |
|
2025-08-30 19:20:11 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:20:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:21:03 - pico-train - INFO - Step 81900 -- ๐ Training Metrics |
|
2025-08-30 19:21:03 - pico-train - INFO - โโโ Loss: 4.7737 |
|
2025-08-30 19:21:03 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:21:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:21:55 - pico-train - INFO - Step 82000 -- ๐พ Saving Checkpoint |
|
2025-08-30 19:23:45 - pico-train - INFO - Step 82000 -- ๐ Evaluation Results |
|
2025-08-30 19:23:45 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 19:23:46 - pico-train - INFO - Step 82000 -- ๐ Training Metrics |
|
2025-08-30 19:23:46 - pico-train - INFO - โโโ Loss: 4.7934 |
|
2025-08-30 19:23:46 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:23:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:23:46 - pico-train - INFO - Step 82000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 19:24:40 - pico-train - INFO - Step 82100 -- ๐ Training Metrics |
|
2025-08-30 19:24:40 - pico-train - INFO - โโโ Loss: 4.7784 |
|
2025-08-30 19:24:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:24:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:25:32 - pico-train - INFO - Step 82200 -- ๐ Training Metrics |
|
2025-08-30 19:25:32 - pico-train - INFO - โโโ Loss: 4.7837 |
|
2025-08-30 19:25:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:25:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:26:24 - pico-train - INFO - Step 82300 -- ๐ Training Metrics |
|
2025-08-30 19:26:24 - pico-train - INFO - โโโ Loss: 4.7611 |
|
2025-08-30 19:26:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:26:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:27:16 - pico-train - INFO - Step 82400 -- ๐ Training Metrics |
|
2025-08-30 19:27:16 - pico-train - INFO - โโโ Loss: 4.7873 |
|
2025-08-30 19:27:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:27:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:28:09 - pico-train - INFO - Step 82500 -- ๐ Training Metrics |
|
2025-08-30 19:28:09 - pico-train - INFO - โโโ Loss: 4.7805 |
|
2025-08-30 19:28:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:28:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:29:02 - pico-train - INFO - Step 82600 -- ๐ Training Metrics |
|
2025-08-30 19:29:02 - pico-train - INFO - โโโ Loss: 4.7728 |
|
2025-08-30 19:29:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:29:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:29:53 - pico-train - INFO - Step 82700 -- ๐ Training Metrics |
|
2025-08-30 19:29:53 - pico-train - INFO - โโโ Loss: 4.7685 |
|
2025-08-30 19:29:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:29:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:30:45 - pico-train - INFO - Step 82800 -- ๐ Training Metrics |
|
2025-08-30 19:30:45 - pico-train - INFO - โโโ Loss: 4.7772 |
|
2025-08-30 19:30:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:30:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:31:37 - pico-train - INFO - Step 82900 -- ๐ Training Metrics |
|
2025-08-30 19:31:37 - pico-train - INFO - โโโ Loss: 4.7580 |
|
2025-08-30 19:31:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:31:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:32:30 - pico-train - INFO - Step 83000 -- ๐ Training Metrics |
|
2025-08-30 19:32:30 - pico-train - INFO - โโโ Loss: 4.7907 |
|
2025-08-30 19:32:30 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:32:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:33:23 - pico-train - INFO - Step 83100 -- ๐ Training Metrics |
|
2025-08-30 19:33:23 - pico-train - INFO - โโโ Loss: 4.7721 |
|
2025-08-30 19:33:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:33:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:34:16 - pico-train - INFO - Step 83200 -- ๐ Training Metrics |
|
2025-08-30 19:34:16 - pico-train - INFO - โโโ Loss: 4.7750 |
|
2025-08-30 19:34:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:34:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:35:09 - pico-train - INFO - Step 83300 -- ๐ Training Metrics |
|
2025-08-30 19:35:09 - pico-train - INFO - โโโ Loss: 4.7808 |
|
2025-08-30 19:35:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:35:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:36:02 - pico-train - INFO - Step 83400 -- ๐ Training Metrics |
|
2025-08-30 19:36:02 - pico-train - INFO - โโโ Loss: 4.7869 |
|
2025-08-30 19:36:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:36:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:36:55 - pico-train - INFO - Step 83500 -- ๐ Training Metrics |
|
2025-08-30 19:36:55 - pico-train - INFO - โโโ Loss: 4.7670 |
|
2025-08-30 19:36:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:36:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:37:48 - pico-train - INFO - Step 83600 -- ๐ Training Metrics |
|
2025-08-30 19:37:48 - pico-train - INFO - โโโ Loss: 4.7615 |
|
2025-08-30 19:37:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:37:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:38:40 - pico-train - INFO - Step 83700 -- ๐ Training Metrics |
|
2025-08-30 19:38:40 - pico-train - INFO - โโโ Loss: 4.7976 |
|
2025-08-30 19:38:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:38:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:39:32 - pico-train - INFO - Step 83800 -- ๐ Training Metrics |
|
2025-08-30 19:39:32 - pico-train - INFO - โโโ Loss: 4.7549 |
|
2025-08-30 19:39:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:39:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:40:24 - pico-train - INFO - Step 83900 -- ๐ Training Metrics |
|
2025-08-30 19:40:24 - pico-train - INFO - โโโ Loss: 4.7879 |
|
2025-08-30 19:40:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:40:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:41:15 - pico-train - INFO - Step 84000 -- ๐พ Saving Checkpoint |
|
2025-08-30 19:43:17 - pico-train - INFO - Step 84000 -- ๐ Evaluation Results |
|
2025-08-30 19:43:17 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 19:43:18 - pico-train - INFO - Step 84000 -- ๐ Training Metrics |
|
2025-08-30 19:43:18 - pico-train - INFO - โโโ Loss: 4.7979 |
|
2025-08-30 19:43:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:43:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:43:18 - pico-train - INFO - Step 84000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 19:44:12 - pico-train - INFO - Step 84100 -- ๐ Training Metrics |
|
2025-08-30 19:44:12 - pico-train - INFO - โโโ Loss: 4.8088 |
|
2025-08-30 19:44:12 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:44:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:45:04 - pico-train - INFO - Step 84200 -- ๐ Training Metrics |
|
2025-08-30 19:45:04 - pico-train - INFO - โโโ Loss: 4.7678 |
|
2025-08-30 19:45:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:45:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:45:56 - pico-train - INFO - Step 84300 -- ๐ Training Metrics |
|
2025-08-30 19:45:56 - pico-train - INFO - โโโ Loss: 4.7725 |
|
2025-08-30 19:45:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:45:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:46:48 - pico-train - INFO - Step 84400 -- ๐ Training Metrics |
|
2025-08-30 19:46:48 - pico-train - INFO - โโโ Loss: 4.7841 |
|
2025-08-30 19:46:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:46:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:47:40 - pico-train - INFO - Step 84500 -- ๐ Training Metrics |
|
2025-08-30 19:47:40 - pico-train - INFO - โโโ Loss: 4.7708 |
|
2025-08-30 19:47:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:47:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:48:32 - pico-train - INFO - Step 84600 -- ๐ Training Metrics |
|
2025-08-30 19:48:32 - pico-train - INFO - โโโ Loss: 4.7748 |
|
2025-08-30 19:48:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:48:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:49:24 - pico-train - INFO - Step 84700 -- ๐ Training Metrics |
|
2025-08-30 19:49:24 - pico-train - INFO - โโโ Loss: 4.7714 |
|
2025-08-30 19:49:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:49:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:50:16 - pico-train - INFO - Step 84800 -- ๐ Training Metrics |
|
2025-08-30 19:50:16 - pico-train - INFO - โโโ Loss: 4.7860 |
|
2025-08-30 19:50:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:50:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:51:09 - pico-train - INFO - Step 84900 -- ๐ Training Metrics |
|
2025-08-30 19:51:09 - pico-train - INFO - โโโ Loss: 4.7671 |
|
2025-08-30 19:51:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:51:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:52:02 - pico-train - INFO - Step 85000 -- ๐ Training Metrics |
|
2025-08-30 19:52:02 - pico-train - INFO - โโโ Loss: 4.7753 |
|
2025-08-30 19:52:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:52:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:52:55 - pico-train - INFO - Step 85100 -- ๐ Training Metrics |
|
2025-08-30 19:52:55 - pico-train - INFO - โโโ Loss: 4.7335 |
|
2025-08-30 19:52:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:52:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:53:48 - pico-train - INFO - Step 85200 -- ๐ Training Metrics |
|
2025-08-30 19:53:48 - pico-train - INFO - โโโ Loss: 4.7700 |
|
2025-08-30 19:53:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:53:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:54:41 - pico-train - INFO - Step 85300 -- ๐ Training Metrics |
|
2025-08-30 19:54:41 - pico-train - INFO - โโโ Loss: 4.7800 |
|
2025-08-30 19:54:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:54:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:55:34 - pico-train - INFO - Step 85400 -- ๐ Training Metrics |
|
2025-08-30 19:55:34 - pico-train - INFO - โโโ Loss: 4.7782 |
|
2025-08-30 19:55:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:55:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:56:27 - pico-train - INFO - Step 85500 -- ๐ Training Metrics |
|
2025-08-30 19:56:27 - pico-train - INFO - โโโ Loss: 4.7698 |
|
2025-08-30 19:56:27 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:56:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:57:21 - pico-train - INFO - Step 85600 -- ๐ Training Metrics |
|
2025-08-30 19:57:21 - pico-train - INFO - โโโ Loss: 4.7835 |
|
2025-08-30 19:57:21 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:57:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:58:14 - pico-train - INFO - Step 85700 -- ๐ Training Metrics |
|
2025-08-30 19:58:14 - pico-train - INFO - โโโ Loss: 4.7651 |
|
2025-08-30 19:58:14 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:58:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:59:06 - pico-train - INFO - Step 85800 -- ๐ Training Metrics |
|
2025-08-30 19:59:06 - pico-train - INFO - โโโ Loss: 4.7900 |
|
2025-08-30 19:59:06 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:59:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 19:59:58 - pico-train - INFO - Step 85900 -- ๐ Training Metrics |
|
2025-08-30 19:59:58 - pico-train - INFO - โโโ Loss: 4.7797 |
|
2025-08-30 19:59:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 19:59:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:00:50 - pico-train - INFO - Step 86000 -- ๐พ Saving Checkpoint |
|
2025-08-30 20:02:55 - pico-train - INFO - Step 86000 -- ๐ Evaluation Results |
|
2025-08-30 20:02:55 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 20:02:56 - pico-train - INFO - Step 86000 -- ๐ Training Metrics |
|
2025-08-30 20:02:56 - pico-train - INFO - โโโ Loss: 4.7650 |
|
2025-08-30 20:02:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:02:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:02:56 - pico-train - INFO - Step 86000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 20:03:51 - pico-train - INFO - Step 86100 -- ๐ Training Metrics |
|
2025-08-30 20:03:51 - pico-train - INFO - โโโ Loss: 4.7682 |
|
2025-08-30 20:03:51 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:03:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:04:44 - pico-train - INFO - Step 86200 -- ๐ Training Metrics |
|
2025-08-30 20:04:44 - pico-train - INFO - โโโ Loss: 4.7968 |
|
2025-08-30 20:04:44 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:04:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:05:37 - pico-train - INFO - Step 86300 -- ๐ Training Metrics |
|
2025-08-30 20:05:37 - pico-train - INFO - โโโ Loss: 4.7895 |
|
2025-08-30 20:05:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:05:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:06:30 - pico-train - INFO - Step 86400 -- ๐ Training Metrics |
|
2025-08-30 20:06:30 - pico-train - INFO - โโโ Loss: 4.7680 |
|
2025-08-30 20:06:30 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:06:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:07:23 - pico-train - INFO - Step 86500 -- ๐ Training Metrics |
|
2025-08-30 20:07:23 - pico-train - INFO - โโโ Loss: 4.7686 |
|
2025-08-30 20:07:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:07:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:08:16 - pico-train - INFO - Step 86600 -- ๐ Training Metrics |
|
2025-08-30 20:08:16 - pico-train - INFO - โโโ Loss: 4.7828 |
|
2025-08-30 20:08:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:08:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:09:10 - pico-train - INFO - Step 86700 -- ๐ Training Metrics |
|
2025-08-30 20:09:10 - pico-train - INFO - โโโ Loss: 4.7595 |
|
2025-08-30 20:09:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:09:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:10:02 - pico-train - INFO - Step 86800 -- ๐ Training Metrics |
|
2025-08-30 20:10:02 - pico-train - INFO - โโโ Loss: 4.7808 |
|
2025-08-30 20:10:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:10:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:10:56 - pico-train - INFO - Step 86900 -- ๐ Training Metrics |
|
2025-08-30 20:10:56 - pico-train - INFO - โโโ Loss: 4.7668 |
|
2025-08-30 20:10:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:10:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:11:49 - pico-train - INFO - Step 87000 -- ๐ Training Metrics |
|
2025-08-30 20:11:49 - pico-train - INFO - โโโ Loss: 4.7481 |
|
2025-08-30 20:11:49 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:11:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:12:41 - pico-train - INFO - Step 87100 -- ๐ Training Metrics |
|
2025-08-30 20:12:41 - pico-train - INFO - โโโ Loss: 4.7536 |
|
2025-08-30 20:12:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:12:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:13:32 - pico-train - INFO - Step 87200 -- ๐ Training Metrics |
|
2025-08-30 20:13:32 - pico-train - INFO - โโโ Loss: 4.7748 |
|
2025-08-30 20:13:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:13:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:14:26 - pico-train - INFO - Step 87300 -- ๐ Training Metrics |
|
2025-08-30 20:14:26 - pico-train - INFO - โโโ Loss: 4.7597 |
|
2025-08-30 20:14:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:14:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:15:19 - pico-train - INFO - Step 87400 -- ๐ Training Metrics |
|
2025-08-30 20:15:19 - pico-train - INFO - โโโ Loss: 4.7862 |
|
2025-08-30 20:15:19 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:15:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:16:12 - pico-train - INFO - Step 87500 -- ๐ Training Metrics |
|
2025-08-30 20:16:12 - pico-train - INFO - โโโ Loss: 4.7682 |
|
2025-08-30 20:16:12 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:16:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:17:05 - pico-train - INFO - Step 87600 -- ๐ Training Metrics |
|
2025-08-30 20:17:05 - pico-train - INFO - โโโ Loss: 4.8045 |
|
2025-08-30 20:17:05 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:17:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:17:58 - pico-train - INFO - Step 87700 -- ๐ Training Metrics |
|
2025-08-30 20:17:58 - pico-train - INFO - โโโ Loss: 4.7911 |
|
2025-08-30 20:17:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:17:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:18:51 - pico-train - INFO - Step 87800 -- ๐ Training Metrics |
|
2025-08-30 20:18:51 - pico-train - INFO - โโโ Loss: 4.7530 |
|
2025-08-30 20:18:51 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:18:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:19:45 - pico-train - INFO - Step 87900 -- ๐ Training Metrics |
|
2025-08-30 20:19:45 - pico-train - INFO - โโโ Loss: 4.7618 |
|
2025-08-30 20:19:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:19:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:20:37 - pico-train - INFO - Step 88000 -- ๐พ Saving Checkpoint |
|
2025-08-30 20:22:42 - pico-train - INFO - Step 88000 -- ๐ Evaluation Results |
|
2025-08-30 20:22:42 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 20:22:43 - pico-train - INFO - Step 88000 -- ๐ Training Metrics |
|
2025-08-30 20:22:43 - pico-train - INFO - โโโ Loss: 4.7796 |
|
2025-08-30 20:22:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:22:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:22:43 - pico-train - INFO - Step 88000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 20:23:38 - pico-train - INFO - Step 88100 -- ๐ Training Metrics |
|
2025-08-30 20:23:38 - pico-train - INFO - โโโ Loss: 4.7432 |
|
2025-08-30 20:23:38 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:23:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:24:31 - pico-train - INFO - Step 88200 -- ๐ Training Metrics |
|
2025-08-30 20:24:31 - pico-train - INFO - โโโ Loss: 4.7725 |
|
2025-08-30 20:24:31 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:24:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:25:24 - pico-train - INFO - Step 88300 -- ๐ Training Metrics |
|
2025-08-30 20:25:24 - pico-train - INFO - โโโ Loss: 4.7749 |
|
2025-08-30 20:25:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:25:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:26:17 - pico-train - INFO - Step 88400 -- ๐ Training Metrics |
|
2025-08-30 20:26:17 - pico-train - INFO - โโโ Loss: 4.7883 |
|
2025-08-30 20:26:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:26:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:27:10 - pico-train - INFO - Step 88500 -- ๐ Training Metrics |
|
2025-08-30 20:27:10 - pico-train - INFO - โโโ Loss: 4.7871 |
|
2025-08-30 20:27:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:27:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:28:03 - pico-train - INFO - Step 88600 -- ๐ Training Metrics |
|
2025-08-30 20:28:03 - pico-train - INFO - โโโ Loss: 4.7894 |
|
2025-08-30 20:28:03 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:28:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:28:56 - pico-train - INFO - Step 88700 -- ๐ Training Metrics |
|
2025-08-30 20:28:56 - pico-train - INFO - โโโ Loss: 4.7812 |
|
2025-08-30 20:28:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:28:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:29:49 - pico-train - INFO - Step 88800 -- ๐ Training Metrics |
|
2025-08-30 20:29:49 - pico-train - INFO - โโโ Loss: 4.7371 |
|
2025-08-30 20:29:49 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:29:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:30:43 - pico-train - INFO - Step 88900 -- ๐ Training Metrics |
|
2025-08-30 20:30:43 - pico-train - INFO - โโโ Loss: 4.7666 |
|
2025-08-30 20:30:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:30:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:31:36 - pico-train - INFO - Step 89000 -- ๐ Training Metrics |
|
2025-08-30 20:31:36 - pico-train - INFO - โโโ Loss: 4.7623 |
|
2025-08-30 20:31:36 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:31:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:32:29 - pico-train - INFO - Step 89100 -- ๐ Training Metrics |
|
2025-08-30 20:32:29 - pico-train - INFO - โโโ Loss: 4.7911 |
|
2025-08-30 20:32:29 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:32:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:33:22 - pico-train - INFO - Step 89200 -- ๐ Training Metrics |
|
2025-08-30 20:33:22 - pico-train - INFO - โโโ Loss: 4.7823 |
|
2025-08-30 20:33:22 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:33:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:34:15 - pico-train - INFO - Step 89300 -- ๐ Training Metrics |
|
2025-08-30 20:34:15 - pico-train - INFO - โโโ Loss: 4.7830 |
|
2025-08-30 20:34:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:34:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:35:08 - pico-train - INFO - Step 89400 -- ๐ Training Metrics |
|
2025-08-30 20:35:08 - pico-train - INFO - โโโ Loss: 4.7724 |
|
2025-08-30 20:35:08 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:35:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:36:01 - pico-train - INFO - Step 89500 -- ๐ Training Metrics |
|
2025-08-30 20:36:01 - pico-train - INFO - โโโ Loss: 4.7654 |
|
2025-08-30 20:36:01 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:36:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:36:54 - pico-train - INFO - Step 89600 -- ๐ Training Metrics |
|
2025-08-30 20:36:54 - pico-train - INFO - โโโ Loss: 4.7613 |
|
2025-08-30 20:36:54 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:36:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:37:47 - pico-train - INFO - Step 89700 -- ๐ Training Metrics |
|
2025-08-30 20:37:47 - pico-train - INFO - โโโ Loss: 4.7544 |
|
2025-08-30 20:37:47 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:37:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:38:41 - pico-train - INFO - Step 89800 -- ๐ Training Metrics |
|
2025-08-30 20:38:41 - pico-train - INFO - โโโ Loss: 4.7889 |
|
2025-08-30 20:38:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:38:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:39:34 - pico-train - INFO - Step 89900 -- ๐ Training Metrics |
|
2025-08-30 20:39:34 - pico-train - INFO - โโโ Loss: 4.7928 |
|
2025-08-30 20:39:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:39:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:40:26 - pico-train - INFO - Step 90000 -- ๐พ Saving Checkpoint |
|
2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- ๐ Evaluation Results |
|
2025-08-30 20:42:31 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- ๐ Training Metrics |
|
2025-08-30 20:42:31 - pico-train - INFO - โโโ Loss: 4.7777 |
|
2025-08-30 20:42:31 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:42:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:42:31 - pico-train - INFO - Step 90000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 20:43:26 - pico-train - INFO - Step 90100 -- ๐ Training Metrics |
|
2025-08-30 20:43:26 - pico-train - INFO - โโโ Loss: 4.7721 |
|
2025-08-30 20:43:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:43:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:44:18 - pico-train - INFO - Step 90200 -- ๐ Training Metrics |
|
2025-08-30 20:44:18 - pico-train - INFO - โโโ Loss: 4.7616 |
|
2025-08-30 20:44:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:44:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:45:10 - pico-train - INFO - Step 90300 -- ๐ Training Metrics |
|
2025-08-30 20:45:10 - pico-train - INFO - โโโ Loss: 4.7529 |
|
2025-08-30 20:45:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:45:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:46:04 - pico-train - INFO - Step 90400 -- ๐ Training Metrics |
|
2025-08-30 20:46:04 - pico-train - INFO - โโโ Loss: 4.7656 |
|
2025-08-30 20:46:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:46:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:46:56 - pico-train - INFO - Step 90500 -- ๐ Training Metrics |
|
2025-08-30 20:46:56 - pico-train - INFO - โโโ Loss: 4.7484 |
|
2025-08-30 20:46:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:46:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:47:50 - pico-train - INFO - Step 90600 -- ๐ Training Metrics |
|
2025-08-30 20:47:50 - pico-train - INFO - โโโ Loss: 4.7811 |
|
2025-08-30 20:47:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:47:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:48:43 - pico-train - INFO - Step 90700 -- ๐ Training Metrics |
|
2025-08-30 20:48:43 - pico-train - INFO - โโโ Loss: 4.7523 |
|
2025-08-30 20:48:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:48:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:49:36 - pico-train - INFO - Step 90800 -- ๐ Training Metrics |
|
2025-08-30 20:49:36 - pico-train - INFO - โโโ Loss: 4.7822 |
|
2025-08-30 20:49:36 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:49:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:50:29 - pico-train - INFO - Step 90900 -- ๐ Training Metrics |
|
2025-08-30 20:50:29 - pico-train - INFO - โโโ Loss: 4.7780 |
|
2025-08-30 20:50:29 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:50:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:51:22 - pico-train - INFO - Step 91000 -- ๐ Training Metrics |
|
2025-08-30 20:51:22 - pico-train - INFO - โโโ Loss: 4.7850 |
|
2025-08-30 20:51:22 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:51:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:52:15 - pico-train - INFO - Step 91100 -- ๐ Training Metrics |
|
2025-08-30 20:52:15 - pico-train - INFO - โโโ Loss: 4.7669 |
|
2025-08-30 20:52:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:52:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:53:09 - pico-train - INFO - Step 91200 -- ๐ Training Metrics |
|
2025-08-30 20:53:09 - pico-train - INFO - โโโ Loss: 4.7713 |
|
2025-08-30 20:53:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:53:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:54:02 - pico-train - INFO - Step 91300 -- ๐ Training Metrics |
|
2025-08-30 20:54:02 - pico-train - INFO - โโโ Loss: 4.7832 |
|
2025-08-30 20:54:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:54:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:54:55 - pico-train - INFO - Step 91400 -- ๐ Training Metrics |
|
2025-08-30 20:54:55 - pico-train - INFO - โโโ Loss: 4.7749 |
|
2025-08-30 20:54:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:54:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:55:48 - pico-train - INFO - Step 91500 -- ๐ Training Metrics |
|
2025-08-30 20:55:48 - pico-train - INFO - โโโ Loss: 4.7702 |
|
2025-08-30 20:55:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:55:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:56:41 - pico-train - INFO - Step 91600 -- ๐ Training Metrics |
|
2025-08-30 20:56:41 - pico-train - INFO - โโโ Loss: 4.7792 |
|
2025-08-30 20:56:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:56:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:57:34 - pico-train - INFO - Step 91700 -- ๐ Training Metrics |
|
2025-08-30 20:57:34 - pico-train - INFO - โโโ Loss: 4.7678 |
|
2025-08-30 20:57:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:57:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:58:28 - pico-train - INFO - Step 91800 -- ๐ Training Metrics |
|
2025-08-30 20:58:28 - pico-train - INFO - โโโ Loss: 4.7831 |
|
2025-08-30 20:58:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:58:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 20:59:21 - pico-train - INFO - Step 91900 -- ๐ Training Metrics |
|
2025-08-30 20:59:21 - pico-train - INFO - โโโ Loss: 4.7746 |
|
2025-08-30 20:59:21 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 20:59:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:00:13 - pico-train - INFO - Step 92000 -- ๐พ Saving Checkpoint |
|
2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- ๐ Evaluation Results |
|
2025-08-30 21:02:18 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- ๐ Training Metrics |
|
2025-08-30 21:02:18 - pico-train - INFO - โโโ Loss: 4.7812 |
|
2025-08-30 21:02:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:02:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:02:18 - pico-train - INFO - Step 92000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 21:03:14 - pico-train - INFO - Step 92100 -- ๐ Training Metrics |
|
2025-08-30 21:03:14 - pico-train - INFO - โโโ Loss: 4.7569 |
|
2025-08-30 21:03:14 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:03:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:04:06 - pico-train - INFO - Step 92200 -- ๐ Training Metrics |
|
2025-08-30 21:04:06 - pico-train - INFO - โโโ Loss: 4.7846 |
|
2025-08-30 21:04:06 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:04:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:04:58 - pico-train - INFO - Step 92300 -- ๐ Training Metrics |
|
2025-08-30 21:04:58 - pico-train - INFO - โโโ Loss: 4.7687 |
|
2025-08-30 21:04:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:04:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:05:50 - pico-train - INFO - Step 92400 -- ๐ Training Metrics |
|
2025-08-30 21:05:50 - pico-train - INFO - โโโ Loss: 4.7699 |
|
2025-08-30 21:05:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:05:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:06:42 - pico-train - INFO - Step 92500 -- ๐ Training Metrics |
|
2025-08-30 21:06:42 - pico-train - INFO - โโโ Loss: 4.7961 |
|
2025-08-30 21:06:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:06:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:07:34 - pico-train - INFO - Step 92600 -- ๐ Training Metrics |
|
2025-08-30 21:07:34 - pico-train - INFO - โโโ Loss: 4.7682 |
|
2025-08-30 21:07:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:07:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:08:26 - pico-train - INFO - Step 92700 -- ๐ Training Metrics |
|
2025-08-30 21:08:26 - pico-train - INFO - โโโ Loss: 4.7786 |
|
2025-08-30 21:08:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:08:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:09:18 - pico-train - INFO - Step 92800 -- ๐ Training Metrics |
|
2025-08-30 21:09:18 - pico-train - INFO - โโโ Loss: 4.7716 |
|
2025-08-30 21:09:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:09:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:10:11 - pico-train - INFO - Step 92900 -- ๐ Training Metrics |
|
2025-08-30 21:10:11 - pico-train - INFO - โโโ Loss: 4.7837 |
|
2025-08-30 21:10:11 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:10:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:11:04 - pico-train - INFO - Step 93000 -- ๐ Training Metrics |
|
2025-08-30 21:11:04 - pico-train - INFO - โโโ Loss: 4.7811 |
|
2025-08-30 21:11:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:11:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:11:57 - pico-train - INFO - Step 93100 -- ๐ Training Metrics |
|
2025-08-30 21:11:57 - pico-train - INFO - โโโ Loss: 4.7830 |
|
2025-08-30 21:11:57 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:11:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:12:50 - pico-train - INFO - Step 93200 -- ๐ Training Metrics |
|
2025-08-30 21:12:50 - pico-train - INFO - โโโ Loss: 4.7935 |
|
2025-08-30 21:12:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:12:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:13:43 - pico-train - INFO - Step 93300 -- ๐ Training Metrics |
|
2025-08-30 21:13:43 - pico-train - INFO - โโโ Loss: 4.8135 |
|
2025-08-30 21:13:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:13:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:14:36 - pico-train - INFO - Step 93400 -- ๐ Training Metrics |
|
2025-08-30 21:14:36 - pico-train - INFO - โโโ Loss: 4.7767 |
|
2025-08-30 21:14:36 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:14:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:15:29 - pico-train - INFO - Step 93500 -- ๐ Training Metrics |
|
2025-08-30 21:15:29 - pico-train - INFO - โโโ Loss: 4.8005 |
|
2025-08-30 21:15:29 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:15:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:16:22 - pico-train - INFO - Step 93600 -- ๐ Training Metrics |
|
2025-08-30 21:16:22 - pico-train - INFO - โโโ Loss: 4.7913 |
|
2025-08-30 21:16:22 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:16:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:17:15 - pico-train - INFO - Step 93700 -- ๐ Training Metrics |
|
2025-08-30 21:17:15 - pico-train - INFO - โโโ Loss: 4.7739 |
|
2025-08-30 21:17:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:17:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:18:08 - pico-train - INFO - Step 93800 -- ๐ Training Metrics |
|
2025-08-30 21:18:08 - pico-train - INFO - โโโ Loss: 4.7875 |
|
2025-08-30 21:18:08 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:18:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:19:00 - pico-train - INFO - Step 93900 -- ๐ Training Metrics |
|
2025-08-30 21:19:00 - pico-train - INFO - โโโ Loss: 4.7801 |
|
2025-08-30 21:19:00 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:19:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:19:52 - pico-train - INFO - Step 94000 -- ๐พ Saving Checkpoint |
|
2025-08-30 21:21:41 - pico-train - INFO - Step 94000 -- ๐ Evaluation Results |
|
2025-08-30 21:21:41 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 21:21:42 - pico-train - INFO - Step 94000 -- ๐ Training Metrics |
|
2025-08-30 21:21:42 - pico-train - INFO - โโโ Loss: 4.7826 |
|
2025-08-30 21:21:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:21:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:21:42 - pico-train - INFO - Step 94000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 21:22:36 - pico-train - INFO - Step 94100 -- ๐ Training Metrics |
|
2025-08-30 21:22:36 - pico-train - INFO - โโโ Loss: 4.7712 |
|
2025-08-30 21:22:36 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:22:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:23:28 - pico-train - INFO - Step 94200 -- ๐ Training Metrics |
|
2025-08-30 21:23:28 - pico-train - INFO - โโโ Loss: 4.7528 |
|
2025-08-30 21:23:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:23:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:24:21 - pico-train - INFO - Step 94300 -- ๐ Training Metrics |
|
2025-08-30 21:24:21 - pico-train - INFO - โโโ Loss: 4.7867 |
|
2025-08-30 21:24:21 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:24:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:25:14 - pico-train - INFO - Step 94400 -- ๐ Training Metrics |
|
2025-08-30 21:25:14 - pico-train - INFO - โโโ Loss: 4.7694 |
|
2025-08-30 21:25:14 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:25:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:26:07 - pico-train - INFO - Step 94500 -- ๐ Training Metrics |
|
2025-08-30 21:26:07 - pico-train - INFO - โโโ Loss: 4.7677 |
|
2025-08-30 21:26:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:26:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:26:59 - pico-train - INFO - Step 94600 -- ๐ Training Metrics |
|
2025-08-30 21:26:59 - pico-train - INFO - โโโ Loss: 4.7968 |
|
2025-08-30 21:26:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:26:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:27:52 - pico-train - INFO - Step 94700 -- ๐ Training Metrics |
|
2025-08-30 21:27:52 - pico-train - INFO - โโโ Loss: 4.7716 |
|
2025-08-30 21:27:52 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:27:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:28:44 - pico-train - INFO - Step 94800 -- ๐ Training Metrics |
|
2025-08-30 21:28:44 - pico-train - INFO - โโโ Loss: 4.7446 |
|
2025-08-30 21:28:44 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:28:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:29:36 - pico-train - INFO - Step 94900 -- ๐ Training Metrics |
|
2025-08-30 21:29:36 - pico-train - INFO - โโโ Loss: 4.7763 |
|
2025-08-30 21:29:36 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:29:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:30:28 - pico-train - INFO - Step 95000 -- ๐ Training Metrics |
|
2025-08-30 21:30:28 - pico-train - INFO - โโโ Loss: 4.7830 |
|
2025-08-30 21:30:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:30:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:31:20 - pico-train - INFO - Step 95100 -- ๐ Training Metrics |
|
2025-08-30 21:31:20 - pico-train - INFO - โโโ Loss: 4.7890 |
|
2025-08-30 21:31:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:31:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:32:13 - pico-train - INFO - Step 95200 -- ๐ Training Metrics |
|
2025-08-30 21:32:13 - pico-train - INFO - โโโ Loss: 4.7685 |
|
2025-08-30 21:32:13 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:32:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:33:06 - pico-train - INFO - Step 95300 -- ๐ Training Metrics |
|
2025-08-30 21:33:06 - pico-train - INFO - โโโ Loss: 4.8231 |
|
2025-08-30 21:33:06 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:33:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:33:58 - pico-train - INFO - Step 95400 -- ๐ Training Metrics |
|
2025-08-30 21:33:58 - pico-train - INFO - โโโ Loss: 4.7698 |
|
2025-08-30 21:33:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:33:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:34:50 - pico-train - INFO - Step 95500 -- ๐ Training Metrics |
|
2025-08-30 21:34:50 - pico-train - INFO - โโโ Loss: 4.7614 |
|
2025-08-30 21:34:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:34:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:35:42 - pico-train - INFO - Step 95600 -- ๐ Training Metrics |
|
2025-08-30 21:35:42 - pico-train - INFO - โโโ Loss: 4.7906 |
|
2025-08-30 21:35:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:35:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:36:34 - pico-train - INFO - Step 95700 -- ๐ Training Metrics |
|
2025-08-30 21:36:34 - pico-train - INFO - โโโ Loss: 4.7685 |
|
2025-08-30 21:36:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:36:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:37:26 - pico-train - INFO - Step 95800 -- ๐ Training Metrics |
|
2025-08-30 21:37:26 - pico-train - INFO - โโโ Loss: 4.7466 |
|
2025-08-30 21:37:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:37:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:38:18 - pico-train - INFO - Step 95900 -- ๐ Training Metrics |
|
2025-08-30 21:38:18 - pico-train - INFO - โโโ Loss: 4.7771 |
|
2025-08-30 21:38:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:38:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:39:09 - pico-train - INFO - Step 96000 -- ๐พ Saving Checkpoint |
|
2025-08-30 21:41:02 - pico-train - INFO - Step 96000 -- ๐ Evaluation Results |
|
2025-08-30 21:41:02 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 21:41:03 - pico-train - INFO - Step 96000 -- ๐ Training Metrics |
|
2025-08-30 21:41:03 - pico-train - INFO - โโโ Loss: 4.7812 |
|
2025-08-30 21:41:03 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:41:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:41:03 - pico-train - INFO - Step 96000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 21:41:58 - pico-train - INFO - Step 96100 -- ๐ Training Metrics |
|
2025-08-30 21:41:58 - pico-train - INFO - โโโ Loss: 4.7849 |
|
2025-08-30 21:41:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:41:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:42:51 - pico-train - INFO - Step 96200 -- ๐ Training Metrics |
|
2025-08-30 21:42:51 - pico-train - INFO - โโโ Loss: 4.7649 |
|
2025-08-30 21:42:51 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:42:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:43:45 - pico-train - INFO - Step 96300 -- ๐ Training Metrics |
|
2025-08-30 21:43:45 - pico-train - INFO - โโโ Loss: 4.7696 |
|
2025-08-30 21:43:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:43:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:44:37 - pico-train - INFO - Step 96400 -- ๐ Training Metrics |
|
2025-08-30 21:44:37 - pico-train - INFO - โโโ Loss: 4.7768 |
|
2025-08-30 21:44:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:44:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:45:31 - pico-train - INFO - Step 96500 -- ๐ Training Metrics |
|
2025-08-30 21:45:31 - pico-train - INFO - โโโ Loss: 4.7631 |
|
2025-08-30 21:45:31 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:45:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:46:24 - pico-train - INFO - Step 96600 -- ๐ Training Metrics |
|
2025-08-30 21:46:24 - pico-train - INFO - โโโ Loss: 4.7730 |
|
2025-08-30 21:46:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:46:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:47:17 - pico-train - INFO - Step 96700 -- ๐ Training Metrics |
|
2025-08-30 21:47:17 - pico-train - INFO - โโโ Loss: 4.7832 |
|
2025-08-30 21:47:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:47:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:48:10 - pico-train - INFO - Step 96800 -- ๐ Training Metrics |
|
2025-08-30 21:48:10 - pico-train - INFO - โโโ Loss: 4.7508 |
|
2025-08-30 21:48:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:48:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:49:03 - pico-train - INFO - Step 96900 -- ๐ Training Metrics |
|
2025-08-30 21:49:03 - pico-train - INFO - โโโ Loss: 4.7688 |
|
2025-08-30 21:49:03 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:49:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:49:56 - pico-train - INFO - Step 97000 -- ๐ Training Metrics |
|
2025-08-30 21:49:56 - pico-train - INFO - โโโ Loss: 4.7887 |
|
2025-08-30 21:49:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:49:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:50:49 - pico-train - INFO - Step 97100 -- ๐ Training Metrics |
|
2025-08-30 21:50:49 - pico-train - INFO - โโโ Loss: 4.7774 |
|
2025-08-30 21:50:49 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:50:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:51:43 - pico-train - INFO - Step 97200 -- ๐ Training Metrics |
|
2025-08-30 21:51:43 - pico-train - INFO - โโโ Loss: 4.7731 |
|
2025-08-30 21:51:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:51:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:52:34 - pico-train - INFO - Step 97300 -- ๐ Training Metrics |
|
2025-08-30 21:52:34 - pico-train - INFO - โโโ Loss: 4.7823 |
|
2025-08-30 21:52:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:52:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:53:27 - pico-train - INFO - Step 97400 -- ๐ Training Metrics |
|
2025-08-30 21:53:27 - pico-train - INFO - โโโ Loss: 4.7782 |
|
2025-08-30 21:53:27 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:53:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:54:20 - pico-train - INFO - Step 97500 -- ๐ Training Metrics |
|
2025-08-30 21:54:20 - pico-train - INFO - โโโ Loss: 4.7935 |
|
2025-08-30 21:54:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:54:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:55:13 - pico-train - INFO - Step 97600 -- ๐ Training Metrics |
|
2025-08-30 21:55:13 - pico-train - INFO - โโโ Loss: 4.7908 |
|
2025-08-30 21:55:13 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:55:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:56:07 - pico-train - INFO - Step 97700 -- ๐ Training Metrics |
|
2025-08-30 21:56:07 - pico-train - INFO - โโโ Loss: 4.7824 |
|
2025-08-30 21:56:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:56:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:56:59 - pico-train - INFO - Step 97800 -- ๐ Training Metrics |
|
2025-08-30 21:56:59 - pico-train - INFO - โโโ Loss: 4.7913 |
|
2025-08-30 21:56:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:56:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:57:53 - pico-train - INFO - Step 97900 -- ๐ Training Metrics |
|
2025-08-30 21:57:53 - pico-train - INFO - โโโ Loss: 4.7547 |
|
2025-08-30 21:57:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 21:57:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 21:58:45 - pico-train - INFO - Step 98000 -- ๐พ Saving Checkpoint |
|
2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- ๐ Evaluation Results |
|
2025-08-30 22:00:46 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- ๐ Training Metrics |
|
2025-08-30 22:00:46 - pico-train - INFO - โโโ Loss: 4.7784 |
|
2025-08-30 22:00:46 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:00:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:00:46 - pico-train - INFO - Step 98000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 22:01:41 - pico-train - INFO - Step 98100 -- ๐ Training Metrics |
|
2025-08-30 22:01:41 - pico-train - INFO - โโโ Loss: 4.7555 |
|
2025-08-30 22:01:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:01:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:02:33 - pico-train - INFO - Step 98200 -- ๐ Training Metrics |
|
2025-08-30 22:02:33 - pico-train - INFO - โโโ Loss: 4.7774 |
|
2025-08-30 22:02:33 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:02:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:03:25 - pico-train - INFO - Step 98300 -- ๐ Training Metrics |
|
2025-08-30 22:03:25 - pico-train - INFO - โโโ Loss: 4.7961 |
|
2025-08-30 22:03:25 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:03:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:04:17 - pico-train - INFO - Step 98400 -- ๐ Training Metrics |
|
2025-08-30 22:04:17 - pico-train - INFO - โโโ Loss: 4.7770 |
|
2025-08-30 22:04:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:04:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:05:09 - pico-train - INFO - Step 98500 -- ๐ Training Metrics |
|
2025-08-30 22:05:09 - pico-train - INFO - โโโ Loss: 4.7789 |
|
2025-08-30 22:05:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:05:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:06:01 - pico-train - INFO - Step 98600 -- ๐ Training Metrics |
|
2025-08-30 22:06:01 - pico-train - INFO - โโโ Loss: 4.7968 |
|
2025-08-30 22:06:01 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:06:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:06:53 - pico-train - INFO - Step 98700 -- ๐ Training Metrics |
|
2025-08-30 22:06:53 - pico-train - INFO - โโโ Loss: 4.7691 |
|
2025-08-30 22:06:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:06:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:07:45 - pico-train - INFO - Step 98800 -- ๐ Training Metrics |
|
2025-08-30 22:07:45 - pico-train - INFO - โโโ Loss: 4.7841 |
|
2025-08-30 22:07:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:07:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:08:37 - pico-train - INFO - Step 98900 -- ๐ Training Metrics |
|
2025-08-30 22:08:37 - pico-train - INFO - โโโ Loss: 4.7785 |
|
2025-08-30 22:08:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:08:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:09:28 - pico-train - INFO - Step 99000 -- ๐ Training Metrics |
|
2025-08-30 22:09:28 - pico-train - INFO - โโโ Loss: 4.7770 |
|
2025-08-30 22:09:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:09:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:10:20 - pico-train - INFO - Step 99100 -- ๐ Training Metrics |
|
2025-08-30 22:10:20 - pico-train - INFO - โโโ Loss: 4.7774 |
|
2025-08-30 22:10:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:10:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:11:12 - pico-train - INFO - Step 99200 -- ๐ Training Metrics |
|
2025-08-30 22:11:12 - pico-train - INFO - โโโ Loss: 4.7946 |
|
2025-08-30 22:11:12 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:11:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:12:04 - pico-train - INFO - Step 99300 -- ๐ Training Metrics |
|
2025-08-30 22:12:04 - pico-train - INFO - โโโ Loss: 4.7804 |
|
2025-08-30 22:12:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:12:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:12:56 - pico-train - INFO - Step 99400 -- ๐ Training Metrics |
|
2025-08-30 22:12:56 - pico-train - INFO - โโโ Loss: 4.7579 |
|
2025-08-30 22:12:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:12:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:13:48 - pico-train - INFO - Step 99500 -- ๐ Training Metrics |
|
2025-08-30 22:13:48 - pico-train - INFO - โโโ Loss: 4.7916 |
|
2025-08-30 22:13:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:13:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:14:40 - pico-train - INFO - Step 99600 -- ๐ Training Metrics |
|
2025-08-30 22:14:40 - pico-train - INFO - โโโ Loss: 4.7512 |
|
2025-08-30 22:14:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:14:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:15:32 - pico-train - INFO - Step 99700 -- ๐ Training Metrics |
|
2025-08-30 22:15:32 - pico-train - INFO - โโโ Loss: 4.7774 |
|
2025-08-30 22:15:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:15:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:16:24 - pico-train - INFO - Step 99800 -- ๐ Training Metrics |
|
2025-08-30 22:16:24 - pico-train - INFO - โโโ Loss: 4.7938 |
|
2025-08-30 22:16:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:16:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:17:16 - pico-train - INFO - Step 99900 -- ๐ Training Metrics |
|
2025-08-30 22:17:16 - pico-train - INFO - โโโ Loss: 4.7923 |
|
2025-08-30 22:17:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
|
2025-08-30 22:17:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 22:18:07 - pico-train - INFO - Step 100000 -- ๐พ Saving Checkpoint |
|
2025-08-30 22:19:57 - pico-train - INFO - Step 100000 -- ๐ Evaluation Results |
|
2025-08-30 22:19:57 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 22:19:57 - pico-train - INFO - ๐ Training complete! Final step: 100000 |
|
|