|
2025-08-30 15:43:27 - pico-train - INFO - Step 62000 -- ๐ Evaluation Results |
|
2025-08-30 15:43:27 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 15:43:28 - pico-train - INFO - ================================================== |
|
2025-08-30 15:43:28 - pico-train - INFO - โจ Training Configuration |
|
2025-08-30 15:43:28 - pico-train - INFO - ================================================== |
|
2025-08-30 15:43:28 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ checkpointing: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ checkpoints_dir: checkpoints โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ evaluation: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ eval_results_dir: eval_results โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ hf_checkpoint: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ collection_slug: null โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ learning_dynamics: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ eval_data: null โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ layer_suffixes: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ - attention.v_proj โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ - attention.o_proj โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ - swiglu.w_2 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ sequence_idx: -1 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ logs_dir: logs โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma10M-v1 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ runs_dir: runs โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ save_every_n_steps: 2000 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ save_to_hf: true โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ training: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ auto_resume: true โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ data: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ dataloader: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 16 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ dataset: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ name: ThomasTheMaker/pretokenized-dolma-10M โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ tokenizer: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ evaluation: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ metrics: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ - paloma โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ paloma: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 1 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ dataset_split: val โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ max_length: 2048 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ model: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ activation_hidden_dim: 384 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ attention_n_heads: 12 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ attention_n_kv_heads: 4 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 1024 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ d_model: 96 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ max_seq_len: 2048 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ model_type: pico_decoder โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ n_layers: 12 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ norm_eps: 1.0e-06 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ position_emb_theta: 10000.0 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ vocab_size: 50304 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ monitoring: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ logging: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ log_every_n_steps: 100 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ log_level: INFO โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ save_to_wandb: false โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ wandb: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ entity: boymyc โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ project: pico-decoder-tiny โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ training: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ fabric: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ accelerator: cuda โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ num_devices: 1 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ num_nodes: 1 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ precision: bf16-mixed โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ max_steps: 100000 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ optimization: โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ gradient_accumulation_steps: 1 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ lr: 0.0002 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ lr_scheduler: cosine โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ lr_warmup_steps: 2000 โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ optimizer: adamw โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โ โ |
|
2025-08-30 15:43:28 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ |
|
2025-08-30 15:43:28 - pico-train - INFO - ================================================== |
|
2025-08-30 15:43:28 - pico-train - INFO - โญ Runtime Summary: |
|
2025-08-30 15:43:28 - pico-train - INFO - ================================================== |
|
2025-08-30 15:43:28 - pico-train - INFO - Starting from step: 62000 |
|
2025-08-30 15:43:28 - pico-train - INFO - Model Setup: |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Total Parameters: 11,282,784 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 |
|
2025-08-30 15:43:28 - pico-train - INFO - Distributed Setup: |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Number of Devices: 1 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Device Type: NVIDIA H100 80GB HBM3 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Available Memory: 85.03 GB |
|
2025-08-30 15:43:28 - pico-train - INFO - Software Setup: |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Python Version: 3.12.3 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ CUDA Version: 12.8 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Operating System: Linux 6.8.0-71-generic |
|
2025-08-30 15:43:28 - pico-train - INFO - Batch Size Configuration: |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Global Batch Size: 16 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Per Device Batch Size: 16 |
|
2025-08-30 15:43:28 - pico-train - INFO - โโ Gradient Accumulation Steps: 1 |
|
2025-08-30 15:43:28 - pico-train - INFO - ================================================== |
|
2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- ๐ Training Metrics |
|
2025-08-30 15:43:29 - pico-train - INFO - โโโ Loss: 4.5970 |
|
2025-08-30 15:43:29 - pico-train - INFO - โโโ Learning Rate: 6.55e-05 |
|
2025-08-30 15:43:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 15:44:25 - pico-train - INFO - Step 62100 -- ๐ Training Metrics |
|
2025-08-30 15:44:25 - pico-train - INFO - โโโ Loss: 4.8133 |
|
2025-08-30 15:44:25 - pico-train - INFO - โโโ Learning Rate: 6.52e-05 |
|
2025-08-30 15:44:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:45:17 - pico-train - INFO - Step 62200 -- ๐ Training Metrics |
|
2025-08-30 15:45:17 - pico-train - INFO - โโโ Loss: 4.8221 |
|
2025-08-30 15:45:17 - pico-train - INFO - โโโ Learning Rate: 6.49e-05 |
|
2025-08-30 15:45:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:46:09 - pico-train - INFO - Step 62300 -- ๐ Training Metrics |
|
2025-08-30 15:46:09 - pico-train - INFO - โโโ Loss: 4.8068 |
|
2025-08-30 15:46:09 - pico-train - INFO - โโโ Learning Rate: 6.46e-05 |
|
2025-08-30 15:46:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:47:01 - pico-train - INFO - Step 62400 -- ๐ Training Metrics |
|
2025-08-30 15:47:01 - pico-train - INFO - โโโ Loss: 4.7858 |
|
2025-08-30 15:47:01 - pico-train - INFO - โโโ Learning Rate: 6.43e-05 |
|
2025-08-30 15:47:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:47:53 - pico-train - INFO - Step 62500 -- ๐ Training Metrics |
|
2025-08-30 15:47:53 - pico-train - INFO - โโโ Loss: 4.8460 |
|
2025-08-30 15:47:53 - pico-train - INFO - โโโ Learning Rate: 6.40e-05 |
|
2025-08-30 15:47:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:48:45 - pico-train - INFO - Step 62600 -- ๐ Training Metrics |
|
2025-08-30 15:48:45 - pico-train - INFO - โโโ Loss: 4.8264 |
|
2025-08-30 15:48:45 - pico-train - INFO - โโโ Learning Rate: 6.37e-05 |
|
2025-08-30 15:48:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:49:37 - pico-train - INFO - Step 62700 -- ๐ Training Metrics |
|
2025-08-30 15:49:37 - pico-train - INFO - โโโ Loss: 4.8266 |
|
2025-08-30 15:49:37 - pico-train - INFO - โโโ Learning Rate: 6.34e-05 |
|
2025-08-30 15:49:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:50:29 - pico-train - INFO - Step 62800 -- ๐ Training Metrics |
|
2025-08-30 15:50:29 - pico-train - INFO - โโโ Loss: 4.8317 |
|
2025-08-30 15:50:29 - pico-train - INFO - โโโ Learning Rate: 6.31e-05 |
|
2025-08-30 15:50:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:51:20 - pico-train - INFO - Step 62900 -- ๐ Training Metrics |
|
2025-08-30 15:51:20 - pico-train - INFO - โโโ Loss: 4.8337 |
|
2025-08-30 15:51:20 - pico-train - INFO - โโโ Learning Rate: 6.28e-05 |
|
2025-08-30 15:51:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:52:12 - pico-train - INFO - Step 63000 -- ๐ Training Metrics |
|
2025-08-30 15:52:12 - pico-train - INFO - โโโ Loss: 4.8183 |
|
2025-08-30 15:52:12 - pico-train - INFO - โโโ Learning Rate: 6.25e-05 |
|
2025-08-30 15:52:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:53:04 - pico-train - INFO - Step 63100 -- ๐ Training Metrics |
|
2025-08-30 15:53:04 - pico-train - INFO - โโโ Loss: 4.8177 |
|
2025-08-30 15:53:04 - pico-train - INFO - โโโ Learning Rate: 6.22e-05 |
|
2025-08-30 15:53:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:53:56 - pico-train - INFO - Step 63200 -- ๐ Training Metrics |
|
2025-08-30 15:53:56 - pico-train - INFO - โโโ Loss: 4.8094 |
|
2025-08-30 15:53:56 - pico-train - INFO - โโโ Learning Rate: 6.19e-05 |
|
2025-08-30 15:53:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:54:48 - pico-train - INFO - Step 63300 -- ๐ Training Metrics |
|
2025-08-30 15:54:48 - pico-train - INFO - โโโ Loss: 4.8294 |
|
2025-08-30 15:54:48 - pico-train - INFO - โโโ Learning Rate: 6.16e-05 |
|
2025-08-30 15:54:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:55:40 - pico-train - INFO - Step 63400 -- ๐ Training Metrics |
|
2025-08-30 15:55:40 - pico-train - INFO - โโโ Loss: 4.8073 |
|
2025-08-30 15:55:40 - pico-train - INFO - โโโ Learning Rate: 6.13e-05 |
|
2025-08-30 15:55:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:56:32 - pico-train - INFO - Step 63500 -- ๐ Training Metrics |
|
2025-08-30 15:56:32 - pico-train - INFO - โโโ Loss: 4.8364 |
|
2025-08-30 15:56:32 - pico-train - INFO - โโโ Learning Rate: 6.10e-05 |
|
2025-08-30 15:56:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:57:23 - pico-train - INFO - Step 63600 -- ๐ Training Metrics |
|
2025-08-30 15:57:23 - pico-train - INFO - โโโ Loss: 4.8236 |
|
2025-08-30 15:57:23 - pico-train - INFO - โโโ Learning Rate: 6.07e-05 |
|
2025-08-30 15:57:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:58:15 - pico-train - INFO - Step 63700 -- ๐ Training Metrics |
|
2025-08-30 15:58:15 - pico-train - INFO - โโโ Loss: 4.8114 |
|
2025-08-30 15:58:15 - pico-train - INFO - โโโ Learning Rate: 6.04e-05 |
|
2025-08-30 15:58:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:59:07 - pico-train - INFO - Step 63800 -- ๐ Training Metrics |
|
2025-08-30 15:59:07 - pico-train - INFO - โโโ Loss: 4.8078 |
|
2025-08-30 15:59:07 - pico-train - INFO - โโโ Learning Rate: 6.01e-05 |
|
2025-08-30 15:59:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 15:59:59 - pico-train - INFO - Step 63900 -- ๐ Training Metrics |
|
2025-08-30 15:59:59 - pico-train - INFO - โโโ Loss: 4.8107 |
|
2025-08-30 15:59:59 - pico-train - INFO - โโโ Learning Rate: 5.98e-05 |
|
2025-08-30 15:59:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:00:50 - pico-train - INFO - Step 64000 -- ๐พ Saving Checkpoint |
|
2025-08-30 16:02:54 - pico-train - INFO - Step 64000 -- ๐ Evaluation Results |
|
2025-08-30 16:02:54 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- ๐ Training Metrics |
|
2025-08-30 16:02:56 - pico-train - INFO - โโโ Loss: 4.8145 |
|
2025-08-30 16:02:56 - pico-train - INFO - โโโ Learning Rate: 5.95e-05 |
|
2025-08-30 16:02:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 16:03:52 - pico-train - INFO - Step 64100 -- ๐ Training Metrics |
|
2025-08-30 16:03:52 - pico-train - INFO - โโโ Loss: 4.8479 |
|
2025-08-30 16:03:52 - pico-train - INFO - โโโ Learning Rate: 5.92e-05 |
|
2025-08-30 16:03:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:04:44 - pico-train - INFO - Step 64200 -- ๐ Training Metrics |
|
2025-08-30 16:04:44 - pico-train - INFO - โโโ Loss: 4.8139 |
|
2025-08-30 16:04:44 - pico-train - INFO - โโโ Learning Rate: 5.89e-05 |
|
2025-08-30 16:04:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:05:36 - pico-train - INFO - Step 64300 -- ๐ Training Metrics |
|
2025-08-30 16:05:36 - pico-train - INFO - โโโ Loss: 4.7867 |
|
2025-08-30 16:05:36 - pico-train - INFO - โโโ Learning Rate: 5.86e-05 |
|
2025-08-30 16:05:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:06:28 - pico-train - INFO - Step 64400 -- ๐ Training Metrics |
|
2025-08-30 16:06:28 - pico-train - INFO - โโโ Loss: 4.8168 |
|
2025-08-30 16:06:28 - pico-train - INFO - โโโ Learning Rate: 5.84e-05 |
|
2025-08-30 16:06:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:07:20 - pico-train - INFO - Step 64500 -- ๐ Training Metrics |
|
2025-08-30 16:07:20 - pico-train - INFO - โโโ Loss: 4.8131 |
|
2025-08-30 16:07:20 - pico-train - INFO - โโโ Learning Rate: 5.81e-05 |
|
2025-08-30 16:07:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:08:12 - pico-train - INFO - Step 64600 -- ๐ Training Metrics |
|
2025-08-30 16:08:12 - pico-train - INFO - โโโ Loss: 4.8285 |
|
2025-08-30 16:08:12 - pico-train - INFO - โโโ Learning Rate: 5.78e-05 |
|
2025-08-30 16:08:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:09:04 - pico-train - INFO - Step 64700 -- ๐ Training Metrics |
|
2025-08-30 16:09:04 - pico-train - INFO - โโโ Loss: 4.8170 |
|
2025-08-30 16:09:04 - pico-train - INFO - โโโ Learning Rate: 5.75e-05 |
|
2025-08-30 16:09:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:09:56 - pico-train - INFO - Step 64800 -- ๐ Training Metrics |
|
2025-08-30 16:09:56 - pico-train - INFO - โโโ Loss: 4.8317 |
|
2025-08-30 16:09:56 - pico-train - INFO - โโโ Learning Rate: 5.72e-05 |
|
2025-08-30 16:09:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:10:48 - pico-train - INFO - Step 64900 -- ๐ Training Metrics |
|
2025-08-30 16:10:48 - pico-train - INFO - โโโ Loss: 4.8368 |
|
2025-08-30 16:10:48 - pico-train - INFO - โโโ Learning Rate: 5.69e-05 |
|
2025-08-30 16:10:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:11:40 - pico-train - INFO - Step 65000 -- ๐ Training Metrics |
|
2025-08-30 16:11:40 - pico-train - INFO - โโโ Loss: 4.8129 |
|
2025-08-30 16:11:40 - pico-train - INFO - โโโ Learning Rate: 5.66e-05 |
|
2025-08-30 16:11:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:12:32 - pico-train - INFO - Step 65100 -- ๐ Training Metrics |
|
2025-08-30 16:12:32 - pico-train - INFO - โโโ Loss: 4.8226 |
|
2025-08-30 16:12:32 - pico-train - INFO - โโโ Learning Rate: 5.63e-05 |
|
2025-08-30 16:12:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:13:24 - pico-train - INFO - Step 65200 -- ๐ Training Metrics |
|
2025-08-30 16:13:24 - pico-train - INFO - โโโ Loss: 4.8321 |
|
2025-08-30 16:13:24 - pico-train - INFO - โโโ Learning Rate: 5.60e-05 |
|
2025-08-30 16:13:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:14:16 - pico-train - INFO - Step 65300 -- ๐ Training Metrics |
|
2025-08-30 16:14:16 - pico-train - INFO - โโโ Loss: 4.8352 |
|
2025-08-30 16:14:16 - pico-train - INFO - โโโ Learning Rate: 5.57e-05 |
|
2025-08-30 16:14:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:15:08 - pico-train - INFO - Step 65400 -- ๐ Training Metrics |
|
2025-08-30 16:15:08 - pico-train - INFO - โโโ Loss: 4.8119 |
|
2025-08-30 16:15:08 - pico-train - INFO - โโโ Learning Rate: 5.55e-05 |
|
2025-08-30 16:15:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:16:00 - pico-train - INFO - Step 65500 -- ๐ Training Metrics |
|
2025-08-30 16:16:00 - pico-train - INFO - โโโ Loss: 4.7889 |
|
2025-08-30 16:16:00 - pico-train - INFO - โโโ Learning Rate: 5.52e-05 |
|
2025-08-30 16:16:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:16:52 - pico-train - INFO - Step 65600 -- ๐ Training Metrics |
|
2025-08-30 16:16:52 - pico-train - INFO - โโโ Loss: 4.8119 |
|
2025-08-30 16:16:52 - pico-train - INFO - โโโ Learning Rate: 5.49e-05 |
|
2025-08-30 16:16:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:17:44 - pico-train - INFO - Step 65700 -- ๐ Training Metrics |
|
2025-08-30 16:17:44 - pico-train - INFO - โโโ Loss: 4.8193 |
|
2025-08-30 16:17:44 - pico-train - INFO - โโโ Learning Rate: 5.46e-05 |
|
2025-08-30 16:17:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:18:35 - pico-train - INFO - Step 65800 -- ๐ Training Metrics |
|
2025-08-30 16:18:35 - pico-train - INFO - โโโ Loss: 4.8121 |
|
2025-08-30 16:18:35 - pico-train - INFO - โโโ Learning Rate: 5.43e-05 |
|
2025-08-30 16:18:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:19:27 - pico-train - INFO - Step 65900 -- ๐ Training Metrics |
|
2025-08-30 16:19:27 - pico-train - INFO - โโโ Loss: 4.8057 |
|
2025-08-30 16:19:27 - pico-train - INFO - โโโ Learning Rate: 5.40e-05 |
|
2025-08-30 16:19:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:20:19 - pico-train - INFO - Step 66000 -- ๐พ Saving Checkpoint |
|
2025-08-30 16:22:18 - pico-train - INFO - Step 66000 -- ๐ Evaluation Results |
|
2025-08-30 16:22:18 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- ๐ Training Metrics |
|
2025-08-30 16:22:20 - pico-train - INFO - โโโ Loss: 4.8260 |
|
2025-08-30 16:22:20 - pico-train - INFO - โโโ Learning Rate: 5.37e-05 |
|
2025-08-30 16:22:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 16:23:16 - pico-train - INFO - Step 66100 -- ๐ Training Metrics |
|
2025-08-30 16:23:16 - pico-train - INFO - โโโ Loss: 4.8110 |
|
2025-08-30 16:23:16 - pico-train - INFO - โโโ Learning Rate: 5.35e-05 |
|
2025-08-30 16:23:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:24:09 - pico-train - INFO - Step 66200 -- ๐ Training Metrics |
|
2025-08-30 16:24:09 - pico-train - INFO - โโโ Loss: 4.8156 |
|
2025-08-30 16:24:09 - pico-train - INFO - โโโ Learning Rate: 5.32e-05 |
|
2025-08-30 16:24:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:25:02 - pico-train - INFO - Step 66300 -- ๐ Training Metrics |
|
2025-08-30 16:25:02 - pico-train - INFO - โโโ Loss: 4.7928 |
|
2025-08-30 16:25:02 - pico-train - INFO - โโโ Learning Rate: 5.29e-05 |
|
2025-08-30 16:25:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:25:55 - pico-train - INFO - Step 66400 -- ๐ Training Metrics |
|
2025-08-30 16:25:55 - pico-train - INFO - โโโ Loss: 4.8202 |
|
2025-08-30 16:25:55 - pico-train - INFO - โโโ Learning Rate: 5.26e-05 |
|
2025-08-30 16:25:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:26:49 - pico-train - INFO - Step 66500 -- ๐ Training Metrics |
|
2025-08-30 16:26:49 - pico-train - INFO - โโโ Loss: 4.8117 |
|
2025-08-30 16:26:49 - pico-train - INFO - โโโ Learning Rate: 5.23e-05 |
|
2025-08-30 16:26:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:27:42 - pico-train - INFO - Step 66600 -- ๐ Training Metrics |
|
2025-08-30 16:27:42 - pico-train - INFO - โโโ Loss: 4.8047 |
|
2025-08-30 16:27:42 - pico-train - INFO - โโโ Learning Rate: 5.20e-05 |
|
2025-08-30 16:27:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:28:34 - pico-train - INFO - Step 66700 -- ๐ Training Metrics |
|
2025-08-30 16:28:34 - pico-train - INFO - โโโ Loss: 4.7995 |
|
2025-08-30 16:28:34 - pico-train - INFO - โโโ Learning Rate: 5.18e-05 |
|
2025-08-30 16:28:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:29:28 - pico-train - INFO - Step 66800 -- ๐ Training Metrics |
|
2025-08-30 16:29:28 - pico-train - INFO - โโโ Loss: 4.8074 |
|
2025-08-30 16:29:28 - pico-train - INFO - โโโ Learning Rate: 5.15e-05 |
|
2025-08-30 16:29:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:30:21 - pico-train - INFO - Step 66900 -- ๐ Training Metrics |
|
2025-08-30 16:30:21 - pico-train - INFO - โโโ Loss: 4.7890 |
|
2025-08-30 16:30:21 - pico-train - INFO - โโโ Learning Rate: 5.12e-05 |
|
2025-08-30 16:30:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:31:14 - pico-train - INFO - Step 67000 -- ๐ Training Metrics |
|
2025-08-30 16:31:14 - pico-train - INFO - โโโ Loss: 4.8216 |
|
2025-08-30 16:31:14 - pico-train - INFO - โโโ Learning Rate: 5.09e-05 |
|
2025-08-30 16:31:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:32:07 - pico-train - INFO - Step 67100 -- ๐ Training Metrics |
|
2025-08-30 16:32:07 - pico-train - INFO - โโโ Loss: 4.8034 |
|
2025-08-30 16:32:07 - pico-train - INFO - โโโ Learning Rate: 5.06e-05 |
|
2025-08-30 16:32:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:32:59 - pico-train - INFO - Step 67200 -- ๐ Training Metrics |
|
2025-08-30 16:32:59 - pico-train - INFO - โโโ Loss: 4.8062 |
|
2025-08-30 16:32:59 - pico-train - INFO - โโโ Learning Rate: 5.04e-05 |
|
2025-08-30 16:32:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:33:51 - pico-train - INFO - Step 67300 -- ๐ Training Metrics |
|
2025-08-30 16:33:51 - pico-train - INFO - โโโ Loss: 4.8106 |
|
2025-08-30 16:33:51 - pico-train - INFO - โโโ Learning Rate: 5.01e-05 |
|
2025-08-30 16:33:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:34:43 - pico-train - INFO - Step 67400 -- ๐ Training Metrics |
|
2025-08-30 16:34:43 - pico-train - INFO - โโโ Loss: 4.8168 |
|
2025-08-30 16:34:43 - pico-train - INFO - โโโ Learning Rate: 4.98e-05 |
|
2025-08-30 16:34:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:35:36 - pico-train - INFO - Step 67500 -- ๐ Training Metrics |
|
2025-08-30 16:35:36 - pico-train - INFO - โโโ Loss: 4.7968 |
|
2025-08-30 16:35:36 - pico-train - INFO - โโโ Learning Rate: 4.95e-05 |
|
2025-08-30 16:35:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:36:27 - pico-train - INFO - Step 67600 -- ๐ Training Metrics |
|
2025-08-30 16:36:27 - pico-train - INFO - โโโ Loss: 4.7905 |
|
2025-08-30 16:36:27 - pico-train - INFO - โโโ Learning Rate: 4.93e-05 |
|
2025-08-30 16:36:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:37:19 - pico-train - INFO - Step 67700 -- ๐ Training Metrics |
|
2025-08-30 16:37:19 - pico-train - INFO - โโโ Loss: 4.8253 |
|
2025-08-30 16:37:19 - pico-train - INFO - โโโ Learning Rate: 4.90e-05 |
|
2025-08-30 16:37:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:38:11 - pico-train - INFO - Step 67800 -- ๐ Training Metrics |
|
2025-08-30 16:38:11 - pico-train - INFO - โโโ Loss: 4.7848 |
|
2025-08-30 16:38:11 - pico-train - INFO - โโโ Learning Rate: 4.87e-05 |
|
2025-08-30 16:38:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:39:03 - pico-train - INFO - Step 67900 -- ๐ Training Metrics |
|
2025-08-30 16:39:03 - pico-train - INFO - โโโ Loss: 4.8165 |
|
2025-08-30 16:39:03 - pico-train - INFO - โโโ Learning Rate: 4.84e-05 |
|
2025-08-30 16:39:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:39:55 - pico-train - INFO - Step 68000 -- ๐พ Saving Checkpoint |
|
2025-08-30 16:42:09 - pico-train - INFO - Step 68000 -- ๐ Evaluation Results |
|
2025-08-30 16:42:09 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- ๐ Training Metrics |
|
2025-08-30 16:42:10 - pico-train - INFO - โโโ Loss: 4.8264 |
|
2025-08-30 16:42:10 - pico-train - INFO - โโโ Learning Rate: 4.82e-05 |
|
2025-08-30 16:42:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 16:43:07 - pico-train - INFO - Step 68100 -- ๐ Training Metrics |
|
2025-08-30 16:43:07 - pico-train - INFO - โโโ Loss: 4.8363 |
|
2025-08-30 16:43:07 - pico-train - INFO - โโโ Learning Rate: 4.79e-05 |
|
2025-08-30 16:43:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:43:59 - pico-train - INFO - Step 68200 -- ๐ Training Metrics |
|
2025-08-30 16:43:59 - pico-train - INFO - โโโ Loss: 4.7964 |
|
2025-08-30 16:43:59 - pico-train - INFO - โโโ Learning Rate: 4.76e-05 |
|
2025-08-30 16:43:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:44:51 - pico-train - INFO - Step 68300 -- ๐ Training Metrics |
|
2025-08-30 16:44:51 - pico-train - INFO - โโโ Loss: 4.7999 |
|
2025-08-30 16:44:51 - pico-train - INFO - โโโ Learning Rate: 4.73e-05 |
|
2025-08-30 16:44:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:45:43 - pico-train - INFO - Step 68400 -- ๐ Training Metrics |
|
2025-08-30 16:45:43 - pico-train - INFO - โโโ Loss: 4.8119 |
|
2025-08-30 16:45:43 - pico-train - INFO - โโโ Learning Rate: 4.71e-05 |
|
2025-08-30 16:45:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:46:35 - pico-train - INFO - Step 68500 -- ๐ Training Metrics |
|
2025-08-30 16:46:35 - pico-train - INFO - โโโ Loss: 4.7998 |
|
2025-08-30 16:46:35 - pico-train - INFO - โโโ Learning Rate: 4.68e-05 |
|
2025-08-30 16:46:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:47:27 - pico-train - INFO - Step 68600 -- ๐ Training Metrics |
|
2025-08-30 16:47:27 - pico-train - INFO - โโโ Loss: 4.8010 |
|
2025-08-30 16:47:27 - pico-train - INFO - โโโ Learning Rate: 4.65e-05 |
|
2025-08-30 16:47:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:48:19 - pico-train - INFO - Step 68700 -- ๐ Training Metrics |
|
2025-08-30 16:48:19 - pico-train - INFO - โโโ Loss: 4.7986 |
|
2025-08-30 16:48:19 - pico-train - INFO - โโโ Learning Rate: 4.63e-05 |
|
2025-08-30 16:48:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:49:12 - pico-train - INFO - Step 68800 -- ๐ Training Metrics |
|
2025-08-30 16:49:12 - pico-train - INFO - โโโ Loss: 4.8133 |
|
2025-08-30 16:49:12 - pico-train - INFO - โโโ Learning Rate: 4.60e-05 |
|
2025-08-30 16:49:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:50:05 - pico-train - INFO - Step 68900 -- ๐ Training Metrics |
|
2025-08-30 16:50:05 - pico-train - INFO - โโโ Loss: 4.7944 |
|
2025-08-30 16:50:05 - pico-train - INFO - โโโ Learning Rate: 4.57e-05 |
|
2025-08-30 16:50:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:50:58 - pico-train - INFO - Step 69000 -- ๐ Training Metrics |
|
2025-08-30 16:50:58 - pico-train - INFO - โโโ Loss: 4.8021 |
|
2025-08-30 16:50:58 - pico-train - INFO - โโโ Learning Rate: 4.54e-05 |
|
2025-08-30 16:50:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:51:51 - pico-train - INFO - Step 69100 -- ๐ Training Metrics |
|
2025-08-30 16:51:51 - pico-train - INFO - โโโ Loss: 4.7611 |
|
2025-08-30 16:51:51 - pico-train - INFO - โโโ Learning Rate: 4.52e-05 |
|
2025-08-30 16:51:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:52:44 - pico-train - INFO - Step 69200 -- ๐ Training Metrics |
|
2025-08-30 16:52:44 - pico-train - INFO - โโโ Loss: 4.7981 |
|
2025-08-30 16:52:44 - pico-train - INFO - โโโ Learning Rate: 4.49e-05 |
|
2025-08-30 16:52:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:53:38 - pico-train - INFO - Step 69300 -- ๐ Training Metrics |
|
2025-08-30 16:53:38 - pico-train - INFO - โโโ Loss: 4.8066 |
|
2025-08-30 16:53:38 - pico-train - INFO - โโโ Learning Rate: 4.46e-05 |
|
2025-08-30 16:53:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:54:31 - pico-train - INFO - Step 69400 -- ๐ Training Metrics |
|
2025-08-30 16:54:31 - pico-train - INFO - โโโ Loss: 4.8053 |
|
2025-08-30 16:54:31 - pico-train - INFO - โโโ Learning Rate: 4.44e-05 |
|
2025-08-30 16:54:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:55:23 - pico-train - INFO - Step 69500 -- ๐ Training Metrics |
|
2025-08-30 16:55:23 - pico-train - INFO - โโโ Loss: 4.7953 |
|
2025-08-30 16:55:23 - pico-train - INFO - โโโ Learning Rate: 4.41e-05 |
|
2025-08-30 16:55:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:56:16 - pico-train - INFO - Step 69600 -- ๐ Training Metrics |
|
2025-08-30 16:56:16 - pico-train - INFO - โโโ Loss: 4.8087 |
|
2025-08-30 16:56:16 - pico-train - INFO - โโโ Learning Rate: 4.38e-05 |
|
2025-08-30 16:56:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:57:10 - pico-train - INFO - Step 69700 -- ๐ Training Metrics |
|
2025-08-30 16:57:10 - pico-train - INFO - โโโ Loss: 4.7915 |
|
2025-08-30 16:57:10 - pico-train - INFO - โโโ Learning Rate: 4.36e-05 |
|
2025-08-30 16:57:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:58:03 - pico-train - INFO - Step 69800 -- ๐ Training Metrics |
|
2025-08-30 16:58:03 - pico-train - INFO - โโโ Loss: 4.8145 |
|
2025-08-30 16:58:03 - pico-train - INFO - โโโ Learning Rate: 4.33e-05 |
|
2025-08-30 16:58:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:58:56 - pico-train - INFO - Step 69900 -- ๐ Training Metrics |
|
2025-08-30 16:58:56 - pico-train - INFO - โโโ Loss: 4.8056 |
|
2025-08-30 16:58:56 - pico-train - INFO - โโโ Learning Rate: 4.31e-05 |
|
2025-08-30 16:58:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 16:59:48 - pico-train - INFO - Step 70000 -- ๐พ Saving Checkpoint |
|
2025-08-30 17:01:50 - pico-train - INFO - Step 70000 -- ๐ Evaluation Results |
|
2025-08-30 17:01:50 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- ๐ Training Metrics |
|
2025-08-30 17:01:52 - pico-train - INFO - โโโ Loss: 4.7898 |
|
2025-08-30 17:01:52 - pico-train - INFO - โโโ Learning Rate: 4.28e-05 |
|
2025-08-30 17:01:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 17:02:48 - pico-train - INFO - Step 70100 -- ๐ Training Metrics |
|
2025-08-30 17:02:48 - pico-train - INFO - โโโ Loss: 4.7929 |
|
2025-08-30 17:02:48 - pico-train - INFO - โโโ Learning Rate: 4.25e-05 |
|
2025-08-30 17:02:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:03:40 - pico-train - INFO - Step 70200 -- ๐ Training Metrics |
|
2025-08-30 17:03:40 - pico-train - INFO - โโโ Loss: 4.8215 |
|
2025-08-30 17:03:40 - pico-train - INFO - โโโ Learning Rate: 4.23e-05 |
|
2025-08-30 17:03:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:04:32 - pico-train - INFO - Step 70300 -- ๐ Training Metrics |
|
2025-08-30 17:04:32 - pico-train - INFO - โโโ Loss: 4.8139 |
|
2025-08-30 17:04:32 - pico-train - INFO - โโโ Learning Rate: 4.20e-05 |
|
2025-08-30 17:04:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:05:24 - pico-train - INFO - Step 70400 -- ๐ Training Metrics |
|
2025-08-30 17:05:24 - pico-train - INFO - โโโ Loss: 4.7922 |
|
2025-08-30 17:05:24 - pico-train - INFO - โโโ Learning Rate: 4.17e-05 |
|
2025-08-30 17:05:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:06:16 - pico-train - INFO - Step 70500 -- ๐ Training Metrics |
|
2025-08-30 17:06:16 - pico-train - INFO - โโโ Loss: 4.7923 |
|
2025-08-30 17:06:16 - pico-train - INFO - โโโ Learning Rate: 4.15e-05 |
|
2025-08-30 17:06:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:07:08 - pico-train - INFO - Step 70600 -- ๐ Training Metrics |
|
2025-08-30 17:07:08 - pico-train - INFO - โโโ Loss: 4.8075 |
|
2025-08-30 17:07:08 - pico-train - INFO - โโโ Learning Rate: 4.12e-05 |
|
2025-08-30 17:07:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:08:00 - pico-train - INFO - Step 70700 -- ๐ Training Metrics |
|
2025-08-30 17:08:00 - pico-train - INFO - โโโ Loss: 4.7833 |
|
2025-08-30 17:08:00 - pico-train - INFO - โโโ Learning Rate: 4.10e-05 |
|
2025-08-30 17:08:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:08:52 - pico-train - INFO - Step 70800 -- ๐ Training Metrics |
|
2025-08-30 17:08:52 - pico-train - INFO - โโโ Loss: 4.8036 |
|
2025-08-30 17:08:52 - pico-train - INFO - โโโ Learning Rate: 4.07e-05 |
|
2025-08-30 17:08:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:09:44 - pico-train - INFO - Step 70900 -- ๐ Training Metrics |
|
2025-08-30 17:09:44 - pico-train - INFO - โโโ Loss: 4.7910 |
|
2025-08-30 17:09:44 - pico-train - INFO - โโโ Learning Rate: 4.04e-05 |
|
2025-08-30 17:09:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:10:36 - pico-train - INFO - Step 71000 -- ๐ Training Metrics |
|
2025-08-30 17:10:36 - pico-train - INFO - โโโ Loss: 4.7723 |
|
2025-08-30 17:10:36 - pico-train - INFO - โโโ Learning Rate: 4.02e-05 |
|
2025-08-30 17:10:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:11:28 - pico-train - INFO - Step 71100 -- ๐ Training Metrics |
|
2025-08-30 17:11:28 - pico-train - INFO - โโโ Loss: 4.7768 |
|
2025-08-30 17:11:28 - pico-train - INFO - โโโ Learning Rate: 3.99e-05 |
|
2025-08-30 17:11:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:12:19 - pico-train - INFO - Step 71200 -- ๐ Training Metrics |
|
2025-08-30 17:12:19 - pico-train - INFO - โโโ Loss: 4.7984 |
|
2025-08-30 17:12:19 - pico-train - INFO - โโโ Learning Rate: 3.97e-05 |
|
2025-08-30 17:12:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:13:11 - pico-train - INFO - Step 71300 -- ๐ Training Metrics |
|
2025-08-30 17:13:11 - pico-train - INFO - โโโ Loss: 4.7825 |
|
2025-08-30 17:13:11 - pico-train - INFO - โโโ Learning Rate: 3.94e-05 |
|
2025-08-30 17:13:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:14:03 - pico-train - INFO - Step 71400 -- ๐ Training Metrics |
|
2025-08-30 17:14:03 - pico-train - INFO - โโโ Loss: 4.8093 |
|
2025-08-30 17:14:03 - pico-train - INFO - โโโ Learning Rate: 3.92e-05 |
|
2025-08-30 17:14:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:14:55 - pico-train - INFO - Step 71500 -- ๐ Training Metrics |
|
2025-08-30 17:14:55 - pico-train - INFO - โโโ Loss: 4.7903 |
|
2025-08-30 17:14:55 - pico-train - INFO - โโโ Learning Rate: 3.89e-05 |
|
2025-08-30 17:14:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:15:47 - pico-train - INFO - Step 71600 -- ๐ Training Metrics |
|
2025-08-30 17:15:47 - pico-train - INFO - โโโ Loss: 4.8269 |
|
2025-08-30 17:15:47 - pico-train - INFO - โโโ Learning Rate: 3.87e-05 |
|
2025-08-30 17:15:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:16:39 - pico-train - INFO - Step 71700 -- ๐ Training Metrics |
|
2025-08-30 17:16:39 - pico-train - INFO - โโโ Loss: 4.8135 |
|
2025-08-30 17:16:39 - pico-train - INFO - โโโ Learning Rate: 3.84e-05 |
|
2025-08-30 17:16:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:17:31 - pico-train - INFO - Step 71800 -- ๐ Training Metrics |
|
2025-08-30 17:17:31 - pico-train - INFO - โโโ Loss: 4.7759 |
|
2025-08-30 17:17:31 - pico-train - INFO - โโโ Learning Rate: 3.82e-05 |
|
2025-08-30 17:17:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:18:22 - pico-train - INFO - Step 71900 -- ๐ Training Metrics |
|
2025-08-30 17:18:22 - pico-train - INFO - โโโ Loss: 4.7837 |
|
2025-08-30 17:18:22 - pico-train - INFO - โโโ Learning Rate: 3.79e-05 |
|
2025-08-30 17:18:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:19:15 - pico-train - INFO - Step 72000 -- ๐พ Saving Checkpoint |
|
2025-08-30 17:21:27 - pico-train - INFO - Step 72000 -- ๐ Evaluation Results |
|
2025-08-30 17:21:27 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- ๐ Training Metrics |
|
2025-08-30 17:21:28 - pico-train - INFO - โโโ Loss: 4.8016 |
|
2025-08-30 17:21:28 - pico-train - INFO - โโโ Learning Rate: 3.77e-05 |
|
2025-08-30 17:21:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 17:22:25 - pico-train - INFO - Step 72100 -- ๐ Training Metrics |
|
2025-08-30 17:22:25 - pico-train - INFO - โโโ Loss: 4.7643 |
|
2025-08-30 17:22:25 - pico-train - INFO - โโโ Learning Rate: 3.74e-05 |
|
2025-08-30 17:22:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:23:16 - pico-train - INFO - Step 72200 -- ๐ Training Metrics |
|
2025-08-30 17:23:16 - pico-train - INFO - โโโ Loss: 4.7938 |
|
2025-08-30 17:23:16 - pico-train - INFO - โโโ Learning Rate: 3.72e-05 |
|
2025-08-30 17:23:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:24:08 - pico-train - INFO - Step 72300 -- ๐ Training Metrics |
|
2025-08-30 17:24:08 - pico-train - INFO - โโโ Loss: 4.7962 |
|
2025-08-30 17:24:08 - pico-train - INFO - โโโ Learning Rate: 3.69e-05 |
|
2025-08-30 17:24:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:25:00 - pico-train - INFO - Step 72400 -- ๐ Training Metrics |
|
2025-08-30 17:25:00 - pico-train - INFO - โโโ Loss: 4.8089 |
|
2025-08-30 17:25:00 - pico-train - INFO - โโโ Learning Rate: 3.67e-05 |
|
2025-08-30 17:25:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:25:52 - pico-train - INFO - Step 72500 -- ๐ Training Metrics |
|
2025-08-30 17:25:52 - pico-train - INFO - โโโ Loss: 4.8081 |
|
2025-08-30 17:25:52 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 |
|
2025-08-30 17:25:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:26:44 - pico-train - INFO - Step 72600 -- ๐ Training Metrics |
|
2025-08-30 17:26:44 - pico-train - INFO - โโโ Loss: 4.8095 |
|
2025-08-30 17:26:44 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 |
|
2025-08-30 17:26:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:27:36 - pico-train - INFO - Step 72700 -- ๐ Training Metrics |
|
2025-08-30 17:27:36 - pico-train - INFO - โโโ Loss: 4.8020 |
|
2025-08-30 17:27:36 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
|
2025-08-30 17:27:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:28:28 - pico-train - INFO - Step 72800 -- ๐ Training Metrics |
|
2025-08-30 17:28:28 - pico-train - INFO - โโโ Loss: 4.7579 |
|
2025-08-30 17:28:28 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 |
|
2025-08-30 17:28:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:29:20 - pico-train - INFO - Step 72900 -- ๐ Training Metrics |
|
2025-08-30 17:29:20 - pico-train - INFO - โโโ Loss: 4.7869 |
|
2025-08-30 17:29:20 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
|
2025-08-30 17:29:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:30:12 - pico-train - INFO - Step 73000 -- ๐ Training Metrics |
|
2025-08-30 17:30:12 - pico-train - INFO - โโโ Loss: 4.7825 |
|
2025-08-30 17:30:12 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 |
|
2025-08-30 17:30:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:31:03 - pico-train - INFO - Step 73100 -- ๐ Training Metrics |
|
2025-08-30 17:31:03 - pico-train - INFO - โโโ Loss: 4.8111 |
|
2025-08-30 17:31:03 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 |
|
2025-08-30 17:31:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:31:55 - pico-train - INFO - Step 73200 -- ๐ Training Metrics |
|
2025-08-30 17:31:55 - pico-train - INFO - โโโ Loss: 4.8028 |
|
2025-08-30 17:31:55 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 |
|
2025-08-30 17:31:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:32:47 - pico-train - INFO - Step 73300 -- ๐ Training Metrics |
|
2025-08-30 17:32:47 - pico-train - INFO - โโโ Loss: 4.8025 |
|
2025-08-30 17:32:47 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 |
|
2025-08-30 17:32:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:33:39 - pico-train - INFO - Step 73400 -- ๐ Training Metrics |
|
2025-08-30 17:33:39 - pico-train - INFO - โโโ Loss: 4.7917 |
|
2025-08-30 17:33:39 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 |
|
2025-08-30 17:33:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:34:31 - pico-train - INFO - Step 73500 -- ๐ Training Metrics |
|
2025-08-30 17:34:31 - pico-train - INFO - โโโ Loss: 4.7851 |
|
2025-08-30 17:34:31 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
|
2025-08-30 17:34:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:35:23 - pico-train - INFO - Step 73600 -- ๐ Training Metrics |
|
2025-08-30 17:35:23 - pico-train - INFO - โโโ Loss: 4.7807 |
|
2025-08-30 17:35:23 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 |
|
2025-08-30 17:35:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:36:15 - pico-train - INFO - Step 73700 -- ๐ Training Metrics |
|
2025-08-30 17:36:15 - pico-train - INFO - โโโ Loss: 4.7741 |
|
2025-08-30 17:36:15 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 |
|
2025-08-30 17:36:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:37:07 - pico-train - INFO - Step 73800 -- ๐ Training Metrics |
|
2025-08-30 17:37:07 - pico-train - INFO - โโโ Loss: 4.8076 |
|
2025-08-30 17:37:07 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 |
|
2025-08-30 17:37:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:37:59 - pico-train - INFO - Step 73900 -- ๐ Training Metrics |
|
2025-08-30 17:37:59 - pico-train - INFO - โโโ Loss: 4.8119 |
|
2025-08-30 17:37:59 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
|
2025-08-30 17:37:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:38:50 - pico-train - INFO - Step 74000 -- ๐พ Saving Checkpoint |
|
2025-08-30 17:40:51 - pico-train - INFO - Step 74000 -- ๐ Evaluation Results |
|
2025-08-30 17:40:51 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- ๐ Training Metrics |
|
2025-08-30 17:40:53 - pico-train - INFO - โโโ Loss: 4.7960 |
|
2025-08-30 17:40:53 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 |
|
2025-08-30 17:40:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 17:41:49 - pico-train - INFO - Step 74100 -- ๐ Training Metrics |
|
2025-08-30 17:41:49 - pico-train - INFO - โโโ Loss: 4.7909 |
|
2025-08-30 17:41:49 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 |
|
2025-08-30 17:41:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:42:42 - pico-train - INFO - Step 74200 -- ๐ Training Metrics |
|
2025-08-30 17:42:42 - pico-train - INFO - โโโ Loss: 4.7807 |
|
2025-08-30 17:42:42 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 |
|
2025-08-30 17:42:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:43:36 - pico-train - INFO - Step 74300 -- ๐ Training Metrics |
|
2025-08-30 17:43:36 - pico-train - INFO - โโโ Loss: 4.7711 |
|
2025-08-30 17:43:36 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 |
|
2025-08-30 17:43:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:44:29 - pico-train - INFO - Step 74400 -- ๐ Training Metrics |
|
2025-08-30 17:44:29 - pico-train - INFO - โโโ Loss: 4.7837 |
|
2025-08-30 17:44:29 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 |
|
2025-08-30 17:44:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:45:21 - pico-train - INFO - Step 74500 -- ๐ Training Metrics |
|
2025-08-30 17:45:21 - pico-train - INFO - โโโ Loss: 4.7668 |
|
2025-08-30 17:45:21 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 |
|
2025-08-30 17:45:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:46:15 - pico-train - INFO - Step 74600 -- ๐ Training Metrics |
|
2025-08-30 17:46:15 - pico-train - INFO - โโโ Loss: 4.7985 |
|
2025-08-30 17:46:15 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 |
|
2025-08-30 17:46:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:47:08 - pico-train - INFO - Step 74700 -- ๐ Training Metrics |
|
2025-08-30 17:47:08 - pico-train - INFO - โโโ Loss: 4.7702 |
|
2025-08-30 17:47:08 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 |
|
2025-08-30 17:47:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:48:01 - pico-train - INFO - Step 74800 -- ๐ Training Metrics |
|
2025-08-30 17:48:01 - pico-train - INFO - โโโ Loss: 4.8002 |
|
2025-08-30 17:48:01 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 |
|
2025-08-30 17:48:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:48:54 - pico-train - INFO - Step 74900 -- ๐ Training Metrics |
|
2025-08-30 17:48:54 - pico-train - INFO - โโโ Loss: 4.7955 |
|
2025-08-30 17:48:54 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 |
|
2025-08-30 17:48:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:49:48 - pico-train - INFO - Step 75000 -- ๐ Training Metrics |
|
2025-08-30 17:49:48 - pico-train - INFO - โโโ Loss: 4.8023 |
|
2025-08-30 17:49:48 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 |
|
2025-08-30 17:49:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:50:41 - pico-train - INFO - Step 75100 -- ๐ Training Metrics |
|
2025-08-30 17:50:41 - pico-train - INFO - โโโ Loss: 4.7842 |
|
2025-08-30 17:50:41 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 |
|
2025-08-30 17:50:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:51:34 - pico-train - INFO - Step 75200 -- ๐ Training Metrics |
|
2025-08-30 17:51:34 - pico-train - INFO - โโโ Loss: 4.7890 |
|
2025-08-30 17:51:34 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
|
2025-08-30 17:51:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:52:27 - pico-train - INFO - Step 75300 -- ๐ Training Metrics |
|
2025-08-30 17:52:27 - pico-train - INFO - โโโ Loss: 4.8004 |
|
2025-08-30 17:52:27 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 |
|
2025-08-30 17:52:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:53:20 - pico-train - INFO - Step 75400 -- ๐ Training Metrics |
|
2025-08-30 17:53:20 - pico-train - INFO - โโโ Loss: 4.7917 |
|
2025-08-30 17:53:20 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 |
|
2025-08-30 17:53:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:54:13 - pico-train - INFO - Step 75500 -- ๐ Training Metrics |
|
2025-08-30 17:54:13 - pico-train - INFO - โโโ Loss: 4.7867 |
|
2025-08-30 17:54:13 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 |
|
2025-08-30 17:54:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:55:07 - pico-train - INFO - Step 75600 -- ๐ Training Metrics |
|
2025-08-30 17:55:07 - pico-train - INFO - โโโ Loss: 4.7957 |
|
2025-08-30 17:55:07 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 |
|
2025-08-30 17:55:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:56:00 - pico-train - INFO - Step 75700 -- ๐ Training Metrics |
|
2025-08-30 17:56:00 - pico-train - INFO - โโโ Loss: 4.7840 |
|
2025-08-30 17:56:00 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 |
|
2025-08-30 17:56:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:56:56 - pico-train - INFO - Step 75800 -- ๐ Training Metrics |
|
2025-08-30 17:56:56 - pico-train - INFO - โโโ Loss: 4.7990 |
|
2025-08-30 17:56:56 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 |
|
2025-08-30 17:56:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:57:48 - pico-train - INFO - Step 75900 -- ๐ Training Metrics |
|
2025-08-30 17:57:48 - pico-train - INFO - โโโ Loss: 4.7904 |
|
2025-08-30 17:57:48 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 |
|
2025-08-30 17:57:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 17:58:41 - pico-train - INFO - Step 76000 -- ๐พ Saving Checkpoint |
|
2025-08-30 18:01:59 - pico-train - INFO - Step 76000 -- ๐ Evaluation Results |
|
2025-08-30 18:01:59 - pico-train - INFO - โโโ paloma: inf |
|
2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- ๐ Training Metrics |
|
2025-08-30 18:02:00 - pico-train - INFO - โโโ Loss: 4.7972 |
|
2025-08-30 18:02:00 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 |
|
2025-08-30 18:02:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- ๐ Saving Learning Dynamics |
|
2025-08-30 18:03:04 - pico-train - INFO - Step 76100 -- ๐ Training Metrics |
|
2025-08-30 18:03:04 - pico-train - INFO - โโโ Loss: 4.7730 |
|
2025-08-30 18:03:04 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 |
|
2025-08-30 18:03:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:03:56 - pico-train - INFO - Step 76200 -- ๐ Training Metrics |
|
2025-08-30 18:03:56 - pico-train - INFO - โโโ Loss: 4.7997 |
|
2025-08-30 18:03:56 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 |
|
2025-08-30 18:03:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:04:48 - pico-train - INFO - Step 76300 -- ๐ Training Metrics |
|
2025-08-30 18:04:48 - pico-train - INFO - โโโ Loss: 4.7843 |
|
2025-08-30 18:04:48 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 |
|
2025-08-30 18:04:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:05:40 - pico-train - INFO - Step 76400 -- ๐ Training Metrics |
|
2025-08-30 18:05:40 - pico-train - INFO - โโโ Loss: 4.7858 |
|
2025-08-30 18:05:40 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 |
|
2025-08-30 18:05:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:06:32 - pico-train - INFO - Step 76500 -- ๐ Training Metrics |
|
2025-08-30 18:06:32 - pico-train - INFO - โโโ Loss: 4.8110 |
|
2025-08-30 18:06:32 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 |
|
2025-08-30 18:06:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:07:24 - pico-train - INFO - Step 76600 -- ๐ Training Metrics |
|
2025-08-30 18:07:24 - pico-train - INFO - โโโ Loss: 4.7834 |
|
2025-08-30 18:07:24 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 |
|
2025-08-30 18:07:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:08:16 - pico-train - INFO - Step 76700 -- ๐ Training Metrics |
|
2025-08-30 18:08:16 - pico-train - INFO - โโโ Loss: 4.7936 |
|
2025-08-30 18:08:16 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 |
|
2025-08-30 18:08:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:09:08 - pico-train - INFO - Step 76800 -- ๐ Training Metrics |
|
2025-08-30 18:09:08 - pico-train - INFO - โโโ Loss: 4.7869 |
|
2025-08-30 18:09:08 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 |
|
2025-08-30 18:09:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:10:00 - pico-train - INFO - Step 76900 -- ๐ Training Metrics |
|
2025-08-30 18:10:00 - pico-train - INFO - โโโ Loss: 4.7979 |
|
2025-08-30 18:10:00 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 |
|
2025-08-30 18:10:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:10:54 - pico-train - INFO - Step 77000 -- ๐ Training Metrics |
|
2025-08-30 18:10:54 - pico-train - INFO - โโโ Loss: 4.7956 |
|
2025-08-30 18:10:54 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
|
2025-08-30 18:10:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:11:46 - pico-train - INFO - Step 77100 -- ๐ Training Metrics |
|
2025-08-30 18:11:46 - pico-train - INFO - โโโ Loss: 4.7974 |
|
2025-08-30 18:11:46 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 |
|
2025-08-30 18:11:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:12:38 - pico-train - INFO - Step 77200 -- ๐ Training Metrics |
|
2025-08-30 18:12:38 - pico-train - INFO - โโโ Loss: 4.8074 |
|
2025-08-30 18:12:38 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 |
|
2025-08-30 18:12:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:13:30 - pico-train - INFO - Step 77300 -- ๐ Training Metrics |
|
2025-08-30 18:13:30 - pico-train - INFO - โโโ Loss: 4.8276 |
|
2025-08-30 18:13:30 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 |
|
2025-08-30 18:13:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:14:27 - pico-train - INFO - Step 77400 -- ๐ Training Metrics |
|
2025-08-30 18:14:27 - pico-train - INFO - โโโ Loss: 4.7908 |
|
2025-08-30 18:14:27 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 |
|
2025-08-30 18:14:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:15:20 - pico-train - INFO - Step 77500 -- ๐ Training Metrics |
|
2025-08-30 18:15:20 - pico-train - INFO - โโโ Loss: 4.8142 |
|
2025-08-30 18:15:20 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 |
|
2025-08-30 18:15:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:16:13 - pico-train - INFO - Step 77600 -- ๐ Training Metrics |
|
2025-08-30 18:16:13 - pico-train - INFO - โโโ Loss: 4.8052 |
|
2025-08-30 18:16:13 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 |
|
2025-08-30 18:16:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:17:06 - pico-train - INFO - Step 77700 -- ๐ Training Metrics |
|
2025-08-30 18:17:06 - pico-train - INFO - โโโ Loss: 4.7876 |
|
2025-08-30 18:17:06 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 |
|
2025-08-30 18:17:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:18:01 - pico-train - INFO - Step 77800 -- ๐ Training Metrics |
|
2025-08-30 18:18:01 - pico-train - INFO - โโโ Loss: 4.8011 |
|
2025-08-30 18:18:01 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 |
|
2025-08-30 18:18:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:18:54 - pico-train - INFO - Step 77900 -- ๐ Training Metrics |
|
2025-08-30 18:18:54 - pico-train - INFO - โโโ Loss: 4.7936 |
|
2025-08-30 18:18:54 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 |
|
2025-08-30 18:18:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
2025-08-30 18:19:47 - pico-train - INFO - Step 78000 -- ๐พ Saving Checkpoint |
|
|