ThomasTheMaker's picture
Upload folder using huggingface_hub
ce2c393 verified
2025-08-30 01:43:03 - pico-train - INFO - Step 32000 -- ๐Ÿ“Š Evaluation Results
2025-08-30 01:43:03 - pico-train - INFO - โ””โ”€โ”€ paloma: 2.977755235898109e+26
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - โœจ Training Configuration
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma5M-v1 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ save_every_n_steps: 500 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ save_to_hf: true โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ batch_size: 4 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ name: ThomasTheMaker/pretokenized-dolma-5M โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ log_every_n_steps: 25 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ max_steps: 20000 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 4 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ lr: 5.0e-05 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ lr_warmup_steps: 8000 โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ”‚ โ”‚
2025-08-30 01:43:05 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:05 - pico-train - INFO - Starting from step: 32000
2025-08-30 01:43:05 - pico-train - INFO - Model Setup:
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-30 01:43:05 - pico-train - INFO - Distributed Setup:
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA GeForce RTX 5090
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Available Memory: 33.68 GB
2025-08-30 01:43:05 - pico-train - INFO - Software Setup:
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Python Version: 3.10.12
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-63-generic
2025-08-30 01:43:05 - pico-train - INFO - Batch Size Configuration:
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Global Batch Size: 4
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 1
2025-08-30 01:43:05 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 4
2025-08-30 01:43:05 - pico-train - INFO - ==================================================
2025-08-30 01:43:06 - pico-train - INFO - Step 32000 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:43:06 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.3376
2025-08-30 01:43:06 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.32e-06
2025-08-30 01:43:06 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:43:06 - pico-train - INFO - Step 32000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-30 01:43:20 - pico-train - INFO - Step 32025 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:43:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1999
2025-08-30 01:43:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.28e-06
2025-08-30 01:43:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:43:33 - pico-train - INFO - Step 32050 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:43:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1488
2025-08-30 01:43:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.24e-06
2025-08-30 01:43:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:43:45 - pico-train - INFO - Step 32075 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:43:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0460
2025-08-30 01:43:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.19e-06
2025-08-30 01:43:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:43:58 - pico-train - INFO - Step 32100 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:43:58 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1627
2025-08-30 01:43:58 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.15e-06
2025-08-30 01:43:58 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:44:11 - pico-train - INFO - Step 32125 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:44:11 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2085
2025-08-30 01:44:11 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.11e-06
2025-08-30 01:44:11 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:44:23 - pico-train - INFO - Step 32150 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:44:23 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1659
2025-08-30 01:44:23 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.06e-06
2025-08-30 01:44:23 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:44:36 - pico-train - INFO - Step 32175 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:44:36 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1719
2025-08-30 01:44:36 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 7.02e-06
2025-08-30 01:44:36 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:44:48 - pico-train - INFO - Step 32200 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:44:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2081
2025-08-30 01:44:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.98e-06
2025-08-30 01:44:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:45:01 - pico-train - INFO - Step 32225 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:45:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1955
2025-08-30 01:45:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.94e-06
2025-08-30 01:45:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:45:14 - pico-train - INFO - Step 32250 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:45:14 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1139
2025-08-30 01:45:14 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.89e-06
2025-08-30 01:45:14 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:45:26 - pico-train - INFO - Step 32275 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:45:26 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1075
2025-08-30 01:45:26 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.85e-06
2025-08-30 01:45:26 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:45:39 - pico-train - INFO - Step 32300 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:45:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0814
2025-08-30 01:45:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.81e-06
2025-08-30 01:45:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:45:51 - pico-train - INFO - Step 32325 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:45:51 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.0880
2025-08-30 01:45:51 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.77e-06
2025-08-30 01:45:51 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:46:04 - pico-train - INFO - Step 32350 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:46:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1997
2025-08-30 01:46:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.73e-06
2025-08-30 01:46:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:46:16 - pico-train - INFO - Step 32375 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:46:16 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1376
2025-08-30 01:46:16 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.68e-06
2025-08-30 01:46:16 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:46:29 - pico-train - INFO - Step 32400 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:46:29 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1077
2025-08-30 01:46:29 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.64e-06
2025-08-30 01:46:29 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:46:42 - pico-train - INFO - Step 32425 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:46:42 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2641
2025-08-30 01:46:42 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.60e-06
2025-08-30 01:46:42 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:46:54 - pico-train - INFO - Step 32450 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:46:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.1020
2025-08-30 01:46:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.56e-06
2025-08-30 01:46:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:47:07 - pico-train - INFO - Step 32475 -- ๐Ÿ”„ Training Metrics
2025-08-30 01:47:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 6.2170
2025-08-30 01:47:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 6.52e-06
2025-08-30 01:47:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-30 01:47:19 - pico-train - INFO - Step 32500 -- ๐Ÿ’พ Saving Checkpoint