File size: 10,037 Bytes
6b0f104 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
2025-09-01 01:54:09 - pico-train - INFO - Step 0 -- ๐ Evaluation Results
2025-09-01 01:54:09 - pico-train - INFO - โโโ paloma: inf
2025-09-01 01:54:09 - pico-train - INFO - ==================================================
2025-09-01 01:54:09 - pico-train - INFO - โจ Training Configuration
2025-09-01 01:54:09 - pico-train - INFO - ==================================================
2025-09-01 01:54:09 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
2025-09-01 01:54:09 - pico-train - INFO - โ checkpointing: โ
2025-09-01 01:54:09 - pico-train - INFO - โ checkpoints_dir: checkpoints โ
2025-09-01 01:54:09 - pico-train - INFO - โ evaluation: โ
2025-09-01 01:54:09 - pico-train - INFO - โ eval_results_dir: eval_results โ
2025-09-01 01:54:09 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ
2025-09-01 01:54:09 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ
2025-09-01 01:54:09 - pico-train - INFO - โ hf_checkpoint: โ
2025-09-01 01:54:09 - pico-train - INFO - โ collection_slug: null โ
2025-09-01 01:54:09 - pico-train - INFO - โ repo_id: pico-lm/pico-decoder-tiny โ
2025-09-01 01:54:09 - pico-train - INFO - โ learning_dynamics: โ
2025-09-01 01:54:09 - pico-train - INFO - โ batch_size: 256 โ
2025-09-01 01:54:09 - pico-train - INFO - โ eval_data: pico-lm/pretokenized-paloma-tinsy โ
2025-09-01 01:54:09 - pico-train - INFO - โ layer_suffixes: โ
2025-09-01 01:54:09 - pico-train - INFO - โ - attention.v_proj โ
2025-09-01 01:54:09 - pico-train - INFO - โ - attention.o_proj โ
2025-09-01 01:54:09 - pico-train - INFO - โ - swiglu.w_2 โ
2025-09-01 01:54:09 - pico-train - INFO - โ sequence_idx: -1 โ
2025-09-01 01:54:09 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ
2025-09-01 01:54:09 - pico-train - INFO - โ logs_dir: logs โ
2025-09-01 01:54:09 - pico-train - INFO - โ run_name: tiny-dolma205M-v2 โ
2025-09-01 01:54:09 - pico-train - INFO - โ runs_dir: runs โ
2025-09-01 01:54:09 - pico-train - INFO - โ save_every_n_steps: 1000 โ
2025-09-01 01:54:09 - pico-train - INFO - โ save_to_hf: false โ
2025-09-01 01:54:09 - pico-train - INFO - โ training: โ
2025-09-01 01:54:09 - pico-train - INFO - โ auto_resume: true โ
2025-09-01 01:54:09 - pico-train - INFO - โ data: โ
2025-09-01 01:54:09 - pico-train - INFO - โ dataloader: โ
2025-09-01 01:54:09 - pico-train - INFO - โ batch_size: 1024 โ
2025-09-01 01:54:09 - pico-train - INFO - โ dataset: โ
2025-09-01 01:54:09 - pico-train - INFO - โ name: pico-lm/pretokenized-dolma โ
2025-09-01 01:54:09 - pico-train - INFO - โ tokenizer: โ
2025-09-01 01:54:09 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ
2025-09-01 01:54:09 - pico-train - INFO - โ vocab_size: 50304 โ
2025-09-01 01:54:09 - pico-train - INFO - โ evaluation: โ
2025-09-01 01:54:09 - pico-train - INFO - โ metrics: โ
2025-09-01 01:54:09 - pico-train - INFO - โ - paloma โ
2025-09-01 01:54:09 - pico-train - INFO - โ paloma: โ
2025-09-01 01:54:09 - pico-train - INFO - โ batch_size: 32 โ
2025-09-01 01:54:09 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ
2025-09-01 01:54:09 - pico-train - INFO - โ dataset_split: val โ
2025-09-01 01:54:09 - pico-train - INFO - โ max_length: 2048 โ
2025-09-01 01:54:09 - pico-train - INFO - โ model: โ
2025-09-01 01:54:09 - pico-train - INFO - โ activation_hidden_dim: 384 โ
2025-09-01 01:54:09 - pico-train - INFO - โ attention_n_heads: 12 โ
2025-09-01 01:54:09 - pico-train - INFO - โ attention_n_kv_heads: 4 โ
2025-09-01 01:54:09 - pico-train - INFO - โ batch_size: 1024 โ
2025-09-01 01:54:09 - pico-train - INFO - โ d_model: 96 โ
2025-09-01 01:54:09 - pico-train - INFO - โ max_seq_len: 2048 โ
2025-09-01 01:54:09 - pico-train - INFO - โ model_type: pico_decoder โ
2025-09-01 01:54:09 - pico-train - INFO - โ n_layers: 12 โ
2025-09-01 01:54:09 - pico-train - INFO - โ norm_eps: 1.0e-06 โ
2025-09-01 01:54:09 - pico-train - INFO - โ position_emb_theta: 10000.0 โ
2025-09-01 01:54:09 - pico-train - INFO - โ vocab_size: 50304 โ
2025-09-01 01:54:09 - pico-train - INFO - โ monitoring: โ
2025-09-01 01:54:09 - pico-train - INFO - โ logging: โ
2025-09-01 01:54:09 - pico-train - INFO - โ log_every_n_steps: 100 โ
2025-09-01 01:54:09 - pico-train - INFO - โ log_level: INFO โ
2025-09-01 01:54:09 - pico-train - INFO - โ save_to_wandb: false โ
2025-09-01 01:54:09 - pico-train - INFO - โ wandb: โ
2025-09-01 01:54:09 - pico-train - INFO - โ entity: pico-lm โ
2025-09-01 01:54:09 - pico-train - INFO - โ project: pico-decoder โ
2025-09-01 01:54:09 - pico-train - INFO - โ training: โ
2025-09-01 01:54:09 - pico-train - INFO - โ fabric: โ
2025-09-01 01:54:09 - pico-train - INFO - โ accelerator: cuda โ
2025-09-01 01:54:09 - pico-train - INFO - โ num_devices: 1 โ
2025-09-01 01:54:09 - pico-train - INFO - โ num_nodes: 1 โ
2025-09-01 01:54:09 - pico-train - INFO - โ precision: bf16-mixed โ
2025-09-01 01:54:09 - pico-train - INFO - โ max_steps: 200000 โ
2025-09-01 01:54:09 - pico-train - INFO - โ optimization: โ
2025-09-01 01:54:09 - pico-train - INFO - โ gradient_accumulation_steps: 256 โ
2025-09-01 01:54:09 - pico-train - INFO - โ lr: 0.0003 โ
2025-09-01 01:54:09 - pico-train - INFO - โ lr_scheduler: linear_with_warmup โ
2025-09-01 01:54:09 - pico-train - INFO - โ lr_warmup_steps: 2500 โ
2025-09-01 01:54:09 - pico-train - INFO - โ optimizer: adamw โ
2025-09-01 01:54:09 - pico-train - INFO - โ โ
2025-09-01 01:54:09 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
2025-09-01 01:54:09 - pico-train - INFO - ==================================================
2025-09-01 01:54:09 - pico-train - INFO - โญ Runtime Summary:
2025-09-01 01:54:09 - pico-train - INFO - ==================================================
2025-09-01 01:54:09 - pico-train - INFO - Starting from step: 0
2025-09-01 01:54:09 - pico-train - INFO - Model Setup:
2025-09-01 01:54:09 - pico-train - INFO - โโ Total Parameters: 11,282,784
2025-09-01 01:54:09 - pico-train - INFO - โโ Trainable Parameters: 11,282,784
2025-09-01 01:54:09 - pico-train - INFO - Distributed Setup:
2025-09-01 01:54:09 - pico-train - INFO - โโ Number of Devices: 1
2025-09-01 01:54:09 - pico-train - INFO - โโ Device Type: NVIDIA H100 80GB HBM3
2025-09-01 01:54:09 - pico-train - INFO - โโ Available Memory: 85.03 GB
2025-09-01 01:54:09 - pico-train - INFO - Software Setup:
2025-09-01 01:54:09 - pico-train - INFO - โโ Python Version: 3.12.3
2025-09-01 01:54:09 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128
2025-09-01 01:54:09 - pico-train - INFO - โโ CUDA Version: 12.8
2025-09-01 01:54:09 - pico-train - INFO - โโ Operating System: Linux 6.8.0-71-generic
2025-09-01 01:54:09 - pico-train - INFO - Batch Size Configuration:
2025-09-01 01:54:09 - pico-train - INFO - โโ Global Batch Size: 512
2025-09-01 01:54:09 - pico-train - INFO - โโ Per Device Batch Size: 512
2025-09-01 01:54:09 - pico-train - INFO - โโ Gradient Accumulation Steps: 256
2025-09-01 01:54:09 - pico-train - INFO - ==================================================
|