SlimFace-demo / docs /training /training_doc.md
danhtran2mind's picture
Upload 164 files
b7f710c verified
# Training Documentation
This document outlines the command-line arguments and a concise overview of the training pipeline for a face classification model using PyTorch Lightning.
## Table of Contents
- Arguments Table
- Training Pipeline Overview
# Training Arguments Documentation
This document outlines the command-line arguments and a concise overview of the training pipeline for a face classification model using PyTorch Lightning.
## Table of Contents
- [Arguments Table](#arguments-table)
- [Training Pipeline Overview](#training-pipeline-overview)
## Arguments Table
| Argument Name | Type | Description |
|----------------------------------------|-------|-------------------------------------------------------------------------------------------------------------------------------|
| `dataset_dir` | `str` | Path to the dataset directory containing `train_data` and `val_data` subdirectories with preprocessed face images organized by person. |
| `image_classification_models_config_path` | `str` | Path to the YAML configuration file defining model configurations, including model function, resolution, and weights. |
| `batch_size` | `int` | Batch size for training and validation data loaders. Affects memory usage and training speed. |
| `num_epochs` | `int` | Number of epochs for training the model. An epoch is one full pass through the training dataset. |
| `learning_rate` | `float` | Initial learning rate for the Adam optimizer used during training. |
| `max_lr_factor` | `float` | Multiplies the initial learning rate to determine the maximum learning rate during the warmup phase of the scheduler. |
| `accelerator` | `str` | Type of accelerator for training. Options: `cpu`, `gpu`, `tpu`, `auto`. `auto` selects the best available device. |
| `devices` | `int` | Number of devices (e.g., GPUs) to use for training. Relevant for multi-GPU training. |
| `algorithm` | `str` | Face detection algorithm for preprocessing images. Options: `mtcnn`, `yolo`. |
| `warmup_steps` | `float` | Fraction of total training steps for the warmup phase of the learning rate scheduler (e.g., `0.05` means 5% of total steps). |
| `total_steps` | `int` | Total number of training steps. If `0`, calculated as epochs × steps per epoch (based on dataset size and batch size). |
| `classification_model_name` | `str` | Name of the classification model to use, as defined in the YAML configuration file. |
## Training Pipeline Overview
The training pipeline preprocesses face images, fine-tunes a classification head on a pretrained model, and trains using PyTorch Lightning. Key components:
1. **Preprocessing**: Aligns faces using `yolo` or `mtcnn`, caches resized images (`preprocess_and_cache_images`).
2. **Dataset**: `FaceDataset` loads pre-aligned images, applies normalization, and assigns labels by person.
3. **Model**: `FaceClassifier` pairs a frozen pretrained model (e.g., EfficientNet) with a custom classification head.
4. **Training**: `FaceClassifierLightning` manages training with Adam optimizer, cosine annealing scheduler, and logs loss/accuracy.
5. **Configuration**: Loads model details from YAML (`load_model_configs`), uses `DataLoader` with multiprocessing, and saves models via `CustomModelCheckpoint`.
6. **Execution**: `main` orchestrates preprocessing, data loading, model training, and saves full model and classifier head.