YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PriviGaze: Privileged Distillation for Accessible Gaze Estimation
On-device gaze estimation designed for people with disabilities.
PriviGaze uses privileged knowledge distillation to train an ultra-compact student model (~80K params) that estimates gaze direction from just a grayscale face image β no eye crops, no RGB, no calibration needed.
Why This Matters
Traditional gaze trackers fail for people with disabilities:
- ποΈ Droopy eyes β eye crop detectors can't find pupils
- π Head roll/mobile instability β calibration breaks
- π‘ Varied lighting β RGB-based models fail
PriviGaze's student model handles all of these by:
- Working from the full face (no precise eye detection needed)
- Using grayscale only (robust to lighting)
- Having a large receptive field (handles head movement)
- Being ~80K parameters (runs on any device)
Architecture
Teacher (Training Only - Privileged Information)
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β PriviGazeTeacher β
β β
β Left Eye RGB βββ ConvNeXtV2-Atto βββ 256d β
β Right Eye RGB ββ ConvNeXtV2-Atto βββ 256d β
β β (Fusion) β
β Face Blurred βββ ConvNeXtV2-Nano βββ 256d β
β (Grayscale) β (Cross-Attention) β
β ββββββββββββ β
β β Fused β β
β β Features β β
β β 256d β β
β ββββββ¬ββββββ β
β ββββββ΄ββββββ β
β β Pitch β Yaw β β
β βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
- 3 privileged inputs: left eye RGB, right eye RGB, blurred grayscale face
- ConvNeXtV2-Atto (3.7M) for eyes, ConvNeXtV2-Nano (15.6M) for face
- Cross-attention fusion between face and eye modalities
- L2CS-Net style binned regression
Student (On-Device Inference)
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β PriviGazeStudent β
β ~80K params β
β β
β Face Grayscale βββ Light Correction β
β β β
β Stem (32ch, /4) β
β β β
β Inception Block β DSConv (/2) β 64ch β
β β β
β Inception Block β DSConv (/2) β 96ch β
β β β
β Inception Block β DSConv (/2) β 128ch β
β β β
β Inception Block β GAP β 160ch β
β β β
β Feature Projection β 128d β
β β β
β ββββββ΄ββββββ β
β β Pitch β Yaw β β
β βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
- 1 input: grayscale face (224Γ224)
- Inception blocks with factorized convolutions (1Γ3 + 3Γ1)
- Depthwise separable convolutions throughout
- Learned light correction (gamma + affine)
- L2CS-Net style binned regression
Distillation Loss
The student learns from the teacher via a multi-component loss:
L_total = L_task + Ξ±_angularΒ·L_angular + Ξ±_contrastΒ·L_contrast + Ξ±_mmdΒ·L_mmd + Ξ±_logitΒ·L_logit
| Component | Weight | Description |
|---|---|---|
| L_task | 1.0 | L2CS-Net binned regression (CE + MSE) |
| L_angular | 1.0 | Direct L1 in degrees |
| L_contrast | 0.5 | InfoNCE contrastive feature matching |
| L_mmd | 0.1 | Maximum Mean Discrepancy distribution matching |
| L_logit | 0.5 | KL divergence on soft targets |
Training
Quick Start
# Install dependencies
pip install -r requirements.txt
# Train teacher first, then distill to student
python train.py --mode both \
--batch-size 32 \
--epochs 100 \
--teacher-epochs 50 \
--save-dir ./checkpoints \
--push-to-hub \
--hub-model-id BcantCode/privi-gaze-distill
Phase 1: Teacher Pre-training
python train.py --mode pretrain_teacher \
--batch-size 32 \
--teacher-epochs 50 \
--save-dir ./checkpoints
Phase 2: Student Distillation
python train.py --mode distill \
--teacher-path ./checkpoints/teacher_best.pt \
--epochs 100 \
--batch-size 32 \
--save-dir ./checkpoints
Model Sizes
| Model | Parameters | Input | Use |
|---|---|---|---|
| PriviGazeTeacher | ~19M | 2ΓRGB eyes + blurred face | Training only |
| PriviGazeStudent | ~80K | 1Γgrayscale face | On-device inference |
Research Foundation
This work builds on:
- L2CS-Net (Abdelrahman et al., 2022): Per-angle binned regression for gaze
- GazeGen / DFT Gaze (Hsieh et al., 2024): 281K distilled gaze model from 10Γ larger teacher
- WCoRD (Chen et al., 2020): Wasserstein contrastive representation distillation
- One Eye is All You Need (Athavale et al., 2022): Inception networks for lightweight gaze
- ETH-XGaze (Zhang et al., 2020): Large-scale gaze dataset with extreme head poses
Dataset
Currently uses SyntheticGazeDataset for development. The synthetic generator creates realistic eye crops with pupil positions encoding gaze direction, plus face images with corresponding features.
For production use, the pipeline supports:
- MPIIFaceGaze: 15 subjects, face crops + eye patches + 3D gaze
- ETH-XGaze: 110 subjects, extreme head poses, 1.1M images (gold standard)
- Gaze360: 238 subjects, 360Β° gaze range
To use real datasets, implement the MPIIGazeDataset class in models/dataset.py.
Requirements
- Python β₯ 3.9
- PyTorch β₯ 2.0
- Transformers β₯ 4.40
- CUDA-capable GPU (for training)
License
Apache 2.0
Citation
@software{privi_gaze_2026,
title={PriviGaze: Privileged Distillation for Accessible Gaze Estimation},
year={2026},
url={https://huggingface.co/BcantCode/privi-gaze-distill}
}