PriviGaze: Privileged Distillation for Accessible Gaze Estimation

On-device gaze estimation designed for people with disabilities.

PriviGaze uses privileged knowledge distillation to train an ultra-compact student model (~80K params) that estimates gaze direction from just a grayscale face image — no eye crops, no RGB, no calibration needed.

Why This Matters

Traditional gaze trackers fail for people with disabilities:

👁️ Droopy eyes → eye crop detectors can't find pupils
🔄 Head roll/mobile instability → calibration breaks
💡 Varied lighting → RGB-based models fail

PriviGaze's student model handles all of these by:

Working from the full face (no precise eye detection needed)
Using grayscale only (robust to lighting)
Having a large receptive field (handles head movement)
Being ~80K parameters (runs on any device)

Architecture

Teacher (Training Only - Privileged Information)

┌─────────────────────────────────────────────────┐
│              PriviGazeTeacher                     │
│                                                   │
│  Left Eye RGB ──→ ConvNeXtV2-Atto ──→ 256d      │
│  Right Eye RGB ─→ ConvNeXtV2-Atto ──→ 256d      │
│                         ↓ (Fusion)                │
│  Face Blurred ──→ ConvNeXtV2-Nano ──→ 256d      │
│  (Grayscale)         ↓ (Cross-Attention)         │
│                   ┌──────────┐                    │
│                   │  Fused   │                    │
│                   │ Features │                    │
│                   │   256d   │                    │
│                   └────┬─────┘                    │
│                   ┌────┴─────┐                    │
│                   │ Pitch │ Yaw │                 │
│                   └─────────────┘                 │
└─────────────────────────────────────────────────┘

3 privileged inputs: left eye RGB, right eye RGB, blurred grayscale face
ConvNeXtV2-Atto (3.7M) for eyes, ConvNeXtV2-Nano (15.6M) for face
Cross-attention fusion between face and eye modalities
L2CS-Net style binned regression

Student (On-Device Inference)

┌─────────────────────────────────────────────────┐
│              PriviGazeStudent                     │
│                    ~80K params                     │
│                                                   │
│  Face Grayscale ──→ Light Correction              │
│       ↓                                           │
│  Stem (32ch, /4)                                  │
│       ↓                                           │
│  Inception Block → DSConv (/2) → 64ch            │
│       ↓                                           │
│  Inception Block → DSConv (/2) → 96ch            │
│       ↓                                           │
│  Inception Block → DSConv (/2) → 128ch           │
│       ↓                                           │
│  Inception Block → GAP → 160ch                   │
│       ↓                                           │
│  Feature Projection → 128d                        │
│       ↓                                           │
│  ┌────┴─────┐                                     │
│  │ Pitch │ Yaw │                                  │
│  └─────────────┘                                  │
└─────────────────────────────────────────────────┘

1 input: grayscale face (224×224)
Inception blocks with factorized convolutions (1×3 + 3×1)
Depthwise separable convolutions throughout
Learned light correction (gamma + affine)
L2CS-Net style binned regression

Distillation Loss

The student learns from the teacher via a multi-component loss:

L_total = L_task + α_angular·L_angular + α_contrast·L_contrast + α_mmd·L_mmd + α_logit·L_logit

Component	Weight	Description
L_task	1.0	L2CS-Net binned regression (CE + MSE)
L_angular	1.0	Direct L1 in degrees
L_contrast	0.5	InfoNCE contrastive feature matching
L_mmd	0.1	Maximum Mean Discrepancy distribution matching
L_logit	0.5	KL divergence on soft targets

Training

Quick Start

# Install dependencies
pip install -r requirements.txt

# Train teacher first, then distill to student
python train.py --mode both \
    --batch-size 32 \
    --epochs 100 \
    --teacher-epochs 50 \
    --save-dir ./checkpoints \
    --push-to-hub \
    --hub-model-id BcantCode/privi-gaze-distill

Phase 1: Teacher Pre-training

python train.py --mode pretrain_teacher \
    --batch-size 32 \
    --teacher-epochs 50 \
    --save-dir ./checkpoints

Phase 2: Student Distillation

python train.py --mode distill \
    --teacher-path ./checkpoints/teacher_best.pt \
    --epochs 100 \
    --batch-size 32 \
    --save-dir ./checkpoints

Model Sizes

Model	Parameters	Input	Use
PriviGazeTeacher	~19M	2×RGB eyes + blurred face	Training only
PriviGazeStudent	~80K	1×grayscale face	On-device inference

Research Foundation

This work builds on:

L2CS-Net (Abdelrahman et al., 2022): Per-angle binned regression for gaze
GazeGen / DFT Gaze (Hsieh et al., 2024): 281K distilled gaze model from 10× larger teacher
WCoRD (Chen et al., 2020): Wasserstein contrastive representation distillation
One Eye is All You Need (Athavale et al., 2022): Inception networks for lightweight gaze
ETH-XGaze (Zhang et al., 2020): Large-scale gaze dataset with extreme head poses

Dataset

Currently uses SyntheticGazeDataset for development. The synthetic generator creates realistic eye crops with pupil positions encoding gaze direction, plus face images with corresponding features.

For production use, the pipeline supports:

MPIIFaceGaze: 15 subjects, face crops + eye patches + 3D gaze
ETH-XGaze: 110 subjects, extreme head poses, 1.1M images (gold standard)
Gaze360: 238 subjects, 360° gaze range

To use real datasets, implement the MPIIGazeDataset class in models/dataset.py.

Requirements

Python ≥ 3.9
PyTorch ≥ 2.0
Transformers ≥ 4.40
CUDA-capable GPU (for training)

License

Apache 2.0

Citation

@software{privi_gaze_2026,
  title={PriviGaze: Privileged Distillation for Accessible Gaze Estimation},
  year={2026},
  url={https://huggingface.co/BcantCode/privi-gaze-distill}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support