metadata

title: Seat Depth Analyzer
emoji: 🪑
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit

Seat Depth Analyzer - Technical Documentation

Seat Depth Analyzer An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short.

🚀 Quick Start

Install Dependencies

bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow

Run Application
```
bashstreamlit run app.py
```
Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission
Open in Browser Navigate to: http://localhost:8501
Test the App

Upload a side-profile image of someone seated, or Try the included sample images Click "🔍 Analyze Seat Depth"

Project Overview

The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as Optimal, Too Deep, or Too Short based on the clearance between the seat front edge and the back of the user's knee.

Ergonomic Classification Criteria

Optimal: 2-6 cm clearance (proper thigh support without circulation issues)
Too Deep: <2 cm clearance or knee behind seat edge (circulation risk)
Too Short: >6 cm clearance (insufficient thigh support)

Technical Architecture

Multi-Model Pipeline

The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models:

Input Image → Pose Detection → Chair Detection → Seat Segmentation → Measurement → Classification →  Output 
                        ↓              ↓               ↓               ↓                ↓              
                        MediaPipe     YOLOv8n        SAM (ViT-B)      CV Analysis    Ergonomic     
                        Pose           (Chair)      Segmentation     & Scaling          Rules

Model Selection and Rationale

1. Pose Estimation Model Choice: MediaPipe Pose

Why MediaPipe Pose?

High Accuracy: Proven performance on diverse body poses and lighting conditions
Landmark Precision: Provides 33 precise body landmarks including knees, hips, eyes, and ears
Visibility Scoring: Each landmark includes visibility confidence, crucial for side-profile analysis
Computational Efficiency: Real-time performance suitable for web applications
Robustness: Handles partial occlusion and varied clothing better than alternatives

Alternative Considered: OpenPose

Rejected because: Higher computational requirements, less optimized for single-person detection
MediaPipe advantage: Better integration with web deployment, more stable landmark tracking

Key Landmarks Used:

Knees (left/right): Primary measurement points
Eyes/Ears: Scaling reference (anatomical constant)
Hips: Thigh length calculation for anatomical proportions

2. Chair Detection Model: YOLOv8n

Why YOLOv8n?

Speed vs. Accuracy Balance: Nano version provides sufficient accuracy for chair detection while maintaining fast inference
Pre-trained COCO: Chair class (ID: 56) readily available without custom training
Bounding Box Precision: Accurate enough to constrain segmentation region
Memory Efficiency: Suitable for deployment environments

Usage Strategy:

Extract chair bounding box (which was then sent to SAM Meta Model)
This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest)
Use as region-of-interest for segmentation model

3. Segmentation Model: SAM (Segment Anything Model) ViT-B

Why SAM? SAM has point based or bounding-box based or even prompt based segmentation ability So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front

Bounding Box-Based Segmentation: Can segment objects using bounding box prompts
High-Quality Masks: Superior edge precision compared to traditional segmentation
Generalization: Works on furniture without specific training
Multi-Scale Features: ViT-B provides good balance of accuracy and speed

Alternative Considered: Traditional edge detection + contour finding

Rejected because: Poor performance on textured seats, lighting variations, and complex backgrounds
SAM advantage: Semantic understanding of object boundaries

Measurement Methodology

Knee Position Estimation

Challenge: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement.

Solution: Anatomical Offset Calculation

# Calculate thigh length for proportional offset
thigh_length_px = euclidean_distance(hip_position, knee_position)

# Back of knee offset: 13% of thigh length behind knee center
back_of_knee_offset = thigh_length_px * 0.13

# Apply directional offset based on facing direction
if facing_direction == "right":
    back_of_knee_x = knee_center_x - back_of_knee_offset
else:
    back_of_knee_x = knee_center_x + back_of_knee_offset

Rationale for 13% Offset:

Since we need the back of the knee and not the knee (which MediaPipe landmark gives us )
Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee
Validated against manual measurements on test images
Accounts for the distance from knee joint center to posterior knee surface

Seat Edge Detection

Multi-Step Process:

Region Extraction:

# Create analysis band around knee level
knee_y = average_knee_height
band_thickness = chair_height // 2
analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :]

Edge Detection Strategy:
- Extract chair mask pixels within the analysis band
- Find extreme X-coordinate based on facing direction
- Right-facing: Rightmost chair pixel (seat front)
- Left-facing: Leftmost chair pixel (seat front)
Validation:
- Ensure sufficient chair pixels detected in analysis region
- Cross-validate with chair bounding box constraints

Scaling and Real-World Measurements

Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet

Reference-Based Scaling:

# Use eye-to-ear distance as anatomical constant
eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark)
eye_to_ear_distance_cm = 7.0  # Average adult measurement

pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm
clearance_cm = clearance_pixels / pixels_per_cm

Why Eye-to-Ear Distance?

Anatomical Constant: Relatively consistent across adults (6.5-7.5 cm)
Visibility: Usually visible in side-profile images
Stability: Less affected by posture compared to other facial measurements

Facing Direction Detection

Determines if person faces left or right in image

Method: Compare average X-coordinates of knees vs. eyes

If knees are right of eyes: facing right
If knees are left of eyes: facing left

This affects:

Which knee/eye/ear to use for measurements
Direction of anatomical offsets
Seat edge detection logic

Challenges in Spacing Detection

1. Pose Detection Challenges

Challenge: Partial Occlusion

Problem: Knees/hips may be obscured by desk, clothing, or shadows
Solution: Visibility scoring and confidence thresholds
Mitigation: Multi-landmark validation, graceful degradation

Challenge: Clothing Variations

Problem: Baggy pants obscure actual knee position
Solution: Anatomical offset based on skeletal landmarks rather than clothing contours
Limitation: Still estimates through clothing, may introduce small errors

2. Chair Segmentation Challenges

Challenge: Complex Seat Materials

Problem: Mesh, leather, fabric textures confuse edge detection
Solution: SAM's semantic understanding handles material variations
Remaining Issue: Highly reflective or transparent materials

Challenge: Partial Chair Visibility

Problem: Desk, person's body may occlude seat edges
Solution: Focus analysis on knee-level band where seat is most likely visible
Limitation: Deep occlusion may cause detection failure

3. Scaling and Measurement Challenges

Challenge: Camera Perspective Distortion

Problem: Non-perpendicular camera angles affect measurements
Solution: Assume reasonable side-profile positioning
Limitation: Extreme angles (>30°) may introduce errors

Challenge: Depth Perception in 2D Images

Problem: Cannot measure true 3D distances
Solution: Project measurements onto image plane
Assumption: Person and chair are roughly in the same plane

4. Lighting and Image Quality

Challenge: Poor Lighting Conditions

Problem: Shadows, backlighting affect landmark detection
Solution: MediaPipe's robustness to lighting variations
Enhancement: Preprocessing could include histogram equalization

Accuracy Improvement Suggestions

Short-Term Improvements

Enhanced Preprocessing
- Maybe can have improced contrast using certain methods like histogram equilization
Multi-Reference Scaling
- Combine eye-to-ear with other facial measurements
- Use hand/finger dimensions when visible
- Cross-validate scaling factors

Medium-Term Enhancements

Custom Training Data
- Collect ergonomic seating dataset with ground truth measurements
- Then we could actually fine-tune pose estimation on seated postures
- And train a specialized chair segmentation model
Multi-Frame Analysis
- Process video streams and have average measurements across multiple frames
3D Pose Estimation
- Integrate depth estimation models
- Calculate true 3D clearances

Long-Term Research Directions

Multi-Modal Sensing

Combine computer vision with pressure sensors
Integrate with smart chair systems
Real-time posture monitoring

Development Process and Design Decisions

Iterative Development Approach

Phase 1: Core Detection
- Implemented basic pose detection
- Added simple chair detection
- Established measurement pipeline
Phase 2: Accuracy Enhancement
- Integrated SAM for precise segmentation
- Added anatomical offset calculations
- Implemented multi-scale analysis
Phase 3: User Experience
- Built Streamlit interface
- Added visualization pipeline
- Implemented sample image system
Phase 4: Robustness
- Enhanced error handling
- Added confidence scoring
- Implemented comprehensive testing

Key Design Decisions

Decision 1: Multi-Model vs. Single Model

Chosen: Multi-model pipeline
Rationale: Each model excels in its domain (pose, detection, segmentation)
Trade-off: Complexity vs. accuracy

Decision 2: Real-time vs. Batch Processing

Chosen: Single image analysis
Rationale: Simplicity, easier deployment
Future: Could extend to video streams

Decision 3: Cloud vs. Local Processing

Chosen: Local processing capability
Rationale: Privacy, offline usage
Deployment: Supports both local and cloud deployment

Assumptions and Limitations

Key Assumptions:

Side Profile View: Person is photographed from the side
Seated Posture: Back is against or near chair backrest
Standard Chair: Conventional office chair design
Adult Subjects: Eye-to-ear scaling appropriate for adults
Static Analysis: Single-moment analysis, not dynamic posture

Known Limitations:

2D Analysis: Cannot account for chair/body rotation out of image plane
Clothing Effects: Thick clothing may obscure true body landmarks
Lighting Dependency: Very poor lighting may affect landmark detection
Chair Variety: Unusual chair designs may confuse detection
Anthropometric Variation: Fixed scaling may not suit all body types

Validation and Testing Strategy

Test Coverage

Unit Tests: Individual component testing
Integration Tests: End-to-end pipeline validation
Accuracy Tests: Ground truth comparison on sample images
Edge Case Tests: Handling of failure conditions
Performance Tests: Processing time benchmarking

Sample Dataset

Optimal Cases (3 samples): Clear examples of proper seating
Too Deep Cases (4 samples): Various levels of excessive depth
Too Short Cases (8 samples): Range of insufficient depth scenarios

Technical References

MediaPipe Pose: Google Research Paper
SAM (Segment Anything): Meta AI Research
YOLOv8: Ultralytics Documentation

Dataset and Tools

Sample Images: Custom collected and validated
Development Environment: Python 3.9, PyTorch, OpenCV
Deployment Platform: Streamlit Cloud

Anthropometric Data Sources

Eye-to-Ear Measurements: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry