seat-depth-analyser / README.md
nitikaborkar's picture
Update README.md
5bd71ef verified

A newer version of the Streamlit SDK is available: 1.49.1

Upgrade
metadata
title: Seat Depth Analyzer
emoji: πŸͺ‘
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit

Seat Depth Analyzer - Technical Documentation

Seat Depth Analyzer An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short.

πŸš€ Quick Start

  1. Install Dependencies

    bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow
    
  2. Run Application

    bashstreamlit run app.py
    

    Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission

  3. Open in Browser Navigate to: http://localhost:8501

  4. Test the App

    Upload a side-profile image of someone seated, or Try the included sample images Click "πŸ” Analyze Seat Depth"

Project Overview

The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as Optimal, Too Deep, or Too Short based on the clearance between the seat front edge and the back of the user's knee.

Ergonomic Classification Criteria

  • Optimal: 2-6 cm clearance (proper thigh support without circulation issues)
  • Too Deep: <2 cm clearance or knee behind seat edge (circulation risk)
  • Too Short: >6 cm clearance (insufficient thigh support)

Technical Architecture

Multi-Model Pipeline

The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models:

Input Image β†’ Pose Detection β†’ Chair Detection β†’ Seat Segmentation β†’ Measurement β†’ Classification β†’  Output 
                        ↓              ↓               ↓               ↓                ↓              
                        MediaPipe     YOLOv8n        SAM (ViT-B)      CV Analysis    Ergonomic     
                        Pose           (Chair)      Segmentation     & Scaling          Rules        

Model Selection and Rationale

1. Pose Estimation Model Choice: MediaPipe Pose

Why MediaPipe Pose?

  • High Accuracy: Proven performance on diverse body poses and lighting conditions
  • Landmark Precision: Provides 33 precise body landmarks including knees, hips, eyes, and ears
  • Visibility Scoring: Each landmark includes visibility confidence, crucial for side-profile analysis
  • Computational Efficiency: Real-time performance suitable for web applications
  • Robustness: Handles partial occlusion and varied clothing better than alternatives

Alternative Considered: OpenPose

  • Rejected because: Higher computational requirements, less optimized for single-person detection
  • MediaPipe advantage: Better integration with web deployment, more stable landmark tracking

Key Landmarks Used:

  • Knees (left/right): Primary measurement points
  • Eyes/Ears: Scaling reference (anatomical constant)
  • Hips: Thigh length calculation for anatomical proportions

2. Chair Detection Model: YOLOv8n

Why YOLOv8n?

  • Speed vs. Accuracy Balance: Nano version provides sufficient accuracy for chair detection while maintaining fast inference
  • Pre-trained COCO: Chair class (ID: 56) readily available without custom training
  • Bounding Box Precision: Accurate enough to constrain segmentation region
  • Memory Efficiency: Suitable for deployment environments

Usage Strategy:

  • Extract chair bounding box (which was then sent to SAM Meta Model)
  • This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest)
  • Use as region-of-interest for segmentation model

3. Segmentation Model: SAM (Segment Anything Model) ViT-B

Why SAM? SAM has point based or bounding-box based or even prompt based segmentation ability So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front

  • Bounding Box-Based Segmentation: Can segment objects using bounding box prompts
  • High-Quality Masks: Superior edge precision compared to traditional segmentation
  • Generalization: Works on furniture without specific training
  • Multi-Scale Features: ViT-B provides good balance of accuracy and speed

Alternative Considered: Traditional edge detection + contour finding

  • Rejected because: Poor performance on textured seats, lighting variations, and complex backgrounds
  • SAM advantage: Semantic understanding of object boundaries

Measurement Methodology

Knee Position Estimation

Challenge: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement.

Solution: Anatomical Offset Calculation

# Calculate thigh length for proportional offset
thigh_length_px = euclidean_distance(hip_position, knee_position)

# Back of knee offset: 13% of thigh length behind knee center
back_of_knee_offset = thigh_length_px * 0.13

# Apply directional offset based on facing direction
if facing_direction == "right":
    back_of_knee_x = knee_center_x - back_of_knee_offset
else:
    back_of_knee_x = knee_center_x + back_of_knee_offset

Rationale for 13% Offset:

  • Since we need the back of the knee and not the knee (which MediaPipe landmark gives us )
  • Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee
  • Validated against manual measurements on test images
  • Accounts for the distance from knee joint center to posterior knee surface

Seat Edge Detection

Multi-Step Process:

  1. Region Extraction:

    # Create analysis band around knee level
    knee_y = average_knee_height
    band_thickness = chair_height // 2
    analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :]
    
  2. Edge Detection Strategy:

    • Extract chair mask pixels within the analysis band
    • Find extreme X-coordinate based on facing direction
    • Right-facing: Rightmost chair pixel (seat front)
    • Left-facing: Leftmost chair pixel (seat front)
  3. Validation:

    • Ensure sufficient chair pixels detected in analysis region
    • Cross-validate with chair bounding box constraints

Scaling and Real-World Measurements

Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet

Reference-Based Scaling:

# Use eye-to-ear distance as anatomical constant
eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark)
eye_to_ear_distance_cm = 7.0  # Average adult measurement

pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm
clearance_cm = clearance_pixels / pixels_per_cm

Why Eye-to-Ear Distance?

  • Anatomical Constant: Relatively consistent across adults (6.5-7.5 cm)
  • Visibility: Usually visible in side-profile images
  • Stability: Less affected by posture compared to other facial measurements

Facing Direction Detection

  • Determines if person faces left or right in image

Method: Compare average X-coordinates of knees vs. eyes

  • If knees are right of eyes: facing right
  • If knees are left of eyes: facing left

This affects:

  1. Which knee/eye/ear to use for measurements
  2. Direction of anatomical offsets
  3. Seat edge detection logic

Challenges in Spacing Detection

1. Pose Detection Challenges

Challenge: Partial Occlusion

  • Problem: Knees/hips may be obscured by desk, clothing, or shadows
  • Solution: Visibility scoring and confidence thresholds
  • Mitigation: Multi-landmark validation, graceful degradation

Challenge: Clothing Variations

  • Problem: Baggy pants obscure actual knee position
  • Solution: Anatomical offset based on skeletal landmarks rather than clothing contours
  • Limitation: Still estimates through clothing, may introduce small errors

2. Chair Segmentation Challenges

Challenge: Complex Seat Materials

  • Problem: Mesh, leather, fabric textures confuse edge detection
  • Solution: SAM's semantic understanding handles material variations
  • Remaining Issue: Highly reflective or transparent materials

Challenge: Partial Chair Visibility

  • Problem: Desk, person's body may occlude seat edges
  • Solution: Focus analysis on knee-level band where seat is most likely visible
  • Limitation: Deep occlusion may cause detection failure

3. Scaling and Measurement Challenges

Challenge: Camera Perspective Distortion

  • Problem: Non-perpendicular camera angles affect measurements
  • Solution: Assume reasonable side-profile positioning
  • Limitation: Extreme angles (>30Β°) may introduce errors

Challenge: Depth Perception in 2D Images

  • Problem: Cannot measure true 3D distances
  • Solution: Project measurements onto image plane
  • Assumption: Person and chair are roughly in the same plane

4. Lighting and Image Quality

Challenge: Poor Lighting Conditions

  • Problem: Shadows, backlighting affect landmark detection
  • Solution: MediaPipe's robustness to lighting variations
  • Enhancement: Preprocessing could include histogram equalization

Accuracy Improvement Suggestions

Short-Term Improvements

  1. Enhanced Preprocessing

    • Maybe can have improced contrast using certain methods like histogram equilization
  2. Multi-Reference Scaling

    • Combine eye-to-ear with other facial measurements
    • Use hand/finger dimensions when visible
    • Cross-validate scaling factors

Medium-Term Enhancements

  1. Custom Training Data

    • Collect ergonomic seating dataset with ground truth measurements
    • Then we could actually fine-tune pose estimation on seated postures
    • And train a specialized chair segmentation model
  2. Multi-Frame Analysis

    • Process video streams and have average measurements across multiple frames
  3. 3D Pose Estimation

    • Integrate depth estimation models
    • Calculate true 3D clearances

Long-Term Research Directions

Multi-Modal Sensing

  • Combine computer vision with pressure sensors
  • Integrate with smart chair systems
  • Real-time posture monitoring

Development Process and Design Decisions

Iterative Development Approach

  1. Phase 1: Core Detection

    • Implemented basic pose detection
    • Added simple chair detection
    • Established measurement pipeline
  2. Phase 2: Accuracy Enhancement

    • Integrated SAM for precise segmentation
    • Added anatomical offset calculations
    • Implemented multi-scale analysis
  3. Phase 3: User Experience

    • Built Streamlit interface
    • Added visualization pipeline
    • Implemented sample image system
  4. Phase 4: Robustness

    • Enhanced error handling
    • Added confidence scoring
    • Implemented comprehensive testing

Key Design Decisions

Decision 1: Multi-Model vs. Single Model

  • Chosen: Multi-model pipeline
  • Rationale: Each model excels in its domain (pose, detection, segmentation)
  • Trade-off: Complexity vs. accuracy

Decision 2: Real-time vs. Batch Processing

  • Chosen: Single image analysis
  • Rationale: Simplicity, easier deployment
  • Future: Could extend to video streams

Decision 3: Cloud vs. Local Processing

  • Chosen: Local processing capability
  • Rationale: Privacy, offline usage
  • Deployment: Supports both local and cloud deployment

Assumptions and Limitations

Key Assumptions:

  1. Side Profile View: Person is photographed from the side
  2. Seated Posture: Back is against or near chair backrest
  3. Standard Chair: Conventional office chair design
  4. Adult Subjects: Eye-to-ear scaling appropriate for adults
  5. Static Analysis: Single-moment analysis, not dynamic posture

Known Limitations:

  1. 2D Analysis: Cannot account for chair/body rotation out of image plane
  2. Clothing Effects: Thick clothing may obscure true body landmarks
  3. Lighting Dependency: Very poor lighting may affect landmark detection
  4. Chair Variety: Unusual chair designs may confuse detection
  5. Anthropometric Variation: Fixed scaling may not suit all body types

Validation and Testing Strategy

Test Coverage

  1. Unit Tests: Individual component testing
  2. Integration Tests: End-to-end pipeline validation
  3. Accuracy Tests: Ground truth comparison on sample images
  4. Edge Case Tests: Handling of failure conditions
  5. Performance Tests: Processing time benchmarking

Sample Dataset

  • Optimal Cases (3 samples): Clear examples of proper seating
  • Too Deep Cases (4 samples): Various levels of excessive depth
  • Too Short Cases (8 samples): Range of insufficient depth scenarios

Technical References

  1. MediaPipe Pose: Google Research Paper
  2. SAM (Segment Anything): Meta AI Research
  3. YOLOv8: Ultralytics Documentation

Dataset and Tools

  • Sample Images: Custom collected and validated
  • Development Environment: Python 3.9, PyTorch, OpenCV
  • Deployment Platform: Streamlit Cloud

Anthropometric Data Sources

  • Eye-to-Ear Measurements: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry