Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.49.1
title: Seat Depth Analyzer
emoji: πͺ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit
Seat Depth Analyzer - Technical Documentation
Seat Depth Analyzer An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short.
π Quick Start
Install Dependencies
bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow
Run Application
bashstreamlit run app.py
Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission
Open in Browser Navigate to: http://localhost:8501
Test the App
Upload a side-profile image of someone seated, or Try the included sample images Click "π Analyze Seat Depth"
Project Overview
The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as Optimal, Too Deep, or Too Short based on the clearance between the seat front edge and the back of the user's knee.
Ergonomic Classification Criteria
- Optimal: 2-6 cm clearance (proper thigh support without circulation issues)
- Too Deep: <2 cm clearance or knee behind seat edge (circulation risk)
- Too Short: >6 cm clearance (insufficient thigh support)
Technical Architecture
Multi-Model Pipeline
The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models:
Input Image β Pose Detection β Chair Detection β Seat Segmentation β Measurement β Classification β Output
β β β β β
MediaPipe YOLOv8n SAM (ViT-B) CV Analysis Ergonomic
Pose (Chair) Segmentation & Scaling Rules
Model Selection and Rationale
1. Pose Estimation Model Choice: MediaPipe Pose
Why MediaPipe Pose?
- High Accuracy: Proven performance on diverse body poses and lighting conditions
- Landmark Precision: Provides 33 precise body landmarks including knees, hips, eyes, and ears
- Visibility Scoring: Each landmark includes visibility confidence, crucial for side-profile analysis
- Computational Efficiency: Real-time performance suitable for web applications
- Robustness: Handles partial occlusion and varied clothing better than alternatives
Alternative Considered: OpenPose
- Rejected because: Higher computational requirements, less optimized for single-person detection
- MediaPipe advantage: Better integration with web deployment, more stable landmark tracking
Key Landmarks Used:
- Knees (left/right): Primary measurement points
- Eyes/Ears: Scaling reference (anatomical constant)
- Hips: Thigh length calculation for anatomical proportions
2. Chair Detection Model: YOLOv8n
Why YOLOv8n?
- Speed vs. Accuracy Balance: Nano version provides sufficient accuracy for chair detection while maintaining fast inference
- Pre-trained COCO: Chair class (ID: 56) readily available without custom training
- Bounding Box Precision: Accurate enough to constrain segmentation region
- Memory Efficiency: Suitable for deployment environments
Usage Strategy:
- Extract chair bounding box (which was then sent to SAM Meta Model)
- This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest)
- Use as region-of-interest for segmentation model
3. Segmentation Model: SAM (Segment Anything Model) ViT-B
Why SAM? SAM has point based or bounding-box based or even prompt based segmentation ability So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front
- Bounding Box-Based Segmentation: Can segment objects using bounding box prompts
- High-Quality Masks: Superior edge precision compared to traditional segmentation
- Generalization: Works on furniture without specific training
- Multi-Scale Features: ViT-B provides good balance of accuracy and speed
Alternative Considered: Traditional edge detection + contour finding
- Rejected because: Poor performance on textured seats, lighting variations, and complex backgrounds
- SAM advantage: Semantic understanding of object boundaries
Measurement Methodology
Knee Position Estimation
Challenge: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement.
Solution: Anatomical Offset Calculation
# Calculate thigh length for proportional offset
thigh_length_px = euclidean_distance(hip_position, knee_position)
# Back of knee offset: 13% of thigh length behind knee center
back_of_knee_offset = thigh_length_px * 0.13
# Apply directional offset based on facing direction
if facing_direction == "right":
back_of_knee_x = knee_center_x - back_of_knee_offset
else:
back_of_knee_x = knee_center_x + back_of_knee_offset
Rationale for 13% Offset:
- Since we need the back of the knee and not the knee (which MediaPipe landmark gives us )
- Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee
- Validated against manual measurements on test images
- Accounts for the distance from knee joint center to posterior knee surface
Seat Edge Detection
Multi-Step Process:
Region Extraction:
# Create analysis band around knee level knee_y = average_knee_height band_thickness = chair_height // 2 analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :]
Edge Detection Strategy:
- Extract chair mask pixels within the analysis band
- Find extreme X-coordinate based on facing direction
- Right-facing: Rightmost chair pixel (seat front)
- Left-facing: Leftmost chair pixel (seat front)
Validation:
- Ensure sufficient chair pixels detected in analysis region
- Cross-validate with chair bounding box constraints
Scaling and Real-World Measurements
Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet
Reference-Based Scaling:
# Use eye-to-ear distance as anatomical constant
eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark)
eye_to_ear_distance_cm = 7.0 # Average adult measurement
pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm
clearance_cm = clearance_pixels / pixels_per_cm
Why Eye-to-Ear Distance?
- Anatomical Constant: Relatively consistent across adults (6.5-7.5 cm)
- Visibility: Usually visible in side-profile images
- Stability: Less affected by posture compared to other facial measurements
Facing Direction Detection
- Determines if person faces left or right in image
Method: Compare average X-coordinates of knees vs. eyes
- If knees are right of eyes: facing right
- If knees are left of eyes: facing left
This affects:
- Which knee/eye/ear to use for measurements
- Direction of anatomical offsets
- Seat edge detection logic
Challenges in Spacing Detection
1. Pose Detection Challenges
Challenge: Partial Occlusion
- Problem: Knees/hips may be obscured by desk, clothing, or shadows
- Solution: Visibility scoring and confidence thresholds
- Mitigation: Multi-landmark validation, graceful degradation
Challenge: Clothing Variations
- Problem: Baggy pants obscure actual knee position
- Solution: Anatomical offset based on skeletal landmarks rather than clothing contours
- Limitation: Still estimates through clothing, may introduce small errors
2. Chair Segmentation Challenges
Challenge: Complex Seat Materials
- Problem: Mesh, leather, fabric textures confuse edge detection
- Solution: SAM's semantic understanding handles material variations
- Remaining Issue: Highly reflective or transparent materials
Challenge: Partial Chair Visibility
- Problem: Desk, person's body may occlude seat edges
- Solution: Focus analysis on knee-level band where seat is most likely visible
- Limitation: Deep occlusion may cause detection failure
3. Scaling and Measurement Challenges
Challenge: Camera Perspective Distortion
- Problem: Non-perpendicular camera angles affect measurements
- Solution: Assume reasonable side-profile positioning
- Limitation: Extreme angles (>30Β°) may introduce errors
Challenge: Depth Perception in 2D Images
- Problem: Cannot measure true 3D distances
- Solution: Project measurements onto image plane
- Assumption: Person and chair are roughly in the same plane
4. Lighting and Image Quality
Challenge: Poor Lighting Conditions
- Problem: Shadows, backlighting affect landmark detection
- Solution: MediaPipe's robustness to lighting variations
- Enhancement: Preprocessing could include histogram equalization
Accuracy Improvement Suggestions
Short-Term Improvements
Enhanced Preprocessing
- Maybe can have improced contrast using certain methods like histogram equilization
Multi-Reference Scaling
- Combine eye-to-ear with other facial measurements
- Use hand/finger dimensions when visible
- Cross-validate scaling factors
Medium-Term Enhancements
Custom Training Data
- Collect ergonomic seating dataset with ground truth measurements
- Then we could actually fine-tune pose estimation on seated postures
- And train a specialized chair segmentation model
Multi-Frame Analysis
- Process video streams and have average measurements across multiple frames
3D Pose Estimation
- Integrate depth estimation models
- Calculate true 3D clearances
Long-Term Research Directions
Multi-Modal Sensing
- Combine computer vision with pressure sensors
- Integrate with smart chair systems
- Real-time posture monitoring
Development Process and Design Decisions
Iterative Development Approach
Phase 1: Core Detection
- Implemented basic pose detection
- Added simple chair detection
- Established measurement pipeline
Phase 2: Accuracy Enhancement
- Integrated SAM for precise segmentation
- Added anatomical offset calculations
- Implemented multi-scale analysis
Phase 3: User Experience
- Built Streamlit interface
- Added visualization pipeline
- Implemented sample image system
Phase 4: Robustness
- Enhanced error handling
- Added confidence scoring
- Implemented comprehensive testing
Key Design Decisions
Decision 1: Multi-Model vs. Single Model
- Chosen: Multi-model pipeline
- Rationale: Each model excels in its domain (pose, detection, segmentation)
- Trade-off: Complexity vs. accuracy
Decision 2: Real-time vs. Batch Processing
- Chosen: Single image analysis
- Rationale: Simplicity, easier deployment
- Future: Could extend to video streams
Decision 3: Cloud vs. Local Processing
- Chosen: Local processing capability
- Rationale: Privacy, offline usage
- Deployment: Supports both local and cloud deployment
Assumptions and Limitations
Key Assumptions:
- Side Profile View: Person is photographed from the side
- Seated Posture: Back is against or near chair backrest
- Standard Chair: Conventional office chair design
- Adult Subjects: Eye-to-ear scaling appropriate for adults
- Static Analysis: Single-moment analysis, not dynamic posture
Known Limitations:
- 2D Analysis: Cannot account for chair/body rotation out of image plane
- Clothing Effects: Thick clothing may obscure true body landmarks
- Lighting Dependency: Very poor lighting may affect landmark detection
- Chair Variety: Unusual chair designs may confuse detection
- Anthropometric Variation: Fixed scaling may not suit all body types
Validation and Testing Strategy
Test Coverage
- Unit Tests: Individual component testing
- Integration Tests: End-to-end pipeline validation
- Accuracy Tests: Ground truth comparison on sample images
- Edge Case Tests: Handling of failure conditions
- Performance Tests: Processing time benchmarking
Sample Dataset
- Optimal Cases (3 samples): Clear examples of proper seating
- Too Deep Cases (4 samples): Various levels of excessive depth
- Too Short Cases (8 samples): Range of insufficient depth scenarios
Technical References
- MediaPipe Pose: Google Research Paper
- SAM (Segment Anything): Meta AI Research
- YOLOv8: Ultralytics Documentation
Dataset and Tools
- Sample Images: Custom collected and validated
- Development Environment: Python 3.9, PyTorch, OpenCV
- Deployment Platform: Streamlit Cloud
Anthropometric Data Sources
- Eye-to-Ear Measurements: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry