seat-depth-analyser / README.md
nitikaborkar's picture
Update README.md
5bd71ef verified
---
title: Seat Depth Analyzer
emoji: πŸͺ‘
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit
---
# Seat Depth Analyzer - Technical Documentation
Seat Depth Analyzer
An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short.
πŸš€ Quick Start
1. Install Dependencies
```python
bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow
```
2. Run Application
```python
bashstreamlit run app.py
```
Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission
3. Open in Browser
Navigate to: http://localhost:8501
4. Test the App
Upload a side-profile image of someone seated, or
Try the included sample images
Click "πŸ” Analyze Seat Depth"
## Project Overview
The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as **Optimal**, **Too Deep**, or **Too Short** based on the clearance between the seat front edge and the back of the user's knee.
### Ergonomic Classification Criteria
- **Optimal**: 2-6 cm clearance (proper thigh support without circulation issues)
- **Too Deep**: <2 cm clearance or knee behind seat edge (circulation risk)
- **Too Short**: >6 cm clearance (insufficient thigh support)
---
## Technical Architecture
### Multi-Model Pipeline
The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models:
```
Input Image β†’ Pose Detection β†’ Chair Detection β†’ Seat Segmentation β†’ Measurement β†’ Classification β†’ Output
↓ ↓ ↓ ↓ ↓
MediaPipe YOLOv8n SAM (ViT-B) CV Analysis Ergonomic
Pose (Chair) Segmentation & Scaling Rules
```
## Model Selection and Rationale
### 1. Pose Estimation Model Choice: MediaPipe Pose
**Why MediaPipe Pose?**
- **High Accuracy**: Proven performance on diverse body poses and lighting conditions
- **Landmark Precision**: Provides 33 precise body landmarks including knees, hips, eyes, and ears
- **Visibility Scoring**: Each landmark includes visibility confidence, crucial for side-profile analysis
- **Computational Efficiency**: Real-time performance suitable for web applications
- **Robustness**: Handles partial occlusion and varied clothing better than alternatives
**Alternative Considered**: OpenPose
- **Rejected because**: Higher computational requirements, less optimized for single-person detection
- **MediaPipe advantage**: Better integration with web deployment, more stable landmark tracking
**Key Landmarks Used**:
- **Knees** (left/right): Primary measurement points
- **Eyes/Ears**: Scaling reference (anatomical constant)
- **Hips**: Thigh length calculation for anatomical proportions
### 2. Chair Detection Model: YOLOv8n
**Why YOLOv8n?**
- **Speed vs. Accuracy Balance**: Nano version provides sufficient accuracy for chair detection while maintaining fast inference
- **Pre-trained COCO**: Chair class (ID: 56) readily available without custom training
- **Bounding Box Precision**: Accurate enough to constrain segmentation region
- **Memory Efficiency**: Suitable for deployment environments
**Usage Strategy**:
- Extract chair bounding box (which was then sent to SAM Meta Model)
- This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest)
- Use as region-of-interest for segmentation model
### 3. Segmentation Model: SAM (Segment Anything Model) ViT-B
**Why SAM?**
SAM has point based or bounding-box based or even prompt based segmentation ability
So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front
- **Bounding Box-Based Segmentation**: Can segment objects using bounding box prompts
- **High-Quality Masks**: Superior edge precision compared to traditional segmentation
- **Generalization**: Works on furniture without specific training
- **Multi-Scale Features**: ViT-B provides good balance of accuracy and speed
**Alternative Considered**: Traditional edge detection + contour finding
- **Rejected because**: Poor performance on textured seats, lighting variations, and complex backgrounds
- **SAM advantage**: Semantic understanding of object boundaries
---
## Measurement Methodology
### Knee Position Estimation
**Challenge**: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement.
**Solution**: Anatomical Offset Calculation
```python
# Calculate thigh length for proportional offset
thigh_length_px = euclidean_distance(hip_position, knee_position)
# Back of knee offset: 13% of thigh length behind knee center
back_of_knee_offset = thigh_length_px * 0.13
# Apply directional offset based on facing direction
if facing_direction == "right":
back_of_knee_x = knee_center_x - back_of_knee_offset
else:
back_of_knee_x = knee_center_x + back_of_knee_offset
```
**Rationale for 13% Offset**:
- Since we need the back of the knee and not the knee (which MediaPipe landmark gives us )
- Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee
- Validated against manual measurements on test images
- Accounts for the distance from knee joint center to posterior knee surface
### Seat Edge Detection
**Multi-Step Process**:
1. **Region Extraction**:
```python
# Create analysis band around knee level
knee_y = average_knee_height
band_thickness = chair_height // 2
analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :]
```
2. **Edge Detection Strategy**:
- Extract chair mask pixels within the analysis band
- Find extreme X-coordinate based on facing direction
- **Right-facing**: Rightmost chair pixel (seat front)
- **Left-facing**: Leftmost chair pixel (seat front)
3. **Validation**:
- Ensure sufficient chair pixels detected in analysis region
- Cross-validate with chair bounding box constraints
### Scaling and Real-World Measurements
Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet
**Reference-Based Scaling**:
```python
# Use eye-to-ear distance as anatomical constant
eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark)
eye_to_ear_distance_cm = 7.0 # Average adult measurement
pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm
clearance_cm = clearance_pixels / pixels_per_cm
```
**Why Eye-to-Ear Distance?**
- **Anatomical Constant**: Relatively consistent across adults (6.5-7.5 cm)
- **Visibility**: Usually visible in side-profile images
- **Stability**: Less affected by posture compared to other facial measurements
### Facing Direction Detection
- Determines if person faces left or right in image
Method: Compare average X-coordinates of knees vs. eyes
- If knees are right of eyes: facing right
- If knees are left of eyes: facing left
This affects:
1. Which knee/eye/ear to use for measurements
2. Direction of anatomical offsets
3. Seat edge detection logic
---
## Challenges in Spacing Detection
### 1. Pose Detection Challenges
**Challenge**: Partial Occlusion
- **Problem**: Knees/hips may be obscured by desk, clothing, or shadows
- **Solution**: Visibility scoring and confidence thresholds
- **Mitigation**: Multi-landmark validation, graceful degradation
**Challenge**: Clothing Variations
- **Problem**: Baggy pants obscure actual knee position
- **Solution**: Anatomical offset based on skeletal landmarks rather than clothing contours
- **Limitation**: Still estimates through clothing, may introduce small errors
### 2. Chair Segmentation Challenges
**Challenge**: Complex Seat Materials
- **Problem**: Mesh, leather, fabric textures confuse edge detection
- **Solution**: SAM's semantic understanding handles material variations
- **Remaining Issue**: Highly reflective or transparent materials
**Challenge**: Partial Chair Visibility
- **Problem**: Desk, person's body may occlude seat edges
- **Solution**: Focus analysis on knee-level band where seat is most likely visible
- **Limitation**: Deep occlusion may cause detection failure
### 3. Scaling and Measurement Challenges
**Challenge**: Camera Perspective Distortion
- **Problem**: Non-perpendicular camera angles affect measurements
- **Solution**: Assume reasonable side-profile positioning
- **Limitation**: Extreme angles (>30Β°) may introduce errors
**Challenge**: Depth Perception in 2D Images
- **Problem**: Cannot measure true 3D distances
- **Solution**: Project measurements onto image plane
- **Assumption**: Person and chair are roughly in the same plane
### 4. Lighting and Image Quality
**Challenge**: Poor Lighting Conditions
- **Problem**: Shadows, backlighting affect landmark detection
- **Solution**: MediaPipe's robustness to lighting variations
- **Enhancement**: Preprocessing could include histogram equalization
---
## Accuracy Improvement Suggestions
### Short-Term Improvements
1. **Enhanced Preprocessing**
- Maybe can have improced contrast using certain methods like histogram equilization
2. **Multi-Reference Scaling**
- Combine eye-to-ear with other facial measurements
- Use hand/finger dimensions when visible
- Cross-validate scaling factors
### Medium-Term Enhancements
1. **Custom Training Data**
- Collect ergonomic seating dataset with ground truth measurements
- Then we could actually fine-tune pose estimation on seated postures
- And train a specialized chair segmentation model
2. **Multi-Frame Analysis**
- Process video streams and have average measurements across multiple frames
3. **3D Pose Estimation**
- Integrate depth estimation models
- Calculate true 3D clearances
### Long-Term Research Directions
**Multi-Modal Sensing**
- Combine computer vision with pressure sensors
- Integrate with smart chair systems
- Real-time posture monitoring
---
## Development Process and Design Decisions
### Iterative Development Approach
1. **Phase 1: Core Detection**
- Implemented basic pose detection
- Added simple chair detection
- Established measurement pipeline
2. **Phase 2: Accuracy Enhancement**
- Integrated SAM for precise segmentation
- Added anatomical offset calculations
- Implemented multi-scale analysis
3. **Phase 3: User Experience**
- Built Streamlit interface
- Added visualization pipeline
- Implemented sample image system
4. **Phase 4: Robustness**
- Enhanced error handling
- Added confidence scoring
- Implemented comprehensive testing
### Key Design Decisions
**Decision 1: Multi-Model vs. Single Model**
- **Chosen**: Multi-model pipeline
- **Rationale**: Each model excels in its domain (pose, detection, segmentation)
- **Trade-off**: Complexity vs. accuracy
**Decision 2: Real-time vs. Batch Processing**
- **Chosen**: Single image analysis
- **Rationale**: Simplicity, easier deployment
- **Future**: Could extend to video streams
**Decision 3: Cloud vs. Local Processing**
- **Chosen**: Local processing capability
- **Rationale**: Privacy, offline usage
- **Deployment**: Supports both local and cloud deployment
### Assumptions and Limitations
**Key Assumptions**:
1. **Side Profile View**: Person is photographed from the side
2. **Seated Posture**: Back is against or near chair backrest
3. **Standard Chair**: Conventional office chair design
4. **Adult Subjects**: Eye-to-ear scaling appropriate for adults
5. **Static Analysis**: Single-moment analysis, not dynamic posture
**Known Limitations**:
1. **2D Analysis**: Cannot account for chair/body rotation out of image plane
2. **Clothing Effects**: Thick clothing may obscure true body landmarks
3. **Lighting Dependency**: Very poor lighting may affect landmark detection
4. **Chair Variety**: Unusual chair designs may confuse detection
5. **Anthropometric Variation**: Fixed scaling may not suit all body types
---
## Validation and Testing Strategy
### Test Coverage
1. **Unit Tests**: Individual component testing
2. **Integration Tests**: End-to-end pipeline validation
3. **Accuracy Tests**: Ground truth comparison on sample images
4. **Edge Case Tests**: Handling of failure conditions
5. **Performance Tests**: Processing time benchmarking
### Sample Dataset
- **Optimal Cases (3 samples)**: Clear examples of proper seating
- **Too Deep Cases (4 samples)**: Various levels of excessive depth
- **Too Short Cases (8 samples)**: Range of insufficient depth scenarios
---
### Technical References
1. **MediaPipe Pose**: [Google Research Paper](https://arxiv.org/abs/2006.10204)
2. **SAM (Segment Anything)**: [Meta AI Research](https://arxiv.org/abs/2304.02643)
3. **YOLOv8**: [Ultralytics Documentation](https://docs.ultralytics.com/)
### Dataset and Tools
- **Sample Images**: Custom collected and validated
- **Development Environment**: Python 3.9, PyTorch, OpenCV
- **Deployment Platform**: Streamlit Cloud
### Anthropometric Data Sources
- **Eye-to-Ear Measurements**: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry