Spaces:
Sleeping
Sleeping
title: Seat Depth Analyzer | |
emoji: πͺ | |
colorFrom: blue | |
colorTo: green | |
sdk: streamlit | |
sdk_version: 1.28.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
# Seat Depth Analyzer - Technical Documentation | |
Seat Depth Analyzer | |
An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short. | |
π Quick Start | |
1. Install Dependencies | |
```python | |
bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow | |
``` | |
2. Run Application | |
```python | |
bashstreamlit run app.py | |
``` | |
Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission | |
3. Open in Browser | |
Navigate to: http://localhost:8501 | |
4. Test the App | |
Upload a side-profile image of someone seated, or | |
Try the included sample images | |
Click "π Analyze Seat Depth" | |
## Project Overview | |
The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as **Optimal**, **Too Deep**, or **Too Short** based on the clearance between the seat front edge and the back of the user's knee. | |
### Ergonomic Classification Criteria | |
- **Optimal**: 2-6 cm clearance (proper thigh support without circulation issues) | |
- **Too Deep**: <2 cm clearance or knee behind seat edge (circulation risk) | |
- **Too Short**: >6 cm clearance (insufficient thigh support) | |
--- | |
## Technical Architecture | |
### Multi-Model Pipeline | |
The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models: | |
``` | |
Input Image β Pose Detection β Chair Detection β Seat Segmentation β Measurement β Classification β Output | |
β β β β β | |
MediaPipe YOLOv8n SAM (ViT-B) CV Analysis Ergonomic | |
Pose (Chair) Segmentation & Scaling Rules | |
``` | |
## Model Selection and Rationale | |
### 1. Pose Estimation Model Choice: MediaPipe Pose | |
**Why MediaPipe Pose?** | |
- **High Accuracy**: Proven performance on diverse body poses and lighting conditions | |
- **Landmark Precision**: Provides 33 precise body landmarks including knees, hips, eyes, and ears | |
- **Visibility Scoring**: Each landmark includes visibility confidence, crucial for side-profile analysis | |
- **Computational Efficiency**: Real-time performance suitable for web applications | |
- **Robustness**: Handles partial occlusion and varied clothing better than alternatives | |
**Alternative Considered**: OpenPose | |
- **Rejected because**: Higher computational requirements, less optimized for single-person detection | |
- **MediaPipe advantage**: Better integration with web deployment, more stable landmark tracking | |
**Key Landmarks Used**: | |
- **Knees** (left/right): Primary measurement points | |
- **Eyes/Ears**: Scaling reference (anatomical constant) | |
- **Hips**: Thigh length calculation for anatomical proportions | |
### 2. Chair Detection Model: YOLOv8n | |
**Why YOLOv8n?** | |
- **Speed vs. Accuracy Balance**: Nano version provides sufficient accuracy for chair detection while maintaining fast inference | |
- **Pre-trained COCO**: Chair class (ID: 56) readily available without custom training | |
- **Bounding Box Precision**: Accurate enough to constrain segmentation region | |
- **Memory Efficiency**: Suitable for deployment environments | |
**Usage Strategy**: | |
- Extract chair bounding box (which was then sent to SAM Meta Model) | |
- This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest) | |
- Use as region-of-interest for segmentation model | |
### 3. Segmentation Model: SAM (Segment Anything Model) ViT-B | |
**Why SAM?** | |
SAM has point based or bounding-box based or even prompt based segmentation ability | |
So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front | |
- **Bounding Box-Based Segmentation**: Can segment objects using bounding box prompts | |
- **High-Quality Masks**: Superior edge precision compared to traditional segmentation | |
- **Generalization**: Works on furniture without specific training | |
- **Multi-Scale Features**: ViT-B provides good balance of accuracy and speed | |
**Alternative Considered**: Traditional edge detection + contour finding | |
- **Rejected because**: Poor performance on textured seats, lighting variations, and complex backgrounds | |
- **SAM advantage**: Semantic understanding of object boundaries | |
--- | |
## Measurement Methodology | |
### Knee Position Estimation | |
**Challenge**: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement. | |
**Solution**: Anatomical Offset Calculation | |
```python | |
# Calculate thigh length for proportional offset | |
thigh_length_px = euclidean_distance(hip_position, knee_position) | |
# Back of knee offset: 13% of thigh length behind knee center | |
back_of_knee_offset = thigh_length_px * 0.13 | |
# Apply directional offset based on facing direction | |
if facing_direction == "right": | |
back_of_knee_x = knee_center_x - back_of_knee_offset | |
else: | |
back_of_knee_x = knee_center_x + back_of_knee_offset | |
``` | |
**Rationale for 13% Offset**: | |
- Since we need the back of the knee and not the knee (which MediaPipe landmark gives us ) | |
- Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee | |
- Validated against manual measurements on test images | |
- Accounts for the distance from knee joint center to posterior knee surface | |
### Seat Edge Detection | |
**Multi-Step Process**: | |
1. **Region Extraction**: | |
```python | |
# Create analysis band around knee level | |
knee_y = average_knee_height | |
band_thickness = chair_height // 2 | |
analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :] | |
``` | |
2. **Edge Detection Strategy**: | |
- Extract chair mask pixels within the analysis band | |
- Find extreme X-coordinate based on facing direction | |
- **Right-facing**: Rightmost chair pixel (seat front) | |
- **Left-facing**: Leftmost chair pixel (seat front) | |
3. **Validation**: | |
- Ensure sufficient chair pixels detected in analysis region | |
- Cross-validate with chair bounding box constraints | |
### Scaling and Real-World Measurements | |
Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet | |
**Reference-Based Scaling**: | |
```python | |
# Use eye-to-ear distance as anatomical constant | |
eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark) | |
eye_to_ear_distance_cm = 7.0 # Average adult measurement | |
pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm | |
clearance_cm = clearance_pixels / pixels_per_cm | |
``` | |
**Why Eye-to-Ear Distance?** | |
- **Anatomical Constant**: Relatively consistent across adults (6.5-7.5 cm) | |
- **Visibility**: Usually visible in side-profile images | |
- **Stability**: Less affected by posture compared to other facial measurements | |
### Facing Direction Detection | |
- Determines if person faces left or right in image | |
Method: Compare average X-coordinates of knees vs. eyes | |
- If knees are right of eyes: facing right | |
- If knees are left of eyes: facing left | |
This affects: | |
1. Which knee/eye/ear to use for measurements | |
2. Direction of anatomical offsets | |
3. Seat edge detection logic | |
--- | |
## Challenges in Spacing Detection | |
### 1. Pose Detection Challenges | |
**Challenge**: Partial Occlusion | |
- **Problem**: Knees/hips may be obscured by desk, clothing, or shadows | |
- **Solution**: Visibility scoring and confidence thresholds | |
- **Mitigation**: Multi-landmark validation, graceful degradation | |
**Challenge**: Clothing Variations | |
- **Problem**: Baggy pants obscure actual knee position | |
- **Solution**: Anatomical offset based on skeletal landmarks rather than clothing contours | |
- **Limitation**: Still estimates through clothing, may introduce small errors | |
### 2. Chair Segmentation Challenges | |
**Challenge**: Complex Seat Materials | |
- **Problem**: Mesh, leather, fabric textures confuse edge detection | |
- **Solution**: SAM's semantic understanding handles material variations | |
- **Remaining Issue**: Highly reflective or transparent materials | |
**Challenge**: Partial Chair Visibility | |
- **Problem**: Desk, person's body may occlude seat edges | |
- **Solution**: Focus analysis on knee-level band where seat is most likely visible | |
- **Limitation**: Deep occlusion may cause detection failure | |
### 3. Scaling and Measurement Challenges | |
**Challenge**: Camera Perspective Distortion | |
- **Problem**: Non-perpendicular camera angles affect measurements | |
- **Solution**: Assume reasonable side-profile positioning | |
- **Limitation**: Extreme angles (>30Β°) may introduce errors | |
**Challenge**: Depth Perception in 2D Images | |
- **Problem**: Cannot measure true 3D distances | |
- **Solution**: Project measurements onto image plane | |
- **Assumption**: Person and chair are roughly in the same plane | |
### 4. Lighting and Image Quality | |
**Challenge**: Poor Lighting Conditions | |
- **Problem**: Shadows, backlighting affect landmark detection | |
- **Solution**: MediaPipe's robustness to lighting variations | |
- **Enhancement**: Preprocessing could include histogram equalization | |
--- | |
## Accuracy Improvement Suggestions | |
### Short-Term Improvements | |
1. **Enhanced Preprocessing** | |
- Maybe can have improced contrast using certain methods like histogram equilization | |
2. **Multi-Reference Scaling** | |
- Combine eye-to-ear with other facial measurements | |
- Use hand/finger dimensions when visible | |
- Cross-validate scaling factors | |
### Medium-Term Enhancements | |
1. **Custom Training Data** | |
- Collect ergonomic seating dataset with ground truth measurements | |
- Then we could actually fine-tune pose estimation on seated postures | |
- And train a specialized chair segmentation model | |
2. **Multi-Frame Analysis** | |
- Process video streams and have average measurements across multiple frames | |
3. **3D Pose Estimation** | |
- Integrate depth estimation models | |
- Calculate true 3D clearances | |
### Long-Term Research Directions | |
**Multi-Modal Sensing** | |
- Combine computer vision with pressure sensors | |
- Integrate with smart chair systems | |
- Real-time posture monitoring | |
--- | |
## Development Process and Design Decisions | |
### Iterative Development Approach | |
1. **Phase 1: Core Detection** | |
- Implemented basic pose detection | |
- Added simple chair detection | |
- Established measurement pipeline | |
2. **Phase 2: Accuracy Enhancement** | |
- Integrated SAM for precise segmentation | |
- Added anatomical offset calculations | |
- Implemented multi-scale analysis | |
3. **Phase 3: User Experience** | |
- Built Streamlit interface | |
- Added visualization pipeline | |
- Implemented sample image system | |
4. **Phase 4: Robustness** | |
- Enhanced error handling | |
- Added confidence scoring | |
- Implemented comprehensive testing | |
### Key Design Decisions | |
**Decision 1: Multi-Model vs. Single Model** | |
- **Chosen**: Multi-model pipeline | |
- **Rationale**: Each model excels in its domain (pose, detection, segmentation) | |
- **Trade-off**: Complexity vs. accuracy | |
**Decision 2: Real-time vs. Batch Processing** | |
- **Chosen**: Single image analysis | |
- **Rationale**: Simplicity, easier deployment | |
- **Future**: Could extend to video streams | |
**Decision 3: Cloud vs. Local Processing** | |
- **Chosen**: Local processing capability | |
- **Rationale**: Privacy, offline usage | |
- **Deployment**: Supports both local and cloud deployment | |
### Assumptions and Limitations | |
**Key Assumptions**: | |
1. **Side Profile View**: Person is photographed from the side | |
2. **Seated Posture**: Back is against or near chair backrest | |
3. **Standard Chair**: Conventional office chair design | |
4. **Adult Subjects**: Eye-to-ear scaling appropriate for adults | |
5. **Static Analysis**: Single-moment analysis, not dynamic posture | |
**Known Limitations**: | |
1. **2D Analysis**: Cannot account for chair/body rotation out of image plane | |
2. **Clothing Effects**: Thick clothing may obscure true body landmarks | |
3. **Lighting Dependency**: Very poor lighting may affect landmark detection | |
4. **Chair Variety**: Unusual chair designs may confuse detection | |
5. **Anthropometric Variation**: Fixed scaling may not suit all body types | |
--- | |
## Validation and Testing Strategy | |
### Test Coverage | |
1. **Unit Tests**: Individual component testing | |
2. **Integration Tests**: End-to-end pipeline validation | |
3. **Accuracy Tests**: Ground truth comparison on sample images | |
4. **Edge Case Tests**: Handling of failure conditions | |
5. **Performance Tests**: Processing time benchmarking | |
### Sample Dataset | |
- **Optimal Cases (3 samples)**: Clear examples of proper seating | |
- **Too Deep Cases (4 samples)**: Various levels of excessive depth | |
- **Too Short Cases (8 samples)**: Range of insufficient depth scenarios | |
--- | |
### Technical References | |
1. **MediaPipe Pose**: [Google Research Paper](https://arxiv.org/abs/2006.10204) | |
2. **SAM (Segment Anything)**: [Meta AI Research](https://arxiv.org/abs/2304.02643) | |
3. **YOLOv8**: [Ultralytics Documentation](https://docs.ultralytics.com/) | |
### Dataset and Tools | |
- **Sample Images**: Custom collected and validated | |
- **Development Environment**: Python 3.9, PyTorch, OpenCV | |
- **Deployment Platform**: Streamlit Cloud | |
### Anthropometric Data Sources | |
- **Eye-to-Ear Measurements**: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry | |