Spaces:

nitikaborkar
/

seat-depth-analyser

Sleeping

App Files Files Community

seat-depth-analyser / README.md

nitikaborkar

Update README.md

5bd71ef verified 28 days ago

preview code

raw

history blame contribute delete

13.7 kB


	---
	title: Seat Depth Analyzer
	emoji: 🪑
	colorFrom: blue
	colorTo: green
	sdk: streamlit
	sdk_version: 1.28.0
	app_file: app.py
	pinned: false
	license: mit
	---


	# Seat Depth Analyzer - Technical Documentation


	Seat Depth Analyzer
	An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short.

	🚀 Quick Start

	1. Install Dependencies
	```python
	bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow
	```

	2. Run Application
	```python
	bashstreamlit run app.py
	```
	Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission

	3. Open in Browser
	Navigate to: http://localhost:8501

	4. Test the App

	Upload a side-profile image of someone seated, or
	Try the included sample images
	Click "🔍 Analyze Seat Depth"

	## Project Overview

	The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as Optimal, Too Deep, or Too Short based on the clearance between the seat front edge and the back of the user's knee.

	### Ergonomic Classification Criteria
	- Optimal: 2-6 cm clearance (proper thigh support without circulation issues)
	- Too Deep: <2 cm clearance or knee behind seat edge (circulation risk)
	- Too Short: >6 cm clearance (insufficient thigh support)

	---

	## Technical Architecture

	### Multi-Model Pipeline
	The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models:

	```
	Input Image → Pose Detection → Chair Detection → Seat Segmentation → Measurement → Classification → Output
	↓ ↓ ↓ ↓ ↓
	MediaPipe YOLOv8n SAM (ViT-B) CV Analysis Ergonomic
	Pose (Chair) Segmentation & Scaling Rules
	```


	## Model Selection and Rationale

	### 1. Pose Estimation Model Choice: MediaPipe Pose

	Why MediaPipe Pose?
	- High Accuracy: Proven performance on diverse body poses and lighting conditions
	- Landmark Precision: Provides 33 precise body landmarks including knees, hips, eyes, and ears
	- Visibility Scoring: Each landmark includes visibility confidence, crucial for side-profile analysis
	- Computational Efficiency: Real-time performance suitable for web applications
	- Robustness: Handles partial occlusion and varied clothing better than alternatives

	Alternative Considered: OpenPose
	- Rejected because: Higher computational requirements, less optimized for single-person detection
	- MediaPipe advantage: Better integration with web deployment, more stable landmark tracking

	Key Landmarks Used:
	- Knees (left/right): Primary measurement points
	- Eyes/Ears: Scaling reference (anatomical constant)
	- Hips: Thigh length calculation for anatomical proportions

	### 2. Chair Detection Model: YOLOv8n

	Why YOLOv8n?
	- Speed vs. Accuracy Balance: Nano version provides sufficient accuracy for chair detection while maintaining fast inference
	- Pre-trained COCO: Chair class (ID: 56) readily available without custom training
	- Bounding Box Precision: Accurate enough to constrain segmentation region
	- Memory Efficiency: Suitable for deployment environments

	Usage Strategy:
	- Extract chair bounding box (which was then sent to SAM Meta Model)
	- This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest)
	- Use as region-of-interest for segmentation model

	### 3. Segmentation Model: SAM (Segment Anything Model) ViT-B

	Why SAM?
	SAM has point based or bounding-box based or even prompt based segmentation ability
	So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front

	- Bounding Box-Based Segmentation: Can segment objects using bounding box prompts
	- High-Quality Masks: Superior edge precision compared to traditional segmentation
	- Generalization: Works on furniture without specific training
	- Multi-Scale Features: ViT-B provides good balance of accuracy and speed

	Alternative Considered: Traditional edge detection + contour finding
	- Rejected because: Poor performance on textured seats, lighting variations, and complex backgrounds
	- SAM advantage: Semantic understanding of object boundaries

	---

	## Measurement Methodology

	### Knee Position Estimation

	Challenge: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement.

	Solution: Anatomical Offset Calculation
	```python
	# Calculate thigh length for proportional offset
	thigh_length_px = euclidean_distance(hip_position, knee_position)

	# Back of knee offset: 13% of thigh length behind knee center
	back_of_knee_offset = thigh_length_px * 0.13

	# Apply directional offset based on facing direction
	if facing_direction == "right":
	back_of_knee_x = knee_center_x - back_of_knee_offset
	else:
	back_of_knee_x = knee_center_x + back_of_knee_offset
	```

	Rationale for 13% Offset:
	- Since we need the back of the knee and not the knee (which MediaPipe landmark gives us )
	- Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee
	- Validated against manual measurements on test images
	- Accounts for the distance from knee joint center to posterior knee surface

	### Seat Edge Detection

	Multi-Step Process:

	1. Region Extraction:
	```python
	# Create analysis band around knee level
	knee_y = average_knee_height
	band_thickness = chair_height // 2
	analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :]
	```

	2. Edge Detection Strategy:
	- Extract chair mask pixels within the analysis band
	- Find extreme X-coordinate based on facing direction
	- Right-facing: Rightmost chair pixel (seat front)
	- Left-facing: Leftmost chair pixel (seat front)

	3. Validation:
	- Ensure sufficient chair pixels detected in analysis region
	- Cross-validate with chair bounding box constraints

	### Scaling and Real-World Measurements

	Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet

	Reference-Based Scaling:
	```python
	# Use eye-to-ear distance as anatomical constant
	eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark)
	eye_to_ear_distance_cm = 7.0 # Average adult measurement

	pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm
	clearance_cm = clearance_pixels / pixels_per_cm
	```

	Why Eye-to-Ear Distance?
	- Anatomical Constant: Relatively consistent across adults (6.5-7.5 cm)
	- Visibility: Usually visible in side-profile images
	- Stability: Less affected by posture compared to other facial measurements

	### Facing Direction Detection
	- Determines if person faces left or right in image

	Method: Compare average X-coordinates of knees vs. eyes
	- If knees are right of eyes: facing right
	- If knees are left of eyes: facing left

	This affects:
	1. Which knee/eye/ear to use for measurements
	2. Direction of anatomical offsets
	3. Seat edge detection logic

	---

	## Challenges in Spacing Detection

	### 1. Pose Detection Challenges

	Challenge: Partial Occlusion
	- Problem: Knees/hips may be obscured by desk, clothing, or shadows
	- Solution: Visibility scoring and confidence thresholds
	- Mitigation: Multi-landmark validation, graceful degradation

	Challenge: Clothing Variations
	- Problem: Baggy pants obscure actual knee position
	- Solution: Anatomical offset based on skeletal landmarks rather than clothing contours
	- Limitation: Still estimates through clothing, may introduce small errors

	### 2. Chair Segmentation Challenges

	Challenge: Complex Seat Materials
	- Problem: Mesh, leather, fabric textures confuse edge detection
	- Solution: SAM's semantic understanding handles material variations
	- Remaining Issue: Highly reflective or transparent materials

	Challenge: Partial Chair Visibility
	- Problem: Desk, person's body may occlude seat edges
	- Solution: Focus analysis on knee-level band where seat is most likely visible
	- Limitation: Deep occlusion may cause detection failure

	### 3. Scaling and Measurement Challenges

	Challenge: Camera Perspective Distortion
	- Problem: Non-perpendicular camera angles affect measurements
	- Solution: Assume reasonable side-profile positioning
	- Limitation: Extreme angles (>30°) may introduce errors

	Challenge: Depth Perception in 2D Images
	- Problem: Cannot measure true 3D distances
	- Solution: Project measurements onto image plane
	- Assumption: Person and chair are roughly in the same plane

	### 4. Lighting and Image Quality

	Challenge: Poor Lighting Conditions
	- Problem: Shadows, backlighting affect landmark detection
	- Solution: MediaPipe's robustness to lighting variations
	- Enhancement: Preprocessing could include histogram equalization

	---

	## Accuracy Improvement Suggestions

	### Short-Term Improvements

	1. Enhanced Preprocessing
	- Maybe can have improced contrast using certain methods like histogram equilization

	2. Multi-Reference Scaling
	- Combine eye-to-ear with other facial measurements
	- Use hand/finger dimensions when visible
	- Cross-validate scaling factors

	### Medium-Term Enhancements

	1. Custom Training Data
	- Collect ergonomic seating dataset with ground truth measurements
	- Then we could actually fine-tune pose estimation on seated postures
	- And train a specialized chair segmentation model

	2. Multi-Frame Analysis
	- Process video streams and have average measurements across multiple frames

	3. 3D Pose Estimation
	- Integrate depth estimation models
	- Calculate true 3D clearances

	### Long-Term Research Directions

	Multi-Modal Sensing
	- Combine computer vision with pressure sensors
	- Integrate with smart chair systems
	- Real-time posture monitoring

	---


	## Development Process and Design Decisions

	### Iterative Development Approach

	1. Phase 1: Core Detection
	- Implemented basic pose detection
	- Added simple chair detection
	- Established measurement pipeline

	2. Phase 2: Accuracy Enhancement
	- Integrated SAM for precise segmentation
	- Added anatomical offset calculations
	- Implemented multi-scale analysis

	3. Phase 3: User Experience
	- Built Streamlit interface
	- Added visualization pipeline
	- Implemented sample image system

	4. Phase 4: Robustness
	- Enhanced error handling
	- Added confidence scoring
	- Implemented comprehensive testing

	### Key Design Decisions

	Decision 1: Multi-Model vs. Single Model
	- Chosen: Multi-model pipeline
	- Rationale: Each model excels in its domain (pose, detection, segmentation)
	- Trade-off: Complexity vs. accuracy

	Decision 2: Real-time vs. Batch Processing
	- Chosen: Single image analysis
	- Rationale: Simplicity, easier deployment
	- Future: Could extend to video streams

	Decision 3: Cloud vs. Local Processing
	- Chosen: Local processing capability
	- Rationale: Privacy, offline usage
	- Deployment: Supports both local and cloud deployment

	### Assumptions and Limitations

	Key Assumptions:
	1. Side Profile View: Person is photographed from the side
	2. Seated Posture: Back is against or near chair backrest
	3. Standard Chair: Conventional office chair design
	4. Adult Subjects: Eye-to-ear scaling appropriate for adults
	5. Static Analysis: Single-moment analysis, not dynamic posture

	Known Limitations:
	1. 2D Analysis: Cannot account for chair/body rotation out of image plane
	2. Clothing Effects: Thick clothing may obscure true body landmarks
	3. Lighting Dependency: Very poor lighting may affect landmark detection
	4. Chair Variety: Unusual chair designs may confuse detection
	5. Anthropometric Variation: Fixed scaling may not suit all body types

	---

	## Validation and Testing Strategy

	### Test Coverage

	1. Unit Tests: Individual component testing
	2. Integration Tests: End-to-end pipeline validation
	3. Accuracy Tests: Ground truth comparison on sample images
	4. Edge Case Tests: Handling of failure conditions
	5. Performance Tests: Processing time benchmarking

	### Sample Dataset

	- Optimal Cases (3 samples): Clear examples of proper seating
	- Too Deep Cases (4 samples): Various levels of excessive depth
	- Too Short Cases (8 samples): Range of insufficient depth scenarios
	---

	### Technical References
	1. MediaPipe Pose: [Google Research Paper](https://arxiv.org/abs/2006.10204)
	2. SAM (Segment Anything): [Meta AI Research](https://arxiv.org/abs/2304.02643)
	3. YOLOv8: [Ultralytics Documentation](https://docs.ultralytics.com/)

	### Dataset and Tools
	- Sample Images: Custom collected and validated
	- Development Environment: Python 3.9, PyTorch, OpenCV
	- Deployment Platform: Streamlit Cloud

	### Anthropometric Data Sources
	- Eye-to-Ear Measurements: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry