Spaces:
Sleeping
Sleeping
File size: 13,680 Bytes
a6beb58 2ba232f 5bd71ef 2ba232f c9db88a a6beb58 c9db88a a6beb58 c9db88a a6beb58 597f928 a6beb58 c9db88a a6beb58 c9db88a a6beb58 6c3ab3c a6beb58 c9db88a a6beb58 c9db88a a6beb58 c9db88a a6beb58 c9db88a a6beb58 6c3ab3c a6beb58 6c3ab3c a6beb58 6c3ab3c a6beb58 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
---
title: Seat Depth Analyzer
emoji: πͺ
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit
---
# Seat Depth Analyzer - Technical Documentation
Seat Depth Analyzer
An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short.
π Quick Start
1. Install Dependencies
```python
bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow
```
2. Run Application
```python
bashstreamlit run app.py
```
Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission
3. Open in Browser
Navigate to: http://localhost:8501
4. Test the App
Upload a side-profile image of someone seated, or
Try the included sample images
Click "π Analyze Seat Depth"
## Project Overview
The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as **Optimal**, **Too Deep**, or **Too Short** based on the clearance between the seat front edge and the back of the user's knee.
### Ergonomic Classification Criteria
- **Optimal**: 2-6 cm clearance (proper thigh support without circulation issues)
- **Too Deep**: <2 cm clearance or knee behind seat edge (circulation risk)
- **Too Short**: >6 cm clearance (insufficient thigh support)
---
## Technical Architecture
### Multi-Model Pipeline
The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models:
```
Input Image β Pose Detection β Chair Detection β Seat Segmentation β Measurement β Classification β Output
β β β β β
MediaPipe YOLOv8n SAM (ViT-B) CV Analysis Ergonomic
Pose (Chair) Segmentation & Scaling Rules
```
## Model Selection and Rationale
### 1. Pose Estimation Model Choice: MediaPipe Pose
**Why MediaPipe Pose?**
- **High Accuracy**: Proven performance on diverse body poses and lighting conditions
- **Landmark Precision**: Provides 33 precise body landmarks including knees, hips, eyes, and ears
- **Visibility Scoring**: Each landmark includes visibility confidence, crucial for side-profile analysis
- **Computational Efficiency**: Real-time performance suitable for web applications
- **Robustness**: Handles partial occlusion and varied clothing better than alternatives
**Alternative Considered**: OpenPose
- **Rejected because**: Higher computational requirements, less optimized for single-person detection
- **MediaPipe advantage**: Better integration with web deployment, more stable landmark tracking
**Key Landmarks Used**:
- **Knees** (left/right): Primary measurement points
- **Eyes/Ears**: Scaling reference (anatomical constant)
- **Hips**: Thigh length calculation for anatomical proportions
### 2. Chair Detection Model: YOLOv8n
**Why YOLOv8n?**
- **Speed vs. Accuracy Balance**: Nano version provides sufficient accuracy for chair detection while maintaining fast inference
- **Pre-trained COCO**: Chair class (ID: 56) readily available without custom training
- **Bounding Box Precision**: Accurate enough to constrain segmentation region
- **Memory Efficiency**: Suitable for deployment environments
**Usage Strategy**:
- Extract chair bounding box (which was then sent to SAM Meta Model)
- This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest)
- Use as region-of-interest for segmentation model
### 3. Segmentation Model: SAM (Segment Anything Model) ViT-B
**Why SAM?**
SAM has point based or bounding-box based or even prompt based segmentation ability
So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front
- **Bounding Box-Based Segmentation**: Can segment objects using bounding box prompts
- **High-Quality Masks**: Superior edge precision compared to traditional segmentation
- **Generalization**: Works on furniture without specific training
- **Multi-Scale Features**: ViT-B provides good balance of accuracy and speed
**Alternative Considered**: Traditional edge detection + contour finding
- **Rejected because**: Poor performance on textured seats, lighting variations, and complex backgrounds
- **SAM advantage**: Semantic understanding of object boundaries
---
## Measurement Methodology
### Knee Position Estimation
**Challenge**: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement.
**Solution**: Anatomical Offset Calculation
```python
# Calculate thigh length for proportional offset
thigh_length_px = euclidean_distance(hip_position, knee_position)
# Back of knee offset: 13% of thigh length behind knee center
back_of_knee_offset = thigh_length_px * 0.13
# Apply directional offset based on facing direction
if facing_direction == "right":
back_of_knee_x = knee_center_x - back_of_knee_offset
else:
back_of_knee_x = knee_center_x + back_of_knee_offset
```
**Rationale for 13% Offset**:
- Since we need the back of the knee and not the knee (which MediaPipe landmark gives us )
- Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee
- Validated against manual measurements on test images
- Accounts for the distance from knee joint center to posterior knee surface
### Seat Edge Detection
**Multi-Step Process**:
1. **Region Extraction**:
```python
# Create analysis band around knee level
knee_y = average_knee_height
band_thickness = chair_height // 2
analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :]
```
2. **Edge Detection Strategy**:
- Extract chair mask pixels within the analysis band
- Find extreme X-coordinate based on facing direction
- **Right-facing**: Rightmost chair pixel (seat front)
- **Left-facing**: Leftmost chair pixel (seat front)
3. **Validation**:
- Ensure sufficient chair pixels detected in analysis region
- Cross-validate with chair bounding box constraints
### Scaling and Real-World Measurements
Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet
**Reference-Based Scaling**:
```python
# Use eye-to-ear distance as anatomical constant
eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark)
eye_to_ear_distance_cm = 7.0 # Average adult measurement
pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm
clearance_cm = clearance_pixels / pixels_per_cm
```
**Why Eye-to-Ear Distance?**
- **Anatomical Constant**: Relatively consistent across adults (6.5-7.5 cm)
- **Visibility**: Usually visible in side-profile images
- **Stability**: Less affected by posture compared to other facial measurements
### Facing Direction Detection
- Determines if person faces left or right in image
Method: Compare average X-coordinates of knees vs. eyes
- If knees are right of eyes: facing right
- If knees are left of eyes: facing left
This affects:
1. Which knee/eye/ear to use for measurements
2. Direction of anatomical offsets
3. Seat edge detection logic
---
## Challenges in Spacing Detection
### 1. Pose Detection Challenges
**Challenge**: Partial Occlusion
- **Problem**: Knees/hips may be obscured by desk, clothing, or shadows
- **Solution**: Visibility scoring and confidence thresholds
- **Mitigation**: Multi-landmark validation, graceful degradation
**Challenge**: Clothing Variations
- **Problem**: Baggy pants obscure actual knee position
- **Solution**: Anatomical offset based on skeletal landmarks rather than clothing contours
- **Limitation**: Still estimates through clothing, may introduce small errors
### 2. Chair Segmentation Challenges
**Challenge**: Complex Seat Materials
- **Problem**: Mesh, leather, fabric textures confuse edge detection
- **Solution**: SAM's semantic understanding handles material variations
- **Remaining Issue**: Highly reflective or transparent materials
**Challenge**: Partial Chair Visibility
- **Problem**: Desk, person's body may occlude seat edges
- **Solution**: Focus analysis on knee-level band where seat is most likely visible
- **Limitation**: Deep occlusion may cause detection failure
### 3. Scaling and Measurement Challenges
**Challenge**: Camera Perspective Distortion
- **Problem**: Non-perpendicular camera angles affect measurements
- **Solution**: Assume reasonable side-profile positioning
- **Limitation**: Extreme angles (>30Β°) may introduce errors
**Challenge**: Depth Perception in 2D Images
- **Problem**: Cannot measure true 3D distances
- **Solution**: Project measurements onto image plane
- **Assumption**: Person and chair are roughly in the same plane
### 4. Lighting and Image Quality
**Challenge**: Poor Lighting Conditions
- **Problem**: Shadows, backlighting affect landmark detection
- **Solution**: MediaPipe's robustness to lighting variations
- **Enhancement**: Preprocessing could include histogram equalization
---
## Accuracy Improvement Suggestions
### Short-Term Improvements
1. **Enhanced Preprocessing**
- Maybe can have improced contrast using certain methods like histogram equilization
2. **Multi-Reference Scaling**
- Combine eye-to-ear with other facial measurements
- Use hand/finger dimensions when visible
- Cross-validate scaling factors
### Medium-Term Enhancements
1. **Custom Training Data**
- Collect ergonomic seating dataset with ground truth measurements
- Then we could actually fine-tune pose estimation on seated postures
- And train a specialized chair segmentation model
2. **Multi-Frame Analysis**
- Process video streams and have average measurements across multiple frames
3. **3D Pose Estimation**
- Integrate depth estimation models
- Calculate true 3D clearances
### Long-Term Research Directions
**Multi-Modal Sensing**
- Combine computer vision with pressure sensors
- Integrate with smart chair systems
- Real-time posture monitoring
---
## Development Process and Design Decisions
### Iterative Development Approach
1. **Phase 1: Core Detection**
- Implemented basic pose detection
- Added simple chair detection
- Established measurement pipeline
2. **Phase 2: Accuracy Enhancement**
- Integrated SAM for precise segmentation
- Added anatomical offset calculations
- Implemented multi-scale analysis
3. **Phase 3: User Experience**
- Built Streamlit interface
- Added visualization pipeline
- Implemented sample image system
4. **Phase 4: Robustness**
- Enhanced error handling
- Added confidence scoring
- Implemented comprehensive testing
### Key Design Decisions
**Decision 1: Multi-Model vs. Single Model**
- **Chosen**: Multi-model pipeline
- **Rationale**: Each model excels in its domain (pose, detection, segmentation)
- **Trade-off**: Complexity vs. accuracy
**Decision 2: Real-time vs. Batch Processing**
- **Chosen**: Single image analysis
- **Rationale**: Simplicity, easier deployment
- **Future**: Could extend to video streams
**Decision 3: Cloud vs. Local Processing**
- **Chosen**: Local processing capability
- **Rationale**: Privacy, offline usage
- **Deployment**: Supports both local and cloud deployment
### Assumptions and Limitations
**Key Assumptions**:
1. **Side Profile View**: Person is photographed from the side
2. **Seated Posture**: Back is against or near chair backrest
3. **Standard Chair**: Conventional office chair design
4. **Adult Subjects**: Eye-to-ear scaling appropriate for adults
5. **Static Analysis**: Single-moment analysis, not dynamic posture
**Known Limitations**:
1. **2D Analysis**: Cannot account for chair/body rotation out of image plane
2. **Clothing Effects**: Thick clothing may obscure true body landmarks
3. **Lighting Dependency**: Very poor lighting may affect landmark detection
4. **Chair Variety**: Unusual chair designs may confuse detection
5. **Anthropometric Variation**: Fixed scaling may not suit all body types
---
## Validation and Testing Strategy
### Test Coverage
1. **Unit Tests**: Individual component testing
2. **Integration Tests**: End-to-end pipeline validation
3. **Accuracy Tests**: Ground truth comparison on sample images
4. **Edge Case Tests**: Handling of failure conditions
5. **Performance Tests**: Processing time benchmarking
### Sample Dataset
- **Optimal Cases (3 samples)**: Clear examples of proper seating
- **Too Deep Cases (4 samples)**: Various levels of excessive depth
- **Too Short Cases (8 samples)**: Range of insufficient depth scenarios
---
### Technical References
1. **MediaPipe Pose**: [Google Research Paper](https://arxiv.org/abs/2006.10204)
2. **SAM (Segment Anything)**: [Meta AI Research](https://arxiv.org/abs/2304.02643)
3. **YOLOv8**: [Ultralytics Documentation](https://docs.ultralytics.com/)
### Dataset and Tools
- **Sample Images**: Custom collected and validated
- **Development Environment**: Python 3.9, PyTorch, OpenCV
- **Deployment Platform**: Streamlit Cloud
### Anthropometric Data Sources
- **Eye-to-Ear Measurements**: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry
|