File size: 13,680 Bytes
a6beb58
2ba232f
 
 
 
 
 
 
 
 
 
 
 
 
5bd71ef
 
2ba232f
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
c9db88a
a6beb58
 
 
 
 
 
 
 
 
597f928
a6beb58
 
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c3ab3c
a6beb58
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9db88a
a6beb58
 
 
 
 
 
 
 
 
 
 
 
 
 
6c3ab3c
 
a6beb58
 
 
 
6c3ab3c
a6beb58
 
 
 
6c3ab3c
a6beb58
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370

---
title: Seat Depth Analyzer
emoji: πŸͺ‘
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit
---


# Seat Depth Analyzer - Technical Documentation


Seat Depth Analyzer
An AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images and classifies seat pan depth as Optimal, Too Deep, or Too Short.

πŸš€ Quick Start

1. Install Dependencies
    ```python
    bashpip install streamlit opencv-python numpy torch torchvision segment-anything ultralytics mediapipe pillow
    ```

2. Run Application
    ```python
    bashstreamlit run app.py
    ```
    Note: SAM model (sam_vit_b_01ec64.pth) is included in the submission

3. Open in Browser
    Navigate to: http://localhost:8501

4. Test the App

    Upload a side-profile image of someone seated, or
    Try the included sample images
    Click "πŸ” Analyze Seat Depth"

##  Project Overview

The Seat Depth Analyzer is an AI-powered computer vision application that analyzes ergonomic seating conditions from side-profile images. It classifies seat pan depth as **Optimal**, **Too Deep**, or **Too Short** based on the clearance between the seat front edge and the back of the user's knee.

### Ergonomic Classification Criteria
- **Optimal**: 2-6 cm clearance (proper thigh support without circulation issues)
- **Too Deep**: <2 cm clearance or knee behind seat edge (circulation risk)
- **Too Short**: >6 cm clearance (insufficient thigh support)

---

##  Technical Architecture

### Multi-Model Pipeline
The solution employs a sophisticated multi-model approach combining three state-of-the-art computer vision models:

```
Input Image β†’ Pose Detection β†’ Chair Detection β†’ Seat Segmentation β†’ Measurement β†’ Classification β†’  Output 
                        ↓              ↓               ↓               ↓                ↓              
                        MediaPipe     YOLOv8n        SAM (ViT-B)      CV Analysis    Ergonomic     
                        Pose           (Chair)      Segmentation     & Scaling          Rules        
```


##  Model Selection and Rationale

### 1. Pose Estimation Model Choice: MediaPipe Pose

**Why MediaPipe Pose?**
- **High Accuracy**: Proven performance on diverse body poses and lighting conditions
- **Landmark Precision**: Provides 33 precise body landmarks including knees, hips, eyes, and ears
- **Visibility Scoring**: Each landmark includes visibility confidence, crucial for side-profile analysis
- **Computational Efficiency**: Real-time performance suitable for web applications
- **Robustness**: Handles partial occlusion and varied clothing better than alternatives

**Alternative Considered**: OpenPose
- **Rejected because**: Higher computational requirements, less optimized for single-person detection
- **MediaPipe advantage**: Better integration with web deployment, more stable landmark tracking

**Key Landmarks Used**:
- **Knees** (left/right): Primary measurement points
- **Eyes/Ears**: Scaling reference (anatomical constant)
- **Hips**: Thigh length calculation for anatomical proportions

### 2. Chair Detection Model: YOLOv8n

**Why YOLOv8n?**
- **Speed vs. Accuracy Balance**: Nano version provides sufficient accuracy for chair detection while maintaining fast inference
- **Pre-trained COCO**: Chair class (ID: 56) readily available without custom training
- **Bounding Box Precision**: Accurate enough to constrain segmentation region
- **Memory Efficiency**: Suitable for deployment environments

**Usage Strategy**:
- Extract chair bounding box (which was then sent to SAM Meta Model)
- This was also used to Apply 25% vertical crop from top (focuses on seat area, excludes backrest)
- Use as region-of-interest for segmentation model

### 3. Segmentation Model: SAM (Segment Anything Model) ViT-B

**Why SAM?**
SAM has point based or bounding-box based or even prompt based segmentation ability
So I used it to mask out the chair from the image in order to be able to better focus on the seat pan front

- **Bounding Box-Based Segmentation**: Can segment objects using bounding box prompts
- **High-Quality Masks**: Superior edge precision compared to traditional segmentation
- **Generalization**: Works on furniture without specific training
- **Multi-Scale Features**: ViT-B provides good balance of accuracy and speed

**Alternative Considered**: Traditional edge detection + contour finding
- **Rejected because**: Poor performance on textured seats, lighting variations, and complex backgrounds
- **SAM advantage**: Semantic understanding of object boundaries

---

##  Measurement Methodology

### Knee Position Estimation

**Challenge**: MediaPipe knee landmarks represent joint centers, not the back of the knee (popliteal area) needed for ergonomic measurement.

**Solution**: Anatomical Offset Calculation
```python
# Calculate thigh length for proportional offset
thigh_length_px = euclidean_distance(hip_position, knee_position)

# Back of knee offset: 13% of thigh length behind knee center
back_of_knee_offset = thigh_length_px * 0.13

# Apply directional offset based on facing direction
if facing_direction == "right":
    back_of_knee_x = knee_center_x - back_of_knee_offset
else:
    back_of_knee_x = knee_center_x + back_of_knee_offset
```

**Rationale for 13% Offset**:
- Since we need the back of the knee and not the knee (which MediaPipe landmark gives us )
- Based on anthropometric studies of knee anatomy - the back of the thigh would be approximately 12-15% offset from the knee
- Validated against manual measurements on test images
- Accounts for the distance from knee joint center to posterior knee surface

### Seat Edge Detection

**Multi-Step Process**:

1. **Region Extraction**:
   ```python
   # Create analysis band around knee level
   knee_y = average_knee_height
   band_thickness = chair_height // 2
   analysis_region = mask[knee_y - band_thickness : knee_y + band_thickness, :]
   ```

2. **Edge Detection Strategy**:
   - Extract chair mask pixels within the analysis band
   - Find extreme X-coordinate based on facing direction
   - **Right-facing**: Rightmost chair pixel (seat front)
   - **Left-facing**: Leftmost chair pixel (seat front)

3. **Validation**:
   - Ensure sufficient chair pixels detected in analysis region
   - Cross-validate with chair bounding box constraints

### Scaling and Real-World Measurements

Now that I had the back of the knee and also the seat front. I could calculate the distance in pixels. But this needed to be converted to cms for our problem statemet 

**Reference-Based Scaling**:
```python
# Use eye-to-ear distance as anatomical constant
eye_to_ear_distance_px = euclidean_distance(eye_landmark, ear_landmark)
eye_to_ear_distance_cm = 7.0  # Average adult measurement

pixels_per_cm = eye_to_ear_distance_px / eye_to_ear_distance_cm
clearance_cm = clearance_pixels / pixels_per_cm
```

**Why Eye-to-Ear Distance?**
- **Anatomical Constant**: Relatively consistent across adults (6.5-7.5 cm)
- **Visibility**: Usually visible in side-profile images
- **Stability**: Less affected by posture compared to other facial measurements

### Facing Direction Detection
- Determines if person faces left or right in image
    
Method: Compare average X-coordinates of knees vs. eyes
- If knees are right of eyes: facing right
- If knees are left of eyes: facing left

This affects:
1. Which knee/eye/ear to use for measurements
2. Direction of anatomical offsets
3. Seat edge detection logic

---

## Challenges in Spacing Detection

### 1. Pose Detection Challenges

**Challenge**: Partial Occlusion
- **Problem**: Knees/hips may be obscured by desk, clothing, or shadows
- **Solution**: Visibility scoring and confidence thresholds
- **Mitigation**: Multi-landmark validation, graceful degradation

**Challenge**: Clothing Variations
- **Problem**: Baggy pants obscure actual knee position
- **Solution**: Anatomical offset based on skeletal landmarks rather than clothing contours
- **Limitation**: Still estimates through clothing, may introduce small errors

### 2. Chair Segmentation Challenges

**Challenge**: Complex Seat Materials
- **Problem**: Mesh, leather, fabric textures confuse edge detection
- **Solution**: SAM's semantic understanding handles material variations
- **Remaining Issue**: Highly reflective or transparent materials

**Challenge**: Partial Chair Visibility
- **Problem**: Desk, person's body may occlude seat edges
- **Solution**: Focus analysis on knee-level band where seat is most likely visible
- **Limitation**: Deep occlusion may cause detection failure

### 3. Scaling and Measurement Challenges

**Challenge**: Camera Perspective Distortion
- **Problem**: Non-perpendicular camera angles affect measurements
- **Solution**: Assume reasonable side-profile positioning
- **Limitation**: Extreme angles (>30Β°) may introduce errors

**Challenge**: Depth Perception in 2D Images
- **Problem**: Cannot measure true 3D distances
- **Solution**: Project measurements onto image plane
- **Assumption**: Person and chair are roughly in the same plane

### 4. Lighting and Image Quality

**Challenge**: Poor Lighting Conditions
- **Problem**: Shadows, backlighting affect landmark detection
- **Solution**: MediaPipe's robustness to lighting variations
- **Enhancement**: Preprocessing could include histogram equalization

---

##  Accuracy Improvement Suggestions

### Short-Term Improvements

1. **Enhanced Preprocessing**
    - Maybe can have improced contrast using certain methods like histogram equilization

2. **Multi-Reference Scaling**
   - Combine eye-to-ear with other facial measurements
   - Use hand/finger dimensions when visible
   - Cross-validate scaling factors

### Medium-Term Enhancements

1. **Custom Training Data**
   - Collect ergonomic seating dataset with ground truth measurements
   - Then we could actually fine-tune pose estimation on seated postures
   - And train a specialized chair segmentation model

2. **Multi-Frame Analysis**
   - Process video streams and have average measurements across multiple frames

3. **3D Pose Estimation**
   - Integrate depth estimation models
   - Calculate true 3D clearances

### Long-Term Research Directions

**Multi-Modal Sensing**
   - Combine computer vision with pressure sensors
   - Integrate with smart chair systems
   - Real-time posture monitoring

---


##  Development Process and Design Decisions

### Iterative Development Approach

1. **Phase 1: Core Detection**
   - Implemented basic pose detection
   - Added simple chair detection
   - Established measurement pipeline

2. **Phase 2: Accuracy Enhancement**
   - Integrated SAM for precise segmentation
   - Added anatomical offset calculations
   - Implemented multi-scale analysis

3. **Phase 3: User Experience**
   - Built Streamlit interface
   - Added visualization pipeline
   - Implemented sample image system

4. **Phase 4: Robustness**
   - Enhanced error handling
   - Added confidence scoring
   - Implemented comprehensive testing

### Key Design Decisions

**Decision 1: Multi-Model vs. Single Model**
- **Chosen**: Multi-model pipeline
- **Rationale**: Each model excels in its domain (pose, detection, segmentation)
- **Trade-off**: Complexity vs. accuracy

**Decision 2: Real-time vs. Batch Processing**
- **Chosen**: Single image analysis
- **Rationale**: Simplicity, easier deployment
- **Future**: Could extend to video streams

**Decision 3: Cloud vs. Local Processing**
- **Chosen**: Local processing capability
- **Rationale**: Privacy, offline usage
- **Deployment**: Supports both local and cloud deployment

### Assumptions and Limitations

**Key Assumptions**:
1. **Side Profile View**: Person is photographed from the side 
2. **Seated Posture**: Back is against or near chair backrest
3. **Standard Chair**: Conventional office chair design
4. **Adult Subjects**: Eye-to-ear scaling appropriate for adults
5. **Static Analysis**: Single-moment analysis, not dynamic posture

**Known Limitations**:
1. **2D Analysis**: Cannot account for chair/body rotation out of image plane
2. **Clothing Effects**: Thick clothing may obscure true body landmarks
3. **Lighting Dependency**: Very poor lighting may affect landmark detection
4. **Chair Variety**: Unusual chair designs may confuse detection
5. **Anthropometric Variation**: Fixed scaling may not suit all body types

---

##  Validation and Testing Strategy

### Test Coverage

1. **Unit Tests**: Individual component testing
2. **Integration Tests**: End-to-end pipeline validation
3. **Accuracy Tests**: Ground truth comparison on sample images
4. **Edge Case Tests**: Handling of failure conditions
5. **Performance Tests**: Processing time benchmarking

### Sample Dataset

- **Optimal Cases (3 samples)**: Clear examples of proper seating
- **Too Deep Cases (4 samples)**: Various levels of excessive depth
- **Too Short Cases (8 samples)**: Range of insufficient depth scenarios
---

### Technical References
1. **MediaPipe Pose**: [Google Research Paper](https://arxiv.org/abs/2006.10204)
2. **SAM (Segment Anything)**: [Meta AI Research](https://arxiv.org/abs/2304.02643)
3. **YOLOv8**: [Ultralytics Documentation](https://docs.ultralytics.com/)

### Dataset and Tools
- **Sample Images**: Custom collected and validated
- **Development Environment**: Python 3.9, PyTorch, OpenCV
- **Deployment Platform**: Streamlit Cloud

### Anthropometric Data Sources
- **Eye-to-Ear Measurements**: Reference paper : "An anthropometric study to evaluate the correlation between the occlusal vertical dimension and length of the thumb" - Clinical, Cosmetic and Investigational Dentistry