File size: 5,463 Bytes
c3c4ad4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7b615ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: mit
tags:
  - image-segmentation
  - unet
  - resnet
  - computer-vision
  - pytorch
library_name: transformers
datasets:
  - antoreepjana/cv-image-segmentation
inference: false
---

# ? Model Card ? Segmentation


## 🧾 Overview

💡 **ResNet + U-Net fusion** combines deep and contextual vision (ResNet) with spatial fidelity and precision in details (U-Net).  
It is a versatile, powerful, and high-sensitivity architecture — ideal for projects where **every pixel matters**.

The model excels in scenarios where the object is **small, detailed, or textured**, and where the **global scene context offers little help**.

This makes it ideal for:
- Medical segmentation (e.g., tumors, vessels)
- Industrial defect inspection
- Embedded vision for robotics or precision tasks

⚠️ However, this specific version was trained on a **narrow-domain dataset**, captured under **controlled indoor conditions**: consistent lighting, high-contrast backgrounds, and fixed camera angles.  
As a result, its ability to generalize to open-world scenarios (e.g., outdoor environments, variable backgrounds) is limited.  

**This is not a flaw in the model**, but a **natural reflection of the training data**.  
When retrained with more diverse and realistic datasets, this architecture is highly capable of delivering robust performance across a wide range of segmentation tasks.

---

## ☕ Behind the Scenes

This certification project was built one commit at a time — powered by curiosity, long debugging sessions, strategic doses of caffeine, and great support from **Microsoft Copilot** and **ChatGPT (OpenAI)**, whose insights were essential in structuring the segmentation pipeline and planning its embedded future.

> "Every time the model tries to segment, the square figure resurfaces. Not as an error, but as a reminder: deep learning can be quite shallow when the curse of imperfect geometry sets in.  
> And even when all the code is rewritten, the world is realigned, and optimism rises again… there she is: the misshapen quadratic figure.  
> Unfazed, unshakeable, perhaps even moved by her own stubbornness. She's not a bug — she's a character."

---

## 🗂️ Dataset

This model was trained using a subset of the [CV Image Segmentation Dataset](https://www.kaggle.com/datasets/antoreepjana/cv-image-segmentation), available on Kaggle.

- **Author**: Antoreep Jana  
- **License**: For educational and non-commercial use  
- **Content**: 300+ annotated images for binary segmentation  
- **Preprocessing**: All images resized to 512×512 and converted to grayscale

⚠️ *Only a filtered and preprocessed subset (related to car images) was used for this version.*
The Dataset presents some distinct data subsets.
I only used the images related to carvana cars (Kaggle Carvana Car Mask Segmentation). This was the dataset used to test the project ...

---

## ⚙️ Model Architecture

- **Encoder**: ResNet-50 (pretrained, adapted for 1-channel input)
- **Decoder**: U-Net with skip connections and bilinear upsampling
- **Input**: Grayscale, 512×512
- **Output**: Binary segmentation mask (background vs. object)
- **Loss**: Composite of `CrossEntropyLoss + DiceLoss`
- **Framework**: PyTorch

---

## 📊 Evaluation Metrics

- Pixel Accuracy (train/val)
- Dice Coefficient
- CrossEntropy Loss
- Class-weighted loss balancing
- *(IoU, MCC, Precision/Recall planned for future integration)*

🧪 Evaluation performed using `evaluate_model.py`

---

## ⚠️ Limitations

This model achieves excellent results when tested on **studio-like images**: consistent lighting, neutral backgrounds, and static perspectives.

However, performance decreases on **unseen outdoor scenarios** (e.g., cars on the street, parking lots) — where background noise, lighting variation, and camera angle impact results.

➡️ This **limitation is dataset-induced**, not architectural.  
When trained on more realistic data, this model generalizes well due to its high sensitivity to texture and spatial structure.

---

## 🚀 Intended Use

Best suited for applications where conditions are similar to the training set, such as:

- Quality control in automotive photography studios
- Automated documentation of vehicles in inspection booths
- Offline image processing for structured, grayscale datasets

---

## 💡 Recommendations

To deploy in open-world environments (e.g., mobile robots, outdoor cameras), it is strongly recommended to **retrain or fine-tune** the model using a **more heterogeneous dataset**.

---

## 🔬 Planned Extensions

The following experimental modules are under active development and may be integrated in future releases:

1️⃣ **Embedded Deployment Pipeline**
- Export to ONNX format with float16 precision
- C++ reimplementation targeting edge devices such as ESP32-S3 and STM32H7
- Lightweight modular training script:  
  `scripts/Segmentation/Future/train_embedded_explicit_model.py`  
  *Status: Experimental – not validated in this version*

2️⃣ **Automated Hyperparameter Optimization**
- Training script that performs automatic hyperparameter search and tuning before final training
- Designed to improve efficiency and reduce manual configuration
- Script:  
  `scripts/Segmentation/Future/cyber_train.py`  
  *Status: Experimental – not validated in this version*


---

## 🪪 Licensing

- **Code**: MIT License  
- **Dataset**: Attribution required (as per Kaggle contributor)