File size: 5,463 Bytes
c3c4ad4 7b615ae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
license: mit
tags:
- image-segmentation
- unet
- resnet
- computer-vision
- pytorch
library_name: transformers
datasets:
- antoreepjana/cv-image-segmentation
inference: false
---
# ? Model Card ? Segmentation
## 🧾 Overview
💡 **ResNet + U-Net fusion** combines deep and contextual vision (ResNet) with spatial fidelity and precision in details (U-Net).
It is a versatile, powerful, and high-sensitivity architecture — ideal for projects where **every pixel matters**.
The model excels in scenarios where the object is **small, detailed, or textured**, and where the **global scene context offers little help**.
This makes it ideal for:
- Medical segmentation (e.g., tumors, vessels)
- Industrial defect inspection
- Embedded vision for robotics or precision tasks
⚠️ However, this specific version was trained on a **narrow-domain dataset**, captured under **controlled indoor conditions**: consistent lighting, high-contrast backgrounds, and fixed camera angles.
As a result, its ability to generalize to open-world scenarios (e.g., outdoor environments, variable backgrounds) is limited.
**This is not a flaw in the model**, but a **natural reflection of the training data**.
When retrained with more diverse and realistic datasets, this architecture is highly capable of delivering robust performance across a wide range of segmentation tasks.
---
## ☕ Behind the Scenes
This certification project was built one commit at a time — powered by curiosity, long debugging sessions, strategic doses of caffeine, and great support from **Microsoft Copilot** and **ChatGPT (OpenAI)**, whose insights were essential in structuring the segmentation pipeline and planning its embedded future.
> "Every time the model tries to segment, the square figure resurfaces. Not as an error, but as a reminder: deep learning can be quite shallow when the curse of imperfect geometry sets in.
> And even when all the code is rewritten, the world is realigned, and optimism rises again… there she is: the misshapen quadratic figure.
> Unfazed, unshakeable, perhaps even moved by her own stubbornness. She's not a bug — she's a character."
---
## 🗂️ Dataset
This model was trained using a subset of the [CV Image Segmentation Dataset](https://www.kaggle.com/datasets/antoreepjana/cv-image-segmentation), available on Kaggle.
- **Author**: Antoreep Jana
- **License**: For educational and non-commercial use
- **Content**: 300+ annotated images for binary segmentation
- **Preprocessing**: All images resized to 512×512 and converted to grayscale
⚠️ *Only a filtered and preprocessed subset (related to car images) was used for this version.*
The Dataset presents some distinct data subsets.
I only used the images related to carvana cars (Kaggle Carvana Car Mask Segmentation). This was the dataset used to test the project ...
---
## ⚙️ Model Architecture
- **Encoder**: ResNet-50 (pretrained, adapted for 1-channel input)
- **Decoder**: U-Net with skip connections and bilinear upsampling
- **Input**: Grayscale, 512×512
- **Output**: Binary segmentation mask (background vs. object)
- **Loss**: Composite of `CrossEntropyLoss + DiceLoss`
- **Framework**: PyTorch
---
## 📊 Evaluation Metrics
- Pixel Accuracy (train/val)
- Dice Coefficient
- CrossEntropy Loss
- Class-weighted loss balancing
- *(IoU, MCC, Precision/Recall planned for future integration)*
🧪 Evaluation performed using `evaluate_model.py`
---
## ⚠️ Limitations
This model achieves excellent results when tested on **studio-like images**: consistent lighting, neutral backgrounds, and static perspectives.
However, performance decreases on **unseen outdoor scenarios** (e.g., cars on the street, parking lots) — where background noise, lighting variation, and camera angle impact results.
➡️ This **limitation is dataset-induced**, not architectural.
When trained on more realistic data, this model generalizes well due to its high sensitivity to texture and spatial structure.
---
## 🚀 Intended Use
Best suited for applications where conditions are similar to the training set, such as:
- Quality control in automotive photography studios
- Automated documentation of vehicles in inspection booths
- Offline image processing for structured, grayscale datasets
---
## 💡 Recommendations
To deploy in open-world environments (e.g., mobile robots, outdoor cameras), it is strongly recommended to **retrain or fine-tune** the model using a **more heterogeneous dataset**.
---
## 🔬 Planned Extensions
The following experimental modules are under active development and may be integrated in future releases:
1️⃣ **Embedded Deployment Pipeline**
- Export to ONNX format with float16 precision
- C++ reimplementation targeting edge devices such as ESP32-S3 and STM32H7
- Lightweight modular training script:
`scripts/Segmentation/Future/train_embedded_explicit_model.py`
*Status: Experimental – not validated in this version*
2️⃣ **Automated Hyperparameter Optimization**
- Training script that performs automatic hyperparameter search and tuning before final training
- Designed to improve efficiency and reduce manual configuration
- Script:
`scripts/Segmentation/Future/cyber_train.py`
*Status: Experimental – not validated in this version*
---
## 🪪 Licensing
- **Code**: MIT License
- **Dataset**: Attribution required (as per Kaggle contributor)
|