|
--- |
|
license: mit |
|
tags: |
|
- image-segmentation |
|
- unet |
|
- resnet |
|
- computer-vision |
|
- pytorch |
|
library_name: transformers |
|
datasets: |
|
- antoreepjana/cv-image-segmentation |
|
inference: false |
|
--- |
|
|
|
# ? Model Card ? Segmentation |
|
|
|
|
|
## 🧾 Overview |
|
|
|
💡 **ResNet + U-Net fusion** combines deep and contextual vision (ResNet) with spatial fidelity and precision in details (U-Net). |
|
It is a versatile, powerful, and high-sensitivity architecture — ideal for projects where **every pixel matters**. |
|
|
|
The model excels in scenarios where the object is **small, detailed, or textured**, and where the **global scene context offers little help**. |
|
|
|
This makes it ideal for: |
|
- Medical segmentation (e.g., tumors, vessels) |
|
- Industrial defect inspection |
|
- Embedded vision for robotics or precision tasks |
|
|
|
⚠️ However, this specific version was trained on a **narrow-domain dataset**, captured under **controlled indoor conditions**: consistent lighting, high-contrast backgrounds, and fixed camera angles. |
|
As a result, its ability to generalize to open-world scenarios (e.g., outdoor environments, variable backgrounds) is limited. |
|
|
|
**This is not a flaw in the model**, but a **natural reflection of the training data**. |
|
When retrained with more diverse and realistic datasets, this architecture is highly capable of delivering robust performance across a wide range of segmentation tasks. |
|
|
|
--- |
|
|
|
## ☕ Behind the Scenes |
|
|
|
This certification project was built one commit at a time — powered by curiosity, long debugging sessions, strategic doses of caffeine, and great support from **Microsoft Copilot** and **ChatGPT (OpenAI)**, whose insights were essential in structuring the segmentation pipeline and planning its embedded future. |
|
|
|
> "Every time the model tries to segment, the square figure resurfaces. Not as an error, but as a reminder: deep learning can be quite shallow when the curse of imperfect geometry sets in. |
|
> And even when all the code is rewritten, the world is realigned, and optimism rises again… there she is: the misshapen quadratic figure. |
|
> Unfazed, unshakeable, perhaps even moved by her own stubbornness. She's not a bug — she's a character." |
|
|
|
--- |
|
|
|
## 🗂️ Dataset |
|
|
|
This model was trained using a subset of the [CV Image Segmentation Dataset](https://www.kaggle.com/datasets/antoreepjana/cv-image-segmentation), available on Kaggle. |
|
|
|
- **Author**: Antoreep Jana |
|
- **License**: For educational and non-commercial use |
|
- **Content**: 300+ annotated images for binary segmentation |
|
- **Preprocessing**: All images resized to 512×512 and converted to grayscale |
|
|
|
⚠️ *Only a filtered and preprocessed subset (related to car images) was used for this version.* |
|
The Dataset presents some distinct data subsets. |
|
I only used the images related to carvana cars (Kaggle Carvana Car Mask Segmentation). This was the dataset used to test the project ... |
|
|
|
--- |
|
|
|
## ⚙️ Model Architecture |
|
|
|
- **Encoder**: ResNet-50 (pretrained, adapted for 1-channel input) |
|
- **Decoder**: U-Net with skip connections and bilinear upsampling |
|
- **Input**: Grayscale, 512×512 |
|
- **Output**: Binary segmentation mask (background vs. object) |
|
- **Loss**: Composite of `CrossEntropyLoss + DiceLoss` |
|
- **Framework**: PyTorch |
|
|
|
--- |
|
|
|
## 📊 Evaluation Metrics |
|
|
|
- Pixel Accuracy (train/val) |
|
- Dice Coefficient |
|
- CrossEntropy Loss |
|
- Class-weighted loss balancing |
|
- *(IoU, MCC, Precision/Recall planned for future integration)* |
|
|
|
🧪 Evaluation performed using `evaluate_model.py` |
|
|
|
--- |
|
|
|
## ⚠️ Limitations |
|
|
|
This model achieves excellent results when tested on **studio-like images**: consistent lighting, neutral backgrounds, and static perspectives. |
|
|
|
However, performance decreases on **unseen outdoor scenarios** (e.g., cars on the street, parking lots) — where background noise, lighting variation, and camera angle impact results. |
|
|
|
➡️ This **limitation is dataset-induced**, not architectural. |
|
When trained on more realistic data, this model generalizes well due to its high sensitivity to texture and spatial structure. |
|
|
|
--- |
|
|
|
## 🚀 Intended Use |
|
|
|
Best suited for applications where conditions are similar to the training set, such as: |
|
|
|
- Quality control in automotive photography studios |
|
- Automated documentation of vehicles in inspection booths |
|
- Offline image processing for structured, grayscale datasets |
|
|
|
--- |
|
|
|
## 💡 Recommendations |
|
|
|
To deploy in open-world environments (e.g., mobile robots, outdoor cameras), it is strongly recommended to **retrain or fine-tune** the model using a **more heterogeneous dataset**. |
|
|
|
--- |
|
|
|
## 🔬 Planned Extensions |
|
|
|
The following experimental modules are under active development and may be integrated in future releases: |
|
|
|
1️⃣ **Embedded Deployment Pipeline** |
|
- Export to ONNX format with float16 precision |
|
- C++ reimplementation targeting edge devices such as ESP32-S3 and STM32H7 |
|
- Lightweight modular training script: |
|
`scripts/Segmentation/Future/train_embedded_explicit_model.py` |
|
*Status: Experimental – not validated in this version* |
|
|
|
2️⃣ **Automated Hyperparameter Optimization** |
|
- Training script that performs automatic hyperparameter search and tuning before final training |
|
- Designed to improve efficiency and reduce manual configuration |
|
- Script: |
|
`scripts/Segmentation/Future/cyber_train.py` |
|
*Status: Experimental – not validated in this version* |
|
|
|
|
|
--- |
|
|
|
## 🪪 Licensing |
|
|
|
- **Code**: MIT License |
|
- **Dataset**: Attribution required (as per Kaggle contributor) |
|
|