license: mit
tags:
- image-segmentation
- unet
- resnet
- computer-vision
- pytorch
library_name: transformers
datasets:
- antoreepjana/cv-image-segmentation
inference: false
? Model Card ? Segmentation
π§Ύ Overview
π‘ ResNet + U-Net fusion combines deep and contextual vision (ResNet) with spatial fidelity and precision in details (U-Net).
It is a versatile, powerful, and high-sensitivity architecture β ideal for projects where every pixel matters.
The model excels in scenarios where the object is small, detailed, or textured, and where the global scene context offers little help.
This makes it ideal for:
- Medical segmentation (e.g., tumors, vessels)
- Industrial defect inspection
- Embedded vision for robotics or precision tasks
β οΈ However, this specific version was trained on a narrow-domain dataset, captured under controlled indoor conditions: consistent lighting, high-contrast backgrounds, and fixed camera angles.
As a result, its ability to generalize to open-world scenarios (e.g., outdoor environments, variable backgrounds) is limited.
This is not a flaw in the model, but a natural reflection of the training data.
When retrained with more diverse and realistic datasets, this architecture is highly capable of delivering robust performance across a wide range of segmentation tasks.
β Behind the Scenes
This certification project was built one commit at a time β powered by curiosity, long debugging sessions, strategic doses of caffeine, and great support from Microsoft Copilot and ChatGPT (OpenAI), whose insights were essential in structuring the segmentation pipeline and planning its embedded future.
"Every time the model tries to segment, the square figure resurfaces. Not as an error, but as a reminder: deep learning can be quite shallow when the curse of imperfect geometry sets in.
And even when all the code is rewritten, the world is realigned, and optimism rises again⦠there she is: the misshapen quadratic figure.
Unfazed, unshakeable, perhaps even moved by her own stubbornness. She's not a bug β she's a character."
ποΈ Dataset
This model was trained using a subset of the CV Image Segmentation Dataset, available on Kaggle.
- Author: Antoreep Jana
- License: For educational and non-commercial use
- Content: 300+ annotated images for binary segmentation
- Preprocessing: All images resized to 512Γ512 and converted to grayscale
β οΈ Only a filtered and preprocessed subset (related to car images) was used for this version. The Dataset presents some distinct data subsets. I only used the images related to carvana cars (Kaggle Carvana Car Mask Segmentation). This was the dataset used to test the project ...
βοΈ Model Architecture
- Encoder: ResNet-50 (pretrained, adapted for 1-channel input)
- Decoder: U-Net with skip connections and bilinear upsampling
- Input: Grayscale, 512Γ512
- Output: Binary segmentation mask (background vs. object)
- Loss: Composite of
CrossEntropyLoss + DiceLoss
- Framework: PyTorch
π Evaluation Metrics
- Pixel Accuracy (train/val)
- Dice Coefficient
- CrossEntropy Loss
- Class-weighted loss balancing
- (IoU, MCC, Precision/Recall planned for future integration)
π§ͺ Evaluation performed using evaluate_model.py
β οΈ Limitations
This model achieves excellent results when tested on studio-like images: consistent lighting, neutral backgrounds, and static perspectives.
However, performance decreases on unseen outdoor scenarios (e.g., cars on the street, parking lots) β where background noise, lighting variation, and camera angle impact results.
β‘οΈ This limitation is dataset-induced, not architectural.
When trained on more realistic data, this model generalizes well due to its high sensitivity to texture and spatial structure.
π Intended Use
Best suited for applications where conditions are similar to the training set, such as:
- Quality control in automotive photography studios
- Automated documentation of vehicles in inspection booths
- Offline image processing for structured, grayscale datasets
π‘ Recommendations
To deploy in open-world environments (e.g., mobile robots, outdoor cameras), it is strongly recommended to retrain or fine-tune the model using a more heterogeneous dataset.
π¬ Planned Extensions
The following experimental modules are under active development and may be integrated in future releases:
1οΈβ£ Embedded Deployment Pipeline
- Export to ONNX format with float16 precision
- C++ reimplementation targeting edge devices such as ESP32-S3 and STM32H7
- Lightweight modular training script:
scripts/Segmentation/Future/train_embedded_explicit_model.py
Status: Experimental β not validated in this version
2οΈβ£ Automated Hyperparameter Optimization
- Training script that performs automatic hyperparameter search and tuning before final training
- Designed to improve efficiency and reduce manual configuration
- Script:
scripts/Segmentation/Future/cyber_train.py
Status: Experimental β not validated in this version
πͺͺ Licensing
- Code: MIT License
- Dataset: Attribution required (as per Kaggle contributor)