resnet-unet-segmentation / model_card.md

Update README and Model Card with YAML metadata

c3c4ad4 about 1 month ago

5.46 kB

	---
	license: mit
	tags:
	- image-segmentation
	- unet
	- resnet
	- computer-vision
	- pytorch
	library_name: transformers
	datasets:
	- antoreepjana/cv-image-segmentation
	inference: false
	---

	# ? Model Card ? Segmentation


	## 🧾 Overview

	💡 ResNet + U-Net fusion combines deep and contextual vision (ResNet) with spatial fidelity and precision in details (U-Net).
	It is a versatile, powerful, and high-sensitivity architecture — ideal for projects where every pixel matters.

	The model excels in scenarios where the object is small, detailed, or textured, and where the global scene context offers little help.

	This makes it ideal for:
	- Medical segmentation (e.g., tumors, vessels)
	- Industrial defect inspection
	- Embedded vision for robotics or precision tasks

	⚠️ However, this specific version was trained on a narrow-domain dataset, captured under controlled indoor conditions: consistent lighting, high-contrast backgrounds, and fixed camera angles.
	As a result, its ability to generalize to open-world scenarios (e.g., outdoor environments, variable backgrounds) is limited.

	This is not a flaw in the model, but a natural reflection of the training data.
	When retrained with more diverse and realistic datasets, this architecture is highly capable of delivering robust performance across a wide range of segmentation tasks.

	---

	## ☕ Behind the Scenes

	This certification project was built one commit at a time — powered by curiosity, long debugging sessions, strategic doses of caffeine, and great support from Microsoft Copilot and ChatGPT (OpenAI), whose insights were essential in structuring the segmentation pipeline and planning its embedded future.

	> "Every time the model tries to segment, the square figure resurfaces. Not as an error, but as a reminder: deep learning can be quite shallow when the curse of imperfect geometry sets in.
	> And even when all the code is rewritten, the world is realigned, and optimism rises again… there she is: the misshapen quadratic figure.
	> Unfazed, unshakeable, perhaps even moved by her own stubbornness. She's not a bug — she's a character."

	---

	## 🗂️ Dataset

	This model was trained using a subset of the [CV Image Segmentation Dataset](https://www.kaggle.com/datasets/antoreepjana/cv-image-segmentation), available on Kaggle.

	- Author: Antoreep Jana
	- License: For educational and non-commercial use
	- Content: 300+ annotated images for binary segmentation
	- Preprocessing: All images resized to 512×512 and converted to grayscale

	⚠️ Only a filtered and preprocessed subset (related to car images) was used for this version.
	The Dataset presents some distinct data subsets.
	I only used the images related to carvana cars (Kaggle Carvana Car Mask Segmentation). This was the dataset used to test the project ...

	---

	## ⚙️ Model Architecture

	- Encoder: ResNet-50 (pretrained, adapted for 1-channel input)
	- Decoder: U-Net with skip connections and bilinear upsampling
	- Input: Grayscale, 512×512
	- Output: Binary segmentation mask (background vs. object)
	- Loss: Composite of `CrossEntropyLoss + DiceLoss`
	- Framework: PyTorch

	---

	## 📊 Evaluation Metrics

	- Pixel Accuracy (train/val)
	- Dice Coefficient
	- CrossEntropy Loss
	- Class-weighted loss balancing
	- (IoU, MCC, Precision/Recall planned for future integration)

	🧪 Evaluation performed using `evaluate_model.py`

	---

	## ⚠️ Limitations

	This model achieves excellent results when tested on studio-like images: consistent lighting, neutral backgrounds, and static perspectives.

	However, performance decreases on unseen outdoor scenarios (e.g., cars on the street, parking lots) — where background noise, lighting variation, and camera angle impact results.

	➡️ This limitation is dataset-induced, not architectural.
	When trained on more realistic data, this model generalizes well due to its high sensitivity to texture and spatial structure.

	---

	## 🚀 Intended Use

	Best suited for applications where conditions are similar to the training set, such as:

	- Quality control in automotive photography studios
	- Automated documentation of vehicles in inspection booths
	- Offline image processing for structured, grayscale datasets

	---

	## 💡 Recommendations

	To deploy in open-world environments (e.g., mobile robots, outdoor cameras), it is strongly recommended to retrain or fine-tune the model using a more heterogeneous dataset.

	---

	## 🔬 Planned Extensions

	The following experimental modules are under active development and may be integrated in future releases:

	1️⃣ Embedded Deployment Pipeline
	- Export to ONNX format with float16 precision
	- C++ reimplementation targeting edge devices such as ESP32-S3 and STM32H7
	- Lightweight modular training script:
	`scripts/Segmentation/Future/train_embedded_explicit_model.py`
	Status: Experimental – not validated in this version

	2️⃣ Automated Hyperparameter Optimization
	- Training script that performs automatic hyperparameter search and tuning before final training
	- Designed to improve efficiency and reduce manual configuration
	- Script:
	`scripts/Segmentation/Future/cyber_train.py`
	Status: Experimental – not validated in this version


	---

	## 🪪 Licensing

	- Code: MIT License
	- Dataset: Attribution required (as per Kaggle contributor)