File size: 19,231 Bytes
c3c4ad4 7b615ae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
---
license: mit
tags:
- image-segmentation
- unet
- resnet
- computer-vision
- pytorch
library_name: transformers
datasets:
- antoreepjana/cv-image-segmentation
inference: false
---
# Image Segmentation with ResNet + U-Net
๐ก ResNet + U-NET fusion combines deep and contextual vision (ResNet) with spatial fidelity and accuracy in the details (U-NET).
It is a versatile, powerful and high sensitivity architecture - ideal for projects where each pixel matters.
The model shines in scenarios where the object is small, detailed or textured, and the global context (whole scene) does not help much.
This makes it ideal for:
- Medical segmentation (eg tumors, vessels)
- Industrial defect inspection
- Embedded vision for robotics or quality control
โ ๏ธ However, this current version was trained on a **narrow-domain dataset**, collected under controlled indoor conditions โ consistent lighting, high-contrast backgrounds, and fixed camera angles. As a result, its ability to generalize to open-world scenarios (e.g., outdoor images, different backgrounds) is limited.
**This is not a flaw of the model**, but a **natural reflection of its training data**. When retrained with more diverse and realistic datasets, this architecture has strong potential for robust performance in general-purpose segmentation tasks.
## ๐ Class Convention
This project follows the standard:
- Class 0: Background
- Class 1: Segmented Object
All masks were converted to reflect this convention before training.
## ๐ Limitations and Considerations
This model was trained with images captured in a highly controlled environment: constant lighting, a clean background, and objects (cars) positioned on a rotating platform.
As a result, it achieves very high accuracy (IoU > 99%) when evaluated on images similar to those in the original dataset. However, its performance deteriorates significantly when exposed to images collected outdoors, with variations in light, angle, background, and perspective.
This limitation was expected and will be taken into account for future versions with more diverse datasets.
Good Image..
 "good_image.png: Segmentation under ideal studio lighting"
Bad Image..
 "Failure example with open-world street background"
## ๐ Objective
To segment objects in custom grayscale images based on manual annotations, using a complete training pipeline, automated inference, and visual mask validation.
## ๐ค Notes on Development
This project was born after many hours of experimentation, learning and progress driven by caffeine.
Unlike other projects I have participated in before, this one evolved incredibly quickly thanks to the support of artificial intelligence such as Copilot (Microsoft) and ChatGPT (OpenAI). Without a doubt, these are tools that are way ahead of their time.
As part of the experience of using and learning from these advanced AI tools, I always threw problems at both of them, to measure their performance and compare their responses. And to make the experience more fun, I kept an extremely formal dialogue with one and not at all formal with the other to see how they would react. And after a while, I reversed it, now being informal with the one that was previously formal and vice versa.
Big thanks to both copilots โ one named Microsoft, the other simply GPT.
- Powered by: PyTorch, Gradio, OpenCV, Matplotlib, and Hugging Face Datasets
## ๐ Project Structure
.
โโโ run_app.py
โโโ bad_image.png
โโโ CHANGELOG.md
โโโ checkpoints
โย ย โโโ best_model.pt
โย ย โโโ modelo_completo.pth
โโโ DataSet
โย ย โโโ annotations
โย ย โย ย โโโ classes.txt
โย ย โโโ ExtraTests
โย ย โโโ images
โย ย โโโ masks
โโโ dice_history.png
โโโ run_evaluate.py
โโโ good_image.png
โโโ __init__.py
โโโ iou_history.png
โโโ LICENSE
โโโ model_card.md
โโโ .huggingface
โย ย โโโ model-index.yaml
โโโ README.md
โโโ report_file.txt
โโโ requirements.txt
โโโ scripts
โย ย โโโ config.py
โย ย โโโ Dataset
โย ย โย ย โโโ ConvertFormat.py
โย ย โย ย โโโ dataAugmentation.py
โย ย โย ย โโโ deleteDuplicates.py
โย ย โย ย โโโ getDS_HuggingFace.py
โย ย โย ย โโโ getImages.py
โย ย โย ย โโโ grays.py
โย ย โย ย โโโ __init__.py
โย ย โย ย โโโ mask_diagnosis.py
โย ย โย ย โโโ masks.py
โย ย โย ย โโโ Rename.py
โย ย โย ย โโโ Resize.py
โย ย โย ย โโโ TrainVal.py
โย ย โย ย โโโ validMasks.py
โย ย โโโ __init__.py
โย ย โโโ Segmentation
โย ย โโโ app.py
โย ย โโโ augment.py
โย ย โโโ diceLossCriterion.py
โย ย โโโ evaluate_model.py
โย ย โโโ flagged
โย ย โโโ focalLoss.py
โย ย โโโ Future
โย ย โโโ __init__.py
โย ย โโโ models.py
โย ย โโโ segDS.py
โย ย โโโ train.py
โโโ structure.txt
โโโ training_loss.png
โโโ training_val_accuracy.png
### ๐ Root Directory
| Name | Description |
|--------------------------|-----------------------------------------------------------------------------|
| `run_app.py` | Launcher script โ possibly for local inference or interface |
| `bad_image.png` | Example of a failed prediction (for benchmarking or documentation) |
| `good_image.png` | Example of a successful prediction (used for showcasing model quality) |
| `CHANGELOG.md` | History of changes and version updates |
| `checkpoints/` | Contains trained model files (`best_model.pt`, `modelo_completo.pth`) |
| `DataSet/` | Contains training images, masks, annotations, and extra test sets |
| `dice_history.png` | Visualization of Dice score progression during training |
| `iou_history.png` | Graph of Intersection over Union (IoU) evolution across epochs |
| `training_loss.png` | Plot showing model loss evolution throughout training |
| `training_val_accuracy.png` | Graph of validation accuracy during model training |
| `run_evaluate.py` | Evaluation script runnable from root โ assesses model performance |
| `__init__.py` | Declares root as a Python package (if imported externally) |
| `LICENSE` | Legal terms for usage and redistribution |
| `model_card.md` | Technical summary of model details, performance, and intended use |
| `.huggingface/model-index.yaml` | Configuration file for Hugging Face model registry (optional export) |
| `README.md` | Main documentation file โ project overview, usage, and setup guide |
| `report_file.txt` | Training log and report output saved during execution |
| `requirements.txt` | List of dependencies needed for running the project |
| `scripts/` | Main logic for training, evaluation, dataset preparation, and modeling |
| `structure.txt` | Manual export of the folder structure, used as reference or debug aid |
### ๐ DataSet/
| Name | Description |
|-------------------|---------------------------------------------------------------------------------|
| `annotations/` | Contains `classes.txt`, defining class labels used in segmentation |
| `images/` | Input images used for training and evaluation |
| `masks/` | Segmentation masks aligned with input images |
| `ExtraTests/` | Optional dataset with additional test cases for generalization assessment |
### ๐ scripts/
| Name | Description |
|----------------------|-------------------------------------------------------------------------------|
| `config.py` | Configuration module holding paths, flags, and hyperparameters |
| `__init__.py` | Declares `scripts/` as an importable Python module |
### ๐ scripts/Dataset/
| Name | Description |
|------------------------|-----------------------------------------------------------------------------|
| `ConvertFormat.py` | Converts image or annotation formats (e.g. from JPG to PNG, or COCO to mask)|
| `dataAugmentation.py` | Applies offline augmentations to images or masks |
| `deleteDuplicates.py` | Detects and removes duplicate samples |
| `getDS_HuggingFace.py` | Downloads datasets from Hugging Face ๐ค |
| `getImages.py` | Image retrieval or organization from storage |
| `grays.py` | Converts images to grayscale |
| `mask_diagnosis.py` | Validates and diagnoses potential issues in masks |
| `masks.py` | Performs manipulation or binarization of segmentation masks |
| `Rename.py` | Batch renaming utility to standardize filenames |
| `Resize.py` | Resizes images and masks to uniform dimensions |
| `TrainVal.py` | Performs dataset train/validation splitting |
| `validMasks.py` | Checks for validity in mask formatting and values |
| `__init__.py` | Declares `Dataset/` as a Python package |
### ๐ scripts/Segmentation/
| Name | Description |
|------------------------|-----------------------------------------------------------------------------|
| `app.py` | Local interface for model inference โ CLI or GUI |
| `augment.py` | Online augmentations and Test-Time Augmentation (TTA) |
| `diceLossCriterion.py` | Custom Dice Loss implementation for segmentation |
| `focalLoss.py` | Custom Focal Loss implementation to handle class imbalance |
| `evaluate_model.py` | Model evaluator with metrics like IoU, Dice, and pixel accuracy |
| `models.py` | Contains neural network architecture (e.g. UNet based on ResNet) |
| `segDS.py` | Dataset class for segmentation tasks, loading images and masks |
| `train.py` | Main training script with logging, plotting, checkpointing, and early stop |
| `Future/` | Experimental code including auto hyperparameter tuning |
| `flagged/` | Optional output folder for flagged evaluations or debug samples |
| `__init__.py` | Declares `Segmentation/` as a Python package |
## Dataset
This project uses data from the [CV Image Segmentation Dataset](https://www.kaggle.com/datasets/antoreepjana/cv-image-segmentation), which provides paired images and masks for semantic segmentation tasks.
The Dataset presents some distinct data subsets.
I only used the images related to carvana cars (Kaggle Carvana Car Mask Segmentation). This was the dataset used to test the project ...
The data subset used as the dataset for the project was pre-processed with the following order of scripts present in this project:
1 - Run getImages.py #Or use other data sources.
2 - Visually inspect the collected images.
3 - Run deleteDuplicates.py
4 - Run ConvertFormat.py
5 - Run Resize.py (Must be run for both the image and mask directories).
6 - Run grays.py (Must be run for both the image and mask directories).
8 - Make annotations.
9 - Run masks.py
10 - Run validMasks.py
11 - Run TrainVal.py
---
## โ๏ธ Model
* Architecture: ResNet encoder + U-Net decoder
* Input: 1-channel grayscale, resized to 512ร512
* Loss: Cross Entropy Loss with class weighting
* Optimizer: Adam
* Scheduler: StepLR with decay
* Training duration: configurable (default: 400 epochs)
* Early Stopping: based on accuracy stagnation
* Checkpoints: saved every N epochs + best model saved
Training script: `scripts/Segmentation/train.py`
Evaluation scripts:
* `scripts/Segmentation/evaluate_model.py`: Batch evaluation over image folders
* `scripts/Segmentation/app.py`: Gradio demo for interactive inference
* `run_app.py`: Wrapper script to launch the Gradio interface from the root directory (calls scripts/Segmentation/app.py internally)
* `run_evaluate.py`: wrapper script to launch the general pre-testing script from the root directory (calls scripts/Segmentation/evaluate_model.py internally)
๐ The model is documented and registered via model-index.yaml for proper listing on Hugging Face Hub.
---
## ๐ Evaluation
Quantitative metrics include:
* Intersection over Union (IoU)
* Dice coefficient
* Accuracy, Precision, Recall
* Balanced Accuracy and MCC
Visual inspection is supported via overlay masks in the ExtraTests/ folder.




---
## ๐ฌ Future Work
The directory `scripts/Segmentation/Future/` includes planned extensions for embedded deployment:
* `train_embedded_explicit_model.py`: A simplified and modular training script for generating lightweight ONNX models.
Note: This script was not executed or validated during this certification phase.
---
## ๐ Deployment Options
This project includes two scripts for model evaluation:
### ๐งช Batch Evaluation Script (`evaluate_model.py`)
Use this script to run the model on an entire directory of test images. Ideal for debugging, validation, and quantitative analysis.
```bash
python evaluate_model.py --input ./your-test-images/
```
You can modify this script to save prediction masks, compute metrics (IoU, pixel accuracy), or visualize results in batch.
---
### ๐ Interactive Web Demo (`app.py`)
This script provides an interactive interface using [Gradio](https://www.gradio.app/). It's designed for easy deployment and model demonstration, such as on Hugging Face Spaces.
To launch the web app locally:
```bash
python app.py
```
Or try it online (if hosted):
๐ [Live demo on Hugging Face Spaces](https://huggingface.co/spaces/seu-usuario/seu-modelo) *TODO:(link serรก atualizado apรณs submissรฃo)*
This interface allows anyone to upload an image and instantly see the segmentation results โ no installation required.
---
๐ **Tip**: Use `evaluate_model.py` during development and testing, and `app.py` for sharing and showcasing your model.
---
## ๐ Certification Context
This repository was submitted for the Hugging Face Computer Vision Certification and is built upon reproducibility, modularity, dataset transparency, and technical rigor.
---
## ๐ License
This project is licensed under the MIT License.
Dataset usage must comply with the original Kaggle dataset license terms.
---
## ๐ฎ Future improvements
Some steps are already planned for the project's evolution:
* Architecture refinement: test lighter variants (e.g. ResNet18, MobileNetV3) to compare performance in embedded environments.
* Training with data augmentation: use Data Augmentation strategies (rotation, noise, scale, brightness) to increase model robustness.
* Cross-validation: include a cross-validation strategy to increase confidence in metrics.
* Conversion to ONNX/TensorRT: prepare an exportable version of the model for inference on edge devices.
* Deployment on specific hardware: test inference on ESP32-S3 or Raspberry Pi using a simplified pipeline with float16.
* Visualization interface: create a simple script or panel that allows you to upload an image and view the segmentation live.
These improvements will be implemented as the project progresses, keeping the focus on lightness, modularity, and real applicability in computer vision with monochromatic images.
---
## ๐ Final thoughts: why this certification matters
This project represents more than just completing a technical challenge. For me, it is the fulfillment of a long-held dream โ to earn a professional certification that values knowledge, practice, and the ability to solve real-world problems, rather than just familiarity with specific versions of tools or frameworks.
For many years, I experienced the frustrating side of commercial certifications that felt more like traps than opportunities: exams based on obsolete technologies, questionable application centers, and mechanisms that created more obstacles than recognition. That never represented who I am โ or what I am capable of building.
This certification, promoted by Hugging Face, is different. It validates true competencies in machine learning and computer vision based on a real-world project, executed end-to-end. It is a type of recognition that carries technical, ethical, and personal value.
That is why it is not โjust another delivery.โ It is a turning point.
---
## ๐ Important notesโฆ
1) The IDE used in the project was Eclipse (https://eclipseide.org/) using the PyDev module (https://www.pydev.org/). In this environment it was necessary to include the project path in PyDev-PYTHONPATH to perfectly recognize the includes of some files, as was the case with config.py.
2) The model is being trained with the "train.py" script.
However, there is a second training script called "cyber_train.py."
This is an empirical test I'm conducting. A little research of my own.
In "train," the hyperparameters are chosen manually.
In "cyber_train," the script will run 25 short training sessions, each lasting 5 epochs, to test the hyperparameters within the established limits and determine the best ones. Then, the actual training will be performed using the best hyperparameters detected.
And where does my empirical research come in?
I'm training first with the simplest version of the script, measuring how long it takes me to arrive at a model with a good accuracy percentage.
Once this is done, I'll run the automated version...
Then, I'll compare which of the two models performed better and how long it took me to achieve each one...
This will serve as a reference for more accurate trade-offs in future projects.
|