Image Segmentation with ResNet + U-Net

๐Ÿ’ก ResNet + U-NET fusion combines deep and contextual vision (ResNet) with spatial fidelity and accuracy in the details (U-NET). It is a versatile, powerful and high sensitivity architecture - ideal for projects where each pixel matters. The model shines in scenarios where the object is small, detailed or textured, and the global context (whole scene) does not help much. This makes it ideal for: - Medical segmentation (eg tumors, vessels) - Industrial defect inspection - Embedded vision for robotics or quality control โš ๏ธ However, this current version was trained on a narrow-domain dataset, collected under controlled indoor conditions โ€” consistent lighting, high-contrast backgrounds, and fixed camera angles. As a result, its ability to generalize to open-world scenarios (e.g., outdoor images, different backgrounds) is limited.
This is not a flaw of the model, but a natural reflection of its training data. When retrained with more diverse and realistic datasets, this architecture has strong potential for robust performance in general-purpose segmentation tasks.

๐Ÿ“Œ Class Convention

This project follows the standard:

  • Class 0: Background
  • Class 1: Segmented Object

All masks were converted to reflect this convention before training.

๐ŸŒ Limitations and Considerations

This model was trained with images captured in a highly controlled environment: constant lighting, a clean background, and objects (cars) positioned on a rotating platform.

As a result, it achieves very high accuracy (IoU > 99%) when evaluated on images similar to those in the original dataset. However, its performance deteriorates significantly when exposed to images collected outdoors, with variations in light, angle, background, and perspective.

This limitation was expected and will be taken into account for future versions with more diverse datasets.

Good Image.. training accuracy "good_image.png: Segmentation under ideal studio lighting"

Bad Image.. training accuracy "Failure example with open-world street background"

๐ŸŒŸ Objective

To segment objects in custom grayscale images based on manual annotations, using a complete training pipeline, automated inference, and visual mask validation.

๐Ÿค– Notes on Development

This project was born after many hours of experimentation, learning and progress driven by caffeine. Unlike other projects I have participated in before, this one evolved incredibly quickly thanks to the support of artificial intelligence such as Copilot (Microsoft) and ChatGPT (OpenAI). Without a doubt, these are tools that are way ahead of their time. As part of the experience of using and learning from these advanced AI tools, I always threw problems at both of them, to measure their performance and compare their responses. And to make the experience more fun, I kept an extremely formal dialogue with one and not at all formal with the other to see how they would react. And after a while, I reversed it, now being informal with the one that was previously formal and vice versa. Big thanks to both copilots โ€” one named Microsoft, the other simply GPT.

  • Powered by: PyTorch, Gradio, OpenCV, Matplotlib, and Hugging Face Datasets

๐Ÿ“ Project Structure

. โ”œโ”€โ”€ run_app.py โ”œโ”€โ”€ bad_image.png โ”œโ”€โ”€ CHANGELOG.md โ”œโ”€โ”€ checkpoints โ”‚   โ”œโ”€โ”€ best_model.pt โ”‚   โ””โ”€โ”€ modelo_completo.pth โ”œโ”€โ”€ DataSet โ”‚   โ”œโ”€โ”€ annotations โ”‚   โ”‚   โ””โ”€โ”€ classes.txt โ”‚   โ”œโ”€โ”€ ExtraTests โ”‚   โ”œโ”€โ”€ images โ”‚   โ””โ”€โ”€ masks โ”œโ”€โ”€ dice_history.png โ”œโ”€โ”€ run_evaluate.py โ”œโ”€โ”€ good_image.png โ”œโ”€โ”€ init.py โ”œโ”€โ”€ iou_history.png โ”œโ”€โ”€ LICENSE โ”œโ”€โ”€ model_card.md โ”œโ”€โ”€ .huggingface โ”‚   โ””โ”€โ”€ model-index.yaml โ”œโ”€โ”€ README.md โ”œโ”€โ”€ report_file.txt โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ scripts โ”‚   โ”œโ”€โ”€ config.py โ”‚   โ”œโ”€โ”€ Dataset โ”‚   โ”‚   โ”œโ”€โ”€ ConvertFormat.py โ”‚   โ”‚   โ”œโ”€โ”€ dataAugmentation.py โ”‚   โ”‚   โ”œโ”€โ”€ deleteDuplicates.py โ”‚   โ”‚   โ”œโ”€โ”€ getDS_HuggingFace.py โ”‚   โ”‚   โ”œโ”€โ”€ getImages.py โ”‚   โ”‚   โ”œโ”€โ”€ grays.py โ”‚   โ”‚   โ”œโ”€โ”€ init.py โ”‚   โ”‚   โ”œโ”€โ”€ mask_diagnosis.py โ”‚   โ”‚   โ”œโ”€โ”€ masks.py โ”‚   โ”‚   โ”œโ”€โ”€ Rename.py โ”‚   โ”‚   โ”œโ”€โ”€ Resize.py โ”‚   โ”‚   โ”œโ”€โ”€ TrainVal.py โ”‚   โ”‚   โ””โ”€โ”€ validMasks.py โ”‚   โ”œโ”€โ”€ init.py โ”‚   โ””โ”€โ”€ Segmentation โ”‚   โ”œโ”€โ”€ app.py โ”‚   โ”œโ”€โ”€ augment.py โ”‚   โ”œโ”€โ”€ diceLossCriterion.py โ”‚   โ”œโ”€โ”€ evaluate_model.py โ”‚   โ”œโ”€โ”€ flagged โ”‚   โ”œโ”€โ”€ focalLoss.py โ”‚   โ”œโ”€โ”€ Future โ”‚   โ”œโ”€โ”€ init.py โ”‚   โ”œโ”€โ”€ models.py โ”‚   โ”œโ”€โ”€ segDS.py โ”‚   โ””โ”€โ”€ train.py โ”œโ”€โ”€ structure.txt โ”œโ”€โ”€ training_loss.png โ””โ”€โ”€ training_val_accuracy.png

๐Ÿ“ Root Directory

Name Description
run_app.py Launcher script โ€” possibly for local inference or interface
bad_image.png Example of a failed prediction (for benchmarking or documentation)
good_image.png Example of a successful prediction (used for showcasing model quality)
CHANGELOG.md History of changes and version updates
checkpoints/ Contains trained model files (best_model.pt, modelo_completo.pth)
DataSet/ Contains training images, masks, annotations, and extra test sets
dice_history.png Visualization of Dice score progression during training
iou_history.png Graph of Intersection over Union (IoU) evolution across epochs
training_loss.png Plot showing model loss evolution throughout training
training_val_accuracy.png Graph of validation accuracy during model training
run_evaluate.py Evaluation script runnable from root โ€” assesses model performance
__init__.py Declares root as a Python package (if imported externally)
LICENSE Legal terms for usage and redistribution
model_card.md Technical summary of model details, performance, and intended use
.huggingface/model-index.yaml Configuration file for Hugging Face model registry (optional export)
README.md Main documentation file โ€” project overview, usage, and setup guide
report_file.txt Training log and report output saved during execution
requirements.txt List of dependencies needed for running the project
scripts/ Main logic for training, evaluation, dataset preparation, and modeling
structure.txt Manual export of the folder structure, used as reference or debug aid

๐Ÿ“ DataSet/

Name Description
annotations/ Contains classes.txt, defining class labels used in segmentation
images/ Input images used for training and evaluation
masks/ Segmentation masks aligned with input images
ExtraTests/ Optional dataset with additional test cases for generalization assessment

๐Ÿ“ scripts/

Name Description
config.py Configuration module holding paths, flags, and hyperparameters
__init__.py Declares scripts/ as an importable Python module

๐Ÿ“ scripts/Dataset/

Name Description
ConvertFormat.py Converts image or annotation formats (e.g. from JPG to PNG, or COCO to mask)
dataAugmentation.py Applies offline augmentations to images or masks
deleteDuplicates.py Detects and removes duplicate samples
getDS_HuggingFace.py Downloads datasets from Hugging Face ๐Ÿค—
getImages.py Image retrieval or organization from storage
grays.py Converts images to grayscale
mask_diagnosis.py Validates and diagnoses potential issues in masks
masks.py Performs manipulation or binarization of segmentation masks
Rename.py Batch renaming utility to standardize filenames
Resize.py Resizes images and masks to uniform dimensions
TrainVal.py Performs dataset train/validation splitting
validMasks.py Checks for validity in mask formatting and values
__init__.py Declares Dataset/ as a Python package

๐Ÿ“ scripts/Segmentation/

Name Description
app.py Local interface for model inference โ€” CLI or GUI
augment.py Online augmentations and Test-Time Augmentation (TTA)
diceLossCriterion.py Custom Dice Loss implementation for segmentation
focalLoss.py Custom Focal Loss implementation to handle class imbalance
evaluate_model.py Model evaluator with metrics like IoU, Dice, and pixel accuracy
models.py Contains neural network architecture (e.g. UNet based on ResNet)
segDS.py Dataset class for segmentation tasks, loading images and masks
train.py Main training script with logging, plotting, checkpointing, and early stop
Future/ Experimental code including auto hyperparameter tuning
flagged/ Optional output folder for flagged evaluations or debug samples
__init__.py Declares Segmentation/ as a Python package

Dataset

This project uses data from the CV Image Segmentation Dataset, which provides paired images and masks for semantic segmentation tasks. The Dataset presents some distinct data subsets. I only used the images related to carvana cars (Kaggle Carvana Car Mask Segmentation). This was the dataset used to test the project ...

The data subset used as the dataset for the project was pre-processed with the following order of scripts present in this project: 1 - Run getImages.py #Or use other data sources. 2 - Visually inspect the collected images. 3 - Run deleteDuplicates.py 4 - Run ConvertFormat.py 5 - Run Resize.py (Must be run for both the image and mask directories). 6 - Run grays.py (Must be run for both the image and mask directories). 8 - Make annotations. 9 - Run masks.py 10 - Run validMasks.py 11 - Run TrainVal.py


โš™๏ธ Model

  • Architecture: ResNet encoder + U-Net decoder
  • Input: 1-channel grayscale, resized to 512ร—512
  • Loss: Cross Entropy Loss with class weighting
  • Optimizer: Adam
  • Scheduler: StepLR with decay
  • Training duration: configurable (default: 400 epochs)
  • Early Stopping: based on accuracy stagnation
  • Checkpoints: saved every N epochs + best model saved

Training script: scripts/Segmentation/train.py Evaluation scripts:

  • scripts/Segmentation/evaluate_model.py: Batch evaluation over image folders

  • scripts/Segmentation/app.py: Gradio demo for interactive inference

  • run_app.py: Wrapper script to launch the Gradio interface from the root directory (calls scripts/Segmentation/app.py internally)

  • run_evaluate.py: wrapper script to launch the general pre-testing script from the root directory (calls scripts/Segmentation/evaluate_model.py internally) ๐Ÿ“„ The model is documented and registered via model-index.yaml for proper listing on Hugging Face Hub.


๐Ÿ“ˆ Evaluation

Quantitative metrics include:

  • Intersection over Union (IoU)
  • Dice coefficient
  • Accuracy, Precision, Recall
  • Balanced Accuracy and MCC

Visual inspection is supported via overlay masks in the ExtraTests/ folder.

training accuracy

training loss

iou_history

dice_history


๐Ÿ”ฌ Future Work

The directory scripts/Segmentation/Future/ includes planned extensions for embedded deployment:

  • train_embedded_explicit_model.py: A simplified and modular training script for generating lightweight ONNX models. Note: This script was not executed or validated during this certification phase.

๐Ÿ— Deployment Options

This project includes two scripts for model evaluation:

๐Ÿงช Batch Evaluation Script (evaluate_model.py)

Use this script to run the model on an entire directory of test images. Ideal for debugging, validation, and quantitative analysis.

python evaluate_model.py --input ./your-test-images/

You can modify this script to save prediction masks, compute metrics (IoU, pixel accuracy), or visualize results in batch.


๐ŸŒ Interactive Web Demo (app.py)

This script provides an interactive interface using Gradio. It's designed for easy deployment and model demonstration, such as on Hugging Face Spaces.

To launch the web app locally:

python app.py

Or try it online (if hosted):

๐Ÿ‘‰ Live demo on Hugging Face Spaces TODO:(link serรก atualizado apรณs submissรฃo)

This interface allows anyone to upload an image and instantly see the segmentation results โ€” no installation required.


๐Ÿ“Œ Tip: Use evaluate_model.py during development and testing, and app.py for sharing and showcasing your model.


๐Ÿ† Certification Context

This repository was submitted for the Hugging Face Computer Vision Certification and is built upon reproducibility, modularity, dataset transparency, and technical rigor.


๐Ÿ“„ License

This project is licensed under the MIT License. Dataset usage must comply with the original Kaggle dataset license terms.


๐Ÿ”ฎ Future improvements

Some steps are already planned for the project's evolution:

  • Architecture refinement: test lighter variants (e.g. ResNet18, MobileNetV3) to compare performance in embedded environments.
  • Training with data augmentation: use Data Augmentation strategies (rotation, noise, scale, brightness) to increase model robustness.
  • Cross-validation: include a cross-validation strategy to increase confidence in metrics.
  • Conversion to ONNX/TensorRT: prepare an exportable version of the model for inference on edge devices.
  • Deployment on specific hardware: test inference on ESP32-S3 or Raspberry Pi using a simplified pipeline with float16.
  • Visualization interface: create a simple script or panel that allows you to upload an image and view the segmentation live.

These improvements will be implemented as the project progresses, keeping the focus on lightness, modularity, and real applicability in computer vision with monochromatic images.


๐ŸŒŸ Final thoughts: why this certification matters

This project represents more than just completing a technical challenge. For me, it is the fulfillment of a long-held dream โ€” to earn a professional certification that values knowledge, practice, and the ability to solve real-world problems, rather than just familiarity with specific versions of tools or frameworks.

For many years, I experienced the frustrating side of commercial certifications that felt more like traps than opportunities: exams based on obsolete technologies, questionable application centers, and mechanisms that created more obstacles than recognition. That never represented who I am โ€” or what I am capable of building.

This certification, promoted by Hugging Face, is different. It validates true competencies in machine learning and computer vision based on a real-world project, executed end-to-end. It is a type of recognition that carries technical, ethical, and personal value.

That is why it is not โ€œjust another delivery.โ€ It is a turning point.


๐ŸŒŸ Important notesโ€ฆ

  1. The IDE used in the project was Eclipse (https://eclipseide.org/) using the PyDev module (https://www.pydev.org/). In this environment it was necessary to include the project path in PyDev-PYTHONPATH to perfectly recognize the includes of some files, as was the case with config.py.

  2. The model is being trained with the "train.py" script. However, there is a second training script called "cyber_train.py." This is an empirical test I'm conducting. A little research of my own. In "train," the hyperparameters are chosen manually. In "cyber_train," the script will run 25 short training sessions, each lasting 5 epochs, to test the hyperparameters within the established limits and determine the best ones. Then, the actual training will be performed using the best hyperparameters detected. And where does my empirical research come in? I'm training first with the simplest version of the script, measuring how long it takes me to arrive at a model with a good accuracy percentage. Once this is done, I'll run the automated version... Then, I'll compare which of the two models performed better and how long it took me to achieve each one... This will serve as a reference for more accurate trade-offs in future projects.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support