YOLOv12 Grayscale - Cavity Detection Model
A specialized YOLOv12 model modified to work with single-channel (grayscale) images for cavity detection. This model has been converted from the original triple-input (9-channel) architecture to grayscale (1-channel) input, making it memory-efficient and suitable for medical imaging applications.
π― Model Overview
- Architecture: YOLOv12-n (Nano) with grayscale input
- Input Channels: 1 (grayscale) instead of 3 (RGB)
- Task: Object Detection (Cavity Detection)
- Classes: 1 class (Cavity)
- Image Size: 640x640
- Parameters: ~2.5M parameters
- Framework: Ultralytics YOLOv12
π₯ Key Features
- β Grayscale Input: Works with single-channel images (1/3 memory usage vs RGB)
- β Medical Imaging Optimized: Designed for dental X-ray analysis
- β Fast Inference: Optimized for real-time detection
- β Easy Integration: Compatible with Ultralytics ecosystem
- β Automatic Conversion: Converts RGB images to grayscale automatically
π Dataset
Training was performed on a custom cavity detection dataset:
- Dataset Size: 98 train, 12 validation, 12 test images
- Classes: 1 (Cavity)
- Format: YOLO format annotations
- Image Type: Dental X-ray images (grayscale)
Dataset Structure
dataset/
βββ train/
β βββ images/ # 98 training images
β βββ labels/ # YOLO format annotations
βββ valid/
β βββ images/ # 12 validation images
β βββ labels/ # YOLO format annotations
βββ test/
βββ images/ # 12 test images
βββ labels/ # YOLO format annotations
Label Format
YOLO format: <class_id> <x_center> <y_center> <width> <height>
Example:
0 0.5234 0.4123 0.1234 0.0987
π Quick Start
Installation
pip install ultralytics opencv-python numpy torch torchvision
Inference
from ultralytics import YOLO
# Load the trained model
model = YOLO('best.pt')
# Predict on a grayscale image
results = model.predict(
source='path/to/xray/image.jpg',
conf=0.25, # Confidence threshold
save=True, # Save results
imgsz=640 # Image size
)
# Process results
for result in results:
boxes = result.boxes
for box in boxes:
cls = int(box.cls[0])
conf = float(box.conf[0])
xyxy = box.xyxy[0].tolist()
print(f"Cavity detected - Confidence: {conf:.2%}")
Batch Inference
# Predict on multiple images
results = model.predict(
source='path/to/images/folder/',
save=True,
conf=0.25
)
π Training
Dataset Configuration
Create dataset.yaml:
path: /path/to/dataset
train: train/images
val: valid/images
test: test/images
nc: 1
names:
0: Cavity
Training Command
# Using Python API
from ultralytics import YOLO
model = YOLO('ultralytics/cfg/models/v12/yolov12_triple.yaml')
results = model.train(
data='dataset.yaml',
epochs=100,
imgsz=640,
batch=16,
device=0,
patience=50,
optimizer='AdamW',
lr0=0.001
)
Training Parameters
| Parameter | Value | Description |
|---|---|---|
epochs |
100-300 | Number of training epochs |
batch |
8-32 | Batch size |
imgsz |
640 | Input image size |
device |
0 or 'cpu' | GPU device or CPU |
lr0 |
0.001 | Initial learning rate |
π Model Performance
- Training Time: ~30 minutes (on CPU)
- Memory Usage: 1/3 of RGB model
- Inference Speed: Real-time capable
π§ Model Architecture
Modified from YOLOv12 triple-input architecture:
Original vs Modified
| Feature | Original (Triple) | Modified (Grayscale) |
|---|---|---|
| Input Channels | 9 | 1 |
| First Layer | TripleInputConv | Conv |
| Memory Usage | 3x baseline | 1/3x baseline |
Key Modifications
- Changed input layer from 9 channels to 1 channel
- Modified data loader for automatic RGBβgrayscale conversion
- Updated augmentation pipeline for single-channel images
- Disabled HSV augmentations (not applicable to grayscale)
- Optimized mosaic and letterbox augmentations
π¨ Data Augmentation
Optimized for grayscale images:
- Geometric: Rotation (Β±10Β°), Translation (Β±10%), Scaling (50-150%), Horizontal flip
- Photometric: Brightness adjustment (Β±40%), Blur
- Advanced: Mosaic augmentation, Copy-paste, Random erasing
Note: HSV augmentations are disabled for grayscale.
πΎ Export Model
from ultralytics import YOLO
model = YOLO('best.pt')
# Export to ONNX
model.export(format='onnx', imgsz=640)
# Export to TensorRT
model.export(format='engine', imgsz=640, half=True)
π Citation
@misc{yolov12-grayscale-cavity,
title={YOLOv12 Grayscale for Cavity Detection},
author={Suphawut},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/suphawutq56789/yolov12-grayscale-cavity}}
}
π License
This model is released under the AGPL-3.0 license.
π Acknowledgments
- Built on Ultralytics YOLO
- Modified from Triple-Input YOLO architecture
- Trained for dental cavity detection
Built with β€οΈ using Ultralytics YOLOv12