vit-large-patch32-384-finetuned-skin-lesion-classification
Vision Transformer model fine-tuned for skin lesion classification across 12 classes. This model builds on the pre-trained ViT (originally trained on ImageNet-21k and fine-tuned on ImageNet-2012) and adapts a checkpoint initially focused on melanoma detection (UnipaPolitoUnimore/vit-large-patch32-384-melanoma).
Model Description
- Architecture: Vision Transformer (ViT) that processes images as fixed-size patches (specifically 384x384 pixels).
- Modifications:
- Replaced the original melanoma model's three-class head with a new linear classifier for 12 classes.
- Classes: actinic keratosis, basal cell carcinoma, clear skin, dermatofibroma, melanoma, melanoma metastasis, nevus, random, seborrheic keratosis, solar lentigo, squamous cell carcinoma, and vascular lesion.
- Feature Extractor: Leverages pretrained skin lesion features for transfer learning.
Intended Uses & Limitations
- Intended Uses:
- Automated skin lesion classification for research and decision support in dermatology.
- Limitations:
- Data bias may persist despite augmentation. Training would benefit from more data, especially for rare classes.
- Performance may vary across imaging conditions or devices.
Training and Evaluation
- Training Data:
- Approximately 70k images assembled via real and high-quality synthetic data, with improved class balance.
- Training Setup:
- Optimizer: AdamW (default settings) with learning rate 2e-05.
- Epochs: 3
- Batch sizes: 8 (train) / 16 (eval)
- Evaluation Results:
Validation Set Performance
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Actinic Keratosis | 0.74 | 0.77 | 0.76 | 163 |
| Basal Cell Carcinoma | 0.90 | 0.86 | 0.88 | 551 |
| Clear Skin | 1.00 | 1.00 | 1.00 | 13 |
| Dermatofibroma | 0.85 | 0.68 | 0.76 | 25 |
| Melanoma | 0.93 | 0.81 | 0.87 | 600 |
| Melanoma Metastasis | 0.85 | 0.77 | 0.81 | 95 |
| Nevus | 0.84 | 0.96 | 0.90 | 847 |
| Random | 1.00 | 1.00 | 1.00 | 52 |
| Seborrheic Keratosis | 0.74 | 0.77 | 0.75 | 190 |
| Solar Lentigo | 0.71 | 0.64 | 0.68 | 42 |
| Squamous Cell Carcinoma | 0.87 | 0.71 | 0.78 | 84 |
| Vascular Lesion | 0.92 | 0.52 | 0.67 | 23 |
| Accuracy | 0.86 | 2685 | ||
| Macro Avg | 0.86 | 0.79 | 0.82 | 2685 |
| Weighted Avg | 0.86 | 0.86 | 0.86 | 2685 |
Test Set Performance
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Actinic Keratosis | 0.81 | 0.80 | 0.81 | 164 |
| Basal Cell Carcinoma | 0.86 | 0.93 | 0.89 | 552 |
| Dermatofibroma | 0.95 | 0.77 | 0.85 | 26 |
| Melanoma | 0.88 | 0.89 | 0.89 | 601 |
| Melanoma Metastasis | 0.96 | 0.73 | 0.83 | 95 |
| Nevus | 0.91 | 0.93 | 0.92 | 848 |
| Seborrheic Keratosis | 0.81 | 0.76 | 0.79 | 191 |
| Solar Lentigo | 0.82 | 0.63 | 0.71 | 43 |
| Squamous Cell Carcinoma | 0.91 | 0.74 | 0.82 | 84 |
| Vascular Lesion | 1.00 | 0.83 | 0.90 | 23 |
| Accuracy | 0.88 | 2627 | ||
| Macro Avg | 0.89 | 0.80 | 0.84 | 2627 |
| Weighted Avg | 0.88 | 0.88 | 0.88 | 2627 |
Confusion Matrix
The confusion matrices for both validation and test sets have been generated to provide insight into per-class performance. They are available as PNG files:
How to Use
Example inference code:
from transformers import ViTImageProcessor, ViTForImageClassification
from PIL import Image
import torch
# Load model and processor
model_id = "path_or_repo_identifier_for_your_model"
processor = ViTImageProcessor.from_pretrained(model_id)
model = ViTForImageClassification.from_pretrained(model_id)
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
model = model.to(device)
model.eval()
# Load and process image
image_path = "path/to/skin_lesion_image.jpg"
image = Image.open(image_path).convert("RGB")
inputs = processor(images=image, return_tensors="pt").to(device)
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
# Get prediction results
logits = outputs.logits
probabilities = torch.nn.functional.softmax(logits, dim=1)[0]
predicted_class_idx = torch.argmax(probabilities).item()
predicted_class = model.config.id2label[predicted_class_idx]
confidence = probabilities[predicted_class_idx].item()
print(f"Predicted class: {predicted_class}")
print(f"Confidence: {confidence:.2%}")
Citation
If you use this model, please cite the original ViT paper:
@misc{dosovitskiy2020image,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and et al.},
year={2020},
eprint={2010.11929},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 172

