Cat vs. Dog Image Classification
This is a Keras image classification model trained to distinguish between images of cats and dogs. The model is based on the EfficientNetB1
architecture and was trained on a dataset of cat and dog images.
Model Architecture
The model uses EfficientNetB1
pre-trained on ImageNet as its base. The architecture is as follows:
- Input Layer: Accepts images of size
(240, 240, 3)
. - Data Augmentation: Applies random transformations to the input images to improve generalization:
RandomFlip("horizontal")
RandomRotation(0.1)
RandomZoom(0.1)
RandomContrast(0.1)
RandomBrightness(0.1)
- Base Model:
EfficientNetB1
(with weights frozen during the initial training phase). - Classification Head:
GlobalAveragePooling2D
Dropout(0.2)
Dense(1, activation="sigmoid")
for binary classification.
Training Procedure
The model was trained in two stages:
- Transfer Learning: The
EfficientNetB1
base was frozen, and only the classification head was trained for 50 epochs. This allows the model to learn to classify cats and dogs using the features learned from ImageNet. - Fine-Tuning: The top 20 layers of the
EfficientNetB1
base were unfrozen and the entire model was trained for an additional 50 epochs with a lower learning rate. This fine-tunes the pre-trained features for the specific task of cat vs. dog classification.
Key training parameters:
- Optimizer:
AdamW
- Loss Function:
binary_crossentropy
- Learning Rate Schedule:
CosineDecayRestarts
- Metrics:
accuracy
,AUC
- Batch Size: 16
Evaluation Results
The model was evaluated on a test set of 3,512 images, achieving the following performance:
Metric | Value |
---|---|
Loss | 0.0338 |
Accuracy | 99.54% |
AUC | 0.9994 |
How to Use
You can use this model for inference with TensorFlow and Keras.
First, make sure you have TensorFlow installed:
pip install tensorflow
Then, you can load the model and use it to predict on a new image:
import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing import image
model = tf.keras.models.load_model('path/to/your/model.keras')
img_path = 'path/to/your/image.jpg'
img = image.load_img(img_path, target_size=(240, 240))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
preprocessed_img = tf.keras.applications.efficientnet.preprocess_input(img_array)
prediction = model.predict(preprocessed_img)
score = prediction[0][0]
print(
f"This image is {100 * (1 - score):.2f}% cat and {100 * score:.2f}% dog."
)
Note: The model outputs a single value between 0 and 1. A value closer to 0 indicates a 'cat', and a value closer to 1 indicates a 'dog'. The exact labels depend on how they were encoded during training (e.g., cat=0, dog=1).
Dataset Credits
The training data is the publicly available microsoft/cats_vs_dogs dataset (originally the Asirra CAPTCHA dataset). Huge thanks to Microsoft Research and Petfinder.com for releasing the images!
@misc{microsoftcatsdogs,
title = {Cats vs. Dogs Image Dataset},
author = {Microsoft Research & Petfinder.com},
howpublished = {HuggingFace Hub},
url = {https://huggingface.co/datasets/microsoft/cats_vs_dogs}
}
Acknowledgements
- TensorFlow/Keras team for the excellent deep-learning framework.
- Mingxing Tan & Quoc V. Le for EfficientNet.
- The Hugging Face community for the awesome Model & Dataset hubs.
- Downloads last month
- 32