Cat vs. Dog Image Classification

This is a Keras image classification model trained to distinguish between images of cats and dogs. The model is based on the EfficientNetB1 architecture and was trained on a dataset of cat and dog images.

Model Architecture

The model uses EfficientNetB1 pre-trained on ImageNet as its base. The architecture is as follows:

Input Layer: Accepts images of size (240, 240, 3).
Data Augmentation: Applies random transformations to the input images to improve generalization:
- RandomFlip("horizontal")
- RandomRotation(0.1)
- RandomZoom(0.1)
- RandomContrast(0.1)
- RandomBrightness(0.1)
Base Model: EfficientNetB1 (with weights frozen during the initial training phase).
Classification Head:
- GlobalAveragePooling2D
- Dropout(0.2)
- Dense(1, activation="sigmoid") for binary classification.

Training Procedure

The model was trained in two stages:

Transfer Learning: The EfficientNetB1 base was frozen, and only the classification head was trained for 50 epochs. This allows the model to learn to classify cats and dogs using the features learned from ImageNet.
Fine-Tuning: The top 20 layers of the EfficientNetB1 base were unfrozen and the entire model was trained for an additional 50 epochs with a lower learning rate. This fine-tunes the pre-trained features for the specific task of cat vs. dog classification.

Key training parameters:

Optimizer: AdamW
Loss Function: binary_crossentropy
Learning Rate Schedule: CosineDecayRestarts
Metrics: accuracy, AUC
Batch Size: 16

Evaluation Results

The model was evaluated on a test set of 3,512 images, achieving the following performance:

Metric	Value
Loss	0.0338
Accuracy	99.54%
AUC	0.9994

How to Use

You can use this model for inference with TensorFlow and Keras.

First, make sure you have TensorFlow installed:

pip install tensorflow

Then, you can load the model and use it to predict on a new image:

import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing import image

model = tf.keras.models.load_model('path/to/your/model.keras')

img_path = 'path/to/your/image.jpg'
img = image.load_img(img_path, target_size=(240, 240))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)

preprocessed_img = tf.keras.applications.efficientnet.preprocess_input(img_array)

prediction = model.predict(preprocessed_img)
score = prediction[0][0]

print(
    f"This image is {100 * (1 - score):.2f}% cat and {100 * score:.2f}% dog."
)

Note: The model outputs a single value between 0 and 1. A value closer to 0 indicates a 'cat', and a value closer to 1 indicates a 'dog'. The exact labels depend on how they were encoded during training (e.g., cat=0, dog=1).

Dataset Credits

The training data is the publicly available microsoft/cats_vs_dogs dataset (originally the Asirra CAPTCHA dataset). Huge thanks to Microsoft Research and Petfinder.com for releasing the images!

@misc{microsoftcatsdogs,
  title  = {Cats vs. Dogs Image Dataset},
  author = {Microsoft Research & Petfinder.com},
  howpublished = {HuggingFace Hub},
  url    = {https://huggingface.co/datasets/microsoft/cats_vs_dogs}
}

Acknowledgements

TensorFlow/Keras team for the excellent deep-learning framework.
Mingxing Tan & Quoc V. Le for EfficientNet.
The Hugging Face community for the awesome Model & Dataset hubs.

muhalwan
/

catndog