Fine-tuned VGG-16 Model for Gunshot Detection

This is a fine-tuned VGG-16 model for detecting gunshots in audio recordings. The model was trained on a dataset of audio clips labeled as either "gunshot" or "background".

Model Details

Trained by: Ranabir Saha
Fine-tuned on: Tropical forest gunshot classification training audio dataset from Automated detection of gunshots in tropical forests using convolutional neural networks (Katsis et al. 2022)
Dataset Source: https://doi.org/10.17632/x48cwz364j.3
Input: Preprocessed mel-spectrograms (224x224x3) loaded from .npy files, generated from 4-second audio clips
Output: Binary classification (Gunshot/Background)

Training

The model was trained using the following parameters:

Base Model: VGG-16 pre-trained on ImageNet
Optimizer: Adam (initial learning rate=0.0001, fine-tuning learning rate=1e-5)
Loss Function: Categorical cross-entropy
Metrics: Accuracy, Precision, Recall
Batch Size: 32
Initial Training: Up to 25 epochs with early stopping (patience=5) on validation loss
Fine-tuning: Last 8 layers unfrozen, up to 10 epochs with early stopping (patience=5)
Class Weights: Balanced to handle class imbalance

Usage

To use this model for inference, you can load it from the Hugging Face Hub and pass preprocessed mel-spectrograms as input.

Example

import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download

# Download the model
model_path = hf_hub_download(repo_id="ranvir-not-found/vgg16-sda_gunshot-detection", filename="vgg16_model.keras")
model = tf.keras.models.load_model(model_path)

# Function to load and preprocess .npy file
def load_and_preprocess_npy(file_path):
    mel_spectrogram = np.load(file_path)
    # Normalization
    spec_min = np.min(mel_spectrogram)
    spec_max = np.max(mel_spectrogram)
    if spec_max > spec_min:
        mel_spectrogram = 255 * (mel_spectrogram - spec_min) / (spec_max - spec_min)
    else:
        mel_spectrogram = np.zeros_like(mel_spectrogram)
    mel_spectrogram = mel_spectrogram.astype(np.float32)
    # Resize to 224x224
    mel = tf.image.resize(mel_spectrogram[..., np.newaxis], (224, 224))
    # Repeat to create 3 channels
    mel = tf.repeat(mel, 3, axis=-1)
    # Apply VGG-16 preprocessing
    mel = tf.keras.applications.vgg16.preprocess_input(mel)
    return mel

# Example usage
npy_path = "path/to/your/spectrogram.npy"
input_data = load_and_preprocess_npy(npy_path)
input_data = tf.expand_dims(input_data, axis=0)  # Add batch dimension
predictions = model.predict(input_data)
class_names = ['gunshot', 'background']
predicted_class = class_names[np.argmax(predictions[0])]
print(f"Predicted class: {predicted_class}, Probabilities: {predictions[0]}")

Evaluation

The model was evaluated on a validation set, and the following metrics were computed:

Confusion Matrix
ROC Curve
Precision-Recall Curve
Classification Report The evaluation results are saved in the evaluation_results directory. The model was optimized for recall on the 'gunshot' class.

For more details, please refer to the training script and logs.