Fine-tuned VGG-16 Model for Gunshot Detection
This is a fine-tuned VGG-16 model for detecting gunshots in audio recordings. The model was trained on a dataset of audio clips labeled as either "gunshot" or "background".
Model Details
- Trained by: Ranabir Saha
- Fine-tuned on: Tropical forest gunshot classification training audio dataset from Automated detection of gunshots in tropical forests using convolutional neural networks (Katsis et al. 2022)
- Dataset Source: https://doi.org/10.17632/x48cwz364j.3
- Input: Preprocessed mel-spectrograms (224x224x3) loaded from
.npy
files, generated from 4-second audio clips - Output: Binary classification (Gunshot/Background)
Training
The model was trained using the following parameters:
- Base Model: VGG-16 pre-trained on ImageNet
- Optimizer: Adam (initial learning rate=0.0001, fine-tuning learning rate=1e-5)
- Loss Function: Categorical cross-entropy
- Metrics: Accuracy, Precision, Recall
- Batch Size: 32
- Initial Training: Up to 25 epochs with early stopping (patience=5) on validation loss
- Fine-tuning: Last 8 layers unfrozen, up to 10 epochs with early stopping (patience=5)
- Class Weights: Balanced to handle class imbalance
Usage
To use this model for inference, you can load it from the Hugging Face Hub and pass preprocessed mel-spectrograms as input.
Example
import numpy as np
import tensorflow as tf
from huggingface_hub import hf_hub_download
# Download the model
model_path = hf_hub_download(repo_id="ranvir-not-found/vgg16-sda_gunshot-detection", filename="vgg16_model.keras")
model = tf.keras.models.load_model(model_path)
# Function to load and preprocess .npy file
def load_and_preprocess_npy(file_path):
mel_spectrogram = np.load(file_path)
# Normalization
spec_min = np.min(mel_spectrogram)
spec_max = np.max(mel_spectrogram)
if spec_max > spec_min:
mel_spectrogram = 255 * (mel_spectrogram - spec_min) / (spec_max - spec_min)
else:
mel_spectrogram = np.zeros_like(mel_spectrogram)
mel_spectrogram = mel_spectrogram.astype(np.float32)
# Resize to 224x224
mel = tf.image.resize(mel_spectrogram[..., np.newaxis], (224, 224))
# Repeat to create 3 channels
mel = tf.repeat(mel, 3, axis=-1)
# Apply VGG-16 preprocessing
mel = tf.keras.applications.vgg16.preprocess_input(mel)
return mel
# Example usage
npy_path = "path/to/your/spectrogram.npy"
input_data = load_and_preprocess_npy(npy_path)
input_data = tf.expand_dims(input_data, axis=0) # Add batch dimension
predictions = model.predict(input_data)
class_names = ['gunshot', 'background']
predicted_class = class_names[np.argmax(predictions[0])]
print(f"Predicted class: {predicted_class}, Probabilities: {predictions[0]}")
Evaluation
The model was evaluated on a validation set, and the following metrics were computed:
- Confusion Matrix
- ROC Curve
- Precision-Recall Curve
- Classification Report
The evaluation results are saved in the
evaluation_results
directory. The model was optimized for recall on the 'gunshot' class.
For more details, please refer to the training script and logs.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support