B4by/voice_clssification_gender_model

Model Architecture

This model is a Random Forest Classifier, a powerful ensemble learning model built from 100 decision trees (n_estimators=100). It was trained with random_state=42 to ensure reproducibility of results.

Preprocessing and Features

For each audio file, a diverse set of acoustic features was extracted. The librosa and parselmouth libraries were used for this purpose. The specific features include:

Fundamental Frequency (F0) Mean: The average of the fundamental frequency (pitch) values, extracted using parselmouth. Only F0 values greater than zero were considered.
Mel-frequency Cepstral Coefficients (MFCCs): 13 MFCCs were extracted, representing the spectral characteristics of the speech. The mean of each of these 13 coefficients was then calculated.
Spectral Centroid Mean: The average spectral centroid, indicating the "brightness" of the sound.
Spectral Bandwidth Mean: The average spectral bandwidth, describing the spread of energy in the frequency spectrum.
Zero-Crossing Rate (ZCR) Mean: The average zero-crossing rate, which indicates the rate at which the audio waveform changes sign.

These features were then horizontally stacked (np.hstack) to form a single feature vector for each audio sample, which was used as input for the Random Forest Classifier.