metadata
language:
- en
- hi
- or
- bn
- ta
- te
- kn
- ml
- mr
- gu
- pa
- as
license: apache-2.0
pipeline_tag: audio-classification
library_name: transformers
tags:
- language-identification
- indian-languages
- multilingual
- speech
- asr-preprocessing
- callcenter-ai
- speech-analytics
- audio-classification
- wav2vec2
- transformers
- pytorch
- huggingface
Vakgyata
Language Identification for Indian Languages from Speech
Model Overview
vakgyata
is an open-source language identification model specifically designed to classify Indian languages from raw speech audio. It is built upon the pretrained Harveenchadha/wav2vec2-pretrained-clsril-23-10k
with additional Layer Normalization integrated to improve stability and performance for audio classification tasks.
Variants and Model Sizes
Variant | Parameters | Accuracy |
---|---|---|
vakgyata-base |
95M | 95.88% |
vakgyata-small |
52M | 95.06% |
vakgyata-mini |
38M | 95.06% |
vakgyata-tiny |
24M | 93.63% |
Supported Languages
Language | Code |
---|---|
English (India) | en-IN |
Hindi | hi-IN |
Odia | or-IN |
Bengali | bn-IN |
Tamil | ta-IN |
Telugu | te-IN |
Kannada | kn-IN |
Malayalam | ml-IN |
Marathi | mr-IN |
Gujarati | gu-IN |
Punjabi | pa-IN |
Assamese | as-IN |
Specifications
- Supported Sampling Rate: 16000 Hz
- Recommended Audio Format: 16kHz, 16bit PCM (Mono)
Installation
pip install transformers torchaudio
Usage
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "onecxi/vakgyata-tiny"
processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id).to(device)
Inference Example
import torchaudio
# Load the audio (ensure it's 16kHz mono)
audio, sr = torchaudio.load("path/to/audio.wav")
# Preprocess
inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt").to(device)
# Inference
with torch.no_grad():
logits = model(**inputs).logits
# Softmax to get probabilities
probs = logits.softmax(dim=-1).cpu().numpy()
# Predicted language
language = model.config.id2label.get(probs.argmax())
print("Predicted Language:", language)
Citation
If you use this model in your research or application, please consider citing the model and its base source:
@misc{vakgyata2024,
title={vakgyata: Language Identification for Indian Speech},
author={OneCXI},
year={2024},
url={https://huggingface.co/onecxi/vakgyata-tiny}
}