vakgyata-tiny / README.md
onecxi's picture
Update README.md
ce4c142 verified
metadata
language:
  - en
  - hi
  - or
  - bn
  - ta
  - te
  - kn
  - ml
  - mr
  - gu
  - pa
  - as
license: apache-2.0
pipeline_tag: audio-classification
library_name: transformers
tags:
  - language-identification
  - indian-languages
  - multilingual
  - speech
  - asr-preprocessing
  - callcenter-ai
  - speech-analytics
  - audio-classification
  - wav2vec2
  - transformers
  - pytorch
  - huggingface

Vakgyata

Language Identification for Indian Languages from Speech


Model Overview

vakgyata is an open-source language identification model specifically designed to classify Indian languages from raw speech audio. It is built upon the pretrained Harveenchadha/wav2vec2-pretrained-clsril-23-10k with additional Layer Normalization integrated to improve stability and performance for audio classification tasks.


Variants and Model Sizes

Variant Parameters Accuracy
vakgyata-base 95M 95.88%
vakgyata-small 52M 95.06%
vakgyata-mini 38M 95.06%
vakgyata-tiny 24M 93.63%

Supported Languages

Language Code
English (India) en-IN
Hindi hi-IN
Odia or-IN
Bengali bn-IN
Tamil ta-IN
Telugu te-IN
Kannada kn-IN
Malayalam ml-IN
Marathi mr-IN
Gujarati gu-IN
Punjabi pa-IN
Assamese as-IN

Specifications

  • Supported Sampling Rate: 16000 Hz
  • Recommended Audio Format: 16kHz, 16bit PCM (Mono)

Installation

pip install transformers torchaudio

Usage

from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "onecxi/vakgyata-tiny"

processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id).to(device)

Inference Example

import torchaudio

# Load the audio (ensure it's 16kHz mono)
audio, sr = torchaudio.load("path/to/audio.wav")

# Preprocess
inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt").to(device)

# Inference
with torch.no_grad():
    logits = model(**inputs).logits

# Softmax to get probabilities
probs = logits.softmax(dim=-1).cpu().numpy()

# Predicted language
language = model.config.id2label.get(probs.argmax())
print("Predicted Language:", language)

Citation

If you use this model in your research or application, please consider citing the model and its base source:

@misc{vakgyata2024,
  title={vakgyata: Language Identification for Indian Speech},
  author={OneCXI},
  year={2024},
  url={https://huggingface.co/onecxi/vakgyata-tiny}
}