--- language: es license: apache-2.0 library_name: onnxruntime pipeline_tag: voice-activity-detection tags: - turn-detection - end-of-utterance - distilbert - onnx - quantized - conversational-ai - voice-assistant - real-time base_model: distilbert-base-multilingual-cased datasets: - videosdk-live/Namo-Turn-Detector-v1-Train model-index: - name: Namo Turn Detector v1 - Spanish results: - task: type: text-classification name: Turn Detection dataset: name: Namo Turn Detector v1 Test - Spanish type: videosdk-live/Namo-Turn-Detector-v1-Test split: train metrics: - type: accuracy value: 0.867181 name: Accuracy - type: f1 value: 0.878187 name: F1 Score - type: precision value: 0.789809 name: Precision - type: recall value: 0.988836 name: Recall --- # 🎯 Namo Turn Detector v1 - Spanish
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![ONNX](https://img.shields.io/badge/ONNX-Optimized-brightgreen)](https://onnx.ai/) [![Model Size](https://img.shields.io/badge/Model%20Size-~136M-orange)](https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Spanish) [![Inference Speed](https://img.shields.io/badge/Inference-<12ms-red)]() **πŸš€ Namo Turn Detection Model for Spanish**
--- ## πŸ“‹ Overview The **Namo Turn Detector** is a specialized AI model designed to solve one of the most challenging problems in conversational AI: **knowing when a user has finished speaking**. This Spanish-specialist model uses advanced natural language understanding to distinguish between: - βœ… **Complete utterances** (user is done speaking) - πŸ”„ **Incomplete utterances** (user will continue speaking) Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency. ## πŸ”‘ Key Features - **Turn Detection Specialist**: Detects end-of-turn vs. continuation in Spanish speech transcripts. - **Low Latency**: Optimized with **quantized ONNX** for <12ms inference. - **Robust Performance**: 86.7% accuracy on diverse Spanish utterances. - **Easy Integration**: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK. - **Enterprise Ready**: Supports real-time conversational AI and voice assistants. ## πŸ“Š Performance Metrics
| Metric | Score | |--------|-------| | **🎯 Accuracy** | **86.71%** | | **πŸ“ˆ F1-Score** | **87.81%** | | **πŸŽͺ Precision** | **78.98%** | | **🎭 Recall** | **98.88%** | | **⚑ Latency** | **<12ms** | | **πŸ’Ύ Model Size** | **~135MB** |
Alt text > πŸ“Š *Evaluated on 1200+ Spanish utterances from diverse conversational contexts* ## ⚑️ Speed Analysis Alt text ## πŸ”§ Train & Test Scripts
[![Train Script](https://img.shields.io/badge/Colab-Train%20Script-brightgreen?logo=google-colab)](https://colab.research.google.com/drive/1DqSUYfcya0r2iAEZB9fS4mfrennubduV) [![Test Script](https://img.shields.io/badge/Colab-Test%20Script-blue?logo=google-colab)](https://colab.research.google.com/drive/19ZOlNoHS2WLX2V4r5r492tsCUnYLXnQR)
## πŸ› οΈ Installation To use this model, you will need to install the following libraries. ```bash pip install onnxruntime transformers huggingface_hub ``` ## πŸš€ Quick Start You can run inference directly from Hugging Face repository. ```python import numpy as np import onnxruntime as ort from transformers import AutoTokenizer from huggingface_hub import hf_hub_download class TurnDetector: def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Spanish"): """ Initializes the detector by downloading the model and tokenizer from the Hugging Face Hub. """ print(f"Loading model from repo: {repo_id}") # Download the model and tokenizer from the Hub # Authentication is handled automatically if you are logged in model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx") self.tokenizer = AutoTokenizer.from_pretrained(repo_id) # Set up the ONNX Runtime inference session self.session = ort.InferenceSession(model_path) self.max_length = 512 print("βœ… Model and tokenizer loaded successfully.") def predict(self, text: str) -> tuple: """ Predicts if a given text utterance is the end of a turn. Returns (predicted_label, confidence) where: - predicted_label: 0 for "Not End of Turn", 1 for "End of Turn" - confidence: confidence score between 0 and 1 """ # Tokenize the input text inputs = self.tokenizer( text, truncation=True, max_length=self.max_length, return_tensors="np" ) # Prepare the feed dictionary for the ONNX model feed_dict = { "input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"] } # Run inference outputs = self.session.run(None, feed_dict) logits = outputs[0] probabilities = self._softmax(logits[0]) predicted_label = np.argmax(probabilities) confidence = float(np.max(probabilities)) return predicted_label, confidence def _softmax(self, x, axis=None): if axis is None: axis = -1 exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True)) return exp_x / np.sum(exp_x, axis=axis, keepdims=True) # --- Example Usage --- if __name__ == "__main__": detector = TurnDetector() sentences = [ "En el NeotrΓ³pico, e, crecen de forma silvestre alrededor de 790 especies.", # Expected: End of Turn "Tres de la madrugada se conoce como tritio y contiene un protΓ³n y..." # Expected: Not End of Turn ] for sentence in sentences: predicted_label, confidence = detector.predict(sentence) result = "End of Turn" if predicted_label == 1 else "Not End of Turn" print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})") print("-" * 50) ``` ## πŸ€– VideoSDK Agents Integration Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications. ```python from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model #download model pre_download_namo_turn_v1_model(language="es") # Initialize Spanish turn detector for VideoSDK Agents turn_detector = NamoTurnDetectorV1(language="es") ``` > πŸ“š [**Complete Integration Guide**](https://docs.videosdk.live/ai_agents/plugins/namo-turn-detector) - Learn how to use `NamoTurnDetectorV1` with VideoSDK Agents ## πŸ“– Citation ```bibtex @model{namo_turn_detector_es_2025, title={Namo Turn Detector v1: Spanish}, author={VideoSDK Team}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Spanish}, note={ONNX-optimized DistilBERT for turn detection in Spanish} } ``` ## πŸ“„ License This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
**Made with ❀️ by the VideoSDK Team** [![VideoSDK](https://img.shields.io/badge/VideoSDK-Live-blue)](https://videosdk.live)