File size: 7,550 Bytes

---
language: es
license: apache-2.0
library_name: onnxruntime
pipeline_tag: voice-activity-detection
tags:
- turn-detection
- end-of-utterance
- distilbert
- onnx
- quantized
- conversational-ai
- voice-assistant
- real-time
base_model: distilbert-base-multilingual-cased
datasets:
- videosdk-live/Namo-Turn-Detector-v1-Train
model-index:
- name: Namo Turn Detector v1 - Spanish
  results:
  - task:
      type: text-classification
      name: Turn Detection
    dataset:
      name: Namo Turn Detector v1 Test - Spanish
      type: videosdk-live/Namo-Turn-Detector-v1-Test
      split: train
    metrics:
    - type: accuracy
      value: 0.867181
      name: Accuracy
    - type: f1
      value: 0.878187
      name: F1 Score
    - type: precision
      value: 0.789809
      name: Precision
    - type: recall
      value: 0.988836
      name: Recall
---

# 🎯 Namo Turn Detector v1 - Spanish

<div align="center">

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![ONNX](https://img.shields.io/badge/ONNX-Optimized-brightgreen)](https://onnx.ai/)
[![Model Size](https://img.shields.io/badge/Model%20Size-~136M-orange)](https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Spanish)
[![Inference Speed](https://img.shields.io/badge/Inference-<12ms-red)]()

**🚀 Namo Turn Detection Model for Spanish**

</div>

---

## 📋 Overview

The **Namo Turn Detector** is a specialized AI model designed to solve one of the most challenging problems in conversational AI: **knowing when a user has finished speaking**. 

This Spanish-specialist model uses advanced natural language understanding to distinguish between:
- ✅ **Complete utterances** (user is done speaking)
- 🔄 **Incomplete utterances** (user will continue speaking)

Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency.

## 🔑 Key Features

- **Turn Detection Specialist**: Detects end-of-turn vs. continuation in Spanish speech transcripts.  
- **Low Latency**: Optimized with **quantized ONNX** for <12ms inference.  
- **Robust Performance**: 86.7% accuracy on diverse Spanish utterances.  
- **Easy Integration**: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK.  
- **Enterprise Ready**: Supports real-time conversational AI and voice assistants.  

## 📊 Performance Metrics
<div>

| Metric | Score |
|--------|-------|
| **🎯 Accuracy** | **86.71%** | 
| **📈 F1-Score** | **87.81%** |
| **🎪 Precision** | **78.98%** |
| **🎭 Recall** | **98.88%** |
| **⚡ Latency** | **<12ms** |
| **💾 Model Size** | **~135MB** |

</div>
<img src="./confusion_matrices.png" alt="Alt text" width="600" height="400"/>

> 📊 *Evaluated on 1200+ Spanish utterances from diverse conversational contexts*

## ⚡️ Speed Analysis

<img src="./performance_analysis.png" alt="Alt text" width="600" height="400"/>

## 🔧 Train & Test Scripts

<div align="center">

[![Train Script](https://img.shields.io/badge/Colab-Train%20Script-brightgreen?logo=google-colab)](https://colab.research.google.com/drive/1DqSUYfcya0r2iAEZB9fS4mfrennubduV) [![Test Script](https://img.shields.io/badge/Colab-Test%20Script-blue?logo=google-colab)](https://colab.research.google.com/drive/19ZOlNoHS2WLX2V4r5r492tsCUnYLXnQR)

</div>

## 🛠️ Installation

To use this model, you will need to install the following libraries.

```bash
pip install onnxruntime transformers huggingface_hub
```

## 🚀 Quick Start

You can run inference directly from Hugging Face repository.

```python
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

class TurnDetector:
    def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Spanish"):
        """
        Initializes the detector by downloading the model and tokenizer
        from the Hugging Face Hub.
        """
        print(f"Loading model from repo: {repo_id}")
        
        # Download the model and tokenizer from the Hub
        # Authentication is handled automatically if you are logged in
        model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx")
        self.tokenizer = AutoTokenizer.from_pretrained(repo_id)
        
        # Set up the ONNX Runtime inference session
        self.session = ort.InferenceSession(model_path)
        self.max_length = 512
        print("✅ Model and tokenizer loaded successfully.")

    def predict(self, text: str) -> tuple:
        """
        Predicts if a given text utterance is the end of a turn.
        Returns (predicted_label, confidence) where:
        - predicted_label: 0 for "Not End of Turn", 1 for "End of Turn"
        - confidence: confidence score between 0 and 1
        """
        # Tokenize the input text
        inputs = self.tokenizer(
            text,
            truncation=True,
            max_length=self.max_length,
            return_tensors="np"
        )
        
        # Prepare the feed dictionary for the ONNX model
        feed_dict = {
            "input_ids": inputs["input_ids"],
            "attention_mask": inputs["attention_mask"]
        }
        
        # Run inference
        outputs = self.session.run(None, feed_dict)
        logits = outputs[0]

        probabilities = self._softmax(logits[0])
        predicted_label = np.argmax(probabilities)
        confidence = float(np.max(probabilities))

        return predicted_label, confidence

    def _softmax(self, x, axis=None):
        if axis is None:
            axis = -1
        exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
        return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

# --- Example Usage ---
if __name__ == "__main__":
    detector = TurnDetector()
    
    sentences = [
        "En el Neotrópico, e, crecen de forma silvestre alrededor de 790 especies.",      # Expected: End of Turn
        "Tres de la madrugada se conoce como tritio y contiene un protón y..." # Expected: Not End of Turn

    ]
    
    for sentence in sentences:
        predicted_label, confidence = detector.predict(sentence)
        result = "End of Turn" if predicted_label == 1 else "Not End of Turn"
        print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})")
        print("-" * 50)

```


## 🤖 VideoSDK Agents Integration

Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications.

```python
from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model

#download model
pre_download_namo_turn_v1_model(language="es")

# Initialize Spanish turn detector for VideoSDK Agents
turn_detector = NamoTurnDetectorV1(language="es")
```

> 📚 [**Complete Integration Guide**](https://docs.videosdk.live/ai_agents/plugins/namo-turn-detector) - Learn how to use `NamoTurnDetectorV1` with VideoSDK Agents

## 📖 Citation

```bibtex
@model{namo_turn_detector_es_2025,
  title={Namo Turn Detector v1: Spanish},
  author={VideoSDK Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Spanish},
  note={ONNX-optimized DistilBERT for turn detection in Spanish}
}
```

## 📄 License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

<div align="center">

**Made with ❤️ by the VideoSDK Team**

[![VideoSDK](https://img.shields.io/badge/VideoSDK-Live-blue)](https://videosdk.live)

</div>