File size: 7,550 Bytes
0d2e41a
 
 
 
d6ec32a
0d2e41a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d487e87
0d2e41a
 
d487e87
 
 
0d2e41a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d487e87
 
 
 
 
 
 
0d2e41a
e5f1ab3
 
 
 
 
 
0d2e41a
 
 
 
 
 
 
 
 
 
 
d487e87
 
 
 
0d2e41a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
---
language: es
license: apache-2.0
library_name: onnxruntime
pipeline_tag: voice-activity-detection
tags:
- turn-detection
- end-of-utterance
- distilbert
- onnx
- quantized
- conversational-ai
- voice-assistant
- real-time
base_model: distilbert-base-multilingual-cased
datasets:
- videosdk-live/Namo-Turn-Detector-v1-Train
model-index:
- name: Namo Turn Detector v1 - Spanish
  results:
  - task:
      type: text-classification
      name: Turn Detection
    dataset:
      name: Namo Turn Detector v1 Test - Spanish
      type: videosdk-live/Namo-Turn-Detector-v1-Test
      split: train
    metrics:
    - type: accuracy
      value: 0.867181
      name: Accuracy
    - type: f1
      value: 0.878187
      name: F1 Score
    - type: precision
      value: 0.789809
      name: Precision
    - type: recall
      value: 0.988836
      name: Recall
---

# 🎯 Namo Turn Detector v1 - Spanish

<div align="center">

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![ONNX](https://img.shields.io/badge/ONNX-Optimized-brightgreen)](https://onnx.ai/)
[![Model Size](https://img.shields.io/badge/Model%20Size-~136M-orange)](https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Spanish)
[![Inference Speed](https://img.shields.io/badge/Inference-<12ms-red)]()

**πŸš€ Namo Turn Detection Model for Spanish**

</div>

---

## πŸ“‹ Overview

The **Namo Turn Detector** is a specialized AI model designed to solve one of the most challenging problems in conversational AI: **knowing when a user has finished speaking**. 

This Spanish-specialist model uses advanced natural language understanding to distinguish between:
- βœ… **Complete utterances** (user is done speaking)
- πŸ”„ **Incomplete utterances** (user will continue speaking)

Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency.

## πŸ”‘ Key Features

- **Turn Detection Specialist**: Detects end-of-turn vs. continuation in Spanish speech transcripts.  
- **Low Latency**: Optimized with **quantized ONNX** for <12ms inference.  
- **Robust Performance**: 86.7% accuracy on diverse Spanish utterances.  
- **Easy Integration**: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK.  
- **Enterprise Ready**: Supports real-time conversational AI and voice assistants.  

## πŸ“Š Performance Metrics
<div>

| Metric | Score |
|--------|-------|
| **🎯 Accuracy** | **86.71%** | 
| **πŸ“ˆ F1-Score** | **87.81%** |
| **πŸŽͺ Precision** | **78.98%** |
| **🎭 Recall** | **98.88%** |
| **⚑ Latency** | **<12ms** |
| **πŸ’Ύ Model Size** | **~135MB** |

</div>
<img src="./confusion_matrices.png" alt="Alt text" width="600" height="400"/>

> πŸ“Š *Evaluated on 1200+ Spanish utterances from diverse conversational contexts*

## ⚑️ Speed Analysis

<img src="./performance_analysis.png" alt="Alt text" width="600" height="400"/>

## πŸ”§ Train & Test Scripts

<div align="center">

[![Train Script](https://img.shields.io/badge/Colab-Train%20Script-brightgreen?logo=google-colab)](https://colab.research.google.com/drive/1DqSUYfcya0r2iAEZB9fS4mfrennubduV) [![Test Script](https://img.shields.io/badge/Colab-Test%20Script-blue?logo=google-colab)](https://colab.research.google.com/drive/19ZOlNoHS2WLX2V4r5r492tsCUnYLXnQR)

</div>

## πŸ› οΈ Installation

To use this model, you will need to install the following libraries.

```bash
pip install onnxruntime transformers huggingface_hub
```

## πŸš€ Quick Start

You can run inference directly from Hugging Face repository.

```python
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

class TurnDetector:
    def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Spanish"):
        """
        Initializes the detector by downloading the model and tokenizer
        from the Hugging Face Hub.
        """
        print(f"Loading model from repo: {repo_id}")
        
        # Download the model and tokenizer from the Hub
        # Authentication is handled automatically if you are logged in
        model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx")
        self.tokenizer = AutoTokenizer.from_pretrained(repo_id)
        
        # Set up the ONNX Runtime inference session
        self.session = ort.InferenceSession(model_path)
        self.max_length = 512
        print("βœ… Model and tokenizer loaded successfully.")

    def predict(self, text: str) -> tuple:
        """
        Predicts if a given text utterance is the end of a turn.
        Returns (predicted_label, confidence) where:
        - predicted_label: 0 for "Not End of Turn", 1 for "End of Turn"
        - confidence: confidence score between 0 and 1
        """
        # Tokenize the input text
        inputs = self.tokenizer(
            text,
            truncation=True,
            max_length=self.max_length,
            return_tensors="np"
        )
        
        # Prepare the feed dictionary for the ONNX model
        feed_dict = {
            "input_ids": inputs["input_ids"],
            "attention_mask": inputs["attention_mask"]
        }
        
        # Run inference
        outputs = self.session.run(None, feed_dict)
        logits = outputs[0]

        probabilities = self._softmax(logits[0])
        predicted_label = np.argmax(probabilities)
        confidence = float(np.max(probabilities))

        return predicted_label, confidence

    def _softmax(self, x, axis=None):
        if axis is None:
            axis = -1
        exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
        return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

# --- Example Usage ---
if __name__ == "__main__":
    detector = TurnDetector()
    
    sentences = [
        "En el NeotrΓ³pico, e, crecen de forma silvestre alrededor de 790 especies.",      # Expected: End of Turn
        "Tres de la madrugada se conoce como tritio y contiene un protΓ³n y..." # Expected: Not End of Turn

    ]
    
    for sentence in sentences:
        predicted_label, confidence = detector.predict(sentence)
        result = "End of Turn" if predicted_label == 1 else "Not End of Turn"
        print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})")
        print("-" * 50)

```


## πŸ€– VideoSDK Agents Integration

Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications.

```python
from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model

#download model
pre_download_namo_turn_v1_model(language="es")

# Initialize Spanish turn detector for VideoSDK Agents
turn_detector = NamoTurnDetectorV1(language="es")
```

> πŸ“š [**Complete Integration Guide**](https://docs.videosdk.live/ai_agents/plugins/namo-turn-detector) - Learn how to use `NamoTurnDetectorV1` with VideoSDK Agents

## πŸ“– Citation

```bibtex
@model{namo_turn_detector_es_2025,
  title={Namo Turn Detector v1: Spanish},
  author={VideoSDK Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Spanish},
  note={ONNX-optimized DistilBERT for turn detection in Spanish}
}
```

## πŸ“„ License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

<div align="center">

**Made with ❀️ by the VideoSDK Team**

[![VideoSDK](https://img.shields.io/badge/VideoSDK-Live-blue)](https://videosdk.live)

</div>