File size: 7,514 Bytes
8e6b2db
9dc48e6
8e6b2db
 
9dc48e6
8e6b2db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9dc48e6
8e6b2db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bc408c
8e6b2db
 
0bc408c
 
 
8e6b2db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bc408c
 
 
 
 
 
 
8e6b2db
83ebd0e
 
 
 
 
 
8e6b2db
 
 
 
 
 
 
 
 
 
0bc408c
 
 
 
8e6b2db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
language: 'no'
license: apache-2.0
library_name: onnxruntime
pipeline_tag: voice-activity-detection
tags:
- turn-detection
- end-of-utterance
- distilbert
- onnx
- quantized
- conversational-ai
- voice-assistant
- real-time
base_model: distilbert-base-multilingual-cased
datasets:
- videosdk-live/Namo-Turn-Detector-v1-Train
model-index:
- name: Namo Turn Detector v1 - Norwegian
  results:
  - task:
      type: text-classification
      name: Turn Detection
    dataset:
      name: Namo Turn Detector v1 Test - Norwegian
      type: videosdk-live/Namo-Turn-Detector-v1-Test
      split: train
    metrics:
    - type: accuracy
      value: 0.873482
      name: Accuracy
    - type: f1
      value: 0.882739
      name: F1 Score
    - type: precision
      value: 0.83496
      name: Precision
    - type: recall
      value: 0.936318
      name: Recall
---

# 🎯 Namo Turn Detector v1 - Norwegian

<div align="center">

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![ONNX](https://img.shields.io/badge/ONNX-Optimized-brightgreen)](https://onnx.ai/)
[![Model Size](https://img.shields.io/badge/Model%20Size-~136M-orange)](https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Norwegian)
[![Inference Speed](https://img.shields.io/badge/Inference-<12ms-red)]()

**πŸš€ Namo Turn Detection Model for Norwegian**

</div>

---

## πŸ“‹ Overview

The **Namo Turn Detector** is a specialized AI model designed to solve one of the most challenging problems in conversational AI: **knowing when a user has finished speaking**. 

This Norwegian-specialist model uses advanced natural language understanding to distinguish between:
- βœ… **Complete utterances** (user is done speaking)
- πŸ”„ **Incomplete utterances** (user will continue speaking)

Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency.

## πŸ”‘ Key Features

- **Turn Detection Specialist**: Detects end-of-turn vs. continuation in Norwegian speech transcripts.  
- **Low Latency**: Optimized with **quantized ONNX** for <12ms inference.  
- **Robust Performance**: 87.3% accuracy on diverse Norwegian utterances.  
- **Easy Integration**: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK.  
- **Enterprise Ready**: Supports real-time conversational AI and voice assistants.  

## πŸ“Š Performance Metrics
<div>

| Metric | Score |
|--------|-------|
| **🎯 Accuracy** | **87.34%** | 
| **πŸ“ˆ F1-Score** | **88.27%** |
| **πŸŽͺ Precision** | **83.49%** |
| **🎭 Recall** | **93.63%** |
| **⚑ Latency** | **<12ms** |
| **πŸ’Ύ Model Size** | **~135MB** |

</div>
<img src="./confusion_matrices.png" alt="Alt text" width="600" height="400"/>

> πŸ“Š *Evaluated on 1500+ Norwegian utterances from diverse conversational contexts*

## ⚑️ Speed Analysis

<img src="./performance_analysis.png" alt="Alt text" width="600" height="400"/>

## πŸ”§ Train & Test Scripts

<div align="center">

[![Train Script](https://img.shields.io/badge/Colab-Train%20Script-brightgreen?logo=google-colab)](https://colab.research.google.com/drive/1DqSUYfcya0r2iAEZB9fS4mfrennubduV) [![Test Script](https://img.shields.io/badge/Colab-Test%20Script-blue?logo=google-colab)](https://colab.research.google.com/drive/19ZOlNoHS2WLX2V4r5r492tsCUnYLXnQR)

</div>

## πŸ› οΈ Installation

To use this model, you will need to install the following libraries.

```bash
pip install onnxruntime transformers huggingface_hub
```

## πŸš€ Quick Start

You can run inference directly from Hugging Face repository.

```python
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

class TurnDetector:
    def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Norwegian"):
        """
        Initializes the detector by downloading the model and tokenizer
        from the Hugging Face Hub.
        """
        print(f"Loading model from repo: {repo_id}")
        
        # Download the model and tokenizer from the Hub
        # Authentication is handled automatically if you are logged in
        model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx")
        self.tokenizer = AutoTokenizer.from_pretrained(repo_id)
        
        # Set up the ONNX Runtime inference session
        self.session = ort.InferenceSession(model_path)
        self.max_length = 512
        print("βœ… Model and tokenizer loaded successfully.")

    def predict(self, text: str) -> tuple:
        """
        Predicts if a given text utterance is the end of a turn.
        Returns (predicted_label, confidence) where:
        - predicted_label: 0 for "Not End of Turn", 1 for "End of Turn"
        - confidence: confidence score between 0 and 1
        """
        # Tokenize the input text
        inputs = self.tokenizer(
            text,
            truncation=True,
            max_length=self.max_length,
            return_tensors="np"
        )
        
        # Prepare the feed dictionary for the ONNX model
        feed_dict = {
            "input_ids": inputs["input_ids"],
            "attention_mask": inputs["attention_mask"]
        }
        
        # Run inference
        outputs = self.session.run(None, feed_dict)
        logits = outputs[0]

        probabilities = self._softmax(logits[0])
        predicted_label = np.argmax(probabilities)
        confidence = float(np.max(probabilities))

        return predicted_label, confidence

    def _softmax(self, x, axis=None):
        if axis is None:
            axis = -1
        exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
        return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

# --- Example Usage ---
if __name__ == "__main__":
    detector = TurnDetector()
    
    sentences = [
        "Noen typer korn er sunnere enn andre.",      # Expected: End of Turn
        "Euklidts elementer ble en ofte brukt, vel?" # Expected: Not End of Turn
    ]
    
    for sentence in sentences:
        predicted_label, confidence = detector.predict(sentence)
        result = "End of Turn" if predicted_label == 1 else "Not End of Turn"
        print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})")
        print("-" * 50)

```


## πŸ€– VideoSDK Agents Integration

Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications.

```python
from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model

#download model
pre_download_namo_turn_v1_model(language="no")

# Initialize Norwegian turn detector for VideoSDK Agents
turn_detector = NamoTurnDetectorV1(language="no")
```

> πŸ“š [**Complete Integration Guide**](https://docs.videosdk.live/ai_agents/plugins/namo-turn-detector) - Learn how to use `NamoTurnDetectorV1` with VideoSDK Agents

## πŸ“– Citation

```bibtex
@model{namo_turn_detector_no_2025,
  title={Namo Turn Detector v1: Norwegian},
  author={VideoSDK Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Norwegian},
  note={ONNX-optimized DistilBERT for turn detection in Norwegian}
}
```

## πŸ“„ License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

<div align="center">

**Made with ❀️ by the VideoSDK Team**

[![VideoSDK](https://img.shields.io/badge/VideoSDK-Live-blue)](https://videosdk.live)

</div>