| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						language: da | 
					
					
						
						| 
							 | 
						license: apache-2.0 | 
					
					
						
						| 
							 | 
						library_name: onnxruntime | 
					
					
						
						| 
							 | 
						pipeline_tag: voice-activity-detection | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- turn-detection | 
					
					
						
						| 
							 | 
						- end-of-utterance | 
					
					
						
						| 
							 | 
						- distilbert | 
					
					
						
						| 
							 | 
						- onnx | 
					
					
						
						| 
							 | 
						- quantized | 
					
					
						
						| 
							 | 
						- conversational-ai | 
					
					
						
						| 
							 | 
						- voice-assistant | 
					
					
						
						| 
							 | 
						- real-time | 
					
					
						
						| 
							 | 
						base_model: distilbert-base-multilingual-cased | 
					
					
						
						| 
							 | 
						datasets: | 
					
					
						
						| 
							 | 
						- videosdk-live/Namo-Turn-Detector-v1-Train | 
					
					
						
						| 
							 | 
						model-index: | 
					
					
						
						| 
							 | 
						- name: Namo Turn Detector v1 - Danish | 
					
					
						
						| 
							 | 
						  results: | 
					
					
						
						| 
							 | 
						  - task: | 
					
					
						
						| 
							 | 
						      type: text-classification | 
					
					
						
						| 
							 | 
						      name: Turn Detection | 
					
					
						
						| 
							 | 
						    dataset: | 
					
					
						
						| 
							 | 
						      name: Namo Turn Detector v1 Test - Danish | 
					
					
						
						| 
							 | 
						      type: videosdk-live/Namo-Turn-Detector-v1-Test | 
					
					
						
						| 
							 | 
						      split: train | 
					
					
						
						| 
							 | 
						    metrics: | 
					
					
						
						| 
							 | 
						    - type: accuracy | 
					
					
						
						| 
							 | 
						      value: 0.865212 | 
					
					
						
						| 
							 | 
						      name: Accuracy | 
					
					
						
						| 
							 | 
						    - type: f1 | 
					
					
						
						| 
							 | 
						      value: 0.868914 | 
					
					
						
						| 
							 | 
						      name: F1 Score | 
					
					
						
						| 
							 | 
						    - type: precision | 
					
					
						
						| 
							 | 
						      value: 0.852941 | 
					
					
						
						| 
							 | 
						      name: Precision | 
					
					
						
						| 
							 | 
						    - type: recall | 
					
					
						
						| 
							 | 
						      value: 0.885496 | 
					
					
						
						| 
							 | 
						      name: Recall | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# 🎯 Namo Turn Detector v1 - Danish | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<div align="center"> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						[](https://opensource.org/licenses/Apache-2.0) | 
					
					
						
						| 
							 | 
						[](https://onnx.ai/) | 
					
					
						
						| 
							 | 
						[](https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Danish) | 
					
					
						
						| 
							 | 
						[]() | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						**🚀 Namo Turn Detection Model for Danish** | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</div> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 📋 Overview | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						The **Namo Turn Detector** is a specialized AI model designed to solve one of the most challenging problems in conversational AI: **knowing when a user has finished speaking**.  | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						This Danish-specialist model uses advanced natural language understanding to distinguish between: | 
					
					
						
						| 
							 | 
						- ✅ **Complete utterances** (user is done speaking) | 
					
					
						
						| 
							 | 
						- 🔄 **Incomplete utterances** (user will continue speaking) | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 🔑 Key Features | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- **Turn Detection Specialist**: Detects end-of-turn vs. continuation in Danish speech transcripts.   | 
					
					
						
						| 
							 | 
						- **Low Latency**: Optimized with **quantized ONNX** for <12ms inference.   | 
					
					
						
						| 
							 | 
						- **Robust Performance**: 86.5% accuracy on diverse Danish utterances.   | 
					
					
						
						| 
							 | 
						- **Easy Integration**: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK.   | 
					
					
						
						| 
							 | 
						- **Enterprise Ready**: Supports real-time conversational AI and voice assistants.   | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 📊 Performance Metrics | 
					
					
						
						| 
							 | 
						<div> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						| Metric | Score | | 
					
					
						
						| 
							 | 
						|--------|-------| | 
					
					
						
						| 
							 | 
						| **🎯 Accuracy** | **86.52%** |  | 
					
					
						
						| 
							 | 
						| **📈 F1-Score** | **86.89%** | | 
					
					
						
						| 
							 | 
						| **🎪 Precision** | **85.29%** | | 
					
					
						
						| 
							 | 
						| **🎭 Recall** | **88.54%** | | 
					
					
						
						| 
							 | 
						| **⚡ Latency** | **<12ms** | | 
					
					
						
						| 
							 | 
						| **💾 Model Size** | **~135MB** | | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</div> | 
					
					
						
						| 
							 | 
						<img src="./confusion_matrices.png" alt="Alt text" width="600" height="400"/> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						> 📊 *Evaluated on 700+ Danish utterances from diverse conversational contexts* | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## ⚡️ Speed Analysis | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<img src="./performance_analysis.png" alt="Alt text" width="600" height="400"/> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 🔧 Train & Test Scripts | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<div align="center"> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						[](https://colab.research.google.com/drive/1DqSUYfcya0r2iAEZB9fS4mfrennubduV) [](https://colab.research.google.com/drive/19ZOlNoHS2WLX2V4r5r492tsCUnYLXnQR) | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</div> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 🛠️ Installation | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						To use this model, you will need to install the following libraries. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```bash | 
					
					
						
						| 
							 | 
						pip install onnxruntime transformers huggingface_hub | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 🚀 Quick Start | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						You can run inference directly from Hugging Face repository. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						import numpy as np | 
					
					
						
						| 
							 | 
						import onnxruntime as ort | 
					
					
						
						| 
							 | 
						from transformers import AutoTokenizer | 
					
					
						
						| 
							 | 
						from huggingface_hub import hf_hub_download | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						class TurnDetector: | 
					
					
						
						| 
							 | 
						    def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Danish"): | 
					
					
						
						| 
							 | 
						        """ | 
					
					
						
						| 
							 | 
						        Initializes the detector by downloading the model and tokenizer | 
					
					
						
						| 
							 | 
						        from the Hugging Face Hub. | 
					
					
						
						| 
							 | 
						        """ | 
					
					
						
						| 
							 | 
						        print(f"Loading model from repo: {repo_id}") | 
					
					
						
						| 
							 | 
						         | 
					
					
						
						| 
							 | 
						        # Download the model and tokenizer from the Hub | 
					
					
						
						| 
							 | 
						        # Authentication is handled automatically if you are logged in | 
					
					
						
						| 
							 | 
						        model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx") | 
					
					
						
						| 
							 | 
						        self.tokenizer = AutoTokenizer.from_pretrained(repo_id) | 
					
					
						
						| 
							 | 
						         | 
					
					
						
						| 
							 | 
						        # Set up the ONNX Runtime inference session | 
					
					
						
						| 
							 | 
						        self.session = ort.InferenceSession(model_path) | 
					
					
						
						| 
							 | 
						        self.max_length = 512 | 
					
					
						
						| 
							 | 
						        print("✅ Model and tokenizer loaded successfully.") | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						    def predict(self, text: str) -> tuple: | 
					
					
						
						| 
							 | 
						        """ | 
					
					
						
						| 
							 | 
						        Predicts if a given text utterance is the end of a turn. | 
					
					
						
						| 
							 | 
						        Returns (predicted_label, confidence) where: | 
					
					
						
						| 
							 | 
						        - predicted_label: 0 for "Not End of Turn", 1 for "End of Turn" | 
					
					
						
						| 
							 | 
						        - confidence: confidence score between 0 and 1 | 
					
					
						
						| 
							 | 
						        """ | 
					
					
						
						| 
							 | 
						        # Tokenize the input text | 
					
					
						
						| 
							 | 
						        inputs = self.tokenizer( | 
					
					
						
						| 
							 | 
						            text, | 
					
					
						
						| 
							 | 
						            truncation=True, | 
					
					
						
						| 
							 | 
						            max_length=self.max_length, | 
					
					
						
						| 
							 | 
						            return_tensors="np" | 
					
					
						
						| 
							 | 
						        ) | 
					
					
						
						| 
							 | 
						         | 
					
					
						
						| 
							 | 
						        # Prepare the feed dictionary for the ONNX model | 
					
					
						
						| 
							 | 
						        feed_dict = { | 
					
					
						
						| 
							 | 
						            "input_ids": inputs["input_ids"], | 
					
					
						
						| 
							 | 
						            "attention_mask": inputs["attention_mask"] | 
					
					
						
						| 
							 | 
						        } | 
					
					
						
						| 
							 | 
						         | 
					
					
						
						| 
							 | 
						        # Run inference | 
					
					
						
						| 
							 | 
						        outputs = self.session.run(None, feed_dict) | 
					
					
						
						| 
							 | 
						        logits = outputs[0] | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						        probabilities = self._softmax(logits[0]) | 
					
					
						
						| 
							 | 
						        predicted_label = np.argmax(probabilities) | 
					
					
						
						| 
							 | 
						        confidence = float(np.max(probabilities)) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						        return predicted_label, confidence | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						    def _softmax(self, x, axis=None): | 
					
					
						
						| 
							 | 
						        if axis is None: | 
					
					
						
						| 
							 | 
						            axis = -1 | 
					
					
						
						| 
							 | 
						        exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True)) | 
					
					
						
						| 
							 | 
						        return exp_x / np.sum(exp_x, axis=axis, keepdims=True) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# --- Example Usage --- | 
					
					
						
						| 
							 | 
						if __name__ == "__main__": | 
					
					
						
						| 
							 | 
						    detector = TurnDetector() | 
					
					
						
						| 
							 | 
						     | 
					
					
						
						| 
							 | 
						    sentences = [ | 
					
					
						
						| 
							 | 
						        "Kan du tilgive dig selv, når du har begået en fejl?",      # Expected: Not End of Turn | 
					
					
						
						| 
							 | 
						        "Store temperaturintervaller er." # Expected: End of Turn | 
					
					
						
						| 
							 | 
						    ] | 
					
					
						
						| 
							 | 
						     | 
					
					
						
						| 
							 | 
						    for sentence in sentences: | 
					
					
						
						| 
							 | 
						        predicted_label, confidence = detector.predict(sentence) | 
					
					
						
						| 
							 | 
						        result = "End of Turn" if predicted_label == 1 else "Not End of Turn" | 
					
					
						
						| 
							 | 
						        print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})") | 
					
					
						
						| 
							 | 
						        print("-" * 50) | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 🤖 VideoSDK Agents Integration | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						#download model | 
					
					
						
						| 
							 | 
						pre_download_namo_turn_v1_model(language="da") | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# Initialize Danish turn detector for VideoSDK Agents | 
					
					
						
						| 
							 | 
						turn_detector = NamoTurnDetectorV1(language="da") | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						> 📚 [**Complete Integration Guide**](https://docs.videosdk.live/ai_agents/plugins/namo-turn-detector) - Learn how to use `NamoTurnDetectorV1` with VideoSDK Agents | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 📖 Citation | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						```bibtex | 
					
					
						
						| 
							 | 
						@model{namo_turn_detector_da_2025, | 
					
					
						
						| 
							 | 
						  title={Namo Turn Detector v1: Danish}, | 
					
					
						
						| 
							 | 
						  author={VideoSDK Team}, | 
					
					
						
						| 
							 | 
						  year={2025}, | 
					
					
						
						| 
							 | 
						  publisher={Hugging Face}, | 
					
					
						
						| 
							 | 
						  url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Danish}, | 
					
					
						
						| 
							 | 
						  note={ONNX-optimized DistilBERT for turn detection in Danish} | 
					
					
						
						| 
							 | 
						} | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## 📄 License | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						<div align="center"> | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						**Made with ❤️ by the VideoSDK Team** | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						[](https://videosdk.live) | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						</div> |