---
language: zh
license: apache-2.0
library_name: onnxruntime
pipeline_tag: voice-activity-detection
tags:
- turn-detection
- end-of-utterance
- distilbert
- onnx
- quantized
- conversational-ai
- voice-assistant
- real-time
base_model: distilbert-base-multilingual-cased
datasets:
- videosdk-live/Namo-Turn-Detector-v1-Train
model-index:
- name: Namo Turn Detector v1 - Chinese
results:
- task:
type: text-classification
name: Turn Detection
dataset:
name: Namo Turn Detector v1 Test - Chinese
type: videosdk-live/Namo-Turn-Detector-v1-Test
split: train
metrics:
- type: accuracy
value: 0.887831
name: Accuracy
- type: f1
value: 0.897881
name: F1 Score
- type: precision
value: 0.842676
name: Precision
- type: recall
value: 0.960825
name: Recall
---
# 🎯 Namo Turn Detector v1 - Chinese
[](https://opensource.org/licenses/Apache-2.0)
[](https://onnx.ai/)
[](https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Chinese)
[]()
**🚀 Namo Turn Detection Model for Chinese**
---
## 📋 Overview
The **Namo Turn Detector** is a specialized AI model designed to solve one of the most challenging problems in conversational AI: **knowing when a user has finished speaking**.
This Chinese-specialist model uses advanced natural language understanding to distinguish between:
- ✅ **Complete utterances** (user is done speaking)
- 🔄 **Incomplete utterances** (user will continue speaking)
Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency.
## 🔑 Key Features
- **Turn Detection Specialist**: Detects end-of-turn vs. continuation in Chinese speech transcripts.
- **Low Latency**: Optimized with **quantized ONNX** for <13ms inference.
- **Robust Performance**: 88.8% accuracy on diverse Chinese utterances.
- **Easy Integration**: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK.
- **Enterprise Ready**: Supports real-time conversational AI and voice assistants.
## 📊 Performance Metrics
| Metric | Score |
|--------|-------|
| **🎯 Accuracy** | **88.78%** |
| **📈 F1-Score** | **89.78%** |
| **🎪 Precision** | **84.26%** |
| **🎭 Recall** | **96.08%** |
| **⚡ Latency** | **<13ms** |
| **💾 Model Size** | **~135MB** |
> 📊 *Evaluated on 900+ Chinese utterances from diverse conversational contexts*
## ⚡️ Speed Analysis
## 🔧 Train & Test Scripts
[](https://colab.research.google.com/drive/1DqSUYfcya0r2iAEZB9fS4mfrennubduV) [](https://colab.research.google.com/drive/19ZOlNoHS2WLX2V4r5r492tsCUnYLXnQR)
## 🛠️ Installation
To use this model, you will need to install the following libraries.
```bash
pip install onnxruntime transformers huggingface_hub
```
## 🚀 Quick Start
You can run inference directly from Hugging Face repository.
```python
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
class TurnDetector:
def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Chinese"):
"""
Initializes the detector by downloading the model and tokenizer
from the Hugging Face Hub.
"""
print(f"Loading model from repo: {repo_id}")
# Download the model and tokenizer from the Hub
# Authentication is handled automatically if you are logged in
model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx")
self.tokenizer = AutoTokenizer.from_pretrained(repo_id)
# Set up the ONNX Runtime inference session
self.session = ort.InferenceSession(model_path)
self.max_length = 512
print("✅ Model and tokenizer loaded successfully.")
def predict(self, text: str) -> tuple:
"""
Predicts if a given text utterance is the end of a turn.
Returns (predicted_label, confidence) where:
- predicted_label: 0 for "Not End of Turn", 1 for "End of Turn"
- confidence: confidence score between 0 and 1
"""
# Tokenize the input text
inputs = self.tokenizer(
text,
truncation=True,
max_length=self.max_length,
return_tensors="np"
)
# Prepare the feed dictionary for the ONNX model
feed_dict = {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
}
# Run inference
outputs = self.session.run(None, feed_dict)
logits = outputs[0]
probabilities = self._softmax(logits[0])
predicted_label = np.argmax(probabilities)
confidence = float(np.max(probabilities))
return predicted_label, confidence
def _softmax(self, x, axis=None):
if axis is None:
axis = -1
exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
return exp_x / np.sum(exp_x, axis=axis, keepdims=True)
# --- Example Usage ---
if __name__ == "__main__":
detector = TurnDetector()
sentences = [
"毛宗港平三国演义称曹操为三绝中的那个坚决", # Expected: End of Turn
"1852年 曾国藩受命在杭州组建湘军 镇压太平天国 就是", # Expected: Not End of Turn
]
for sentence in sentences:
predicted_label, confidence = detector.predict(sentence)
result = "End of Turn" if predicted_label == 1 else "Not End of Turn"
print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})")
print("-" * 50)
```
## 🤖 VideoSDK Agents Integration
Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications.
```python
from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model
#download model
pre_download_namo_turn_v1_model(language="zh")
# Initialize Chinese turn detector for VideoSDK Agents
turn_detector = NamoTurnDetectorV1(language="zh")
```
> 📚 [**Complete Integration Guide**](https://docs.videosdk.live/ai_agents/plugins/namo-turn-detector) - Learn how to use `NamoTurnDetectorV1` with VideoSDK Agents
## 📖 Citation
```bibtex
@model{namo_turn_detector_zh_2025,
title={Namo Turn Detector v1: Chinese},
author={VideoSDK Team},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Chinese},
note={ONNX-optimized DistilBERT for turn detection in Chinese}
}
```
## 📄 License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
**Made with ❤️ by the VideoSDK Team**
[](https://videosdk.live)