|
--- |
|
pipeline_tag: voice-activity-detection |
|
license: bsd-2-clause |
|
tags: |
|
- speech-processing |
|
- semantic-vad |
|
- multilingual |
|
datasets: |
|
- pipecat-ai/smart-turn-data-v3-train |
|
- pipecat-ai/smart-turn-data-v3-test |
|
--- |
|
|
|
# Smart Turn v3 |
|
|
|
**Smart Turn v3** is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript. |
|
|
|
## Links |
|
|
|
* [Blog post: Smart Turn v3](https://www.daily.co/blog/announcing-smart-turn-v3-with-cpu-inference-in-just-12ms/) |
|
* [GitHub repo](https://github.com/pipecat-ai/smart-turn) with training and inference code |
|
* [Datasets](https://github.com/pipecat-ai/datasets) with training and inference code |
|
|
|
|
|
## Model architecture |
|
|
|
* Backbone: Whisper Tiny encoder |
|
* Head: shallow linear classifier |
|
* Params: 8 M (int8) |
|
* Checkpoint: 8 MB ONNX |
|
|
|
|
|
## How to use |
|
|
|
Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat. |
|
|