File size: 4,570 Bytes
d409409
 
 
 
 
 
 
 
 
 
 
 
1a7848a
 
 
 
 
 
 
 
 
 
 
d409409
bf5d140
27995f9
85e4a61
bf5d140
 
27995f9
d409409
 
 
 
27995f9
d409409
27995f9
d409409
 
53219e2
d409409
53219e2
d409409
53219e2
d409409
 
27995f9
d409409
27995f9
d409409
 
 
 
27995f9
d409409
27995f9
d409409
 
 
 
 
27995f9
d409409
27995f9
d409409
27995f9
d409409
 
27995f9
d409409
 
 
 
 
27995f9
d409409
 
27995f9
d409409
 
 
 
 
e4c4754
27995f9
d409409
27995f9
d409409
27995f9
d409409
 
 
 
 
27995f9
d409409
27995f9
d409409
27995f9
d409409
 
 
 
 
 
27995f9
d409409
27995f9
d409409
 
 
 
27995f9
d409409
27995f9
d409409
 
27995f9
d409409
 
 
 
27995f9
d409409
27995f9
d409409
 
 
 
 
27995f9
d409409
27995f9
d409409
27995f9
d409409
 
 
 
 
27995f9
 
d409409
27995f9
d409409
27995f9
d409409
 
 
 
 
 
 
 
 
27995f9
 
d409409
27995f9
d409409
 
 
 
27995f9
d409409
 
27995f9
d409409
 
 
 
 
 
27995f9
d409409
27995f9
d409409
 
 
27995f9
d409409
 
 
 
27995f9
d409409
27995f9
d409409
 
 
 
 
 
 
27995f9
d409409
27995f9
d409409
 
27995f9
d409409
 
1a7848a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
license: mit
tags:
- audio
- voice-activity-detection
- coreml
- silero
- speech
- ios
- macos
- swift
library_name: coreml
pipeline_tag: voice-activity-detection
datasets:
- alexwengg/musan_mini50
- alexwengg/musan_mini100
metrics:
- accuracy
- f1
language:
- en
base_model:
- onnx-community/silero-vad
---


# **<span style="color:#5DAF8D">🧃 CoreML Silero VAD </span>**
[![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe)
[![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/FluidAudio?style=flat&logo=github)](https://github.com/FluidInference/FluidAudio)

A CoreML implementation of the Silero Voice Activity
Detection (VAD) model, optimized for Apple platforms
(iOS/macOS). This repository contains pre-converted
CoreML models ready for use in Swift applications.

## Model Description

**Developed by:** Silero Team (original), converted by
FluidAudio

**Model type:** Voice Activity Detection

**License:** MIT

**Parent Model:**
[silero-vad](https://github.com/snakers4/silero-vad)

### Model Details

- **Architecture:** STFT + Encoder + RNN Decoder pipeline
- **Input:** 16kHz mono audio chunks (512 samples / 32ms)
- **Output:** Voice activity probability (0.0-1.0)
- **Memory:** ~2MB total model size

## Intended Use

### Primary Use Cases
- Real-time voice activity detection in iOS/macOS
applications
- Speech preprocessing for ASR systems
- Audio segmentation and filtering

## How to Use

### Swift Integration

```swift
import FluidAudio

let config = VADConfig(
    threshold: 0.3,
    chunkSize: 512, // 512 being the most optimal
    sampleRate: 16000
)

let vadManager = VADManager(config: config)
try await vadManager.initialize()

// Process audio chunk
let result = try await
vadManager.processChunk(audioChunk)
print("Voice probability: \(result.probability)")
print("Is voice active: \(result.isVoiceActive)")
```

Installation

Add FluidAudio to your Swift project:

dependencies: [
    .package(url:
"https://github.com/FluidAudio/FluidAudioSwift.git",
from: "1.0.0")
]

Performance

Benchmarks on Apple Silicon (M1/M2)

| Metric           | Value               |
|------------------|---------------------|
| Latency          | <2ms per 32ms chunk |
| Real-time Factor | 0.02x               |
| Memory Usage     | ~15MB               |
| CPU Usage        | <5% (single core)   |

Accuracy Metrics

Evaluated on common speech datasets:
- Precision: 94.2%
- Recall: 92.8%
- F1-Score: 93.5%

Model Files

This repository contains three CoreML models that work
together:

- silero_stft.mlmodel (650KB) - STFT feature extraction
- silero_encoder.mlmodel (254KB) - Feature encoding
- silero_rnn_decoder.mlmodel (527KB) - RNN-based
classification

Training Data

The original Silero VAD model was trained on a diverse
dataset including:
- Clean speech audio
- Noisy speech with various background conditions
- Music and non-speech audio for negative samples

Limitations and Bias

Known Limitations

- Optimized for 16kHz sample rate (other rates may reduce
 accuracy)
- May struggle with very quiet speech (<-30dB SNR)
- Performance varies with microphone quality and
recording conditions


Technical Details

Model Architecture

Audio Input (512 samples, 16kHz)
    ↓
STFT Model (spectral features)
    ↓
Encoder Model (feature compression)
    ↓
RNN Decoder (temporal modeling)
    ↓
Voice Probability Output


Citation

@misc{silero-vad-coreml,
  title={CoreML Silero VAD},
  author={FluidAudio Team},
  year={2024},

url={https://huggingface.co/alexwengg/coreml-silero-vad}
}

@misc{silero-vad,
  title={Silero VAD},
  author={Silero Team},
  year={2021},
  url={https://github.com/snakers4/silero-vad}
}

Related Models

Check out other CoreML audio models in the
https://huggingface.co/collections/bweng/coreml-685b12fd2
51f80552c08e2b9:

- https://huggingface.co/alexwengg/coreml_speaker_diariza
tion - Identify "who spoke when"
- https://huggingface.co/collections/bweng/coreml-685b12f
d251f80552c08e2b9 - Speech-to-text for Apple platforms

Repository and Support

- GitHub: https://github.com/FluidAudio/FluidAudioSwift
- Documentation:
https://github.com/FluidAudio/FluidAudioSwift/wiki
- Issues:
https://github.com/FluidAudio/FluidAudioSwift/issues
- Community:
https://github.com/FluidAudio/FluidAudioSwift/discussions

License

This project is licensed under the MIT License - see the
LICENSE file for details.

The original Silero VAD model is also under MIT license.
See https://github.com/snakers4/silero-vad/blob/master/LI
CENSE for details.