Spaces:
Running
Running
Commit
·
41b6e84
1
Parent(s):
f64b107
fixed the protobuf issue
Browse files
Dockerfile
CHANGED
@@ -51,7 +51,7 @@ WORKDIR /app
|
|
51 |
|
52 |
# App files
|
53 |
COPY --chown=user pyproject.toml uv.lock \
|
54 |
-
LICENSE README.md
|
55 |
./
|
56 |
COPY --chown=user src/ ./src/
|
57 |
COPY --chown=user examples/${EXAMPLE_NAME} ./examples/${EXAMPLE_NAME}
|
|
|
51 |
|
52 |
# App files
|
53 |
COPY --chown=user pyproject.toml uv.lock \
|
54 |
+
LICENSE README.md \
|
55 |
./
|
56 |
COPY --chown=user src/ ./src/
|
57 |
COPY --chown=user examples/${EXAMPLE_NAME} ./examples/${EXAMPLE_NAME}
|
NVIDIA_PIPECAT.md
CHANGED
@@ -2,4 +2,4 @@
|
|
2 |
|
3 |
The NVIDIA Pipecat library augments [the Pipecat framework](https://github.com/pipecat-ai/pipecat) by adding additional frame processors and services, as well as new multimodal frames to facilitate the creation of human-avatar interactions. This includes the integration of NVIDIA services and NIMs such as [NVIDIA Riva](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html), [NVIDIA Audio2Face](https://build.nvidia.com/nvidia/audio2face-3d), and [NVIDIA Foundational RAG](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline). It also introduces a few processors with a focus on improving the end-user experience for multimodal conversational agents, along with speculative speech processing to reduce latency for faster bot responses.
|
4 |
|
5 |
-
The nvidia-pipecat source code can be found in [the GitHub repository](https://github.com/NVIDIA/ace-controller). Follow [the documentation](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html) for more details.
|
|
|
2 |
|
3 |
The NVIDIA Pipecat library augments [the Pipecat framework](https://github.com/pipecat-ai/pipecat) by adding additional frame processors and services, as well as new multimodal frames to facilitate the creation of human-avatar interactions. This includes the integration of NVIDIA services and NIMs such as [NVIDIA Riva](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html), [NVIDIA Audio2Face](https://build.nvidia.com/nvidia/audio2face-3d), and [NVIDIA Foundational RAG](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline). It also introduces a few processors with a focus on improving the end-user experience for multimodal conversational agents, along with speculative speech processing to reduce latency for faster bot responses.
|
4 |
|
5 |
+
The nvidia-pipecat source code can be found in [the GitHub repository](https://github.com/NVIDIA/ace-controller). Follow [the documentation](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html) for more details.
|
examples/voice_agent_multi_thread/DOCKER_DEPLOYMENT.md
DELETED
@@ -1,322 +0,0 @@
|
|
1 |
-
# Docker Deployment - Multi-Threaded Voice Agent
|
2 |
-
|
3 |
-
## Overview
|
4 |
-
|
5 |
-
This Docker container runs the complete multi-threaded telco voice agent stack:
|
6 |
-
- **LangGraph Server** (`langgraph dev`) on port 2024
|
7 |
-
- **Pipecat Pipeline** (FastAPI + WebRTC) on port 7860
|
8 |
-
- **React UI** served at `http://localhost:7860`
|
9 |
-
|
10 |
-
## Quick Start
|
11 |
-
|
12 |
-
### Build the Image
|
13 |
-
|
14 |
-
```bash
|
15 |
-
# From project root
|
16 |
-
docker build -t voice-agent-multi-thread .
|
17 |
-
```
|
18 |
-
|
19 |
-
### Run the Container
|
20 |
-
|
21 |
-
```bash
|
22 |
-
docker run -p 7860:7860 \
|
23 |
-
-e RIVA_API_KEY=your_nvidia_api_key \
|
24 |
-
-e NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409 \
|
25 |
-
-e NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d \
|
26 |
-
voice-agent-multi-thread
|
27 |
-
```
|
28 |
-
|
29 |
-
### Access the UI
|
30 |
-
|
31 |
-
Open your browser to: **http://localhost:7860**
|
32 |
-
|
33 |
-
## What Happens Inside the Container
|
34 |
-
|
35 |
-
The `start.sh` script orchestrates two processes:
|
36 |
-
|
37 |
-
### 1. LangGraph Server (Port 2024)
|
38 |
-
```bash
|
39 |
-
cd /app/examples/voice_agent_multi_thread/agents
|
40 |
-
uv run langgraph dev --no-browser --host 0.0.0.0 --port 2024
|
41 |
-
```
|
42 |
-
|
43 |
-
This runs the multi-threaded telco agent with:
|
44 |
-
- Main thread for long operations
|
45 |
-
- Secondary thread for interim queries
|
46 |
-
- Store-based coordination
|
47 |
-
|
48 |
-
### 2. Pipecat Pipeline (Port 7860)
|
49 |
-
```bash
|
50 |
-
cd /app/examples/voice_agent_multi_thread
|
51 |
-
uv run pipeline.py
|
52 |
-
```
|
53 |
-
|
54 |
-
This runs the voice pipeline with:
|
55 |
-
- WebRTC transport
|
56 |
-
- RIVA ASR (speech-to-text)
|
57 |
-
- LangGraphLLMService (multi-threaded routing)
|
58 |
-
- RIVA TTS (text-to-speech)
|
59 |
-
- React UI
|
60 |
-
|
61 |
-
## Environment Variables
|
62 |
-
|
63 |
-
### Required
|
64 |
-
|
65 |
-
```bash
|
66 |
-
# NVIDIA API Key for RIVA services
|
67 |
-
RIVA_API_KEY=nvapi-xxxxx
|
68 |
-
```
|
69 |
-
|
70 |
-
### Optional
|
71 |
-
|
72 |
-
```bash
|
73 |
-
# LangGraph Configuration
|
74 |
-
LANGGRAPH_HOST=0.0.0.0
|
75 |
-
LANGGRAPH_PORT=2024
|
76 |
-
LANGGRAPH_ASSISTANT=telco-agent
|
77 |
-
|
78 |
-
# User Configuration
|
79 |
-
USER_EMAIL=user@example.com
|
80 |
-
|
81 |
-
# ASR Configuration
|
82 |
-
NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
|
83 |
-
RIVA_ASR_LANGUAGE=en-US
|
84 |
-
RIVA_ASR_MODEL=parakeet-1.1b-en-US-asr-streaming-silero-vad-asr-bls-ensemble
|
85 |
-
|
86 |
-
# TTS Configuration
|
87 |
-
NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
|
88 |
-
RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1
|
89 |
-
RIVA_TTS_MODEL=magpie_tts_ensemble-Magpie-ZeroShot
|
90 |
-
RIVA_TTS_LANGUAGE=en-US
|
91 |
-
|
92 |
-
# Zero-shot audio prompt (optional)
|
93 |
-
ZERO_SHOT_AUDIO_PROMPT_URL=https://github.com/your-repo/audio-prompt.wav
|
94 |
-
|
95 |
-
# Multi-threading (default: true)
|
96 |
-
ENABLE_MULTI_THREADING=true
|
97 |
-
|
98 |
-
# Debug
|
99 |
-
LANGGRAPH_DEBUG_STREAM=false
|
100 |
-
```
|
101 |
-
|
102 |
-
## Docker Compose
|
103 |
-
|
104 |
-
Create `docker-compose.yml`:
|
105 |
-
|
106 |
-
```yaml
|
107 |
-
version: '3.8'
|
108 |
-
|
109 |
-
services:
|
110 |
-
voice-agent:
|
111 |
-
build: .
|
112 |
-
ports:
|
113 |
-
- "7860:7860"
|
114 |
-
environment:
|
115 |
-
- RIVA_API_KEY=${RIVA_API_KEY}
|
116 |
-
- NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
|
117 |
-
- NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
|
118 |
-
- USER_EMAIL=user@example.com
|
119 |
-
- LANGGRAPH_ASSISTANT=telco-agent
|
120 |
-
- ENABLE_MULTI_THREADING=true
|
121 |
-
volumes:
|
122 |
-
# Optional: mount .env file
|
123 |
-
- ./examples/voice_agent_multi_thread/.env:/app/examples/voice_agent_multi_thread/.env:ro
|
124 |
-
# Optional: persist audio recordings
|
125 |
-
- ./audio_dumps:/app/examples/voice_agent_multi_thread/audio_dumps
|
126 |
-
healthcheck:
|
127 |
-
test: ["CMD", "curl", "-f", "http://localhost:7860/get_prompt"]
|
128 |
-
interval: 30s
|
129 |
-
timeout: 10s
|
130 |
-
retries: 3
|
131 |
-
start_period: 60s
|
132 |
-
```
|
133 |
-
|
134 |
-
Run with:
|
135 |
-
```bash
|
136 |
-
docker-compose up
|
137 |
-
```
|
138 |
-
|
139 |
-
## Using .env File
|
140 |
-
|
141 |
-
Create `.env` in `examples/voice_agent_multi_thread/`:
|
142 |
-
|
143 |
-
```bash
|
144 |
-
# NVIDIA API Keys
|
145 |
-
RIVA_API_KEY=nvapi-xxxxx
|
146 |
-
|
147 |
-
# LangGraph
|
148 |
-
LANGGRAPH_ASSISTANT=telco-agent
|
149 |
-
LANGGRAPH_BASE_URL=http://127.0.0.1:2024
|
150 |
-
|
151 |
-
# User
|
152 |
-
USER_EMAIL=test@example.com
|
153 |
-
|
154 |
-
# ASR
|
155 |
-
NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
|
156 |
-
|
157 |
-
# TTS
|
158 |
-
NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
|
159 |
-
RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1
|
160 |
-
```
|
161 |
-
|
162 |
-
The `start.sh` script automatically loads this file.
|
163 |
-
|
164 |
-
## Ports
|
165 |
-
|
166 |
-
| Service | Internal Port | External Port | Purpose |
|
167 |
-
|---------|---------------|---------------|---------|
|
168 |
-
| LangGraph Server | 2024 | - | Agent runtime (internal only) |
|
169 |
-
| Pipecat Pipeline | 7860 | 7860 | WebRTC + HTTP API |
|
170 |
-
| React UI | - | 7860 | Served by pipeline |
|
171 |
-
|
172 |
-
**Note**: Only port 7860 is exposed externally. LangGraph runs internally on 2024.
|
173 |
-
|
174 |
-
## Healthcheck
|
175 |
-
|
176 |
-
The container includes a healthcheck that verifies the pipeline is responding:
|
177 |
-
|
178 |
-
```bash
|
179 |
-
curl -f http://localhost:7860/get_prompt
|
180 |
-
```
|
181 |
-
|
182 |
-
Check health status:
|
183 |
-
```bash
|
184 |
-
docker ps
|
185 |
-
# Look for "(healthy)" in STATUS column
|
186 |
-
```
|
187 |
-
|
188 |
-
## Logs
|
189 |
-
|
190 |
-
View all logs:
|
191 |
-
```bash
|
192 |
-
docker logs -f <container-id>
|
193 |
-
```
|
194 |
-
|
195 |
-
You'll see both:
|
196 |
-
- LangGraph server startup and agent logs
|
197 |
-
- Pipeline startup and WebRTC connection logs
|
198 |
-
|
199 |
-
## Testing Multi-Threading
|
200 |
-
|
201 |
-
1. **Open UI**: http://localhost:7860
|
202 |
-
2. **Select Agent**: Choose "Telco Agent"
|
203 |
-
3. **Test Long Operation**:
|
204 |
-
- Say: *"Close my contract"*
|
205 |
-
- Confirm: *"Yes"*
|
206 |
-
- Operation starts (50 seconds)
|
207 |
-
4. **Test Secondary Thread**:
|
208 |
-
- While waiting, say: *"What's the status?"*
|
209 |
-
- Agent responds with progress
|
210 |
-
- Say: *"How much data do I have left?"*
|
211 |
-
- Agent answers while main operation continues
|
212 |
-
|
213 |
-
## Troubleshooting
|
214 |
-
|
215 |
-
### Container won't start
|
216 |
-
```bash
|
217 |
-
# Check logs
|
218 |
-
docker logs <container-id>
|
219 |
-
|
220 |
-
# Common issues:
|
221 |
-
# 1. Missing RIVA_API_KEY
|
222 |
-
# 2. Port 7860 already in use
|
223 |
-
# 3. Insufficient memory
|
224 |
-
```
|
225 |
-
|
226 |
-
### LangGraph not starting
|
227 |
-
```bash
|
228 |
-
# Check if agents directory exists
|
229 |
-
docker exec <container-id> ls -la /app/examples/voice_agent_multi_thread/agents
|
230 |
-
|
231 |
-
# Check langgraph.json
|
232 |
-
docker exec <container-id> cat /app/examples/voice_agent_multi_thread/agents/langgraph.json
|
233 |
-
```
|
234 |
-
|
235 |
-
### Pipeline not responding
|
236 |
-
```bash
|
237 |
-
# Check pipeline logs
|
238 |
-
docker logs <container-id> 2>&1 | grep pipeline
|
239 |
-
|
240 |
-
# Check if port is accessible
|
241 |
-
curl http://localhost:7860/get_prompt
|
242 |
-
```
|
243 |
-
|
244 |
-
### Multi-threading not working
|
245 |
-
```bash
|
246 |
-
# Verify env var
|
247 |
-
docker exec <container-id> env | grep MULTI_THREADING
|
248 |
-
|
249 |
-
# Check LangGraph server
|
250 |
-
docker exec <container-id> curl http://localhost:2024/assistants
|
251 |
-
```
|
252 |
-
|
253 |
-
## Development Mode
|
254 |
-
|
255 |
-
To develop inside the container:
|
256 |
-
|
257 |
-
```bash
|
258 |
-
# Run with shell
|
259 |
-
docker run -it -p 7860:7860 \
|
260 |
-
-v $(pwd)/examples/voice_agent_multi_thread:/app/examples/voice_agent_multi_thread \
|
261 |
-
voice-agent-multi-thread /bin/bash
|
262 |
-
|
263 |
-
# Inside container:
|
264 |
-
cd /app/examples/voice_agent_multi_thread
|
265 |
-
|
266 |
-
# Start services manually
|
267 |
-
cd agents && uv run langgraph dev &
|
268 |
-
cd .. && uv run pipeline.py
|
269 |
-
```
|
270 |
-
|
271 |
-
## Building for Production
|
272 |
-
|
273 |
-
### Multi-stage optimization
|
274 |
-
The Dockerfile uses a multi-stage build:
|
275 |
-
1. **ui-builder**: Compiles React UI
|
276 |
-
2. **python base**: Installs Python dependencies
|
277 |
-
3. **Final image**: ~2GB (UI + Python + agents)
|
278 |
-
|
279 |
-
### Reducing image size
|
280 |
-
```dockerfile
|
281 |
-
# Use slim Python base (already done)
|
282 |
-
FROM python:3.12-slim
|
283 |
-
|
284 |
-
# Clean up build artifacts (already done)
|
285 |
-
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
|
286 |
-
|
287 |
-
# Use uv for faster installs (already done)
|
288 |
-
RUN pip install uv
|
289 |
-
```
|
290 |
-
|
291 |
-
## Security Considerations
|
292 |
-
|
293 |
-
1. **Non-root user**: Container runs as UID 1000
|
294 |
-
2. **No secrets in image**: Use environment variables or mount secrets
|
295 |
-
3. **Read-only filesystem**: UI dist is built at image time
|
296 |
-
4. **Health checks**: Automatic restart on failure
|
297 |
-
|
298 |
-
## Performance
|
299 |
-
|
300 |
-
- **Startup time**: ~30-60 seconds
|
301 |
-
- **Memory**: ~2GB recommended
|
302 |
-
- **CPU**: 2 cores minimum
|
303 |
-
- **Storage**: ~3GB for image + runtime
|
304 |
-
|
305 |
-
## Related Files
|
306 |
-
|
307 |
-
- `Dockerfile` - Container definition
|
308 |
-
- `start.sh` - Startup orchestration
|
309 |
-
- `agents/langgraph.json` - Agent configuration
|
310 |
-
- `pipeline.py` - Pipecat pipeline
|
311 |
-
- `langgraph_llm_service.py` - Multi-threaded LLM service
|
312 |
-
|
313 |
-
## Support
|
314 |
-
|
315 |
-
For issues:
|
316 |
-
1. Check logs: `docker logs <container-id>`
|
317 |
-
2. Verify environment variables
|
318 |
-
3. Test components individually (LangGraph, Pipeline)
|
319 |
-
4. Review `PIPECAT_MULTI_THREADING.md` for architecture details
|
320 |
-
|
321 |
-
|
322 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
examples/voice_agent_multi_thread/agents/requirements.txt
CHANGED
@@ -11,5 +11,6 @@ docling
|
|
11 |
pymongo
|
12 |
yt_dlp
|
13 |
requests
|
14 |
-
protobuf==6.31.1
|
|
|
15 |
twilio
|
|
|
11 |
pymongo
|
12 |
yt_dlp
|
13 |
requests
|
14 |
+
# protobuf==6.31.1
|
15 |
+
protobuf
|
16 |
twilio
|
pyproject.toml
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
name = "nvidia-pipecat"
|
3 |
version = "0.2.0"
|
4 |
description = "NVIDIA ACE Pipecat SDK"
|
5 |
-
readme = "
|
6 |
license = { file = "LICENSE" }
|
7 |
authors = [
|
8 |
{ name = "NVIDIA ACE", email = "ace-dev@exchange.nvidia.com" }
|
|
|
2 |
name = "nvidia-pipecat"
|
3 |
version = "0.2.0"
|
4 |
description = "NVIDIA ACE Pipecat SDK"
|
5 |
+
readme = "README.md"
|
6 |
license = { file = "LICENSE" }
|
7 |
authors = [
|
8 |
{ name = "NVIDIA ACE", email = "ace-dev@exchange.nvidia.com" }
|