Spaces:

nvidia
/

voice-agent-examples

Running

App Files Files

xet

Community

fciannella commited on 3 days ago

Commit

41b6e84

1 Parent(s): f64b107

fixed the protobuf issue

Browse files

Files changed (5) hide show

Dockerfile +1 -1
NVIDIA_PIPECAT.md +1 -1
examples/voice_agent_multi_thread/DOCKER_DEPLOYMENT.md +0 -322
examples/voice_agent_multi_thread/agents/requirements.txt +2 -1
pyproject.toml +1 -1

Dockerfile CHANGED Viewed

@@ -51,7 +51,7 @@ WORKDIR /app
 # App files
 COPY --chown=user pyproject.toml uv.lock \
-     LICENSE README.md NVIDIA_PIPECAT.md \
      ./
 COPY --chown=user src/ ./src/
 COPY --chown=user examples/${EXAMPLE_NAME} ./examples/${EXAMPLE_NAME}

 # App files
 COPY --chown=user pyproject.toml uv.lock \
+     LICENSE README.md \
      ./
 COPY --chown=user src/ ./src/
 COPY --chown=user examples/${EXAMPLE_NAME} ./examples/${EXAMPLE_NAME}

NVIDIA_PIPECAT.md CHANGED Viewed

@@ -2,4 +2,4 @@
 The NVIDIA Pipecat library augments [the Pipecat framework](https://github.com/pipecat-ai/pipecat) by adding additional frame processors and services, as well as new multimodal frames to facilitate the creation of human-avatar interactions. This includes the integration of NVIDIA services and NIMs such as [NVIDIA Riva](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html), [NVIDIA Audio2Face](https://build.nvidia.com/nvidia/audio2face-3d), and [NVIDIA Foundational RAG](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline). It also introduces a few processors with a focus on improving the end-user experience for multimodal conversational agents, along with speculative speech processing to reduce latency for faster bot responses.
-The nvidia-pipecat source code can be found in [the GitHub repository](https://github.com/NVIDIA/ace-controller). Follow [the documentation](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html) for more details.


2
3	The NVIDIA Pipecat library augments [the Pipecat framework](https://github.com/pipecat-ai/pipecat) by adding additional frame processors and services, as well as new multimodal frames to facilitate the creation of human-avatar interactions. This includes the integration of NVIDIA services and NIMs such as [NVIDIA Riva](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html), [NVIDIA Audio2Face](https://build.nvidia.com/nvidia/audio2face-3d), and [NVIDIA Foundational RAG](https://build.nvidia.com/nvidia/build-an-enterprise-rag-pipeline). It also introduces a few processors with a focus on improving the end-user experience for multimodal conversational agents, along with speculative speech processing to reduce latency for faster bot responses.
4
5	+ The nvidia-pipecat source code can be found in [the GitHub repository](https://github.com/NVIDIA/ace-controller). Follow [the documentation](https://docs.nvidia.com/ace/ace-controller-microservice/latest/index.html) for more details.

examples/voice_agent_multi_thread/DOCKER_DEPLOYMENT.md DELETED Viewed

@@ -1,322 +0,0 @@
-# Docker Deployment - Multi-Threaded Voice Agent
-## Overview
-This Docker container runs the complete multi-threaded telco voice agent stack:
-- **LangGraph Server** (`langgraph dev`) on port 2024
-- **Pipecat Pipeline** (FastAPI + WebRTC) on port 7860
-- **React UI** served at `http://localhost:7860`
-## Quick Start
-### Build the Image
-```bash
-# From project root
-docker build -t voice-agent-multi-thread .
-```
-### Run the Container
-```bash
-docker run -p 7860:7860 \
-  -e RIVA_API_KEY=your_nvidia_api_key \
-  -e NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409 \
-  -e NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d \
-  voice-agent-multi-thread
-```
-### Access the UI
-Open your browser to: **http://localhost:7860**
-## What Happens Inside the Container
-The `start.sh` script orchestrates two processes:
-### 1. LangGraph Server (Port 2024)
-```bash
-cd /app/examples/voice_agent_multi_thread/agents
-uv run langgraph dev --no-browser --host 0.0.0.0 --port 2024
-```
-This runs the multi-threaded telco agent with:
-- Main thread for long operations
-- Secondary thread for interim queries
-- Store-based coordination
-### 2. Pipecat Pipeline (Port 7860)
-```bash
-cd /app/examples/voice_agent_multi_thread
-uv run pipeline.py
-```
-This runs the voice pipeline with:
-- WebRTC transport
-- RIVA ASR (speech-to-text)
-- LangGraphLLMService (multi-threaded routing)
-- RIVA TTS (text-to-speech)
-- React UI
-## Environment Variables
-### Required
-```bash
-# NVIDIA API Key for RIVA services
-RIVA_API_KEY=nvapi-xxxxx
-```
-### Optional
-```bash
-# LangGraph Configuration
-LANGGRAPH_HOST=0.0.0.0
-LANGGRAPH_PORT=2024
-LANGGRAPH_ASSISTANT=telco-agent
-# User Configuration
-USER_EMAIL=user@example.com
-# ASR Configuration
-NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
-RIVA_ASR_LANGUAGE=en-US
-RIVA_ASR_MODEL=parakeet-1.1b-en-US-asr-streaming-silero-vad-asr-bls-ensemble
-# TTS Configuration
-NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
-RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1
-RIVA_TTS_MODEL=magpie_tts_ensemble-Magpie-ZeroShot
-RIVA_TTS_LANGUAGE=en-US
-# Zero-shot audio prompt (optional)
-ZERO_SHOT_AUDIO_PROMPT_URL=https://github.com/your-repo/audio-prompt.wav
-# Multi-threading (default: true)
-ENABLE_MULTI_THREADING=true
-# Debug
-LANGGRAPH_DEBUG_STREAM=false
-```
-## Docker Compose
-Create `docker-compose.yml`:
-```yaml
-version: '3.8'
-services:
-  voice-agent:
-    build: .
-    ports:
-      - "7860:7860"
-    environment:
-      - RIVA_API_KEY=${RIVA_API_KEY}
-      - NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
-      - NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
-      - USER_EMAIL=user@example.com
-      - LANGGRAPH_ASSISTANT=telco-agent
-      - ENABLE_MULTI_THREADING=true
-    volumes:
-      # Optional: mount .env file
-      - ./examples/voice_agent_multi_thread/.env:/app/examples/voice_agent_multi_thread/.env:ro
-      # Optional: persist audio recordings
-      - ./audio_dumps:/app/examples/voice_agent_multi_thread/audio_dumps
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:7860/get_prompt"]
-      interval: 30s
-      timeout: 10s
-      retries: 3
-      start_period: 60s
-```
-Run with:
-```bash
-docker-compose up
-```
-## Using .env File
-Create `.env` in `examples/voice_agent_multi_thread/`:
-```bash
-# NVIDIA API Keys
-RIVA_API_KEY=nvapi-xxxxx
-# LangGraph
-LANGGRAPH_ASSISTANT=telco-agent
-LANGGRAPH_BASE_URL=http://127.0.0.1:2024
-# User
-USER_EMAIL=test@example.com
-# ASR
-NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
-# TTS
-NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
-RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1
-```
-The `start.sh` script automatically loads this file.
-## Ports
-| Service | Internal Port | External Port | Purpose |
-|---------|---------------|---------------|---------|
-| LangGraph Server | 2024 | - | Agent runtime (internal only) |
-| Pipecat Pipeline | 7860 | 7860 | WebRTC + HTTP API |
-| React UI | - | 7860 | Served by pipeline |
-**Note**: Only port 7860 is exposed externally. LangGraph runs internally on 2024.
-## Healthcheck
-The container includes a healthcheck that verifies the pipeline is responding:
-```bash
-curl -f http://localhost:7860/get_prompt
-```
-Check health status:
-```bash
-docker ps
-# Look for "(healthy)" in STATUS column
-```
-## Logs
-View all logs:
-```bash
-docker logs -f <container-id>
-```
-You'll see both:
-- LangGraph server startup and agent logs
-- Pipeline startup and WebRTC connection logs
-## Testing Multi-Threading
-1. **Open UI**: http://localhost:7860
-2. **Select Agent**: Choose "Telco Agent"
-3. **Test Long Operation**:
-   - Say: *"Close my contract"*
-   - Confirm: *"Yes"*
-   - Operation starts (50 seconds)
-4. **Test Secondary Thread**:
-   - While waiting, say: *"What's the status?"*
-   - Agent responds with progress
-   - Say: *"How much data do I have left?"*
-   - Agent answers while main operation continues
-## Troubleshooting
-### Container won't start
-```bash
-# Check logs
-docker logs <container-id>
-# Common issues:
-# 1. Missing RIVA_API_KEY
-# 2. Port 7860 already in use
-# 3. Insufficient memory
-```
-### LangGraph not starting
-```bash
-# Check if agents directory exists
-docker exec <container-id> ls -la /app/examples/voice_agent_multi_thread/agents
-# Check langgraph.json
-docker exec <container-id> cat /app/examples/voice_agent_multi_thread/agents/langgraph.json
-```
-### Pipeline not responding
-```bash
-# Check pipeline logs
-docker logs <container-id> 2>&1 | grep pipeline
-# Check if port is accessible
-curl http://localhost:7860/get_prompt
-```
-### Multi-threading not working
-```bash
-# Verify env var
-docker exec <container-id> env | grep MULTI_THREADING
-# Check LangGraph server
-docker exec <container-id> curl http://localhost:2024/assistants
-```
-## Development Mode
-To develop inside the container:
-```bash
-# Run with shell
-docker run -it -p 7860:7860 \
-  -v $(pwd)/examples/voice_agent_multi_thread:/app/examples/voice_agent_multi_thread \
-  voice-agent-multi-thread /bin/bash
-# Inside container:
-cd /app/examples/voice_agent_multi_thread
-# Start services manually
-cd agents && uv run langgraph dev &
-cd .. && uv run pipeline.py
-```
-## Building for Production
-### Multi-stage optimization
-The Dockerfile uses a multi-stage build:
-1. **ui-builder**: Compiles React UI
-2. **python base**: Installs Python dependencies
-3. **Final image**: ~2GB (UI + Python + agents)
-### Reducing image size
-```dockerfile
-# Use slim Python base (already done)
-FROM python:3.12-slim
-# Clean up build artifacts (already done)
-RUN apt-get clean && rm -rf /var/lib/apt/lists/*
-# Use uv for faster installs (already done)
-RUN pip install uv
-```
-## Security Considerations
-1. **Non-root user**: Container runs as UID 1000
-2. **No secrets in image**: Use environment variables or mount secrets
-3. **Read-only filesystem**: UI dist is built at image time
-4. **Health checks**: Automatic restart on failure
-## Performance
-- **Startup time**: ~30-60 seconds
-- **Memory**: ~2GB recommended
-- **CPU**: 2 cores minimum
-- **Storage**: ~3GB for image + runtime
-## Related Files
-- `Dockerfile` - Container definition
-- `start.sh` - Startup orchestration
-- `agents/langgraph.json` - Agent configuration
-- `pipeline.py` - Pipecat pipeline
-- `langgraph_llm_service.py` - Multi-threaded LLM service
-## Support
-For issues:
-1. Check logs: `docker logs <container-id>`
-2. Verify environment variables
-3. Test components individually (LangGraph, Pipeline)
-4. Review `PIPECAT_MULTI_THREADING.md` for architecture details

examples/voice_agent_multi_thread/agents/requirements.txt CHANGED Viewed

@@ -11,5 +11,6 @@ docling
 pymongo
 yt_dlp
 requests
-protobuf==6.31.1
 twilio

 pymongo
 yt_dlp
 requests
+# protobuf==6.31.1
+protobuf
 twilio

pyproject.toml CHANGED Viewed

@@ -2,7 +2,7 @@
 name = "nvidia-pipecat"
 version = "0.2.0"
 description = "NVIDIA ACE Pipecat SDK"
-readme = "NVIDIA_PIPECAT.md"
 license = { file = "LICENSE" }
 authors = [
     { name = "NVIDIA ACE", email = "ace-dev@exchange.nvidia.com" }

 name = "nvidia-pipecat"
 version = "0.2.0"
 description = "NVIDIA ACE Pipecat SDK"
+readme = "README.md"
 license = { file = "LICENSE" }
 authors = [
     { name = "NVIDIA ACE", email = "ace-dev@exchange.nvidia.com" }