Spaces:
Running
Running
File size: 4,367 Bytes
06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 06523e9 53ea588 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# Voice Agent WebRTC + LangGraph (Quick Start)
This example launches a complete voice agent stack:
- LangGraph dev server for local agents
- Pipecat-based speech pipeline (WebRTC, ASR, LLM adapter, TTS)
- Static UI you can open in a browser
## 1) Mandatory environment variables
Create `.env` next to this README (or copy from `env.example`) and set at least:
- `NVIDIA_API_KEY` or `RIVA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS
- `USE_LANGGRAPH=true`: enable LangGraph-backed LLM
- `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`)
- `LANGGRAPH_ASSISTANT` (default `ace-base-agent`)
- `USER_EMAIL` (any email for routing, e.g. `test@example.com`)
- `LANGGRAPH_STREAM_MODE` (default `values`)
- `LANGGRAPH_DEBUG_STREAM` (default `true`)
Optional but commonly used:
- `RIVA_ASR_LANGUAGE` (default `en-US`)
- `RIVA_TTS_LANGUAGE` (default `en-US`)
- `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`)
- `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`)
- `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot and a custom voice prompt
- `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup
- `ENABLE_SPECULATIVE_SPEECH` (default `true`)
- TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD`
## 2) What it does
- Starts LangGraph dev server to serve local agents from `agents/`.
- Starts the Pipecat pipeline (`pipeline.py`) exposing:
- HTTP: `http://<host>:7860` (health and RTC config)
- WebSocket: `ws://<host>:7860/ws` for audio and transcripts
- Serves the built UI at `http://<host>:9000/` (via the container).
By default it uses:
- ASR: NVIDIA Riva (NIM) with `RIVA_API_KEY` and `NVIDIA_ASR_FUNCTION_ID`
- LLM: LangGraph adapter streaming from the selected assistant
- TTS: NVIDIA Riva Magpie (NIM) with `RIVA_API_KEY` and `NVIDIA_TTS_FUNCTION_ID`
## 3) Run
### Option A: Docker (recommended)
From this directory:
```bash
docker compose up --build -d
```
Then open `http://<machine-ip>:9000/`.
Chrome on http origins: enable “Insecure origins treated as secure” at `chrome://flags/` and add `http://<machine-ip>:9000`.
### Option B: Python (local)
Requires Python 3.12 and `uv`.
```bash
uv run pipeline.py
```
Then start the UI from `ui/` (see `ui/README.md`).
## 4) Swap TTS providers (Magpie ⇄ ElevenLabs)
The default TTS in `pipeline.py` is NVIDIA Riva Magpie via NIM:
```startLine:endLine:examples/voice_agent_webrtc_langgraph/pipeline.py
tts = RivaTTSService(
api_key=os.getenv("RIVA_API_KEY"),
function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
zero_shot_audio_prompt_file=(
Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
),
)
```
To use ElevenLabs instead:
1) Ensure `pipecat` ElevenLabs dependency is available (already included via project deps).
2) Set environment:
- `ELEVENLABS_API_KEY`
- Optionally `ELEVENLABS_VOICE_ID` and model settings supported by ElevenLabs
3) Change the TTS construction in `pipeline.py` to use `ElevenLabsTTSServiceWithEndOfSpeech`:
```python
from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
# Replace RivaTTSService(...) with:
tts = ElevenLabsTTSServiceWithEndOfSpeech(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
sample_rate=16000,
channels=1,
)
```
That’s it. No other pipeline changes are required. The transcript synchronization already supports ElevenLabs end‑of‑speech events.
Notes for Magpie Zero‑shot:
- Provide `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`.
- If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`. You can also set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download at startup.
## 5) Troubleshooting
- Healthcheck: `curl -f http://localhost:7860/get_prompt`
- If UI can’t access mic on http, use Chrome flag above or host UI via HTTPS.
- For NAT/firewall issues, configure TURN or Twilio credentials.
|