# Voice Agent WebRTC + LangGraph (Quick Start) This example launches a complete voice agent stack: - LangGraph dev server for local agents - Pipecat-based speech pipeline (WebRTC, ASR, LLM adapter, TTS) - Static UI you can open in a browser ## 1) Mandatory environment variables Create `.env` next to this README (or copy from `env.example`) and set at least: - `NVIDIA_API_KEY` or `RIVA_API_KEY`: required for NVIDIA NIM-hosted Riva ASR/TTS - `USE_LANGGRAPH=true`: enable LangGraph-backed LLM - `LANGGRAPH_BASE_URL` (default `http://127.0.0.1:2024`) - `LANGGRAPH_ASSISTANT` (default `ace-base-agent`) - `USER_EMAIL` (any email for routing, e.g. `test@example.com`) - `LANGGRAPH_STREAM_MODE` (default `values`) - `LANGGRAPH_DEBUG_STREAM` (default `true`) Optional but commonly used: - `RIVA_ASR_LANGUAGE` (default `en-US`) - `RIVA_TTS_LANGUAGE` (default `en-US`) - `RIVA_TTS_VOICE_ID` (e.g. `Magpie-ZeroShot.Female-1`) - `RIVA_TTS_MODEL` (e.g. `magpie_tts_ensemble-Magpie-ZeroShot`) - `ZERO_SHOT_AUDIO_PROMPT` if using Magpie Zero‑shot and a custom voice prompt - `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download prompt on startup - `ENABLE_SPECULATIVE_SPEECH` (default `true`) - TURN/Twilio for WebRTC if needed: `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, or `TURN_SERVER_URL`, `TURN_USERNAME`, `TURN_PASSWORD` ## 2) What it does - Starts LangGraph dev server to serve local agents from `agents/`. - Starts the Pipecat pipeline (`pipeline.py`) exposing: - HTTP: `http://:7860` (health and RTC config) - WebSocket: `ws://:7860/ws` for audio and transcripts - Serves the built UI at `http://:9000/` (via the container). By default it uses: - ASR: NVIDIA Riva (NIM) with `RIVA_API_KEY` and `NVIDIA_ASR_FUNCTION_ID` - LLM: LangGraph adapter streaming from the selected assistant - TTS: NVIDIA Riva Magpie (NIM) with `RIVA_API_KEY` and `NVIDIA_TTS_FUNCTION_ID` ## 3) Run ### Option A: Docker (recommended) From this directory: ```bash docker compose up --build -d ``` Then open `http://:9000/`. Chrome on http origins: enable “Insecure origins treated as secure” at `chrome://flags/` and add `http://:9000`. ### Option B: Python (local) Requires Python 3.12 and `uv`. ```bash uv run pipeline.py ``` Then start the UI from `ui/` (see `ui/README.md`). ## 4) Swap TTS providers (Magpie ⇄ ElevenLabs) The default TTS in `pipeline.py` is NVIDIA Riva Magpie via NIM: ```startLine:endLine:examples/voice_agent_webrtc_langgraph/pipeline.py tts = RivaTTSService( api_key=os.getenv("RIVA_API_KEY"), function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"), voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"), model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"), language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"), zero_shot_audio_prompt_file=( Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None ), ) ``` To use ElevenLabs instead: 1) Ensure `pipecat` ElevenLabs dependency is available (already included via project deps). 2) Set environment: - `ELEVENLABS_API_KEY` - Optionally `ELEVENLABS_VOICE_ID` and model settings supported by ElevenLabs 3) Change the TTS construction in `pipeline.py` to use `ElevenLabsTTSServiceWithEndOfSpeech`: ```python from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech # Replace RivaTTSService(...) with: tts = ElevenLabsTTSServiceWithEndOfSpeech( api_key=os.getenv("ELEVENLABS_API_KEY"), voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"), sample_rate=16000, channels=1, ) ``` That’s it. No other pipeline changes are required. The transcript synchronization already supports ElevenLabs end‑of‑speech events. Notes for Magpie Zero‑shot: - Provide `RIVA_TTS_VOICE_ID` like `Magpie-ZeroShot.Female-1` and `RIVA_TTS_MODEL` like `magpie_tts_ensemble-Magpie-ZeroShot`. - If using a custom voice prompt, mount it via `docker-compose.yml` and set `ZERO_SHOT_AUDIO_PROMPT`. You can also set `ZERO_SHOT_AUDIO_PROMPT_URL` to auto-download at startup. ## 5) Troubleshooting - Healthcheck: `curl -f http://localhost:7860/get_prompt` - If UI can’t access mic on http, use Chrome flag above or host UI via HTTPS. - For NAT/firewall issues, configure TURN or Twilio credentials.