Spaces:
Running
title: Voice Agent WebRTC + LangGraph
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
suggested_hardware: t4-small
short_description: Voice agent with LangGraph, WebRTC, ASR & TTS
Voice Agent WebRTC + LangGraph (Quick Start)
This repository includes a complete voice agent stack:
- LangGraph dev server for local agents
- Pipecat-based speech pipeline (WebRTC, ASR, LangGraph LLM adapter, TTS)
- Static UI you can open in a browser
Primary example: examples/voice_agent_webrtc_langgraph/
1) Mandatory environment variables
Create .env
in examples/voice_agent_webrtc_langgraph/
(copy from env.example
) and set at least:
RIVA_API_KEY
orNVIDIA_API_KEY
: required for NVIDIA NIM-hosted Riva ASR/TTSLANGGRAPH_BASE_URL
(defaulthttp://127.0.0.1:2024
)LANGGRAPH_ASSISTANT
(defaultace-base-agent
)USER_EMAIL
(e.g.test@example.com
)LANGGRAPH_STREAM_MODE
(defaultvalues
)LANGGRAPH_DEBUG_STREAM
(defaulttrue
)
Optional but useful:
RIVA_ASR_LANGUAGE
(defaulten-US
)RIVA_TTS_LANGUAGE
(defaulten-US
)RIVA_TTS_VOICE_ID
(e.g.Magpie-ZeroShot.Female-1
)RIVA_TTS_MODEL
(e.g.magpie_tts_ensemble-Magpie-ZeroShot
)ZERO_SHOT_AUDIO_PROMPT
if using Magpie Zero‑shot with a custom audio promptZERO_SHOT_AUDIO_PROMPT_URL
to auto-download prompt on startupENABLE_SPECULATIVE_SPEECH
(defaulttrue
)LANGGRAPH_AUTH_TOKEN
(orAUTH0_ACCESS_TOKEN
/AUTH_BEARER_TOKEN
) if your LangGraph server requires auth- TURN/Twilio for WebRTC if needed:
TWILIO_ACCOUNT_SID
,TWILIO_AUTH_TOKEN
, orTURN_SERVER_URL
,TURN_USERNAME
,TURN_PASSWORD
2) What it does
- Starts LangGraph dev server serving agents from
examples/voice_agent_webrtc_langgraph/agents/
. - Starts the Pipecat pipeline (
pipeline.py
) exposing:- HTTP:
http://<host>:7860
(health, RTC config) - WebSocket:
ws://<host>:7860/ws
(audio + transcripts) - Static UI:
http://<host>:7860/
(served by FastAPI)
- HTTP:
Defaults:
- ASR: NVIDIA Riva (NIM) via
RIVA_API_KEY
and built-inNVIDIA_ASR_FUNCTION_ID
- LLM: LangGraph adapter, streaming from the selected assistant
- TTS: NVIDIA Riva Magpie (NIM) via
RIVA_API_KEY
and built-inNVIDIA_TTS_FUNCTION_ID
3) Run
Option A: Docker (recommended)
From examples/voice_agent_webrtc_langgraph/
:
docker compose up --build -d
Then open http://<machine-ip>:7860/
.
Chrome on http origins: enable "Insecure origins treated as secure" at chrome://flags/
and add http://<machine-ip>:7860
.
Building for Different Examples
The Dockerfile in the repository root is generalized to work with any example. Use the EXAMPLE_NAME
build argument to specify which example to use:
For voice_agent_webrtc_langgraph (default):
docker build --build-arg EXAMPLE_NAME=voice_agent_webrtc_langgraph -t my-voice-agent .
docker run -p 7860:7860 --env-file examples/voice_agent_webrtc_langgraph/.env my-voice-agent
For voice_agent_multi_thread:
docker build --build-arg EXAMPLE_NAME=voice_agent_multi_thread -t my-voice-agent .
docker run -p 7860:7860 --env-file examples/voice_agent_multi_thread/.env my-voice-agent
The Dockerfile will automatically:
- Build the UI for the specified example
- Copy only the files for that example
- Set up the correct working directory
- Configure the start script to run the correct example
Note: The UI is served on the same port as the API (7860). The FastAPI app serves both the WebSocket/HTTP endpoints and the static UI files.
Option B: Python (local)
Requires Python 3.12 and uv
.
cd examples/voice_agent_webrtc_langgraph
uv run pipeline.py
Then start the UI from ui/
(see examples/voice_agent_webrtc_langgraph/ui/README.md
).
4) Swap TTS providers (Magpie ⇄ ElevenLabs)
The default TTS in examples/voice_agent_webrtc_langgraph/pipeline.py
is NVIDIA Riva Magpie via NIM:
from nvidia_pipecat.services.riva_speech import RivaTTSService
tts = RivaTTSService(
api_key=os.getenv("RIVA_API_KEY"),
function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
zero_shot_audio_prompt_file=(
Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
),
)
To use ElevenLabs instead:
- Ensure ElevenLabs support is available (included via project deps).
- Set environment:
ELEVENLABS_API_KEY
- Optionally
ELEVENLABS_VOICE_ID
and any model-specific settings
- Edit
examples/voice_agent_webrtc_langgraph/pipeline.py
to import and construct ElevenLabs TTS:
from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
# Replace the RivaTTSService(...) block with:
tts = ElevenLabsTTSServiceWithEndOfSpeech(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
sample_rate=16000,
channels=1,
)
No other pipeline changes are required; transcript synchronization supports ElevenLabs end‑of‑speech events.
Notes for Magpie Zero‑shot:
- Set
RIVA_TTS_VOICE_ID
likeMagpie-ZeroShot.Female-1
andRIVA_TTS_MODEL
likemagpie_tts_ensemble-Magpie-ZeroShot
. - If using a custom voice prompt, mount it via
docker-compose.yml
and setZERO_SHOT_AUDIO_PROMPT
, or setZERO_SHOT_AUDIO_PROMPT_URL
to auto-download on startup.
5) Troubleshooting
- Healthcheck:
curl -f http://localhost:7860/get_prompt
- If the UI can't access the mic on http, use the Chrome flag above or host the UI via HTTPS.
- For NAT/firewall issues, configure TURN or provide Twilio credentials.
6) Multi-threaded Voice Agent (voice_agent_multi_thread)
The voice_agent_multi_thread
example includes a non-blocking multi-threaded agent implementation that allows users to continue conversing while long-running operations execute in the background.
Build the Docker image:
docker build --build-arg EXAMPLE_NAME=voice_agent_multi_thread -t voice-agent-multi-thread .
Run the container:
docker run -d --name voice-agent-multi-thread \
-p 2024:2024 \
-p 7862:7860 \
--env-file examples/voice_agent_multi_thread/.env \
voice-agent-multi-thread
Then access:
- LangGraph API:
http://localhost:2024
- Web UI:
http://localhost:7862
- Pipeline WebSocket:
ws://localhost:7862/ws
The multi-threaded agent automatically enables for telco-agent
and wire-transfer-agent
, allowing the secondary thread to handle status checks and interim conversations while the main thread processes long-running tools.
Stop and remove the container:
docker stop voice-agent-multi-thread && docker rm voice-agent-multi-thread
7) Manual Docker Commands (voice_agent_webrtc_langgraph)
If you prefer manual Docker commands instead of docker-compose:
docker build -t ace-voice-webrtc:latest \
-f examples/voice_agent_webrtc_langgraph/Dockerfile \
.
docker run --name ace-voice-webrtc -d \
-p 7860:7860 \
-p 2024:2024 \
--env-file examples/voice_agent_webrtc_langgraph/.env \
-e LANGGRAPH_ASSISTANT=healthcare-agent \
ace-voice-webrtc:latest