Deploy NVIDIA Riva ASR on Kubernetes GPU Cluster in Just 5 Minutes
Voice is the new keyboard. In a world of smart speakers, voice assistants, and an ever-increasing need for hands-free interaction, Automatic Speech Recognition (ASR) is no longer a futuristic concept but a present-day necessity. From transcribing meetings to powering in-car navigation, ASR is at the heart of the conversational AI revolution.
But how do you, a developer or an MLOps engineer, bring this powerful technology into your own applications? What if I told you that you could deploy a state-of-the-art ASR service on your Kubernetes cluster in the time it takes to brew a cup of coffee?
Welcome to your 5-minute guide to deploying NVIDIA Riva, a powerful SDK for building and deploying speech AI applications. We'll use a GPU-accelerated Kubernetes cluster to get a high-performance ASR service up and running. Ready? Let's dive in!
What is Helm and Why Use It?
Throughout this guide, we'll be using Helm, which is often described as the package manager for Kubernetes. Think of it like apt
or yum
but for Kubernetes applications. Helm uses "charts" to define, install, and upgrade even the most complex Kubernetes applications. It simplifies deployment, making it repeatable and manageable.
Step 1: Validate Your Kubernetes Cluster
NVIDIA Riva leverages GPUs to deliver high-performance, low-latency ASR, so your cluster must have NVIDIA GPU support enabled.
Run this command to create a test pod that checks for GPU availability:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: nvidia-smi
spec:
restartPolicy: OnFailure
containers:
- name: nvidia-smi
image: nvidia/cuda:12.0.0-base-ubuntu22.04
args:
- "nvidia-smi"
resources:
limits:
nvidia.com/gpu: 1
EOF
Then check the output with:
kubectl logs nvidia-smi
If you see your GPU details, you’re all set to run Riva!
Step 2: Grab the Riva Helm Chart
Before deploying, you must obtain an API key from NVIDIA's NGC catalog. Generate one here: NGC API Key Setup. Export your API key:
export NGC_API_KEY=<your_api_key>
Fetch the Riva API Helm chart with Helm:
helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-2.19.0.tgz
--username='$oauthtoken' --password=$NGC_API_KEY --untar
This creates a riva-api
directory with the deployment files.
Step 3: Customize Your Deployment
Open the values.yaml
file inside riva-api
:
cd riva-api
nano values.yaml
Here you can enable or disable various ASR, speaker diarization, NMT, and TTS models. For example, uncomment lines to enable:
Streaming ASR model (best throughput)
nvidia/riva/rmir_asr_parakeet_0-6b_en_us_str_thr:2.19.0
Offline ASR model
nvidia/riva/rmir_asr_conformer_en_us_ofl:2.19.0
Enable only the models you need to optimize resource usage.
Step 4: Deploy Riva!
Run the Helm install command:
helm install riva-api riva-api/ \
--set ngcCredentials.password=$(echo -n $NGC_API_KEY | base64 -w0) \
--set modelRepoGenerator.modelDeployKey=$(echo -n tlt_encode | base64 -w0) \
-f path/to/values.yaml
To deploy in a specific namespace, add -n <namespace>
.
Riva will now deploy the API and Triton Inference Server pods, downloading and optimizing models automatically.
Step 5: Check Pod Status
Check if pods are running:
kubectl get pods
Wait for pods like riva-api-riva-api-...
and riva-api-riva-triton-...
to reach Running
state.
Step 6: Access the Riva API
Port forward the Riva service to your localhost:
kubectl port-forward service/riva-api 50051:50051 --address=0.0.0.0
The service is now accessible on localhost:50051
.
Bonus: Test with Streamlit UI
✨ If you’ve made it this far and found this guide useful, share it with someone who might need it. Let’s keep building smarter voice-powered systems together.
🤝 I’d love to connect and exchange ideas