voice-agent-examples / CHANGELOG.md
fciannella's picture
Working with service run on 7860
53ea588
|
raw
history blame
2.28 kB

NVIDIA Pipecat 0.1.0 (23 April 2025)

The NVIDIA Pipecat library augments the Pipecat framework by adding additional frame processors and services, as well as new multimodal frames to enhance avatar interactions. This is the first release of the NVIDIA Pipecat library.

New Features

  • Added Pipecat services for Riva ASR (Automatic Speech Recognition), Riva TTS (Text to Speech), and Riva NMT (Neural Machine Translation) models.
  • Added Pipecat frames, processors, and services to support multimodal avatar interactions and use cases. This includes Audio2Face3DService, AnimationGraphService, FacialGestureProviderProcessor, and PostureProviderProcessor.
  • Added ACETransport, which is specifically designed to support integration with existing ACE microservices. This includes a FastAPI-based HTTP and WebSocket server implementation compatible with ACE.
  • Added NvidiaLLMService for NIM LLM models and NvidiaRAGService for the NVIDIA RAG Blueprint.
  • Added UserTranscriptSynchronization processor for user speech transcripts and BotTranscriptSynchronization processor for synchronizing bot transcripts with bot audio playback.
  • Added custom context aggregators and processors to enable Speculative Speech Processing to reduce latency.
  • Added UserPresence, Proactivity, and AcknowledgementProcessor frame processors to improve human-bot interactions.
  • Released source code for the voice assistant example using nvidia-pipecat, along with the pipecat-ai library service, to showcase NVIDIA services with ACETransport.

Improvements

  • Added ElevenLabsTTSServiceWithEndOfSpeech, an extended version of the ElevenLabs TTS service with end-of-speech events for usage in avatar interactions.