Spaces:

the11
/

Voice-Activated-RAG-System

Running

App Files Files Community

Voice-Activated-RAG-System / README.md

the11

Upload 9 files

a704a0c verified 30 days ago

preview code

raw

history blame contribute delete

3.34 kB

	---
	title: Voice-Activated RAG System
	emoji: 🗣️
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 5.34.2
	app_file: main.py
	pinned: false
	---
	# Voice to LLM & Sentiment Analyzer with RAG

	## Overview
	This project is a voice-driven application that integrates various machine learning models and APIs to provide sentiment analysis and context retrieval from uploaded PDF documents. It utilizes voice input to query the system, processes the audio using Whisper for transcription, retrieves relevant context from the uploaded PDFs using a Retrieval-Augmented Generation (RAG) approach, and provides a spoken audio output of the LLM response using Gemini TTS (Google GenAI API).

	## Features
	- Voice input for user queries
	- PDF document processing and context retrieval
	- Sentiment analysis using a pre-trained emotion classification model
	- Text-to-Speech output for the model's responses (using Gemini TTS)
	- Integration with the Groq API for advanced language model capabilities

	## Modular Structure

	This project is organized into modular components for maintainability and clarity:

	- `main.py`: Entry point to run the app.
	- `ui.py`: Gradio UI layout and event wiring.
	- `tts_gemini.py`: Gemini TTS logic (text-to-speech).
	- `emotion.py`: Emotion detection and tone mapping.
	- `rag.py`: PDF processing, chunking, embedding, FAISS, and context retrieval.
	- `llm.py`: LLM prompt construction and response logic.

	## Setup Instructions
	1. Clone the repository:
	```
	git clone <repository-url>
	cd GoComet-C4
	```

	2. Run the setup script to create a virtual environment and install dependencies (Linux/macOS):
	```
	bash setup.sh
	```
	On Windows, run each command in `setup.sh` manually in your terminal:
	```
	python -m venv venv
	venv\Scripts\activate
	pip install -r requirements.txt
	set GROQ_API_KEY="<your-groq-api-key>"
	set GEMINI_API_KEY="<your-gemini-api-key>"
	```

	3. Activate the virtual environment:
	- Linux/macOS:
	```
	source venv/bin/activate
	```
	- Windows:
	```
	venv\Scripts\activate
	```

	## Usage
	To run the application, execute the following command:
	```
	python main.py
	```

	Once the application is running, you can upload PDF files and use the microphone to speak your queries. The application will process the audio, retrieve context from the PDFs, analyze sentiment, and provide the LLM output, sentiment, transcript, context, and a spoken audio response (.wav) in the interface. TTS audio files will be saved in a `tts_outputs` directory in your project root.

	## Dependencies
	This project requires the following Python libraries:
	- gradio
	- whisper
	- groq
	- transformers
	- PyPDF2
	- sentence-transformers
	- faiss-cpu
	- soundfile
	- numpy
	- google-genai

	Install these dependencies using the `requirements.txt` file provided in the project.

	## Latency Logging

	After each run, the latency (processing time in seconds) for each pipeline component is logged in `logs/latency_log.csv`:

	\| Whisper STT (s) \| Document Retrieval (s) \| Sentiment Analysis (s) \| Response Gen (LLM) (s) \| TTS Synthesis (s) \| Total (s) \|
	\|-----------------\|-----------------------\|-----------------------\|------------------------\|-------------------\|-----------\|

	This file accumulates results from all runs, allowing you to analyze and monitor performance over time.