File size: 1,720 Bytes
63025d4
 
 
 
 
 
 
08f13f2
35dd02e
63025d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d994d22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: mit
title: Chatbot for Video Question Answering
sdk: gradio
emoji: πŸ“š
pinned: false
short_description: A chatbot that can answer questions about a video.
python_version: 3.12.7
sdk_version: 5.35.0
---

# Chatbot for Video Question Answering Demo

AI chatbot that can answer questions about video content. This project leverages multi-modal LLM, multi-modal RAG pipeline to process video frames, transcribe audio, and retrieval information to provide accurate answers to questions about video content.

## Requirements

- Python 3.12+
- [uv](https://docs.astral.sh/uv/) for package and project manager
- [FFmpeg](https://ffmpeg.org/) installed and available in PATH
- [Google Gemini API key](https://aistudio.google.com/apikey) for the LLM functionality

## Installation

1. Clone this repository
   ```bash
   git clone [repository-url]
   cd VideoChatbot
   ```

2. Install dependencies using uv
   ```bash
   uv sync
   ```

3. Create a `.env` file in the project root with your API key
   ```
   GEMINI_API_KEY=your_api_key_here
   ```

## Usage

1. Start the application
   ```bash
   python -m app.main
   ```

2. Access the UI through your browser (typically at http://127.0.0.1:7860)

3. Upload a video file or provide a YouTube URL and ask questions about it

4. The system will process the video (extract frames, transcribe audio), index the content, and then answer your questions

## Notes

This project is designed to be a demo and may require additional configuration for production use. The video processing and indexing can take time depending on the video length and complexity. Use a larger LLMs, embeddings, transcription models, and vector databases for better performance and accuracy.