Spaces:
Running
Running
# ๐ Dubsway Video AI - Groq Agentic System Guide | |
## Overview | |
This guide will help you set up and run the enhanced agentic video analysis system using **Groq** with the **Llama3-8b-8192** model. The system provides: | |
- ๐ค **Agentic Analysis**: Multi-modal video understanding with reasoning capabilities | |
- ๐ฏ **MCP/ACP Integration**: Model Context Protocol tools for enhanced analysis | |
- ๐ **Multi-modal Processing**: Audio, visual, and text analysis | |
- ๐ **Web Integration**: Real-time web search and Wikipedia lookups | |
- ๐ **Beautiful Reports**: Comprehensive, formatted analysis reports | |
- ๐พ **Enhanced Vector Storage**: Better RAG capabilities with metadata | |
## ๐ ๏ธ Setup Instructions | |
### 1. Get Groq API Key | |
1. Visit [Groq Console](https://console.groq.com/) | |
2. Sign up for a free account | |
3. Get your API key from the dashboard | |
4. Set the environment variable: | |
```bash | |
set GROQ_API_KEY=your_key_here | |
``` | |
Or add to your `.env` file: | |
``` | |
GROQ_API_KEY=your_key_here | |
``` | |
### 2. Install Dependencies | |
Run the setup script: | |
```bash | |
setup_agentic_system.bat | |
``` | |
Or manually: | |
```bash | |
# Activate virtual environment | |
myenv31\Scripts\activate.bat | |
# Install dependencies | |
pip install -r requirements.txt | |
# Install Groq specifically | |
pip install langchain-groq | |
``` | |
### 3. Test the System | |
Run the test script to verify everything is working: | |
```bash | |
python test_agentic_system.py | |
``` | |
You should see: | |
``` | |
๐ Dubsway Video AI - Agentic System Test | |
============================================================ | |
๐ฆ Testing Dependencies | |
============================================================ | |
โ opencv-python | |
โ pillow | |
โ torch | |
โ transformers | |
โ faster_whisper | |
โ langchain | |
โ langchain_groq | |
โ duckduckgo-search | |
โ wikipedia-api | |
๐งช Testing Groq Integration for Agentic Video Analysis | |
============================================================ | |
โ GROQ_API_KEY found | |
โ langchain-groq imported successfully | |
โ Groq test successful: Hello from Groq! | |
๐ Testing Enhanced Analysis Components | |
============================================================ | |
โ Enhanced analysis imports successful | |
โ MultiModalAnalyzer initialized successfully | |
โ Agent created successfully | |
๐ค Testing Agentic Integration | |
============================================================ | |
โ Agentic integration imports successful | |
โ AgenticVideoProcessor initialized successfully | |
โ MCPToolManager initialized successfully | |
โ 5 tools registered | |
๐ All tests passed! Your agentic system is ready to use. | |
``` | |
## ๐โโ๏ธ Running the Agentic System | |
### Option 1: Use Setup Script | |
```bash | |
setup_agentic_system.bat | |
``` | |
### Option 2: Manual Setup | |
```bash | |
# 1. Activate environment | |
myenv31\Scripts\activate.bat | |
# 2. Set API key | |
set GROQ_API_KEY=your_key_here | |
# 3. Run the daemon | |
python -m worker.daemon | |
``` | |
### Option 3: Start Server | |
```bash | |
start-server.bat | |
``` | |
## ๐ง System Architecture | |
### Enhanced Analysis Flow | |
``` | |
Video Upload โ Agentic Processor โ Multi-modal Analysis | |
โ | |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
โ 1. Audio Analysis (Whisper + Emotion Detection) โ | |
โ 2. Visual Analysis (Object Detection + OCR) โ | |
โ 3. Agentic Reasoning (Groq Llama3-8b-8192) โ | |
โ 4. Web Search Integration โ | |
โ 5. Wikipedia Lookups โ | |
โ 6. Beautiful Report Generation โ | |
โ 7. Enhanced Vector Storage โ | |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
โ | |
Comprehensive Analysis Report + PDF + Vector Embeddings | |
``` | |
### Key Components | |
1. **MultiModalAnalyzer**: Handles audio, visual, and text analysis | |
2. **AgenticVideoProcessor**: Orchestrates the entire analysis pipeline | |
3. **MCPToolManager**: Manages web search, Wikipedia, and other tools | |
4. **Enhanced Vector Storage**: Stores analysis with rich metadata | |
## ๐ Enhanced Features | |
### Multi-modal Analysis | |
- **Audio**: Transcription, emotion detection, speaker identification | |
- **Visual**: Object detection, scene understanding, OCR text extraction | |
- **Text**: Sentiment analysis, topic extraction, context enrichment | |
### Agentic Capabilities | |
- **Reasoning**: Advanced understanding using Groq Llama3 | |
- **Context**: Web search for additional information | |
- **Knowledge**: Wikipedia lookups for detailed explanations | |
- **Insights**: Actionable recommendations and analysis | |
### Beautiful Reports | |
``` | |
# ๐น Video Analysis Report | |
## ๐ Overview | |
- **Duration**: 120 seconds | |
- **Resolution**: 1920x1080 | |
- **Language**: English | |
## ๐ต Audio Analysis | |
### Transcription Summary | |
[Enhanced transcription with context] | |
### Key Audio Segments | |
- **0.0s - 30.0s**: Introduction to the topic | |
- **30.0s - 60.0s**: Main content discussion | |
- **60.0s - 90.0s**: Technical details | |
- **90.0s - 120.0s**: Conclusion and summary | |
## ๐ฌ Visual Analysis | |
### Scene Breakdown | |
- **0.0s**: Presenter in office setting | |
- **30.0s**: Screen sharing with diagrams | |
- **60.0s**: Close-up of technical specifications | |
- **90.0s**: Return to presenter view | |
### Key Visual Elements | |
- **Person**: appears 45 times | |
- **Computer**: appears 12 times | |
- **Text**: appears 8 times | |
- **Diagram**: appears 5 times | |
## ๐ฏ Key Insights | |
### Topics Covered | |
- Artificial Intelligence | |
- Machine Learning | |
- Technology Innovation | |
- Business Applications | |
### Sentiment Analysis | |
- **Positive**: 75% | |
- **Negative**: 10% | |
- **Neutral**: 15% | |
### Important Moments | |
- **15s**: Key insight about AI applications | |
- **45s**: Technical breakthrough mentioned | |
- **75s**: Business impact discussion | |
## ๐ Recommendations | |
Based on the analysis, consider: | |
- Content engagement opportunities | |
- Areas for improvement | |
- Target audience insights | |
--- | |
*Report generated using Groq Llama3-8b-8192* | |
``` | |
## ๐ Troubleshooting | |
### Common Issues | |
1. **GROQ_API_KEY not found** | |
``` | |
โ GROQ_API_KEY environment variable not found! | |
``` | |
**Solution**: Set the environment variable or add to `.env` file | |
2. **Import errors** | |
``` | |
โ Failed to import langchain-groq | |
``` | |
**Solution**: Install with `pip install langchain-groq` | |
3. **Agentic analysis fails** | |
``` | |
Agentic analysis failed, falling back to basic Whisper | |
``` | |
**Solution**: Check Groq API key and internet connection | |
4. **Memory issues** | |
``` | |
CUDA out of memory | |
``` | |
**Solution**: Reduce batch size or use CPU processing | |
### Performance Optimization | |
1. **GPU Usage**: The system automatically detects and uses CUDA if available | |
2. **Batch Processing**: Videos are processed one at a time to manage memory | |
3. **Caching**: Analysis results are cached to avoid reprocessing | |
4. **Fallback**: System falls back to basic analysis if enhanced features fail | |
## ๐ฏ Usage Examples | |
### Basic Usage | |
```python | |
from app.utils.agentic_integration import analyze_with_agentic_capabilities | |
# Process video with agentic capabilities | |
transcription, summary = await analyze_with_agentic_capabilities( | |
video_url="https://example.com/video.mp4", | |
user_id=1, | |
db=session | |
) | |
``` | |
### Advanced Usage | |
```python | |
from app.utils.enhanced_analysis import MultiModalAnalyzer | |
# Create analyzer with custom settings | |
analyzer = MultiModalAnalyzer(groq_api_key="your_key") | |
# Perform comprehensive analysis | |
analysis = await analyzer.analyze_video_enhanced("video.mp4") | |
# Access results | |
print(analysis.formatted_report) | |
print(analysis.audio_analysis) | |
print(analysis.visual_analysis) | |
``` | |
## ๐ Benefits of Agentic System | |
1. **Better Understanding**: Multi-modal analysis provides deeper insights | |
2. **Context Awareness**: Web search and Wikipedia integration | |
3. **Beautiful Output**: Professional, formatted reports | |
4. **Enhanced RAG**: Better vector embeddings for retrieval | |
5. **Open Source**: Uses Groq's Llama3-8b-8192 model | |
6. **Scalable**: Handles multiple video formats and sizes | |
7. **Reliable**: Fallback to basic analysis if enhanced features fail | |
## ๐ฎ Future Enhancements | |
- **Real-time Processing**: Stream video analysis | |
- **Custom Models**: Integration with custom fine-tuned models | |
- **Advanced OCR**: Better text extraction from videos | |
- **Emotion Detection**: Advanced audio and visual emotion analysis | |
- **Multi-language**: Support for multiple languages | |
- **API Endpoints**: REST API for external integration | |
## ๐ Support | |
If you encounter issues: | |
1. Check the troubleshooting section above | |
2. Run `python test_agentic_system.py` to diagnose issues | |
3. Check the logs in `worker.log` | |
4. Ensure all dependencies are installed correctly | |
5. Verify your Groq API key is valid and has sufficient credits | |
--- | |
**Happy analyzing! ๐** |