dubswayAgenticV2 / GROQ_AGENTIC_GUIDE.md
peace2024's picture
agentic analysis
eefb74d
# ๐Ÿš€ Dubsway Video AI - Groq Agentic System Guide
## Overview
This guide will help you set up and run the enhanced agentic video analysis system using **Groq** with the **Llama3-8b-8192** model. The system provides:
- ๐Ÿค– **Agentic Analysis**: Multi-modal video understanding with reasoning capabilities
- ๐ŸŽฏ **MCP/ACP Integration**: Model Context Protocol tools for enhanced analysis
- ๐Ÿ” **Multi-modal Processing**: Audio, visual, and text analysis
- ๐ŸŒ **Web Integration**: Real-time web search and Wikipedia lookups
- ๐Ÿ“Š **Beautiful Reports**: Comprehensive, formatted analysis reports
- ๐Ÿ’พ **Enhanced Vector Storage**: Better RAG capabilities with metadata
## ๐Ÿ› ๏ธ Setup Instructions
### 1. Get Groq API Key
1. Visit [Groq Console](https://console.groq.com/)
2. Sign up for a free account
3. Get your API key from the dashboard
4. Set the environment variable:
```bash
set GROQ_API_KEY=your_key_here
```
Or add to your `.env` file:
```
GROQ_API_KEY=your_key_here
```
### 2. Install Dependencies
Run the setup script:
```bash
setup_agentic_system.bat
```
Or manually:
```bash
# Activate virtual environment
myenv31\Scripts\activate.bat
# Install dependencies
pip install -r requirements.txt
# Install Groq specifically
pip install langchain-groq
```
### 3. Test the System
Run the test script to verify everything is working:
```bash
python test_agentic_system.py
```
You should see:
```
๐Ÿš€ Dubsway Video AI - Agentic System Test
============================================================
๐Ÿ“ฆ Testing Dependencies
============================================================
โœ… opencv-python
โœ… pillow
โœ… torch
โœ… transformers
โœ… faster_whisper
โœ… langchain
โœ… langchain_groq
โœ… duckduckgo-search
โœ… wikipedia-api
๐Ÿงช Testing Groq Integration for Agentic Video Analysis
============================================================
โœ… GROQ_API_KEY found
โœ… langchain-groq imported successfully
โœ… Groq test successful: Hello from Groq!
๐Ÿ” Testing Enhanced Analysis Components
============================================================
โœ… Enhanced analysis imports successful
โœ… MultiModalAnalyzer initialized successfully
โœ… Agent created successfully
๐Ÿค– Testing Agentic Integration
============================================================
โœ… Agentic integration imports successful
โœ… AgenticVideoProcessor initialized successfully
โœ… MCPToolManager initialized successfully
โœ… 5 tools registered
๐ŸŽ‰ All tests passed! Your agentic system is ready to use.
```
## ๐Ÿƒโ€โ™‚๏ธ Running the Agentic System
### Option 1: Use Setup Script
```bash
setup_agentic_system.bat
```
### Option 2: Manual Setup
```bash
# 1. Activate environment
myenv31\Scripts\activate.bat
# 2. Set API key
set GROQ_API_KEY=your_key_here
# 3. Run the daemon
python -m worker.daemon
```
### Option 3: Start Server
```bash
start-server.bat
```
## ๐Ÿ”ง System Architecture
### Enhanced Analysis Flow
```
Video Upload โ†’ Agentic Processor โ†’ Multi-modal Analysis
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. Audio Analysis (Whisper + Emotion Detection) โ”‚
โ”‚ 2. Visual Analysis (Object Detection + OCR) โ”‚
โ”‚ 3. Agentic Reasoning (Groq Llama3-8b-8192) โ”‚
โ”‚ 4. Web Search Integration โ”‚
โ”‚ 5. Wikipedia Lookups โ”‚
โ”‚ 6. Beautiful Report Generation โ”‚
โ”‚ 7. Enhanced Vector Storage โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
Comprehensive Analysis Report + PDF + Vector Embeddings
```
### Key Components
1. **MultiModalAnalyzer**: Handles audio, visual, and text analysis
2. **AgenticVideoProcessor**: Orchestrates the entire analysis pipeline
3. **MCPToolManager**: Manages web search, Wikipedia, and other tools
4. **Enhanced Vector Storage**: Stores analysis with rich metadata
## ๐Ÿ“Š Enhanced Features
### Multi-modal Analysis
- **Audio**: Transcription, emotion detection, speaker identification
- **Visual**: Object detection, scene understanding, OCR text extraction
- **Text**: Sentiment analysis, topic extraction, context enrichment
### Agentic Capabilities
- **Reasoning**: Advanced understanding using Groq Llama3
- **Context**: Web search for additional information
- **Knowledge**: Wikipedia lookups for detailed explanations
- **Insights**: Actionable recommendations and analysis
### Beautiful Reports
```
# ๐Ÿ“น Video Analysis Report
## ๐Ÿ“Š Overview
- **Duration**: 120 seconds
- **Resolution**: 1920x1080
- **Language**: English
## ๐ŸŽต Audio Analysis
### Transcription Summary
[Enhanced transcription with context]
### Key Audio Segments
- **0.0s - 30.0s**: Introduction to the topic
- **30.0s - 60.0s**: Main content discussion
- **60.0s - 90.0s**: Technical details
- **90.0s - 120.0s**: Conclusion and summary
## ๐ŸŽฌ Visual Analysis
### Scene Breakdown
- **0.0s**: Presenter in office setting
- **30.0s**: Screen sharing with diagrams
- **60.0s**: Close-up of technical specifications
- **90.0s**: Return to presenter view
### Key Visual Elements
- **Person**: appears 45 times
- **Computer**: appears 12 times
- **Text**: appears 8 times
- **Diagram**: appears 5 times
## ๐ŸŽฏ Key Insights
### Topics Covered
- Artificial Intelligence
- Machine Learning
- Technology Innovation
- Business Applications
### Sentiment Analysis
- **Positive**: 75%
- **Negative**: 10%
- **Neutral**: 15%
### Important Moments
- **15s**: Key insight about AI applications
- **45s**: Technical breakthrough mentioned
- **75s**: Business impact discussion
## ๐Ÿ“ˆ Recommendations
Based on the analysis, consider:
- Content engagement opportunities
- Areas for improvement
- Target audience insights
---
*Report generated using Groq Llama3-8b-8192*
```
## ๐Ÿ” Troubleshooting
### Common Issues
1. **GROQ_API_KEY not found**
```
โŒ GROQ_API_KEY environment variable not found!
```
**Solution**: Set the environment variable or add to `.env` file
2. **Import errors**
```
โŒ Failed to import langchain-groq
```
**Solution**: Install with `pip install langchain-groq`
3. **Agentic analysis fails**
```
Agentic analysis failed, falling back to basic Whisper
```
**Solution**: Check Groq API key and internet connection
4. **Memory issues**
```
CUDA out of memory
```
**Solution**: Reduce batch size or use CPU processing
### Performance Optimization
1. **GPU Usage**: The system automatically detects and uses CUDA if available
2. **Batch Processing**: Videos are processed one at a time to manage memory
3. **Caching**: Analysis results are cached to avoid reprocessing
4. **Fallback**: System falls back to basic analysis if enhanced features fail
## ๐ŸŽฏ Usage Examples
### Basic Usage
```python
from app.utils.agentic_integration import analyze_with_agentic_capabilities
# Process video with agentic capabilities
transcription, summary = await analyze_with_agentic_capabilities(
video_url="https://example.com/video.mp4",
user_id=1,
db=session
)
```
### Advanced Usage
```python
from app.utils.enhanced_analysis import MultiModalAnalyzer
# Create analyzer with custom settings
analyzer = MultiModalAnalyzer(groq_api_key="your_key")
# Perform comprehensive analysis
analysis = await analyzer.analyze_video_enhanced("video.mp4")
# Access results
print(analysis.formatted_report)
print(analysis.audio_analysis)
print(analysis.visual_analysis)
```
## ๐Ÿ“ˆ Benefits of Agentic System
1. **Better Understanding**: Multi-modal analysis provides deeper insights
2. **Context Awareness**: Web search and Wikipedia integration
3. **Beautiful Output**: Professional, formatted reports
4. **Enhanced RAG**: Better vector embeddings for retrieval
5. **Open Source**: Uses Groq's Llama3-8b-8192 model
6. **Scalable**: Handles multiple video formats and sizes
7. **Reliable**: Fallback to basic analysis if enhanced features fail
## ๐Ÿ”ฎ Future Enhancements
- **Real-time Processing**: Stream video analysis
- **Custom Models**: Integration with custom fine-tuned models
- **Advanced OCR**: Better text extraction from videos
- **Emotion Detection**: Advanced audio and visual emotion analysis
- **Multi-language**: Support for multiple languages
- **API Endpoints**: REST API for external integration
## ๐Ÿ“ž Support
If you encounter issues:
1. Check the troubleshooting section above
2. Run `python test_agentic_system.py` to diagnose issues
3. Check the logs in `worker.log`
4. Ensure all dependencies are installed correctly
5. Verify your Groq API key is valid and has sufficient credits
---
**Happy analyzing! ๐ŸŽ‰**