Spaces:
Running
๐ Dubsway Video AI - Groq Agentic System Guide
Overview
This guide will help you set up and run the enhanced agentic video analysis system using Groq with the Llama3-8b-8192 model. The system provides:
- ๐ค Agentic Analysis: Multi-modal video understanding with reasoning capabilities
- ๐ฏ MCP/ACP Integration: Model Context Protocol tools for enhanced analysis
- ๐ Multi-modal Processing: Audio, visual, and text analysis
- ๐ Web Integration: Real-time web search and Wikipedia lookups
- ๐ Beautiful Reports: Comprehensive, formatted analysis reports
- ๐พ Enhanced Vector Storage: Better RAG capabilities with metadata
๐ ๏ธ Setup Instructions
1. Get Groq API Key
- Visit Groq Console
- Sign up for a free account
- Get your API key from the dashboard
- Set the environment variable:
Or add to yourset GROQ_API_KEY=your_key_here
.env
file:GROQ_API_KEY=your_key_here
2. Install Dependencies
Run the setup script:
setup_agentic_system.bat
Or manually:
# Activate virtual environment
myenv31\Scripts\activate.bat
# Install dependencies
pip install -r requirements.txt
# Install Groq specifically
pip install langchain-groq
3. Test the System
Run the test script to verify everything is working:
python test_agentic_system.py
You should see: ``` ๐ Dubsway Video AI - Agentic System Test
๐ฆ Testing Dependencies
โ opencv-python โ pillow โ torch โ transformers โ faster_whisper โ langchain โ langchain_groq โ duckduckgo-search โ wikipedia-api
๐งช Testing Groq Integration for Agentic Video Analysis
โ GROQ_API_KEY found โ langchain-groq imported successfully โ Groq test successful: Hello from Groq!
๐ Testing Enhanced Analysis Components
โ Enhanced analysis imports successful โ MultiModalAnalyzer initialized successfully โ Agent created successfully
๐ค Testing Agentic Integration
โ Agentic integration imports successful โ AgenticVideoProcessor initialized successfully โ MCPToolManager initialized successfully โ 5 tools registered
๐ All tests passed! Your agentic system is ready to use.
## ๐โโ๏ธ Running the Agentic System
### Option 1: Use Setup Script
```bash
setup_agentic_system.bat
Option 2: Manual Setup
# 1. Activate environment
myenv31\Scripts\activate.bat
# 2. Set API key
set GROQ_API_KEY=your_key_here
# 3. Run the daemon
python -m worker.daemon
Option 3: Start Server
start-server.bat
๐ง System Architecture
Enhanced Analysis Flow
Video Upload โ Agentic Processor โ Multi-modal Analysis
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Audio Analysis (Whisper + Emotion Detection) โ
โ 2. Visual Analysis (Object Detection + OCR) โ
โ 3. Agentic Reasoning (Groq Llama3-8b-8192) โ
โ 4. Web Search Integration โ
โ 5. Wikipedia Lookups โ
โ 6. Beautiful Report Generation โ
โ 7. Enhanced Vector Storage โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Comprehensive Analysis Report + PDF + Vector Embeddings
Key Components
- MultiModalAnalyzer: Handles audio, visual, and text analysis
- AgenticVideoProcessor: Orchestrates the entire analysis pipeline
- MCPToolManager: Manages web search, Wikipedia, and other tools
- Enhanced Vector Storage: Stores analysis with rich metadata
๐ Enhanced Features
Multi-modal Analysis
- Audio: Transcription, emotion detection, speaker identification
- Visual: Object detection, scene understanding, OCR text extraction
- Text: Sentiment analysis, topic extraction, context enrichment
Agentic Capabilities
- Reasoning: Advanced understanding using Groq Llama3
- Context: Web search for additional information
- Knowledge: Wikipedia lookups for detailed explanations
- Insights: Actionable recommendations and analysis
Beautiful Reports
# ๐น Video Analysis Report
## ๐ Overview
- **Duration**: 120 seconds
- **Resolution**: 1920x1080
- **Language**: English
## ๐ต Audio Analysis
### Transcription Summary
[Enhanced transcription with context]
### Key Audio Segments
- **0.0s - 30.0s**: Introduction to the topic
- **30.0s - 60.0s**: Main content discussion
- **60.0s - 90.0s**: Technical details
- **90.0s - 120.0s**: Conclusion and summary
## ๐ฌ Visual Analysis
### Scene Breakdown
- **0.0s**: Presenter in office setting
- **30.0s**: Screen sharing with diagrams
- **60.0s**: Close-up of technical specifications
- **90.0s**: Return to presenter view
### Key Visual Elements
- **Person**: appears 45 times
- **Computer**: appears 12 times
- **Text**: appears 8 times
- **Diagram**: appears 5 times
## ๐ฏ Key Insights
### Topics Covered
- Artificial Intelligence
- Machine Learning
- Technology Innovation
- Business Applications
### Sentiment Analysis
- **Positive**: 75%
- **Negative**: 10%
- **Neutral**: 15%
### Important Moments
- **15s**: Key insight about AI applications
- **45s**: Technical breakthrough mentioned
- **75s**: Business impact discussion
## ๐ Recommendations
Based on the analysis, consider:
- Content engagement opportunities
- Areas for improvement
- Target audience insights
---
*Report generated using Groq Llama3-8b-8192*
๐ Troubleshooting
Common Issues
GROQ_API_KEY not found
โ GROQ_API_KEY environment variable not found!
Solution: Set the environment variable or add to
.env
fileImport errors
โ Failed to import langchain-groq
Solution: Install with
pip install langchain-groq
Agentic analysis fails
Agentic analysis failed, falling back to basic Whisper
Solution: Check Groq API key and internet connection
Memory issues
CUDA out of memory
Solution: Reduce batch size or use CPU processing
Performance Optimization
- GPU Usage: The system automatically detects and uses CUDA if available
- Batch Processing: Videos are processed one at a time to manage memory
- Caching: Analysis results are cached to avoid reprocessing
- Fallback: System falls back to basic analysis if enhanced features fail
๐ฏ Usage Examples
Basic Usage
from app.utils.agentic_integration import analyze_with_agentic_capabilities
# Process video with agentic capabilities
transcription, summary = await analyze_with_agentic_capabilities(
video_url="https://example.com/video.mp4",
user_id=1,
db=session
)
Advanced Usage
from app.utils.enhanced_analysis import MultiModalAnalyzer
# Create analyzer with custom settings
analyzer = MultiModalAnalyzer(groq_api_key="your_key")
# Perform comprehensive analysis
analysis = await analyzer.analyze_video_enhanced("video.mp4")
# Access results
print(analysis.formatted_report)
print(analysis.audio_analysis)
print(analysis.visual_analysis)
๐ Benefits of Agentic System
- Better Understanding: Multi-modal analysis provides deeper insights
- Context Awareness: Web search and Wikipedia integration
- Beautiful Output: Professional, formatted reports
- Enhanced RAG: Better vector embeddings for retrieval
- Open Source: Uses Groq's Llama3-8b-8192 model
- Scalable: Handles multiple video formats and sizes
- Reliable: Fallback to basic analysis if enhanced features fail
๐ฎ Future Enhancements
- Real-time Processing: Stream video analysis
- Custom Models: Integration with custom fine-tuned models
- Advanced OCR: Better text extraction from videos
- Emotion Detection: Advanced audio and visual emotion analysis
- Multi-language: Support for multiple languages
- API Endpoints: REST API for external integration
๐ Support
If you encounter issues:
- Check the troubleshooting section above
- Run
python test_agentic_system.py
to diagnose issues - Check the logs in
worker.log
- Ensure all dependencies are installed correctly
- Verify your Groq API key is valid and has sufficient credits
Happy analyzing! ๐