Spaces:

peace2024
/

dubswayAgenticV2

Running

App Files Files Community

dubswayAgenticV2 / GROQ_AGENTIC_GUIDE.md

peace2024

agentic analysis

eefb74d 2 months ago

preview code

raw

history blame contribute delete

8.89 kB

	# 🚀 Dubsway Video AI - Groq Agentic System Guide

	## Overview

	This guide will help you set up and run the enhanced agentic video analysis system using Groq with the Llama3-8b-8192 model. The system provides:

	- 🤖 Agentic Analysis: Multi-modal video understanding with reasoning capabilities
	- 🎯 MCP/ACP Integration: Model Context Protocol tools for enhanced analysis
	- 🔍 Multi-modal Processing: Audio, visual, and text analysis
	- 🌐 Web Integration: Real-time web search and Wikipedia lookups
	- 📊 Beautiful Reports: Comprehensive, formatted analysis reports
	- 💾 Enhanced Vector Storage: Better RAG capabilities with metadata

	## 🛠️ Setup Instructions

	### 1. Get Groq API Key

	1. Visit [Groq Console](https://console.groq.com/)
	2. Sign up for a free account
	3. Get your API key from the dashboard
	4. Set the environment variable:
	```bash
	set GROQ_API_KEY=your_key_here
	```
	Or add to your `.env` file:
	```
	GROQ_API_KEY=your_key_here
	```

	### 2. Install Dependencies

	Run the setup script:
	```bash
	setup_agentic_system.bat
	```

	Or manually:
	```bash
	# Activate virtual environment
	myenv31\Scripts\activate.bat

	# Install dependencies
	pip install -r requirements.txt

	# Install Groq specifically
	pip install langchain-groq
	```

	### 3. Test the System

	Run the test script to verify everything is working:
	```bash
	python test_agentic_system.py
	```

	You should see:
	```
	🚀 Dubsway Video AI - Agentic System Test
	============================================================
	📦 Testing Dependencies
	============================================================
	✅ opencv-python
	✅ pillow
	✅ torch
	✅ transformers
	✅ faster_whisper
	✅ langchain
	✅ langchain_groq
	✅ duckduckgo-search
	✅ wikipedia-api

	🧪 Testing Groq Integration for Agentic Video Analysis
	============================================================
	✅ GROQ_API_KEY found
	✅ langchain-groq imported successfully
	✅ Groq test successful: Hello from Groq!

	🔍 Testing Enhanced Analysis Components
	============================================================
	✅ Enhanced analysis imports successful
	✅ MultiModalAnalyzer initialized successfully
	✅ Agent created successfully

	🤖 Testing Agentic Integration
	============================================================
	✅ Agentic integration imports successful
	✅ AgenticVideoProcessor initialized successfully
	✅ MCPToolManager initialized successfully
	✅ 5 tools registered

	🎉 All tests passed! Your agentic system is ready to use.
	```

	## 🏃‍♂️ Running the Agentic System

	### Option 1: Use Setup Script
	```bash
	setup_agentic_system.bat
	```

	### Option 2: Manual Setup
	```bash
	# 1. Activate environment
	myenv31\Scripts\activate.bat

	# 2. Set API key
	set GROQ_API_KEY=your_key_here

	# 3. Run the daemon
	python -m worker.daemon
	```

	### Option 3: Start Server
	```bash
	start-server.bat
	```

	## 🔧 System Architecture

	### Enhanced Analysis Flow

	```
	Video Upload → Agentic Processor → Multi-modal Analysis
	↓
	┌─────────────────────────────────────────────────────┐
	│ 1. Audio Analysis (Whisper + Emotion Detection) │
	│ 2. Visual Analysis (Object Detection + OCR) │
	│ 3. Agentic Reasoning (Groq Llama3-8b-8192) │
	│ 4. Web Search Integration │
	│ 5. Wikipedia Lookups │
	│ 6. Beautiful Report Generation │
	│ 7. Enhanced Vector Storage │
	└─────────────────────────────────────────────────────┘
	↓
	Comprehensive Analysis Report + PDF + Vector Embeddings
	```

	### Key Components

	1. MultiModalAnalyzer: Handles audio, visual, and text analysis
	2. AgenticVideoProcessor: Orchestrates the entire analysis pipeline
	3. MCPToolManager: Manages web search, Wikipedia, and other tools
	4. Enhanced Vector Storage: Stores analysis with rich metadata

	## 📊 Enhanced Features

	### Multi-modal Analysis
	- Audio: Transcription, emotion detection, speaker identification
	- Visual: Object detection, scene understanding, OCR text extraction
	- Text: Sentiment analysis, topic extraction, context enrichment

	### Agentic Capabilities
	- Reasoning: Advanced understanding using Groq Llama3
	- Context: Web search for additional information
	- Knowledge: Wikipedia lookups for detailed explanations
	- Insights: Actionable recommendations and analysis

	### Beautiful Reports
	```
	# 📹 Video Analysis Report

	## 📊 Overview
	- Duration: 120 seconds
	- Resolution: 1920x1080
	- Language: English

	## 🎵 Audio Analysis
	### Transcription Summary
	[Enhanced transcription with context]

	### Key Audio Segments
	- 0.0s - 30.0s: Introduction to the topic
	- 30.0s - 60.0s: Main content discussion
	- 60.0s - 90.0s: Technical details
	- 90.0s - 120.0s: Conclusion and summary

	## 🎬 Visual Analysis
	### Scene Breakdown
	- 0.0s: Presenter in office setting
	- 30.0s: Screen sharing with diagrams
	- 60.0s: Close-up of technical specifications
	- 90.0s: Return to presenter view

	### Key Visual Elements
	- Person: appears 45 times
	- Computer: appears 12 times
	- Text: appears 8 times
	- Diagram: appears 5 times

	## 🎯 Key Insights
	### Topics Covered
	- Artificial Intelligence
	- Machine Learning
	- Technology Innovation
	- Business Applications

	### Sentiment Analysis
	- Positive: 75%
	- Negative: 10%
	- Neutral: 15%

	### Important Moments
	- 15s: Key insight about AI applications
	- 45s: Technical breakthrough mentioned
	- 75s: Business impact discussion

	## 📈 Recommendations
	Based on the analysis, consider:
	- Content engagement opportunities
	- Areas for improvement
	- Target audience insights

	---
	Report generated using Groq Llama3-8b-8192
	```

	## 🔍 Troubleshooting

	### Common Issues

	1. GROQ_API_KEY not found
	```
	❌ GROQ_API_KEY environment variable not found!
	```
	Solution: Set the environment variable or add to `.env` file

	2. Import errors
	```
	❌ Failed to import langchain-groq
	```
	Solution: Install with `pip install langchain-groq`

	3. Agentic analysis fails
	```
	Agentic analysis failed, falling back to basic Whisper
	```
	Solution: Check Groq API key and internet connection

	4. Memory issues
	```
	CUDA out of memory
	```
	Solution: Reduce batch size or use CPU processing

	### Performance Optimization

	1. GPU Usage: The system automatically detects and uses CUDA if available
	2. Batch Processing: Videos are processed one at a time to manage memory
	3. Caching: Analysis results are cached to avoid reprocessing
	4. Fallback: System falls back to basic analysis if enhanced features fail

	## 🎯 Usage Examples

	### Basic Usage
	```python
	from app.utils.agentic_integration import analyze_with_agentic_capabilities

	# Process video with agentic capabilities
	transcription, summary = await analyze_with_agentic_capabilities(
	video_url="https://example.com/video.mp4",
	user_id=1,
	db=session
	)
	```

	### Advanced Usage
	```python
	from app.utils.enhanced_analysis import MultiModalAnalyzer

	# Create analyzer with custom settings
	analyzer = MultiModalAnalyzer(groq_api_key="your_key")

	# Perform comprehensive analysis
	analysis = await analyzer.analyze_video_enhanced("video.mp4")

	# Access results
	print(analysis.formatted_report)
	print(analysis.audio_analysis)
	print(analysis.visual_analysis)
	```

	## 📈 Benefits of Agentic System

	1. Better Understanding: Multi-modal analysis provides deeper insights
	2. Context Awareness: Web search and Wikipedia integration
	3. Beautiful Output: Professional, formatted reports
	4. Enhanced RAG: Better vector embeddings for retrieval
	5. Open Source: Uses Groq's Llama3-8b-8192 model
	6. Scalable: Handles multiple video formats and sizes
	7. Reliable: Fallback to basic analysis if enhanced features fail

	## 🔮 Future Enhancements

	- Real-time Processing: Stream video analysis
	- Custom Models: Integration with custom fine-tuned models
	- Advanced OCR: Better text extraction from videos
	- Emotion Detection: Advanced audio and visual emotion analysis
	- Multi-language: Support for multiple languages
	- API Endpoints: REST API for external integration

	## 📞 Support

	If you encounter issues:

	1. Check the troubleshooting section above
	2. Run `python test_agentic_system.py` to diagnose issues
	3. Check the logs in `worker.log`
	4. Ensure all dependencies are installed correctly
	5. Verify your Groq API key is valid and has sufficient credits

	---

	Happy analyzing! 🎉