Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.33.2
metadata
title: Enhanced GAIA Agent
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
hf_oauth: true
Enhanced GAIA Agent - Unified AGNO Architecture with Multimodal Capabilities
This HuggingFace Space contains an enhanced unified GAIA agent with comprehensive AGNO tool integration and multimodal capabilities, designed for optimal performance on the GAIA benchmark.
π Features
Core AGNO Tools Integration
- Calculator: Mathematical computations and calculations
- Python: Code execution and data processing
- Wikipedia: Knowledge retrieval and fact checking
- ArXiv: Scientific paper search and analysis
- Firecrawl: Web scraping and content extraction
- Exa: Advanced web search capabilities
- File: File operations and document processing
- Shell: System command execution
Multimodal Capabilities
- Audio Processing: Faster-Whisper for European community-driven audio transcription
- Image Analysis: Open-source image understanding and analysis
- Document Processing: Text extraction and analysis from various formats
- Video Analysis: YouTube transcript extraction and analysis
Architecture Highlights
- Single Agent Solution: Unified architecture handling all GAIA task types
- AGNO Native Orchestration: Intelligent tool selection and coordination
- Open Source: No dependency on proprietary APIs for core functionality
- Deployment Ready: Optimized for HuggingFace Space deployment
- Response Format Compliance: Compatible with HF evaluation system
π οΈ Setup
Required Environment Variables (HuggingFace Spaces Secrets)
Set these as secrets in your HuggingFace Space:
MISTRAL_API_KEY=your_mistral_api_key_here
EXA_API_KEY=your_exa_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_api_key_here
Optional Environment Variables
OPENAI_API_KEY=your_openai_api_key_here # For enhanced multimodal features
π Usage Instructions
- Login: Click the "Login with Hugging Face" button
- Run Evaluation: Click "Run Evaluation & Submit All Answers"
- View Results: Monitor the status and see your agent's performance
ποΈ Architecture
Agent Structure
Enhanced GAIA Agent
βββ Enhanced Unified AGNO Agent (Primary)
β βββ All AGNO Tools (8 tools)
β βββ European Open-Source Multimodal Tools (3 tools)
β βββ Response Formatting
βββ Utility Modules
β βββ Response Formatter
β βββ Question Classifier
β βββ Answer Formatter
βββ Provider Integrations
βββ Search Providers
βββ EXA Provider
βββ Data Sources
Key Components
Enhanced Unified AGNO Agent
- File:
agents/enhanced_unified_agno_agent.py
- Purpose: Main agent with comprehensive tool integration
- Capabilities: Handles all GAIA task types with intelligent tool orchestration
Multimodal Agent
- File:
agents/mistral_multimodal_agent.py
- Purpose: Open-source multimodal processing
- Capabilities: Audio, image, and document analysis
Response Formatting
- File:
utils/response_formatter.py
- Purpose: Ensures GAIA-compliant response formatting
- Features: Automatic answer extraction and validation
π§ Technical Details
Dependencies
- Core Framework: Gradio 4.44.1, AGNO 1.5.4+
- AI Models: Mistral API, Faster-Whisper
- Web Tools: Firecrawl, EXA, DuckDuckGo
- Knowledge: Wikipedia, ArXiv
- Utilities: Pandas, NumPy, Requests
Performance Optimizations
- Single Agent Architecture: Reduces complexity and improves reliability
- AGNO Native Orchestration: Leverages built-in tool coordination
- Open Source Models: Reduces API dependencies and costs
- Efficient Error Handling: Graceful fallbacks and error recovery
π§ͺ Testing
The system includes comprehensive testing:
- Integration Tests: Full system validation
- Tool Tests: Individual tool functionality
- Multimodal Tests: Audio and image processing
- Deployment Tests: HuggingFace Space compatibility
π Performance
GAIA Benchmark Capabilities
- Level 1: Basic reasoning and knowledge retrieval
- Level 2: Multi-step reasoning with tool usage
- Level 3: Complex multimodal and multi-tool coordination
Tool Coverage
- Text Processing: 100% coverage with multiple tools
- Mathematical: Calculator + Python execution
- Knowledge: Wikipedia + ArXiv + Web search
- Multimodal: Audio transcription + Image analysis
- Web: Firecrawl + EXA + DuckDuckGo
π Deployment
HuggingFace Space Deployment
- Clone Repository: Copy all files to your HF Space
- Set Secrets: Configure API keys in Space settings
- Deploy: Space will automatically build and deploy
- Test: Use the interface to validate functionality
Local Development
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export MISTRAL_API_KEY="your_key_here"
export EXA_API_KEY="your_key_here"
export FIRECRAWL_API_KEY="your_key_here"
# Run locally
python app.py
π Monitoring
The system includes built-in monitoring:
- Environment Validation: API key verification
- Tool Availability: Real-time tool status
- Error Tracking: Comprehensive error logging
- Performance Metrics: Response time and success rates
π€ Contributing
This is a deployment-ready system optimized for the GAIA benchmark. For improvements:
- Tool Enhancement: Add new AGNO tools or improve existing ones
- Multimodal Expansion: Integrate additional open-source models
- Performance Optimization: Improve response times and accuracy
- Error Handling: Enhance robustness and fallback mechanisms
π License
MIT License - See LICENSE file for details.
π Links
- GAIA Benchmark: Official GAIA Repository
- AGNO Framework: AGNO Documentation
- HuggingFace Spaces: Spaces Documentation
Note: This system is optimized for the GAIA benchmark and requires proper API key configuration for full functionality.