metadata

title: Enhanced GAIA Agent
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
hf_oauth: true

Enhanced GAIA Agent - Unified AGNO Architecture with Multimodal Capabilities

This HuggingFace Space contains an enhanced unified GAIA agent with comprehensive AGNO tool integration and multimodal capabilities, designed for optimal performance on the GAIA benchmark.

🚀 Features

Core AGNO Tools Integration

Calculator: Mathematical computations and calculations
Python: Code execution and data processing
Wikipedia: Knowledge retrieval and fact checking
ArXiv: Scientific paper search and analysis
Firecrawl: Web scraping and content extraction
Exa: Advanced web search capabilities
File: File operations and document processing
Shell: System command execution

Multimodal Capabilities

Audio Processing: Faster-Whisper for European community-driven audio transcription
Image Analysis: Open-source image understanding and analysis
Document Processing: Text extraction and analysis from various formats
Video Analysis: YouTube transcript extraction and analysis

Architecture Highlights

Single Agent Solution: Unified architecture handling all GAIA task types
AGNO Native Orchestration: Intelligent tool selection and coordination
Open Source: No dependency on proprietary APIs for core functionality
Deployment Ready: Optimized for HuggingFace Space deployment
Response Format Compliance: Compatible with HF evaluation system

🛠️ Setup

Required Environment Variables (HuggingFace Spaces Secrets)

Set these as secrets in your HuggingFace Space:

MISTRAL_API_KEY=your_mistral_api_key_here
EXA_API_KEY=your_exa_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_api_key_here

Optional Environment Variables

OPENAI_API_KEY=your_openai_api_key_here  # For enhanced multimodal features

📋 Usage Instructions

Login: Click the "Login with Hugging Face" button
Run Evaluation: Click "Run Evaluation & Submit All Answers"
View Results: Monitor the status and see your agent's performance

🏗️ Architecture

Agent Structure

Enhanced GAIA Agent
├── Enhanced Unified AGNO Agent (Primary)
│   ├── All AGNO Tools (8 tools)
│   ├── European Open-Source Multimodal Tools (3 tools)
│   └── Response Formatting
├── Utility Modules
│   ├── Response Formatter
│   ├── Question Classifier
│   └── Answer Formatter
└── Provider Integrations
    ├── Search Providers
    ├── EXA Provider
    └── Data Sources

Key Components

Enhanced Unified AGNO Agent

File: agents/enhanced_unified_agno_agent.py
Purpose: Main agent with comprehensive tool integration
Capabilities: Handles all GAIA task types with intelligent tool orchestration

Multimodal Agent

File: agents/mistral_multimodal_agent.py
Purpose: Open-source multimodal processing
Capabilities: Audio, image, and document analysis

Response Formatting

File: utils/response_formatter.py
Purpose: Ensures GAIA-compliant response formatting
Features: Automatic answer extraction and validation

🔧 Technical Details

Dependencies

Core Framework: Gradio 4.44.1, AGNO 1.5.4+
AI Models: Mistral API, Faster-Whisper
Web Tools: Firecrawl, EXA, DuckDuckGo
Knowledge: Wikipedia, ArXiv
Utilities: Pandas, NumPy, Requests

Performance Optimizations

Single Agent Architecture: Reduces complexity and improves reliability
AGNO Native Orchestration: Leverages built-in tool coordination
Open Source Models: Reduces API dependencies and costs
Efficient Error Handling: Graceful fallbacks and error recovery

🧪 Testing

The system includes comprehensive testing:

Integration Tests: Full system validation
Tool Tests: Individual tool functionality
Multimodal Tests: Audio and image processing
Deployment Tests: HuggingFace Space compatibility

📊 Performance

GAIA Benchmark Capabilities

Level 1: Basic reasoning and knowledge retrieval
Level 2: Multi-step reasoning with tool usage
Level 3: Complex multimodal and multi-tool coordination

Tool Coverage

Text Processing: 100% coverage with multiple tools
Mathematical: Calculator + Python execution
Knowledge: Wikipedia + ArXiv + Web search
Multimodal: Audio transcription + Image analysis
Web: Firecrawl + EXA + DuckDuckGo

🚀 Deployment

HuggingFace Space Deployment

Clone Repository: Copy all files to your HF Space
Set Secrets: Configure API keys in Space settings
Deploy: Space will automatically build and deploy
Test: Use the interface to validate functionality

Local Development

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export MISTRAL_API_KEY="your_key_here"
export EXA_API_KEY="your_key_here"
export FIRECRAWL_API_KEY="your_key_here"

# Run locally
python app.py

📈 Monitoring

The system includes built-in monitoring:

Environment Validation: API key verification
Tool Availability: Real-time tool status
Error Tracking: Comprehensive error logging
Performance Metrics: Response time and success rates

🤝 Contributing

This is a deployment-ready system optimized for the GAIA benchmark. For improvements:

Tool Enhancement: Add new AGNO tools or improve existing ones
Multimodal Expansion: Integrate additional open-source models
Performance Optimization: Improve response times and accuracy
Error Handling: Enhance robustness and fallback mechanisms

📄 License

MIT License - See LICENSE file for details.

🔗 Links

GAIA Benchmark: Official GAIA Repository
AGNO Framework: AGNO Documentation
HuggingFace Spaces: Spaces Documentation

Note: This system is optimized for the GAIA benchmark and requires proper API key configuration for full functionality.