---
title: Enhanced GAIA Agent
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
hf_oauth: true
---

# Enhanced GAIA Agent - Unified AGNO Architecture with Multimodal Capabilities

This HuggingFace Space contains an enhanced unified GAIA agent with comprehensive AGNO tool integration and multimodal capabilities, designed for optimal performance on the GAIA benchmark.

## 🚀 Features

### Core AGNO Tools Integration
- **Calculator**: Mathematical computations and calculations
- **Python**: Code execution and data processing
- **Wikipedia**: Knowledge retrieval and fact checking
- **ArXiv**: Scientific paper search and analysis
- **Firecrawl**: Web scraping and content extraction
- **Exa**: Advanced web search capabilities
- **File**: File operations and document processing
- **Shell**: System command execution

### Multimodal Capabilities
- **Audio Processing**: Faster-Whisper for European community-driven audio transcription
- **Image Analysis**: Open-source image understanding and analysis
- **Document Processing**: Text extraction and analysis from various formats
- **Video Analysis**: YouTube transcript extraction and analysis

### Architecture Highlights
- **Single Agent Solution**: Unified architecture handling all GAIA task types
- **AGNO Native Orchestration**: Intelligent tool selection and coordination
- **Open Source**: No dependency on proprietary APIs for core functionality
- **Deployment Ready**: Optimized for HuggingFace Space deployment
- **Response Format Compliance**: Compatible with HF evaluation system

## 🛠️ Setup

### Required Environment Variables (HuggingFace Spaces Secrets)

Set these as secrets in your HuggingFace Space:

```
MISTRAL_API_KEY=your_mistral_api_key_here
EXA_API_KEY=your_exa_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_api_key_here
```

### Optional Environment Variables
```
OPENAI_API_KEY=your_openai_api_key_here  # For enhanced multimodal features
```

## 📋 Usage Instructions

1. **Login**: Click the "Login with Hugging Face" button
2. **Run Evaluation**: Click "Run Evaluation & Submit All Answers"
3. **View Results**: Monitor the status and see your agent's performance

## 🏗️ Architecture

### Agent Structure
```
Enhanced GAIA Agent
├── Enhanced Unified AGNO Agent (Primary)
│   ├── All AGNO Tools (8 tools)
│   ├── European Open-Source Multimodal Tools (3 tools)
│   └── Response Formatting
├── Utility Modules
│   ├── Response Formatter
│   ├── Question Classifier
│   └── Answer Formatter
└── Provider Integrations
    ├── Search Providers
    ├── EXA Provider
    └── Data Sources
```

### Key Components

#### Enhanced Unified AGNO Agent
- **File**: `agents/enhanced_unified_agno_agent.py`
- **Purpose**: Main agent with comprehensive tool integration
- **Capabilities**: Handles all GAIA task types with intelligent tool orchestration

#### Multimodal Agent
- **File**: `agents/mistral_multimodal_agent.py`
- **Purpose**: Open-source multimodal processing
- **Capabilities**: Audio, image, and document analysis

#### Response Formatting
- **File**: `utils/response_formatter.py`
- **Purpose**: Ensures GAIA-compliant response formatting
- **Features**: Automatic answer extraction and validation

## 🔧 Technical Details

### Dependencies
- **Core Framework**: Gradio 4.44.1, AGNO 1.5.4+
- **AI Models**: Mistral API, Faster-Whisper
- **Web Tools**: Firecrawl, EXA, DuckDuckGo
- **Knowledge**: Wikipedia, ArXiv
- **Utilities**: Pandas, NumPy, Requests

### Performance Optimizations
- **Single Agent Architecture**: Reduces complexity and improves reliability
- **AGNO Native Orchestration**: Leverages built-in tool coordination
- **Open Source Models**: Reduces API dependencies and costs
- **Efficient Error Handling**: Graceful fallbacks and error recovery

## 🧪 Testing

The system includes comprehensive testing:
- **Integration Tests**: Full system validation
- **Tool Tests**: Individual tool functionality
- **Multimodal Tests**: Audio and image processing
- **Deployment Tests**: HuggingFace Space compatibility

## 📊 Performance

### GAIA Benchmark Capabilities
- **Level 1**: Basic reasoning and knowledge retrieval
- **Level 2**: Multi-step reasoning with tool usage
- **Level 3**: Complex multimodal and multi-tool coordination

### Tool Coverage
- **Text Processing**: 100% coverage with multiple tools
- **Mathematical**: Calculator + Python execution
- **Knowledge**: Wikipedia + ArXiv + Web search
- **Multimodal**: Audio transcription + Image analysis
- **Web**: Firecrawl + EXA + DuckDuckGo

## 🚀 Deployment

### HuggingFace Space Deployment
1. **Clone Repository**: Copy all files to your HF Space
2. **Set Secrets**: Configure API keys in Space settings
3. **Deploy**: Space will automatically build and deploy
4. **Test**: Use the interface to validate functionality

### Local Development
```bash
# Install dependencies
pip install -r requirements.txt

# Set environment variables
export MISTRAL_API_KEY="your_key_here"
export EXA_API_KEY="your_key_here"
export FIRECRAWL_API_KEY="your_key_here"

# Run locally
python app.py
```

## 📈 Monitoring

The system includes built-in monitoring:
- **Environment Validation**: API key verification
- **Tool Availability**: Real-time tool status
- **Error Tracking**: Comprehensive error logging
- **Performance Metrics**: Response time and success rates

## 🤝 Contributing

This is a deployment-ready system optimized for the GAIA benchmark. For improvements:
1. **Tool Enhancement**: Add new AGNO tools or improve existing ones
2. **Multimodal Expansion**: Integrate additional open-source models
3. **Performance Optimization**: Improve response times and accuracy
4. **Error Handling**: Enhance robustness and fallback mechanisms

## 📄 License

MIT License - See LICENSE file for details.

## 🔗 Links

- **GAIA Benchmark**: [Official GAIA Repository](https://github.com/gaia-benchmark/gaia)
- **AGNO Framework**: [AGNO Documentation](https://github.com/phidatahq/agno)
- **HuggingFace Spaces**: [Spaces Documentation](https://huggingface.co/docs/hub/spaces)

---

**Note**: This system is optimized for the GAIA benchmark and requires proper API key configuration for full functionality.