Spaces:
Sleeping
Sleeping
title: NovaEval by Noveum.ai | |
emoji: ⚡ | |
colorFrom: purple | |
colorTo: blue | |
sdk: docker | |
pinned: false | |
# NovaEval by Noveum.ai | |
Advanced AI Model Evaluation Platform powered by Hugging Face Models | |
## 🚀 Features | |
### 🤖 **Comprehensive Model Selection** | |
- **15+ Top Hugging Face Models** across different size categories | |
- **Real-time Model Search** with provider filtering | |
- **Detailed Model Information** including capabilities, size, and provider | |
- **Size-based Filtering** (Small 1-3B, Medium 7B, Large 14B+) | |
### 📊 **Rich Dataset Collection** | |
- **11 Evaluation Datasets** covering reasoning, knowledge, math, code, and language | |
- **Category-based Filtering** for easy dataset discovery | |
- **Detailed Dataset Information** including sample counts and difficulty levels | |
- **Popular Benchmarks** like MMLU, HellaSwag, GSM8K, HumanEval | |
### ⚡ **Advanced Evaluation Engine** | |
- **Real-time Progress Tracking** with WebSocket updates | |
- **Live Evaluation Logs** showing detailed request/response data | |
- **Multiple Metrics Support** (Accuracy, F1-Score, BLEU, ROUGE, Pass@K) | |
- **Configurable Parameters** (sample size, temperature, max tokens) | |
### 🎨 **Modern User Interface** | |
- **Responsive Design** optimized for desktop and mobile | |
- **Interactive Model Cards** with hover effects and selection states | |
- **Real-time Configuration** with sliders and checkboxes | |
- **Professional Gradient Design** with smooth animations | |
## 🔧 **Technical Stack** | |
- **Backend**: FastAPI + Python 3.11 | |
- **Frontend**: HTML5 + Tailwind CSS + Vanilla JavaScript | |
- **Real-time**: WebSocket for live updates | |
- **Models**: Hugging Face Inference API (free tier) | |
- **Deployment**: Docker + Hugging Face Spaces | |
## 📋 **Available Models** | |
### Small Models (1-3B) | |
- **FLAN-T5 Large** (0.8B) - Google | |
- **Qwen 2.5 3B** (3B) - Alibaba | |
- **Gemma 2B** (2B) - Google | |
### Medium Models (7B) | |
- **Qwen 2.5 7B** (7B) - Alibaba | |
- **Mistral 7B** (7B) - Mistral AI | |
- **DialoGPT Medium** (345M) - Microsoft | |
- **CodeLlama 7B Python** (7B) - Meta | |
### Large Models (14B+) | |
- **Qwen 2.5 14B** (14B) - Alibaba | |
- **Qwen 2.5 32B** (32B) - Alibaba | |
- **Qwen 2.5 72B** (72B) - Alibaba | |
## 📊 **Available Datasets** | |
### Reasoning | |
- **HellaSwag** - Commonsense reasoning (60K samples) | |
- **CommonsenseQA** - Reasoning questions (12.1K samples) | |
- **ARC** - Science reasoning (7.8K samples) | |
### Knowledge | |
- **MMLU** - Multitask understanding (231K samples) | |
- **BoolQ** - Reading comprehension (12.7K samples) | |
### Math | |
- **GSM8K** - Grade school math (17.6K samples) | |
- **AQUA-RAT** - Algebraic reasoning (196K samples) | |
### Code | |
- **HumanEval** - Python code generation (164 samples) | |
- **MBPP** - Basic Python problems (1.4K samples) | |
### Language | |
- **IMDB Reviews** - Sentiment analysis (100K samples) | |
- **CNN/DailyMail** - Summarization (936K samples) | |
## 🎯 **Evaluation Metrics** | |
- **Accuracy** - Percentage of correct predictions | |
- **F1 Score** - Harmonic mean of precision and recall | |
- **BLEU Score** - Text generation quality | |
- **ROUGE Score** - Summarization quality | |
- **Pass@K** - Code generation success rate | |
## 🚀 **Quick Start** | |
### Option 1: Direct Upload to Hugging Face Spaces | |
1. Create a new Space on Hugging Face | |
2. Choose "Docker" as the SDK | |
3. Upload these files: | |
- `app.py` (renamed from `advanced_novaeval_app.py`) | |
- `requirements.txt` | |
- `Dockerfile` | |
- `README.md` | |
4. Commit and push - your Space will build automatically! | |
### Option 2: Local Development | |
```bash | |
# Install dependencies | |
pip install -r requirements.txt | |
# Run the application | |
python advanced_novaeval_app.py | |
# Open browser to http://localhost:7860 | |
``` | |
## 🔧 **Configuration Options** | |
### Model Parameters | |
- **Sample Size**: 10-1000 samples | |
- **Temperature**: 0.0-2.0 (creativity control) | |
- **Max Tokens**: 128-2048 (response length) | |
- **Top-p**: 0.9 (nucleus sampling) | |
### Evaluation Settings | |
- **Multiple Model Selection**: Compare up to 10 models | |
- **Flexible Metrics**: Choose relevant metrics for your task | |
- **Real-time Monitoring**: Watch evaluations progress live | |
- **Export Results**: Download results in JSON format | |
## 📱 **User Experience** | |
### Workflow | |
1. **Select Models** - Choose from 15+ Hugging Face models | |
2. **Pick Dataset** - Select from 11 evaluation datasets | |
3. **Configure Metrics** - Choose relevant evaluation metrics | |
4. **Set Parameters** - Adjust sample size, temperature, etc. | |
5. **Start Evaluation** - Watch real-time progress and logs | |
6. **View Results** - Analyze performance comparisons | |
### Features | |
- **Model Search** - Find models by name or provider | |
- **Category Filtering** - Filter by model size or dataset type | |
- **Real-time Logs** - See actual evaluation steps | |
- **Progress Tracking** - Visual progress bars and percentages | |
- **Interactive Results** - Compare models side-by-side | |
## 🌟 **Why NovaEval?** | |
### For Researchers | |
- **Comprehensive Benchmarking** across multiple models and datasets | |
- **Standardized Evaluation** with consistent metrics and procedures | |
- **Real-time Monitoring** to track evaluation progress | |
- **Export Capabilities** for further analysis | |
### For Developers | |
- **Easy Integration** with Hugging Face ecosystem | |
- **No API Keys Required** - uses free HF Inference API | |
- **Modern Interface** with responsive design | |
- **Detailed Logging** for debugging and analysis | |
### For Teams | |
- **Collaborative Evaluation** with shareable results | |
- **Professional Interface** suitable for presentations | |
- **Comprehensive Documentation** for easy onboarding | |
- **Open Source** with full customization capabilities | |
## 🔗 **Links** | |
- **Noveum.ai**: [https://noveum.ai](https://noveum.ai) | |
- **NovaEval Framework**: [https://github.com/Noveum/NovaEval](https://github.com/Noveum/NovaEval) | |
- **Hugging Face Models**: [https://huggingface.co/models](https://huggingface.co/models) | |
- **Documentation**: Available in the application interface | |
## 📄 **License** | |
This project is open source and available under the MIT License. | |
## 🤝 **Contributing** | |
We welcome contributions! Please see our contributing guidelines for more information. | |
--- | |
**Built with ❤️ by [Noveum.ai](https://noveum.ai) - Advancing AI Evaluation** | |