Spaces:
Running
Running
title: Napolab Leaderboard | |
emoji: π | |
colorFrom: blue | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 5.38.2 | |
app_file: app.py | |
pinned: true | |
python_version: "3.10" | |
tags: | |
- nlp | |
- portuguese | |
- benchmarking | |
- language-models | |
- gradio | |
datasets: | |
- ruanchaves/napolab | |
- assin | |
- assin2 | |
- ruanchaves/hatebr | |
- ruanchaves/faquad-nli | |
short_description: "The Natural Portuguese Language Benchmark" | |
# Napolab Leaderboard - Gradio App | |
A comprehensive Gradio web application for exploring and benchmarking Portuguese language models using the Napolab dataset collection. | |
## Features | |
- **π Benchmark Results**: Single comprehensive table with one column per dataset and clickable model links | |
- **π Model Analysis**: Radar chart showing model performance across all datasets | |
- **βΉοΈ About**: Information about Napolab and citation details | |
## Installation | |
1. Navigate to the leaderboard directory: | |
```bash | |
cd dev/napolab/leaderboard | |
``` | |
2. Install the required dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. Extract data from external sources (optional but recommended): | |
```bash | |
# Extract data from Portuguese LLM Leaderboard | |
python extract_portuguese_leaderboard.py | |
# Download external models data | |
python download_external_models.py | |
``` | |
4. Run the Gradio app: | |
```bash | |
python app.py | |
``` | |
The app will be available at `http://localhost:7860` | |
## Data Management | |
The app uses a YAML configuration file (`data.yaml`) for adding new data, making it easy to edit and maintain. | |
### Data Extraction Scripts | |
The leaderboard includes scripts to automatically extract and update data from external sources: | |
#### `extract_portuguese_leaderboard.py` | |
This script extracts benchmark results from the Open Portuguese LLM Leaderboard: | |
- Fetches data from the Hugging Face Spaces leaderboard | |
- Updates the `portuguese_leaderboard.csv` file | |
- Includes both open-source and proprietary models | |
- Automatically handles data formatting and validation | |
#### `download_external_models.py` | |
This script downloads additional model data: | |
- Fetches model metadata from various sources | |
- Updates the `external_models.csv` file | |
- Includes model links and performance metrics | |
- Ensures data consistency with the main leaderboard | |
**Note**: These scripts require internet connection and may take a few minutes to complete. Run them periodically to keep the leaderboard data up to date. | |
## Usage | |
### Benchmark Results Tab | |
- **Single Comprehensive Table**: Shows all models with one column per dataset | |
- **Dataset Columns**: Each dataset has its own column showing model performance scores | |
- **Average Column**: Shows the average performance across all datasets for each model | |
- **Model Column**: Clickable links to Hugging Face model pages | |
- **Sorted Results**: Models are sorted by overall average performance (descending) | |
### Model Analysis Tab | |
- Radar chart showing each model's performance across all datasets | |
- **Default view**: Shows only bertimbau-large and mdeberta-v3-base models | |
- **Interactive legend**: Click to show/hide models, double-click to isolate | |
- Each line represents one model, each point represents one dataset | |
- Color-coded by model architecture | |
- Interactive hover information with detailed performance metrics | |
### Model Hub Tab | |
- Access links to pre-trained models on Hugging Face | |
- Models are organized by dataset and architecture type | |
- Direct links to model repositories | |
## Supported Datasets | |
The app includes all Napolab datasets: | |
- **ASSIN**: Semantic Similarity and Textual Entailment | |
- **ASSIN 2**: Semantic Similarity and Textual Entailment (v2) | |
- **Rerelem**: Relational Reasoning | |
- **HateBR**: Hate Speech Detection | |
- **Reli-SA**: Religious Sentiment Analysis | |
- **FaQUaD-NLI**: Factual Question Answering and NLI | |
- **PorSimplesSent**: Simple Sentences Sentiment Analysis | |
## Model Architectures | |
The benchmark includes models based on: | |
- **mDeBERTa v3**: Multilingual DeBERTa v3 | |
- **BERT Large**: Large Portuguese BERT | |
- **BERT Base**: Base Portuguese BERT | |
## Data Management | |
The app now uses a YAML configuration file (`data.yaml`) for all data, making it easy to edit and maintain. | |
### Editing Data | |
Simply edit the `data.yaml` file to: | |
- Add new datasets | |
- Update benchmark results | |
- Add new models | |
- Modify model metadata | |
### Data Structure | |
The YAML file contains four main sections: | |
1. **datasets**: Information about each dataset | |
2. **benchmark_results**: Performance metrics for models on datasets | |
3. **model_metadata**: Model information (parameters, architecture, etc.) | |
4. **additional_models**: Additional models for the Model Hub | |
### Data Management Tools | |
Use the `manage_data.py` script for data operations: | |
```bash | |
# Validate the data structure | |
python manage_data.py validate | |
# Add a new dataset | |
python manage_data.py add-dataset \ | |
--dataset-name "new_dataset" \ | |
--dataset-display-name "New Dataset" \ | |
--dataset-description "Description of the dataset" \ | |
--dataset-tasks "Classification" "Sentiment Analysis" \ | |
--dataset-url "https://huggingface.co/datasets/new_dataset" | |
# Add benchmark results | |
python manage_data.py add-benchmark \ | |
--dataset-name "assin" \ | |
--model-name "new-model" \ | |
--metrics "accuracy=0.92" "f1=0.91" | |
# Add model metadata | |
python manage_data.py add-model \ | |
--model-name "new-model" \ | |
--parameters 110000000 \ | |
--architecture "BERT Base" \ | |
--base-model "bert-base-uncased" \ | |
--task "Classification" \ | |
--huggingface-url "https://huggingface.co/new-model" | |
``` | |
### Customization | |
To add new datasets or benchmark results: | |
1. Edit the `data.yaml` file directly, or | |
2. Use the `manage_data.py` script for structured additions | |
3. The app will automatically reload the data when restarted | |
## Troubleshooting | |
- **Dataset loading errors**: Ensure you have internet connection to access Hugging Face datasets | |
- **Memory issues**: Reduce the number of samples in the Dataset Explorer | |
- **Port conflicts**: Change the port in the `app.launch()` call | |
## Contributing | |
Feel free to contribute by: | |
- Adding new datasets | |
- Improving visualizations | |
- Adding new features | |
- Reporting bugs | |
## License | |
This project follows the same license as the main Napolab repository. |