Spaces:
Running
Running
File size: 6,381 Bytes
570bc12 16da91b 570bc12 d2667d0 570bc12 0855f92 570bc12 0855f92 570bc12 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
---
title: Napolab Leaderboard
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
python_version: "3.10"
tags:
- nlp
- portuguese
- benchmarking
- language-models
- gradio
datasets:
- ruanchaves/napolab
- assin
- assin2
- ruanchaves/hatebr
- ruanchaves/faquad-nli
short_description: "The Natural Portuguese Language Benchmark"
---
# Napolab Leaderboard - Gradio App
A comprehensive Gradio web application for exploring and benchmarking Portuguese language models using the Napolab dataset collection.
## Features
- **π Benchmark Results**: Single comprehensive table with one column per dataset and clickable model links
- **π Model Analysis**: Radar chart showing model performance across all datasets
- **βΉοΈ About**: Information about Napolab and citation details
## Installation
1. Navigate to the leaderboard directory:
```bash
cd dev/napolab/leaderboard
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Extract data from external sources (optional but recommended):
```bash
# Extract data from Portuguese LLM Leaderboard
python extract_portuguese_leaderboard.py
# Download external models data
python download_external_models.py
```
4. Run the Gradio app:
```bash
python app.py
```
The app will be available at `http://localhost:7860`
## Data Management
The app uses a YAML configuration file (`data.yaml`) for adding new data, making it easy to edit and maintain.
### Data Extraction Scripts
The leaderboard includes scripts to automatically extract and update data from external sources:
#### `extract_portuguese_leaderboard.py`
This script extracts benchmark results from the Open Portuguese LLM Leaderboard:
- Fetches data from the Hugging Face Spaces leaderboard
- Updates the `portuguese_leaderboard.csv` file
- Includes both open-source and proprietary models
- Automatically handles data formatting and validation
#### `download_external_models.py`
This script downloads additional model data:
- Fetches model metadata from various sources
- Updates the `external_models.csv` file
- Includes model links and performance metrics
- Ensures data consistency with the main leaderboard
**Note**: These scripts require internet connection and may take a few minutes to complete. Run them periodically to keep the leaderboard data up to date.
## Usage
### Benchmark Results Tab
- **Single Comprehensive Table**: Shows all models with one column per dataset
- **Dataset Columns**: Each dataset has its own column showing model performance scores
- **Average Column**: Shows the average performance across all datasets for each model
- **Model Column**: Clickable links to Hugging Face model pages
- **Sorted Results**: Models are sorted by overall average performance (descending)
### Model Analysis Tab
- Radar chart showing each model's performance across all datasets
- **Default view**: Shows only bertimbau-large and mdeberta-v3-base models
- **Interactive legend**: Click to show/hide models, double-click to isolate
- Each line represents one model, each point represents one dataset
- Color-coded by model architecture
- Interactive hover information with detailed performance metrics
### Model Hub Tab
- Access links to pre-trained models on Hugging Face
- Models are organized by dataset and architecture type
- Direct links to model repositories
## Supported Datasets
The app includes all Napolab datasets:
- **ASSIN**: Semantic Similarity and Textual Entailment
- **ASSIN 2**: Semantic Similarity and Textual Entailment (v2)
- **Rerelem**: Relational Reasoning
- **HateBR**: Hate Speech Detection
- **Reli-SA**: Religious Sentiment Analysis
- **FaQUaD-NLI**: Factual Question Answering and NLI
- **PorSimplesSent**: Simple Sentences Sentiment Analysis
## Model Architectures
The benchmark includes models based on:
- **mDeBERTa v3**: Multilingual DeBERTa v3
- **BERT Large**: Large Portuguese BERT
- **BERT Base**: Base Portuguese BERT
## Data Management
The app now uses a YAML configuration file (`data.yaml`) for all data, making it easy to edit and maintain.
### Editing Data
Simply edit the `data.yaml` file to:
- Add new datasets
- Update benchmark results
- Add new models
- Modify model metadata
### Data Structure
The YAML file contains four main sections:
1. **datasets**: Information about each dataset
2. **benchmark_results**: Performance metrics for models on datasets
3. **model_metadata**: Model information (parameters, architecture, etc.)
4. **additional_models**: Additional models for the Model Hub
### Data Management Tools
Use the `manage_data.py` script for data operations:
```bash
# Validate the data structure
python manage_data.py validate
# Add a new dataset
python manage_data.py add-dataset \
--dataset-name "new_dataset" \
--dataset-display-name "New Dataset" \
--dataset-description "Description of the dataset" \
--dataset-tasks "Classification" "Sentiment Analysis" \
--dataset-url "https://huggingface.co/datasets/new_dataset"
# Add benchmark results
python manage_data.py add-benchmark \
--dataset-name "assin" \
--model-name "new-model" \
--metrics "accuracy=0.92" "f1=0.91"
# Add model metadata
python manage_data.py add-model \
--model-name "new-model" \
--parameters 110000000 \
--architecture "BERT Base" \
--base-model "bert-base-uncased" \
--task "Classification" \
--huggingface-url "https://huggingface.co/new-model"
```
### Customization
To add new datasets or benchmark results:
1. Edit the `data.yaml` file directly, or
2. Use the `manage_data.py` script for structured additions
3. The app will automatically reload the data when restarted
## Troubleshooting
- **Dataset loading errors**: Ensure you have internet connection to access Hugging Face datasets
- **Memory issues**: Reduce the number of samples in the Dataset Explorer
- **Port conflicts**: Change the port in the `app.launch()` call
## Contributing
Feel free to contribute by:
- Adding new datasets
- Improving visualizations
- Adding new features
- Reporting bugs
## License
This project follows the same license as the main Napolab repository. |