napolab / README.md
ruanchaves's picture
Upload 3 files
16da91b verified
---
title: Napolab Leaderboard
emoji: 🌎
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
python_version: "3.10"
tags:
- nlp
- portuguese
- benchmarking
- language-models
- gradio
datasets:
- ruanchaves/napolab
- assin
- assin2
- ruanchaves/hatebr
- ruanchaves/faquad-nli
short_description: "The Natural Portuguese Language Benchmark"
---
# Napolab Leaderboard - Gradio App
A comprehensive Gradio web application for exploring and benchmarking Portuguese language models using the Napolab dataset collection.
## Features
- **πŸ† Benchmark Results**: Single comprehensive table with one column per dataset and clickable model links
- **πŸ“ˆ Model Analysis**: Radar chart showing model performance across all datasets
- **ℹ️ About**: Information about Napolab and citation details
## Installation
1. Navigate to the leaderboard directory:
```bash
cd dev/napolab/leaderboard
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Extract data from external sources (optional but recommended):
```bash
# Extract data from Portuguese LLM Leaderboard
python extract_portuguese_leaderboard.py
# Download external models data
python download_external_models.py
```
4. Run the Gradio app:
```bash
python app.py
```
The app will be available at `http://localhost:7860`
## Data Management
The app uses a YAML configuration file (`data.yaml`) for adding new data, making it easy to edit and maintain.
### Data Extraction Scripts
The leaderboard includes scripts to automatically extract and update data from external sources:
#### `extract_portuguese_leaderboard.py`
This script extracts benchmark results from the Open Portuguese LLM Leaderboard:
- Fetches data from the Hugging Face Spaces leaderboard
- Updates the `portuguese_leaderboard.csv` file
- Includes both open-source and proprietary models
- Automatically handles data formatting and validation
#### `download_external_models.py`
This script downloads additional model data:
- Fetches model metadata from various sources
- Updates the `external_models.csv` file
- Includes model links and performance metrics
- Ensures data consistency with the main leaderboard
**Note**: These scripts require internet connection and may take a few minutes to complete. Run them periodically to keep the leaderboard data up to date.
## Usage
### Benchmark Results Tab
- **Single Comprehensive Table**: Shows all models with one column per dataset
- **Dataset Columns**: Each dataset has its own column showing model performance scores
- **Average Column**: Shows the average performance across all datasets for each model
- **Model Column**: Clickable links to Hugging Face model pages
- **Sorted Results**: Models are sorted by overall average performance (descending)
### Model Analysis Tab
- Radar chart showing each model's performance across all datasets
- **Default view**: Shows only bertimbau-large and mdeberta-v3-base models
- **Interactive legend**: Click to show/hide models, double-click to isolate
- Each line represents one model, each point represents one dataset
- Color-coded by model architecture
- Interactive hover information with detailed performance metrics
### Model Hub Tab
- Access links to pre-trained models on Hugging Face
- Models are organized by dataset and architecture type
- Direct links to model repositories
## Supported Datasets
The app includes all Napolab datasets:
- **ASSIN**: Semantic Similarity and Textual Entailment
- **ASSIN 2**: Semantic Similarity and Textual Entailment (v2)
- **Rerelem**: Relational Reasoning
- **HateBR**: Hate Speech Detection
- **Reli-SA**: Religious Sentiment Analysis
- **FaQUaD-NLI**: Factual Question Answering and NLI
- **PorSimplesSent**: Simple Sentences Sentiment Analysis
## Model Architectures
The benchmark includes models based on:
- **mDeBERTa v3**: Multilingual DeBERTa v3
- **BERT Large**: Large Portuguese BERT
- **BERT Base**: Base Portuguese BERT
## Data Management
The app now uses a YAML configuration file (`data.yaml`) for all data, making it easy to edit and maintain.
### Editing Data
Simply edit the `data.yaml` file to:
- Add new datasets
- Update benchmark results
- Add new models
- Modify model metadata
### Data Structure
The YAML file contains four main sections:
1. **datasets**: Information about each dataset
2. **benchmark_results**: Performance metrics for models on datasets
3. **model_metadata**: Model information (parameters, architecture, etc.)
4. **additional_models**: Additional models for the Model Hub
### Data Management Tools
Use the `manage_data.py` script for data operations:
```bash
# Validate the data structure
python manage_data.py validate
# Add a new dataset
python manage_data.py add-dataset \
--dataset-name "new_dataset" \
--dataset-display-name "New Dataset" \
--dataset-description "Description of the dataset" \
--dataset-tasks "Classification" "Sentiment Analysis" \
--dataset-url "https://huggingface.co/datasets/new_dataset"
# Add benchmark results
python manage_data.py add-benchmark \
--dataset-name "assin" \
--model-name "new-model" \
--metrics "accuracy=0.92" "f1=0.91"
# Add model metadata
python manage_data.py add-model \
--model-name "new-model" \
--parameters 110000000 \
--architecture "BERT Base" \
--base-model "bert-base-uncased" \
--task "Classification" \
--huggingface-url "https://huggingface.co/new-model"
```
### Customization
To add new datasets or benchmark results:
1. Edit the `data.yaml` file directly, or
2. Use the `manage_data.py` script for structured additions
3. The app will automatically reload the data when restarted
## Troubleshooting
- **Dataset loading errors**: Ensure you have internet connection to access Hugging Face datasets
- **Memory issues**: Reduce the number of samples in the Dataset Explorer
- **Port conflicts**: Change the port in the `app.launch()` call
## Contributing
Feel free to contribute by:
- Adding new datasets
- Improving visualizations
- Adding new features
- Reporting bugs
## License
This project follows the same license as the main Napolab repository.