File size: 6,381 Bytes
570bc12
 
 
 
 
 
16da91b
570bc12
 
d2667d0
570bc12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0855f92
 
 
 
 
 
 
 
570bc12
0855f92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
570bc12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
---

title: Napolab Leaderboard
emoji: 🌎
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
python_version: "3.10"
tags:
  - nlp
  - portuguese
  - benchmarking
  - language-models
  - gradio
datasets:
  - ruanchaves/napolab
  - assin
  - assin2
  - ruanchaves/hatebr
  - ruanchaves/faquad-nli
short_description: "The Natural Portuguese Language Benchmark"
---


# Napolab Leaderboard - Gradio App

A comprehensive Gradio web application for exploring and benchmarking Portuguese language models using the Napolab dataset collection.

## Features

- **πŸ† Benchmark Results**: Single comprehensive table with one column per dataset and clickable model links
- **πŸ“ˆ Model Analysis**: Radar chart showing model performance across all datasets
- **ℹ️ About**: Information about Napolab and citation details

## Installation

1. Navigate to the leaderboard directory:
```bash

cd dev/napolab/leaderboard

```

2. Install the required dependencies:
```bash

pip install -r requirements.txt

```

3. Extract data from external sources (optional but recommended):
```bash

# Extract data from Portuguese LLM Leaderboard

python extract_portuguese_leaderboard.py



# Download external models data

python download_external_models.py

```

4. Run the Gradio app:
```bash

python app.py

```

The app will be available at `http://localhost:7860`

## Data Management

The app uses a YAML configuration file (`data.yaml`) for adding new data, making it easy to edit and maintain.

### Data Extraction Scripts

The leaderboard includes scripts to automatically extract and update data from external sources:

#### `extract_portuguese_leaderboard.py`
This script extracts benchmark results from the Open Portuguese LLM Leaderboard:
- Fetches data from the Hugging Face Spaces leaderboard
- Updates the `portuguese_leaderboard.csv` file
- Includes both open-source and proprietary models
- Automatically handles data formatting and validation

#### `download_external_models.py`
This script downloads additional model data:
- Fetches model metadata from various sources
- Updates the `external_models.csv` file
- Includes model links and performance metrics
- Ensures data consistency with the main leaderboard

**Note**: These scripts require internet connection and may take a few minutes to complete. Run them periodically to keep the leaderboard data up to date.

## Usage

### Benchmark Results Tab
- **Single Comprehensive Table**: Shows all models with one column per dataset
- **Dataset Columns**: Each dataset has its own column showing model performance scores
- **Average Column**: Shows the average performance across all datasets for each model
- **Model Column**: Clickable links to Hugging Face model pages
- **Sorted Results**: Models are sorted by overall average performance (descending)

### Model Analysis Tab
- Radar chart showing each model's performance across all datasets
- **Default view**: Shows only bertimbau-large and mdeberta-v3-base models
- **Interactive legend**: Click to show/hide models, double-click to isolate
- Each line represents one model, each point represents one dataset
- Color-coded by model architecture
- Interactive hover information with detailed performance metrics

### Model Hub Tab
- Access links to pre-trained models on Hugging Face
- Models are organized by dataset and architecture type
- Direct links to model repositories

## Supported Datasets

The app includes all Napolab datasets:

- **ASSIN**: Semantic Similarity and Textual Entailment
- **ASSIN 2**: Semantic Similarity and Textual Entailment (v2)
- **Rerelem**: Relational Reasoning
- **HateBR**: Hate Speech Detection
- **Reli-SA**: Religious Sentiment Analysis
- **FaQUaD-NLI**: Factual Question Answering and NLI
- **PorSimplesSent**: Simple Sentences Sentiment Analysis

## Model Architectures

The benchmark includes models based on:
- **mDeBERTa v3**: Multilingual DeBERTa v3
- **BERT Large**: Large Portuguese BERT
- **BERT Base**: Base Portuguese BERT

## Data Management

The app now uses a YAML configuration file (`data.yaml`) for all data, making it easy to edit and maintain.

### Editing Data

Simply edit the `data.yaml` file to:
- Add new datasets
- Update benchmark results
- Add new models
- Modify model metadata

### Data Structure

The YAML file contains four main sections:

1. **datasets**: Information about each dataset
2. **benchmark_results**: Performance metrics for models on datasets

3. **model_metadata**: Model information (parameters, architecture, etc.)
4. **additional_models**: Additional models for the Model Hub



### Data Management Tools



Use the `manage_data.py` script for data operations:



```bash

# Validate the data structure

python manage_data.py validate



# Add a new dataset

python manage_data.py add-dataset \

  --dataset-name "new_dataset" \

  --dataset-display-name "New Dataset" \

  --dataset-description "Description of the dataset" \

  --dataset-tasks "Classification" "Sentiment Analysis" \

  --dataset-url "https://huggingface.co/datasets/new_dataset"



# Add benchmark results

python manage_data.py add-benchmark \

  --dataset-name "assin" \

  --model-name "new-model" \

  --metrics "accuracy=0.92" "f1=0.91"



# Add model metadata

python manage_data.py add-model \

  --model-name "new-model" \

  --parameters 110000000 \

  --architecture "BERT Base" \

  --base-model "bert-base-uncased" \

  --task "Classification" \

  --huggingface-url "https://huggingface.co/new-model"

```



### Customization



To add new datasets or benchmark results:



1. Edit the `data.yaml` file directly, or

2. Use the `manage_data.py` script for structured additions

3. The app will automatically reload the data when restarted



## Troubleshooting



- **Dataset loading errors**: Ensure you have internet connection to access Hugging Face datasets

- **Memory issues**: Reduce the number of samples in the Dataset Explorer

- **Port conflicts**: Change the port in the `app.launch()` call



## Contributing



Feel free to contribute by:

- Adding new datasets

- Improving visualizations

- Adding new features

- Reporting bugs



## License



This project follows the same license as the main Napolab repository.