File size: 1,230 Bytes
49fd887
3a9cbd7
8daa4df
 
49fd887
8daa4df
 
3a9cbd7
49fd887
3a9cbd7
49fd887
 
3a9cbd7
49fd887
3a9cbd7
49fd887
8daa4df
 
3a9cbd7
 
 
 
8daa4df
3a9cbd7
8daa4df
3a9cbd7
8daa4df
3a9cbd7
 
 
 
 
 
8daa4df
3a9cbd7
8daa4df
3a9cbd7
 
 
8daa4df
3a9cbd7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
title: OpenThoughts Benchmark Explorer
emoji: πŸ“Š
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: apache-2.0
---

# OpenThoughts Evalchemy Benchmark Explorer

A comprehensive web application for exploring OpenThoughts benchmark correlations and model performance.

## Features

- Interactive correlation heatmaps
- Scatter plot explorer with uncertainty analysis
- Model performance comparisons
- Statistical summaries and uncertainty analysis

## Usage

The app automatically loads benchmark data and provides multiple views for analysis:

1. **Overview Dashboard**: High-level summary of benchmarks and correlations
2. **Interactive Heatmap**: Correlation matrix visualization
3. **Scatter Explorer**: Detailed pairwise benchmark comparisons
4. **Model Performance**: Individual model analysis
5. **Statistical Summary**: Correlation statistics across methods
6. **Uncertainty Analysis**: Measurement reliability analysis

## Data Files

The app requires two CSV files:
- `comprehensive_benchmark_scores.csv`: Main benchmark scores
- `benchmark_standard_errors.csv`: Standard error estimates (optional)

These files should be in the root directory of the repository.