title: OpenThoughts Benchmark Explorer | |
emoji: π | |
colorFrom: blue | |
colorTo: red | |
sdk: streamlit | |
sdk_version: 1.28.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# OpenThoughts Evalchemy Benchmark Explorer | |
A comprehensive web application for exploring OpenThoughts benchmark correlations and model performance. | |
## Features | |
- Interactive correlation heatmaps | |
- Scatter plot explorer with uncertainty analysis | |
- Model performance comparisons | |
- Statistical summaries and uncertainty analysis | |
## Usage | |
The app automatically loads benchmark data and provides multiple views for analysis: | |
1. **Overview Dashboard**: High-level summary of benchmarks and correlations | |
2. **Interactive Heatmap**: Correlation matrix visualization | |
3. **Scatter Explorer**: Detailed pairwise benchmark comparisons | |
4. **Model Performance**: Individual model analysis | |
5. **Statistical Summary**: Correlation statistics across methods | |
6. **Uncertainty Analysis**: Measurement reliability analysis | |
## Data Files | |
The app requires two CSV files: | |
- `comprehensive_benchmark_scores.csv`: Main benchmark scores | |
- `benchmark_standard_errors.csv`: Standard error estimates (optional) | |
These files should be in the root directory of the repository. | |