jmercat's picture
Fix HuggingFace Space configuration - add proper SDK settings and clean requirements
3a9cbd7
---
title: OpenThoughts Benchmark Explorer
emoji: πŸ“Š
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: apache-2.0
---
# OpenThoughts Evalchemy Benchmark Explorer
A comprehensive web application for exploring OpenThoughts benchmark correlations and model performance.
## Features
- Interactive correlation heatmaps
- Scatter plot explorer with uncertainty analysis
- Model performance comparisons
- Statistical summaries and uncertainty analysis
## Usage
The app automatically loads benchmark data and provides multiple views for analysis:
1. **Overview Dashboard**: High-level summary of benchmarks and correlations
2. **Interactive Heatmap**: Correlation matrix visualization
3. **Scatter Explorer**: Detailed pairwise benchmark comparisons
4. **Model Performance**: Individual model analysis
5. **Statistical Summary**: Correlation statistics across methods
6. **Uncertainty Analysis**: Measurement reliability analysis
## Data Files
The app requires two CSV files:
- `comprehensive_benchmark_scores.csv`: Main benchmark scores
- `benchmark_standard_errors.csv`: Standard error estimates (optional)
These files should be in the root directory of the repository.