Spaces:
Runtime error
Runtime error
NCHMARK_COLS: ['Perplexity'] | |
=== END COLUMN SETUP === | |
π§ CHECKING MODEL TRACING AVAILABILITY... | |
- Model tracing path: /home/user/app/src/evaluation/../../model-tracing | |
- Path exists: True | |
- main.py exists: True | |
π― Final MODEL_TRACING_AVAILABLE = True | |
.gitattributes: 0%| | 0.00/2.46k [00:00<?, ?B/s] | |
.gitattributes: 100%|ββββββββββ| 2.46k/2.46k [00:00<00:00, 10.1MB/s] | |
(β¦)therAI_gpt-neo-1.3B_20250726_010247.json: 0%| | 0.00/202 [00:00<?, ?B/s] | |
(β¦)therAI_gpt-neo-1.3B_20250726_010247.json: 100%|ββββββββββ| 202/202 [00:00<00:00, 748kB/s] | |
(β¦)s_facebook_opt-125m_20250726_020655.json: 0%| | 0.00/205 [00:00<?, ?B/s] | |
(β¦)s_facebook_opt-125m_20250726_020655.json: 100%|ββββββββββ| 205/205 [00:00<00:00, 909kB/s] | |
(β¦)s_facebook_opt-350m_20250726_021737.json: 0%| | 0.00/205 [00:00<?, ?B/s] | |
(β¦)s_facebook_opt-350m_20250726_021737.json: 100%|ββββββββββ| 205/205 [00:00<00:00, 850kB/s] | |
(β¦)ommunity_gpt2-large_20250726_013038.json: 0%| | 0.00/214 [00:00<?, ?B/s] | |
(β¦)ommunity_gpt2-large_20250726_013038.json: 100%|ββββββββββ| 214/214 [00:00<00:00, 1.03MB/s] | |
(β¦)mmunity_gpt2-medium_20250726_015555.json: 0%| | 0.00/216 [00:00<?, ?B/s] | |
(β¦)mmunity_gpt2-medium_20250726_015555.json: 100%|ββββββββββ| 216/216 [00:00<00:00, 730kB/s] | |
(β¦)enai-community_gpt2_20250725_231201.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
(β¦)enai-community_gpt2_20250725_231201.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 533kB/s] | |
(β¦)enai-community_gpt2_20250725_233155.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
(β¦)enai-community_gpt2_20250725_233155.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 905kB/s] | |
(β¦)enai-community_gpt2_20250725_235115.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
(β¦)enai-community_gpt2_20250725_235115.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 801kB/s] | |
(β¦)enai-community_gpt2_20250725_235748.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
(β¦)enai-community_gpt2_20250725_235748.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 856kB/s] | |
(β¦)enai-community_gpt2_20250726_000358.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
(β¦)enai-community_gpt2_20250726_000358.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 696kB/s] | |
(β¦)enai-community_gpt2_20250726_000650.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
(β¦)enai-community_gpt2_20250726_000650.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 792kB/s] | |
(β¦)enai-community_gpt2_20250726_015147.json: 0%| | 0.00/209 [00:00<?, ?B/s] | |
(β¦)enai-community_gpt2_20250726_015147.json: 100%|ββββββββββ| 209/209 [00:00<00:00, 1.12MB/s] | |
π STARTING GRADIO APP INITIALIZATION | |
π Initializing allowed models... | |
π INITIALIZING ALLOWED MODELS | |
π Models to initialize: ['lmsys/vicuna-7b-v1.5', 'ibm-granite/granite-7b-base', 'EleutherAI/llemma_7b'] | |
π§Ή CLEANING NON-ALLOWED RESULT FILES | |
ποΈ Removing non-allowed model result: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json (model: EleutherAI/gpt-neo-1.3B) | |
ποΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-125m_20250726_020655.json (model: facebook/opt-125m) | |
ποΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-350m_20250726_021737.json (model: facebook/opt-350m) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json (model: openai-community/gpt2-large) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-medium_20250726_015555.json (model: openai-community/gpt2-medium) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json (model: openai-community/gpt2) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json (model: openai-community/gpt2) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json (model: openai-community/gpt2) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json (model: openai-community/gpt2) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json (model: openai-community/gpt2) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json (model: openai-community/gpt2) | |
ποΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_015147.json (model: openai-community/gpt2) | |
β Removed 12 non-allowed result files | |
π§ CREATING RESULT FILE FOR: lmsys/vicuna-7b-v1.5 | |
π Result file path: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json | |
β Created result file: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json | |
π§ CREATING RESULT FILE FOR: ibm-granite/granite-7b-base | |
π Result file path: ./eval-results/ibm_granite_granite_7b_base_float16.json | |
β Created result file: ./eval-results/ibm_granite_granite_7b_base_float16.json | |
π§ CREATING RESULT FILE FOR: EleutherAI/llemma_7b | |
π Result file path: ./eval-results/EleutherAI_llemma_7b_float16.json | |
β Created result file: ./eval-results/EleutherAI_llemma_7b_float16.json | |
β Initialized 3 model result files | |
π Creating initial results DataFrame... | |
π CREATE_RESULTS_DATAFRAME CALLED | |
=== GET_LEADERBOARD_DF DEBUG === | |
Starting leaderboard creation... | |
Looking for results in: ./eval-results | |
Expected columns: ['T', 'Model', 'Average β¬οΈ', 'Perplexity', 'Match P-Value β¬οΈ', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub β€οΈ', 'Available on the hub', 'Model sha'] | |
Benchmark columns: ['Perplexity'] | |
Searching for result files in: ./eval-results | |
Found 0 result files | |
Processing 0 evaluation results | |
Returning 0 processed results | |
Found 0 raw results | |
No raw data found, creating empty DataFrame | |
Creating empty fallback DataFrame... | |
Empty DataFrame created with columns: ['T', 'Model', 'Average β¬οΈ', 'Perplexity', 'Match P-Value β¬οΈ', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub β€οΈ', 'Available on the hub', 'Model sha'] | |
π Retrieved leaderboard df: (0, 13) | |
β οΈ DataFrame is None or empty, returning empty DataFrame | |
β Initial DataFrame created with shape: (0, 6) | |
π Columns: ['Model', 'Perplexity', 'Match P-Value', 'Average Score', 'Type', 'Precision'] | |
π¨ Creating Gradio interface... | |
π― GRADIO INTERFACE SETUP COMPLETE | |
π LAUNCHING GRADIO APP WITH MODEL TRACING INTEGRATION | |
π Features enabled: | |
- Perplexity evaluation | |
- Model trace p-value computation (vs GPT-2 base) | |
- Match statistic with alignment | |
π Ready to accept requests! | |
* Running on local URL: http://0.0.0.0:7860, with SSR β‘ (experimental, to disable set `ssr=False` in `launch()`) |