File size: 7,249 Bytes
1bac1ed
 
 
 
 
 
 
21bc425
1bac1ed
 
3a2ac99
1bac1ed
 
3a2ac99
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
3a2ac99
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
21bc425
1bac1ed
 
 
 
 
 
 
 
 
 
 
 
 
 
21bc425
1bac1ed
 
 
21bc425
1bac1ed
 
 
21bc425
1bac1ed
 
 
 
 
21bc425
1bac1ed
21bc425
3a2ac99
 
21bc425
1bac1ed
21bc425
 
 
1bac1ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
NCHMARK_COLS: ['Perplexity']
=== END COLUMN SETUP ===
πŸ”§ CHECKING MODEL TRACING AVAILABILITY...
   - Model tracing path: /home/user/app/src/evaluation/../../model-tracing
   - Path exists: True
   - main.py exists: True
🎯 Final MODEL_TRACING_AVAILABLE = True

.gitattributes:   0%|          | 0.00/2.46k [00:00<?, ?B/s]
.gitattributes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.46k/2.46k [00:00<00:00, 10.1MB/s]

(…)therAI_gpt-neo-1.3B_20250726_010247.json:   0%|          | 0.00/202 [00:00<?, ?B/s]
(…)therAI_gpt-neo-1.3B_20250726_010247.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 202/202 [00:00<00:00, 748kB/s]

(…)s_facebook_opt-125m_20250726_020655.json:   0%|          | 0.00/205 [00:00<?, ?B/s]
(…)s_facebook_opt-125m_20250726_020655.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 205/205 [00:00<00:00, 909kB/s]

(…)s_facebook_opt-350m_20250726_021737.json:   0%|          | 0.00/205 [00:00<?, ?B/s]
(…)s_facebook_opt-350m_20250726_021737.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 205/205 [00:00<00:00, 850kB/s]

(…)ommunity_gpt2-large_20250726_013038.json:   0%|          | 0.00/214 [00:00<?, ?B/s]
(…)ommunity_gpt2-large_20250726_013038.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 214/214 [00:00<00:00, 1.03MB/s]

(…)mmunity_gpt2-medium_20250726_015555.json:   0%|          | 0.00/216 [00:00<?, ?B/s]
(…)mmunity_gpt2-medium_20250726_015555.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 216/216 [00:00<00:00, 730kB/s]

(…)enai-community_gpt2_20250725_231201.json:   0%|          | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_231201.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 533kB/s]

(…)enai-community_gpt2_20250725_233155.json:   0%|          | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_233155.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 905kB/s]

(…)enai-community_gpt2_20250725_235115.json:   0%|          | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_235115.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 801kB/s]

(…)enai-community_gpt2_20250725_235748.json:   0%|          | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_235748.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 856kB/s]

(…)enai-community_gpt2_20250726_000358.json:   0%|          | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250726_000358.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 696kB/s]

(…)enai-community_gpt2_20250726_000650.json:   0%|          | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250726_000650.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 792kB/s]

(…)enai-community_gpt2_20250726_015147.json:   0%|          | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250726_015147.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 1.12MB/s]

πŸš€ STARTING GRADIO APP INITIALIZATION
πŸ“Š Initializing allowed models...

πŸš€ INITIALIZING ALLOWED MODELS
πŸ“‹ Models to initialize: ['lmsys/vicuna-7b-v1.5', 'ibm-granite/granite-7b-base', 'EleutherAI/llemma_7b']

🧹 CLEANING NON-ALLOWED RESULT FILES
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json (model: EleutherAI/gpt-neo-1.3B)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-125m_20250726_020655.json (model: facebook/opt-125m)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-350m_20250726_021737.json (model: facebook/opt-350m)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json (model: openai-community/gpt2-large)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-medium_20250726_015555.json (model: openai-community/gpt2-medium)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_015147.json (model: openai-community/gpt2)
βœ… Removed 12 non-allowed result files

πŸ”§ CREATING RESULT FILE FOR: lmsys/vicuna-7b-v1.5
πŸ“ Result file path: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json
βœ… Created result file: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json

πŸ”§ CREATING RESULT FILE FOR: ibm-granite/granite-7b-base
πŸ“ Result file path: ./eval-results/ibm_granite_granite_7b_base_float16.json
βœ… Created result file: ./eval-results/ibm_granite_granite_7b_base_float16.json

πŸ”§ CREATING RESULT FILE FOR: EleutherAI/llemma_7b
πŸ“ Result file path: ./eval-results/EleutherAI_llemma_7b_float16.json
βœ… Created result file: ./eval-results/EleutherAI_llemma_7b_float16.json
βœ… Initialized 3 model result files
πŸ“Š Creating initial results DataFrame...

πŸ“Š CREATE_RESULTS_DATAFRAME CALLED

=== GET_LEADERBOARD_DF DEBUG ===
Starting leaderboard creation...
Looking for results in: ./eval-results
Expected columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Match P-Value ⬇️', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❀️', 'Available on the hub', 'Model sha']
Benchmark columns: ['Perplexity']

Searching for result files in: ./eval-results
Found 0 result files

Processing 0 evaluation results

Returning 0 processed results

Found 0 raw results
No raw data found, creating empty DataFrame
Creating empty fallback DataFrame...
Empty DataFrame created with columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Match P-Value ⬇️', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❀️', 'Available on the hub', 'Model sha']
πŸ“‹ Retrieved leaderboard df: (0, 13)
⚠️ DataFrame is None or empty, returning empty DataFrame
βœ… Initial DataFrame created with shape: (0, 6)
πŸ“‹ Columns: ['Model', 'Perplexity', 'Match P-Value', 'Average Score', 'Type', 'Precision']
🎨 Creating Gradio interface...
🎯 GRADIO INTERFACE SETUP COMPLETE
πŸš€ LAUNCHING GRADIO APP WITH MODEL TRACING INTEGRATION
πŸ“Š Features enabled:
   - Perplexity evaluation
   - Model trace p-value computation (vs GPT-2 base)
   - Match statistic with alignment
πŸŽ‰ Ready to accept requests!
* Running on local URL:  http://0.0.0.0:7860, with SSR ⚑ (experimental, to disable set `ssr=False` in `launch()`)