model_trace / logs.txt
Ahmed Ahmed
try again
1bac1ed
NCHMARK_COLS: ['Perplexity']
=== END COLUMN SETUP ===
πŸ”§ CHECKING MODEL TRACING AVAILABILITY...
- Model tracing path: /home/user/app/src/evaluation/../../model-tracing
- Path exists: True
- main.py exists: True
🎯 Final MODEL_TRACING_AVAILABLE = True
.gitattributes: 0%| | 0.00/2.46k [00:00<?, ?B/s]
.gitattributes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.46k/2.46k [00:00<00:00, 10.1MB/s]
(…)therAI_gpt-neo-1.3B_20250726_010247.json: 0%| | 0.00/202 [00:00<?, ?B/s]
(…)therAI_gpt-neo-1.3B_20250726_010247.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 202/202 [00:00<00:00, 748kB/s]
(…)s_facebook_opt-125m_20250726_020655.json: 0%| | 0.00/205 [00:00<?, ?B/s]
(…)s_facebook_opt-125m_20250726_020655.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 205/205 [00:00<00:00, 909kB/s]
(…)s_facebook_opt-350m_20250726_021737.json: 0%| | 0.00/205 [00:00<?, ?B/s]
(…)s_facebook_opt-350m_20250726_021737.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 205/205 [00:00<00:00, 850kB/s]
(…)ommunity_gpt2-large_20250726_013038.json: 0%| | 0.00/214 [00:00<?, ?B/s]
(…)ommunity_gpt2-large_20250726_013038.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 214/214 [00:00<00:00, 1.03MB/s]
(…)mmunity_gpt2-medium_20250726_015555.json: 0%| | 0.00/216 [00:00<?, ?B/s]
(…)mmunity_gpt2-medium_20250726_015555.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 216/216 [00:00<00:00, 730kB/s]
(…)enai-community_gpt2_20250725_231201.json: 0%| | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_231201.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 533kB/s]
(…)enai-community_gpt2_20250725_233155.json: 0%| | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_233155.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 905kB/s]
(…)enai-community_gpt2_20250725_235115.json: 0%| | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_235115.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 801kB/s]
(…)enai-community_gpt2_20250725_235748.json: 0%| | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250725_235748.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 856kB/s]
(…)enai-community_gpt2_20250726_000358.json: 0%| | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250726_000358.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 696kB/s]
(…)enai-community_gpt2_20250726_000650.json: 0%| | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250726_000650.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 792kB/s]
(…)enai-community_gpt2_20250726_015147.json: 0%| | 0.00/209 [00:00<?, ?B/s]
(…)enai-community_gpt2_20250726_015147.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 209/209 [00:00<00:00, 1.12MB/s]
πŸš€ STARTING GRADIO APP INITIALIZATION
πŸ“Š Initializing allowed models...
πŸš€ INITIALIZING ALLOWED MODELS
πŸ“‹ Models to initialize: ['lmsys/vicuna-7b-v1.5', 'ibm-granite/granite-7b-base', 'EleutherAI/llemma_7b']
🧹 CLEANING NON-ALLOWED RESULT FILES
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/EleutherAI/results_EleutherAI_gpt-neo-1.3B_20250726_010247.json (model: EleutherAI/gpt-neo-1.3B)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-125m_20250726_020655.json (model: facebook/opt-125m)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/facebook/results_facebook_opt-350m_20250726_021737.json (model: facebook/opt-350m)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-large_20250726_013038.json (model: openai-community/gpt2-large)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2-medium_20250726_015555.json (model: openai-community/gpt2-medium)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_231201.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_233155.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235115.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250725_235748.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000358.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_000650.json (model: openai-community/gpt2)
πŸ—‘οΈ Removing non-allowed model result: ./eval-results/openai-community/results_openai-community_gpt2_20250726_015147.json (model: openai-community/gpt2)
βœ… Removed 12 non-allowed result files
πŸ”§ CREATING RESULT FILE FOR: lmsys/vicuna-7b-v1.5
πŸ“ Result file path: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json
βœ… Created result file: ./eval-results/lmsys_vicuna_7b_v1.5_float16.json
πŸ”§ CREATING RESULT FILE FOR: ibm-granite/granite-7b-base
πŸ“ Result file path: ./eval-results/ibm_granite_granite_7b_base_float16.json
βœ… Created result file: ./eval-results/ibm_granite_granite_7b_base_float16.json
πŸ”§ CREATING RESULT FILE FOR: EleutherAI/llemma_7b
πŸ“ Result file path: ./eval-results/EleutherAI_llemma_7b_float16.json
βœ… Created result file: ./eval-results/EleutherAI_llemma_7b_float16.json
βœ… Initialized 3 model result files
πŸ“Š Creating initial results DataFrame...
πŸ“Š CREATE_RESULTS_DATAFRAME CALLED
=== GET_LEADERBOARD_DF DEBUG ===
Starting leaderboard creation...
Looking for results in: ./eval-results
Expected columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Match P-Value ⬇️', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❀️', 'Available on the hub', 'Model sha']
Benchmark columns: ['Perplexity']
Searching for result files in: ./eval-results
Found 0 result files
Processing 0 evaluation results
Returning 0 processed results
Found 0 raw results
No raw data found, creating empty DataFrame
Creating empty fallback DataFrame...
Empty DataFrame created with columns: ['T', 'Model', 'Average ⬆️', 'Perplexity', 'Match P-Value ⬇️', 'Type', 'Architecture', 'Precision', 'Hub License', '#Params (B)', 'Hub ❀️', 'Available on the hub', 'Model sha']
πŸ“‹ Retrieved leaderboard df: (0, 13)
⚠️ DataFrame is None or empty, returning empty DataFrame
βœ… Initial DataFrame created with shape: (0, 6)
πŸ“‹ Columns: ['Model', 'Perplexity', 'Match P-Value', 'Average Score', 'Type', 'Precision']
🎨 Creating Gradio interface...
🎯 GRADIO INTERFACE SETUP COMPLETE
πŸš€ LAUNCHING GRADIO APP WITH MODEL TRACING INTEGRATION
πŸ“Š Features enabled:
- Perplexity evaluation
- Model trace p-value computation (vs GPT-2 base)
- Match statistic with alignment
πŸŽ‰ Ready to accept requests!
* Running on local URL: http://0.0.0.0:7860, with SSR ⚑ (experimental, to disable set `ssr=False` in `launch()`)