|
2025-03-31 22:21:04,147 - __main__ - INFO - Initializing leaderboard data... |
|
2025-03-31 22:21:04,266 - __main__ - INFO - Loaded leaderboard with 7 entries |
|
2025-03-31 22:21:06,082 - apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts |
|
2025-03-31 22:21:06,083 - apscheduler.scheduler - INFO - Added job "<lambda>" to job store "default" |
|
2025-03-31 22:21:06,083 - apscheduler.scheduler - INFO - Scheduler started |
|
2025-03-31 22:21:41,064 - __main__ - INFO - Received submission for model gpt-4o-mini-TEST6: /tmp/gradio/26f1c6837517a59736a02ffe486b3504336116347e339745a7973e9412dad4db/gpt-4o-mini-3.jsonl |
|
2025-03-31 22:21:42,056 - guardbench.context - INFO - Loading dataset from: whitecircle-ai/guardbench_dataset_1k_public |
|
2025-03-31 22:21:42,922 - guardbench.context - INFO - Successfully loaded dataset with 980 examples |
|
2025-03-31 22:21:42,922 - guardbench.evaluator - INFO - Starting evaluation for model: gpt-4o-mini-TEST6 |
|
2025-03-31 22:21:42,922 - guardbench.evaluator - INFO - Using cached results for model: gpt-4o-mini-TEST6 |
|
2025-03-31 22:21:42,942 - guardbench.evaluator - INFO - Processing cached results for category: Hate Speech, Extremism, and Discrimination |
|
2025-03-31 22:21:42,945 - guardbench.evaluator - INFO - Length Hate Speech, Extremism, and Discrimination - 29 |
|
2025-03-31 22:21:43,044 - guardbench.evaluator - INFO - Processing cached results for category: Drug– and Substance–Related Activities |
|
2025-03-31 22:21:43,048 - guardbench.evaluator - INFO - Length Drug– and Substance–Related Activities - 30 |
|
2025-03-31 22:21:43,151 - guardbench.evaluator - INFO - Processing cached results for category: AI Manipulation and Jailbreaking |
|
2025-03-31 22:21:43,155 - guardbench.evaluator - INFO - Length AI Manipulation and Jailbreaking - 30 |
|
2025-03-31 22:21:43,257 - guardbench.evaluator - INFO - Processing cached results for category: Political Corruption and Legal Evasion |
|
2025-03-31 22:21:43,260 - guardbench.evaluator - INFO - Length Political Corruption and Legal Evasion - 30 |
|
2025-03-31 22:21:43,360 - guardbench.evaluator - INFO - Processing cached results for category: Academic Dishonesty and Cheating |
|
2025-03-31 22:21:43,363 - guardbench.evaluator - INFO - Length Academic Dishonesty and Cheating - 29 |
|
2025-03-31 22:21:43,461 - guardbench.evaluator - INFO - Processing cached results for category: Labor Exploitation and Human Trafficking |
|
2025-03-31 22:21:43,465 - guardbench.evaluator - INFO - Length Labor Exploitation and Human Trafficking - 30 |
|
2025-03-31 22:21:43,563 - guardbench.evaluator - INFO - Processing cached results for category: Safe Prompts |
|
2025-03-31 22:21:43,567 - guardbench.evaluator - INFO - Length Safe Prompts - 490 |
|
2025-03-31 22:21:44,119 - guardbench.evaluator - INFO - Processing cached results for category: Manipulation, Deception, and Misinformation |
|
2025-03-31 22:21:44,123 - guardbench.evaluator - INFO - Length Manipulation, Deception, and Misinformation - 30 |
|
2025-03-31 22:21:44,223 - guardbench.evaluator - INFO - Processing cached results for category: Sexual Content and Violence |
|
2025-03-31 22:21:44,226 - guardbench.evaluator - INFO - Length Sexual Content and Violence - 29 |
|
2025-03-31 22:21:44,323 - guardbench.evaluator - INFO - Processing cached results for category: Self–Harm and Suicidal Ideation |
|
2025-03-31 22:21:44,326 - guardbench.evaluator - INFO - Length Self–Harm and Suicidal Ideation - 13 |
|
2025-03-31 22:21:44,383 - guardbench.evaluator - INFO - Processing cached results for category: Animal Cruelty and Exploitation |
|
2025-03-31 22:21:44,387 - guardbench.evaluator - INFO - Length Animal Cruelty and Exploitation - 30 |
|
2025-03-31 22:21:44,487 - guardbench.evaluator - INFO - Processing cached results for category: Criminal, Violent, and Terrorist Activity |
|
2025-03-31 22:21:44,490 - guardbench.evaluator - INFO - Length Criminal, Violent, and Terrorist Activity - 30 |
|
2025-03-31 22:21:44,590 - guardbench.evaluator - INFO - Processing cached results for category: Child Exploitation and Abuse |
|
2025-03-31 22:21:44,593 - guardbench.evaluator - INFO - Length Child Exploitation and Abuse - 30 |
|
2025-03-31 22:21:44,694 - guardbench.evaluator - INFO - Processing cached results for category: Financial Fraud and Unethical Business |
|
2025-03-31 22:21:44,698 - guardbench.evaluator - INFO - Length Financial Fraud and Unethical Business - 30 |
|
2025-03-31 22:21:44,799 - guardbench.evaluator - INFO - Processing cached results for category: Cybercrime, Hacking, and Digital Exploits |
|
2025-03-31 22:21:44,802 - guardbench.evaluator - INFO - Length Cybercrime, Hacking, and Digital Exploits - 30 |
|
2025-03-31 22:21:44,903 - guardbench.evaluator - INFO - Processing cached results for category: Creative Content Involving Illicit Themes |
|
2025-03-31 22:21:44,907 - guardbench.evaluator - INFO - Length Creative Content Involving Illicit Themes - 30 |
|
2025-03-31 22:21:45,008 - guardbench.evaluator - INFO - Processing cached results for category: Environmental and Industrial Harm |
|
2025-03-31 22:21:45,011 - guardbench.evaluator - INFO - Length Environmental and Industrial Harm - 30 |
|
2025-03-31 22:21:45,112 - guardbench.evaluator - INFO - Processing cached results for category: Weapon, Explosives, and Hazardous Materials |
|
2025-03-31 22:21:45,116 - guardbench.evaluator - INFO - Length Weapon, Explosives, and Hazardous Materials - 30 |
|
2025-03-31 22:21:45,222 - guardbench.evaluator - INFO - Updated leaderboard for model: gpt-4o-mini-TEST6 from cached results |
|
2025-03-31 22:21:45,223 - guardbench.evaluator - INFO - Evaluation from cached results completed for model: gpt-4o-mini-TEST6 |
|
2025-03-31 22:21:47,120 - __main__ - INFO - Refreshing leaderboard data after submission for version v0... |
|
2025-03-31 22:21:47,363 - __main__ - INFO - Refreshed leaderboard data after submission |
|
|