2025-03-31 22:03:49,677 - __main__ - INFO - Initializing leaderboard data... 2025-03-31 22:03:49,841 - __main__ - INFO - Loaded leaderboard with 3 entries 2025-03-31 22:03:51,379 - apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts 2025-03-31 22:03:51,379 - apscheduler.scheduler - INFO - Added job "" to job store "default" 2025-03-31 22:03:51,379 - apscheduler.scheduler - INFO - Scheduler started 2025-03-31 22:04:45,096 - __main__ - INFO - Received submission for model gpt4omini-TEST: /tmp/gradio/26f1c6837517a59736a02ffe486b3504336116347e339745a7973e9412dad4db/gpt-4o-mini-3.jsonl 2025-03-31 22:04:45,574 - guardbench.context - INFO - Loading dataset from: whitecircle-ai/guardbench_dataset_1k_public 2025-03-31 22:04:46,470 - guardbench.context - INFO - Successfully loaded dataset with 980 examples 2025-03-31 22:04:46,470 - guardbench.evaluator - INFO - Starting evaluation for model: gpt4omini-TEST 2025-03-31 22:04:46,470 - guardbench.evaluator - INFO - Using cached results for model: gpt4omini-TEST 2025-03-31 22:04:46,490 - guardbench.evaluator - INFO - Processing cached results for category: Labor Exploitation and Human Trafficking 2025-03-31 22:04:46,493 - guardbench.evaluator - INFO - Length Labor Exploitation and Human Trafficking - 30 2025-03-31 22:04:46,595 - guardbench.evaluator - INFO - Processing cached results for category: Cybercrime, Hacking, and Digital Exploits 2025-03-31 22:04:46,600 - guardbench.evaluator - INFO - Length Cybercrime, Hacking, and Digital Exploits - 30 2025-03-31 22:04:46,702 - guardbench.evaluator - INFO - Processing cached results for category: Academic Dishonesty and Cheating 2025-03-31 22:04:46,706 - guardbench.evaluator - INFO - Length Academic Dishonesty and Cheating - 29 2025-03-31 22:04:46,804 - guardbench.evaluator - INFO - Processing cached results for category: Creative Content Involving Illicit Themes 2025-03-31 22:04:46,808 - guardbench.evaluator - INFO - Length Creative Content Involving Illicit Themes - 30 2025-03-31 22:04:46,914 - guardbench.evaluator - INFO - Processing cached results for category: Hate Speech, Extremism, and Discrimination 2025-03-31 22:04:46,917 - guardbench.evaluator - INFO - Length Hate Speech, Extremism, and Discrimination - 29 2025-03-31 22:04:47,019 - guardbench.evaluator - INFO - Processing cached results for category: Weapon, Explosives, and Hazardous Materials 2025-03-31 22:04:47,023 - guardbench.evaluator - INFO - Length Weapon, Explosives, and Hazardous Materials - 30 2025-03-31 22:04:47,128 - guardbench.evaluator - INFO - Processing cached results for category: Drug– and Substance–Related Activities 2025-03-31 22:04:47,132 - guardbench.evaluator - INFO - Length Drug– and Substance–Related Activities - 30 2025-03-31 22:04:47,237 - guardbench.evaluator - INFO - Processing cached results for category: Criminal, Violent, and Terrorist Activity 2025-03-31 22:04:47,240 - guardbench.evaluator - INFO - Length Criminal, Violent, and Terrorist Activity - 30 2025-03-31 22:04:47,355 - guardbench.evaluator - INFO - Processing cached results for category: Political Corruption and Legal Evasion 2025-03-31 22:04:47,359 - guardbench.evaluator - INFO - Length Political Corruption and Legal Evasion - 30 2025-03-31 22:04:47,462 - guardbench.evaluator - INFO - Processing cached results for category: Child Exploitation and Abuse 2025-03-31 22:04:47,466 - guardbench.evaluator - INFO - Length Child Exploitation and Abuse - 30 2025-03-31 22:04:47,573 - guardbench.evaluator - INFO - Processing cached results for category: Environmental and Industrial Harm 2025-03-31 22:04:47,576 - guardbench.evaluator - INFO - Length Environmental and Industrial Harm - 30 2025-03-31 22:04:47,694 - guardbench.evaluator - INFO - Processing cached results for category: AI Manipulation and Jailbreaking 2025-03-31 22:04:47,697 - guardbench.evaluator - INFO - Length AI Manipulation and Jailbreaking - 30 2025-03-31 22:04:47,804 - guardbench.evaluator - INFO - Processing cached results for category: Financial Fraud and Unethical Business 2025-03-31 22:04:47,808 - guardbench.evaluator - INFO - Length Financial Fraud and Unethical Business - 30 2025-03-31 22:04:47,913 - guardbench.evaluator - INFO - Processing cached results for category: Manipulation, Deception, and Misinformation 2025-03-31 22:04:47,917 - guardbench.evaluator - INFO - Length Manipulation, Deception, and Misinformation - 30 2025-03-31 22:04:48,022 - guardbench.evaluator - INFO - Processing cached results for category: Safe Prompts 2025-03-31 22:04:48,026 - guardbench.evaluator - INFO - Length Safe Prompts - 490 2025-03-31 22:04:48,605 - guardbench.evaluator - INFO - Processing cached results for category: Self–Harm and Suicidal Ideation 2025-03-31 22:04:48,609 - guardbench.evaluator - INFO - Length Self–Harm and Suicidal Ideation - 13 2025-03-31 22:04:48,667 - guardbench.evaluator - INFO - Processing cached results for category: Sexual Content and Violence 2025-03-31 22:04:48,671 - guardbench.evaluator - INFO - Length Sexual Content and Violence - 29 2025-03-31 22:04:48,771 - guardbench.evaluator - INFO - Processing cached results for category: Animal Cruelty and Exploitation 2025-03-31 22:04:48,775 - guardbench.evaluator - INFO - Length Animal Cruelty and Exploitation - 30 2025-03-31 22:04:48,882 - guardbench.evaluator - INFO - Updated leaderboard for model: gpt4omini-TEST from cached results 2025-03-31 22:04:48,883 - guardbench.evaluator - INFO - Evaluation from cached results completed for model: gpt4omini-TEST 2025-03-31 22:04:50,345 - __main__ - INFO - Refreshing leaderboard data after submission for version v0... 2025-03-31 22:04:50,514 - __main__ - INFO - Refreshed leaderboard data after submission