2025-03-31 20:54:21,474 - __main__ - INFO - Initializing leaderboard data... 2025-03-31 20:54:21,606 - __main__ - INFO - Loaded leaderboard with 0 entries 2025-03-31 20:54:21,675 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:21,785 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:21,881 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:21,977 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,074 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,169 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,293 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,394 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,505 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,594 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,685 - __main__ - WARNING - Initializing empty leaderboard 2025-03-31 20:54:22,997 - apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts 2025-03-31 20:54:22,997 - apscheduler.scheduler - INFO - Added job "" to job store "default" 2025-03-31 20:54:22,997 - apscheduler.scheduler - INFO - Scheduler started 2025-03-31 20:55:51,877 - __main__ - INFO - Received submission for model chatgpt-4o-latest (CoT): /tmp/gradio/a1f2d3a725f7b441a1fbfdac8e51dfd3bf7bbb4ab2d1c20362cfa130f4bdda6d/chatgpt-4o-latest CoT.jsonl 2025-03-31 20:55:51,906 - guardbench.context - INFO - Loading dataset from: whitecircle-ai/guardbench_dataset_1k_public 2025-03-31 20:55:52,929 - guardbench.context - INFO - Successfully loaded dataset with 980 examples 2025-03-31 20:55:52,929 - guardbench.evaluator - INFO - Starting evaluation for model: chatgpt-4o-latest_(CoT) 2025-03-31 20:55:52,929 - guardbench.evaluator - INFO - Using cached results for model: chatgpt-4o-latest_(CoT) 2025-03-31 20:55:52,966 - guardbench.evaluator - INFO - Processing cached results for category: Animal Cruelty and Exploitation 2025-03-31 20:55:52,970 - guardbench.evaluator - INFO - Length Animal Cruelty and Exploitation - 30 2025-03-31 20:55:53,073 - guardbench.evaluator - INFO - Processing cached results for category: Hate Speech, Extremism, and Discrimination 2025-03-31 20:55:53,076 - guardbench.evaluator - INFO - Length Hate Speech, Extremism, and Discrimination - 29 2025-03-31 20:55:53,175 - guardbench.evaluator - INFO - Processing cached results for category: Creative Content Involving Illicit Themes 2025-03-31 20:55:53,178 - guardbench.evaluator - INFO - Length Creative Content Involving Illicit Themes - 30 2025-03-31 20:55:53,281 - guardbench.evaluator - INFO - Processing cached results for category: AI Manipulation and Jailbreaking 2025-03-31 20:55:53,284 - guardbench.evaluator - INFO - Length AI Manipulation and Jailbreaking - 30 2025-03-31 20:55:53,386 - guardbench.evaluator - INFO - Processing cached results for category: Sexual Content and Violence 2025-03-31 20:55:53,390 - guardbench.evaluator - INFO - Length Sexual Content and Violence - 29 2025-03-31 20:55:53,487 - guardbench.evaluator - INFO - Processing cached results for category: Child Exploitation and Abuse 2025-03-31 20:55:53,491 - guardbench.evaluator - INFO - Length Child Exploitation and Abuse - 30 2025-03-31 20:55:53,592 - guardbench.evaluator - INFO - Processing cached results for category: Manipulation, Deception, and Misinformation 2025-03-31 20:55:53,596 - guardbench.evaluator - INFO - Length Manipulation, Deception, and Misinformation - 30 2025-03-31 20:55:53,698 - guardbench.evaluator - INFO - Processing cached results for category: Safe Prompts 2025-03-31 20:55:53,701 - guardbench.evaluator - INFO - Length Safe Prompts - 490 2025-03-31 20:55:54,267 - guardbench.evaluator - INFO - Processing cached results for category: Environmental and Industrial Harm 2025-03-31 20:55:54,271 - guardbench.evaluator - INFO - Length Environmental and Industrial Harm - 30 2025-03-31 20:55:54,371 - guardbench.evaluator - INFO - Processing cached results for category: Self–Harm and Suicidal Ideation 2025-03-31 20:55:54,375 - guardbench.evaluator - INFO - Length Self–Harm and Suicidal Ideation - 13 2025-03-31 20:55:54,431 - guardbench.evaluator - INFO - Processing cached results for category: Drug– and Substance–Related Activities 2025-03-31 20:55:54,434 - guardbench.evaluator - INFO - Length Drug– and Substance–Related Activities - 30 2025-03-31 20:55:54,534 - guardbench.evaluator - INFO - Processing cached results for category: Academic Dishonesty and Cheating 2025-03-31 20:55:54,537 - guardbench.evaluator - INFO - Length Academic Dishonesty and Cheating - 29 2025-03-31 20:55:54,634 - guardbench.evaluator - INFO - Processing cached results for category: Financial Fraud and Unethical Business 2025-03-31 20:55:54,638 - guardbench.evaluator - INFO - Length Financial Fraud and Unethical Business - 30 2025-03-31 20:55:54,738 - guardbench.evaluator - INFO - Processing cached results for category: Labor Exploitation and Human Trafficking 2025-03-31 20:55:54,741 - guardbench.evaluator - INFO - Length Labor Exploitation and Human Trafficking - 30 2025-03-31 20:55:54,841 - guardbench.evaluator - INFO - Processing cached results for category: Cybercrime, Hacking, and Digital Exploits 2025-03-31 20:55:54,844 - guardbench.evaluator - INFO - Length Cybercrime, Hacking, and Digital Exploits - 30 2025-03-31 20:55:54,945 - guardbench.evaluator - INFO - Processing cached results for category: Political Corruption and Legal Evasion 2025-03-31 20:55:54,949 - guardbench.evaluator - INFO - Length Political Corruption and Legal Evasion - 30 2025-03-31 20:55:55,049 - guardbench.evaluator - INFO - Processing cached results for category: Weapon, Explosives, and Hazardous Materials 2025-03-31 20:55:55,052 - guardbench.evaluator - INFO - Length Weapon, Explosives, and Hazardous Materials - 30 2025-03-31 20:55:55,152 - guardbench.evaluator - INFO - Processing cached results for category: Criminal, Violent, and Terrorist Activity 2025-03-31 20:55:55,156 - guardbench.evaluator - INFO - Length Criminal, Violent, and Terrorist Activity - 30 2025-03-31 20:55:55,260 - guardbench.evaluator - INFO - Updated leaderboard for model: chatgpt-4o-latest_(CoT) from cached results 2025-03-31 20:55:55,262 - guardbench.evaluator - INFO - Evaluation from cached results completed for model: chatgpt-4o-latest_(CoT) 2025-03-31 20:55:56,838 - __main__ - INFO - Refreshing leaderboard data after submission for version v0... 2025-03-31 20:55:57,001 - __main__ - INFO - Refreshed leaderboard data after submission