circle-guard-bench / logs /guardbench_20250331_223148_4e22eb66.log
apsys's picture
works
b1cb07d
raw
history blame
5.78 kB
2025-03-31 22:31:49,992 - __main__ - INFO - Initializing leaderboard data...
2025-03-31 22:31:50,115 - __main__ - INFO - Loaded leaderboard with 9 entries
2025-03-31 22:31:51,543 - apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts
2025-03-31 22:31:51,543 - apscheduler.scheduler - INFO - Added job "<lambda>" to job store "default"
2025-03-31 22:31:51,543 - apscheduler.scheduler - INFO - Scheduler started
2025-03-31 22:32:19,107 - __main__ - INFO - Received submission for model got-r0mini-8: /tmp/gradio/26f1c6837517a59736a02ffe486b3504336116347e339745a7973e9412dad4db/gpt-4o-mini-3.jsonl
2025-03-31 22:32:19,583 - guardbench.context - INFO - Loading dataset from: whitecircle-ai/guardbench_dataset_1k_public
2025-03-31 22:32:20,839 - guardbench.context - INFO - Successfully loaded dataset with 980 examples
2025-03-31 22:32:20,839 - guardbench.evaluator - INFO - Starting evaluation for model: got-r0mini-8
2025-03-31 22:32:20,839 - guardbench.evaluator - INFO - Using cached results for model: got-r0mini-8
2025-03-31 22:32:20,857 - guardbench.evaluator - INFO - Processing cached results for category: Political Corruption and Legal Evasion
2025-03-31 22:32:20,861 - guardbench.evaluator - INFO - Length Political Corruption and Legal Evasion - 30
2025-03-31 22:32:20,960 - guardbench.evaluator - INFO - Processing cached results for category: Creative Content Involving Illicit Themes
2025-03-31 22:32:20,963 - guardbench.evaluator - INFO - Length Creative Content Involving Illicit Themes - 30
2025-03-31 22:32:21,062 - guardbench.evaluator - INFO - Processing cached results for category: Financial Fraud and Unethical Business
2025-03-31 22:32:21,065 - guardbench.evaluator - INFO - Length Financial Fraud and Unethical Business - 30
2025-03-31 22:32:21,165 - guardbench.evaluator - INFO - Processing cached results for category: Manipulation, Deception, and Misinformation
2025-03-31 22:32:21,168 - guardbench.evaluator - INFO - Length Manipulation, Deception, and Misinformation - 30
2025-03-31 22:32:21,266 - guardbench.evaluator - INFO - Processing cached results for category: Drug– and Substance–Related Activities
2025-03-31 22:32:21,269 - guardbench.evaluator - INFO - Length Drug– and Substance–Related Activities - 30
2025-03-31 22:32:21,367 - guardbench.evaluator - INFO - Processing cached results for category: AI Manipulation and Jailbreaking
2025-03-31 22:32:21,371 - guardbench.evaluator - INFO - Length AI Manipulation and Jailbreaking - 30
2025-03-31 22:32:21,469 - guardbench.evaluator - INFO - Processing cached results for category: Sexual Content and Violence
2025-03-31 22:32:21,473 - guardbench.evaluator - INFO - Length Sexual Content and Violence - 29
2025-03-31 22:32:21,568 - guardbench.evaluator - INFO - Processing cached results for category: Cybercrime, Hacking, and Digital Exploits
2025-03-31 22:32:21,571 - guardbench.evaluator - INFO - Length Cybercrime, Hacking, and Digital Exploits - 30
2025-03-31 22:32:21,669 - guardbench.evaluator - INFO - Processing cached results for category: Safe Prompts
2025-03-31 22:32:21,673 - guardbench.evaluator - INFO - Length Safe Prompts - 490
2025-03-31 22:32:22,215 - guardbench.evaluator - INFO - Processing cached results for category: Child Exploitation and Abuse
2025-03-31 22:32:22,219 - guardbench.evaluator - INFO - Length Child Exploitation and Abuse - 30
2025-03-31 22:32:22,318 - guardbench.evaluator - INFO - Processing cached results for category: Labor Exploitation and Human Trafficking
2025-03-31 22:32:22,321 - guardbench.evaluator - INFO - Length Labor Exploitation and Human Trafficking - 30
2025-03-31 22:32:22,418 - guardbench.evaluator - INFO - Processing cached results for category: Self–Harm and Suicidal Ideation
2025-03-31 22:32:22,422 - guardbench.evaluator - INFO - Length Self–Harm and Suicidal Ideation - 13
2025-03-31 22:32:22,476 - guardbench.evaluator - INFO - Processing cached results for category: Criminal, Violent, and Terrorist Activity
2025-03-31 22:32:22,480 - guardbench.evaluator - INFO - Length Criminal, Violent, and Terrorist Activity - 30
2025-03-31 22:32:22,578 - guardbench.evaluator - INFO - Processing cached results for category: Hate Speech, Extremism, and Discrimination
2025-03-31 22:32:22,581 - guardbench.evaluator - INFO - Length Hate Speech, Extremism, and Discrimination - 29
2025-03-31 22:32:22,678 - guardbench.evaluator - INFO - Processing cached results for category: Environmental and Industrial Harm
2025-03-31 22:32:22,681 - guardbench.evaluator - INFO - Length Environmental and Industrial Harm - 30
2025-03-31 22:32:22,779 - guardbench.evaluator - INFO - Processing cached results for category: Animal Cruelty and Exploitation
2025-03-31 22:32:22,783 - guardbench.evaluator - INFO - Length Animal Cruelty and Exploitation - 30
2025-03-31 22:32:22,884 - guardbench.evaluator - INFO - Processing cached results for category: Academic Dishonesty and Cheating
2025-03-31 22:32:22,888 - guardbench.evaluator - INFO - Length Academic Dishonesty and Cheating - 29
2025-03-31 22:32:22,984 - guardbench.evaluator - INFO - Processing cached results for category: Weapon, Explosives, and Hazardous Materials
2025-03-31 22:32:22,987 - guardbench.evaluator - INFO - Length Weapon, Explosives, and Hazardous Materials - 30
2025-03-31 22:32:23,092 - guardbench.evaluator - INFO - Updated leaderboard for model: got-r0mini-8 from cached results
2025-03-31 22:32:23,093 - guardbench.evaluator - INFO - Evaluation from cached results completed for model: got-r0mini-8
2025-03-31 22:32:25,575 - __main__ - INFO - Refreshing leaderboard data after submission for version v0...
2025-03-31 22:32:25,790 - __main__ - INFO - Refreshed leaderboard data after submission