RefalMachine's picture
Upload folder using huggingface_hub
ac85578 verified
INFO: 2024-07-12 12:48:42,054: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-12 12:48:42,055: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 12:48:42,055: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:48:43,444: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-12 12:48:43,444: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 12:48:43,445: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:48:46,084: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-12 12:48:46,084: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-12 12:48:46,084: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 12:48:47,926: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-12 12:48:47,926: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-12 12:48:47,927: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 12:48:49,072: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-12 12:48:49,072: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 12:48:49,072: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:48:51,325: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-12 12:48:51,325: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-12 12:48:51,325: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 12:48:53,579: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-12 12:48:53,580: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 12:48:53,580: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:48:55,110: llmtf.base.darumeru/MultiQ: Loading Dataset: 13.06s
INFO: 2024-07-12 12:48:56,870: llmtf.base.daru/treewayabstractive: Loading Dataset: 7.80s
INFO: 2024-07-12 12:48:57,066: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 3.49s
INFO: 2024-07-12 12:48:59,493: llmtf.base.daru/treewayextractive: Loading Dataset: 8.17s
INFO: 2024-07-12 12:49:35,613: llmtf.base.darumeru/ruMMLU: Loading Dataset: 52.17s
INFO: 2024-07-12 12:52:06,068: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 198.14s
INFO: 2024-07-12 12:52:12,454: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 206.37s
INFO: 2024-07-12 12:57:36,064: llmtf.base.darumeru/MultiQ: Processing Dataset: 520.95s
INFO: 2024-07-12 12:57:36,065: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-12 12:57:36,070: llmtf.base.darumeru/MultiQ: {'f1': 0.5698089335328944, 'em': 0.5019120458891013}
INFO: 2024-07-12 12:57:36,081: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 12:57:36,081: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:57:38,886: llmtf.base.darumeru/PARus: Loading Dataset: 2.80s
INFO: 2024-07-12 12:57:56,888: llmtf.base.darumeru/PARus: Processing Dataset: 18.00s
INFO: 2024-07-12 12:57:56,890: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-12 12:57:56,902: llmtf.base.darumeru/PARus: {'acc': 0.83}
INFO: 2024-07-12 12:57:56,904: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 12:57:56,904: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:58:00,330: llmtf.base.darumeru/RCB: Loading Dataset: 3.43s
INFO: 2024-07-12 12:58:26,702: llmtf.base.darumeru/RCB: Processing Dataset: 26.37s
INFO: 2024-07-12 12:58:26,716: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-12 12:58:26,730: llmtf.base.darumeru/RCB: {'acc': 0.5318181818181819, 'f1_macro': 0.4819804386277897}
INFO: 2024-07-12 12:58:26,731: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 12:58:26,732: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 12:58:35,277: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 8.54s
INFO: 2024-07-12 13:00:31,609: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 694.54s
INFO: 2024-07-12 13:00:31,613: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-12 13:00:31,645: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.3698479637205674, 'len': 0.998929777089993, 'lcs': 0.9815584658287272}
INFO: 2024-07-12 13:00:31,648: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 13:00:31,648: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:00:34,944: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.30s
INFO: 2024-07-12 13:01:29,607: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 174.33s
INFO: 2024-07-12 13:01:29,610: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-12 13:01:29,622: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7538659793814433, 'f1_macro': 0.7551200071805053}
INFO: 2024-07-12 13:01:29,638: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 13:01:29,639: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:01:33,958: llmtf.base.darumeru/ruTiE: Loading Dataset: 4.32s
INFO: 2024-07-12 13:02:27,340: llmtf.base.daru/treewayextractive: Processing Dataset: 807.84s
INFO: 2024-07-12 13:02:27,342: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-12 13:02:27,599: llmtf.base.daru/treewayextractive: {'r-prec': 0.4038567821067821}
INFO: 2024-07-12 13:02:28,175: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:02:28,182: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/cp_sent_ru darumeru/ruOpenBookQA
0.672 0.404 0.536 0.830 0.507 0.999 0.754
INFO: 2024-07-12 13:05:57,127: llmtf.base.darumeru/ruTiE: Processing Dataset: 263.17s
INFO: 2024-07-12 13:05:57,131: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-12 13:05:57,160: llmtf.base.darumeru/ruTiE: {'acc': 0.5395348837209303}
INFO: 2024-07-12 13:05:57,163: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 13:05:57,163: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:05:59,613: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.45s
INFO: 2024-07-12 13:06:09,846: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 10.23s
INFO: 2024-07-12 13:06:09,848: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-12 13:06:09,854: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8761420630173862}
INFO: 2024-07-12 13:06:09,855: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 13:06:09,855: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:06:12,444: llmtf.base.darumeru/RWSD: Loading Dataset: 2.59s
INFO: 2024-07-12 13:06:36,116: llmtf.base.darumeru/RWSD: Processing Dataset: 23.67s
INFO: 2024-07-12 13:06:36,132: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-12 13:06:36,136: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019}
INFO: 2024-07-12 13:06:36,138: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 13:06:36,138: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:06:43,790: llmtf.base.darumeru/USE: Loading Dataset: 7.65s
INFO: 2024-07-12 13:09:40,078: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 545.13s
INFO: 2024-07-12 13:09:40,098: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-12 13:09:40,103: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 3.8994152226580563, 'len': 0.9995035620835028, 'lcs': 0.9936840637058483}
INFO: 2024-07-12 13:09:40,105: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 13:09:40,106: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:09:42,693: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.59s
INFO: 2024-07-12 13:12:52,217: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1396.60s
INFO: 2024-07-12 13:12:52,219: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-12 13:12:52,228: llmtf.base.darumeru/ruMMLU: {'acc': 0.48737902823505935}
INFO: 2024-07-12 13:12:52,316: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:12:52,364: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree
0.685 0.404 0.536 0.830 0.507 0.608 1.000 0.999 0.487 0.754 0.540 0.876
INFO: 2024-07-12 13:13:11,907: llmtf.base.darumeru/USE: Processing Dataset: 388.12s
INFO: 2024-07-12 13:13:11,911: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-12 13:13:11,929: llmtf.base.darumeru/USE: {'grade_norm': 0.11764705882352941}
INFO: 2024-07-12 13:13:11,936: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000]
INFO: 2024-07-12 13:13:11,936: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 13:13:21,392: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1275.32s
INFO: 2024-07-12 13:13:21,395: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-12 13:13:21,436: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.310000
anatomy 0.644444
astronomy 0.677632
business_ethics 0.650000
clinical_knowledge 0.720755
college_biology 0.763889
college_chemistry 0.480000
college_computer_science 0.570000
college_mathematics 0.400000
college_medicine 0.676301
college_physics 0.372549
computer_security 0.760000
conceptual_physics 0.591489
econometrics 0.473684
electrical_engineering 0.551724
elementary_mathematics 0.396825
formal_logic 0.492063
global_facts 0.320000
high_school_biology 0.780645
high_school_chemistry 0.487685
high_school_computer_science 0.680000
high_school_european_history 0.806061
high_school_geography 0.787879
high_school_government_and_politics 0.891192
high_school_macroeconomics 0.643590
high_school_mathematics 0.355556
high_school_microeconomics 0.663866
high_school_physics 0.364238
high_school_psychology 0.834862
high_school_statistics 0.486111
high_school_us_history 0.838235
high_school_world_history 0.835443
human_aging 0.708520
human_sexuality 0.763359
international_law 0.809917
jurisprudence 0.750000
logical_fallacies 0.791411
machine_learning 0.491071
management 0.834951
marketing 0.880342
medical_genetics 0.740000
miscellaneous 0.822478
moral_disputes 0.728324
moral_scenarios 0.271508
nutrition 0.722222
philosophy 0.710611
prehistory 0.762346
professional_accounting 0.492908
professional_law 0.481095
professional_medicine 0.713235
professional_psychology 0.638889
public_relations 0.645455
security_studies 0.742857
sociology 0.840796
us_foreign_policy 0.840000
virology 0.530120
world_religions 0.818713
INFO: 2024-07-12 13:13:21,443: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.528856
humanities 0.699671
other (business, health, misc.) 0.675448
social sciences 0.730536
INFO: 2024-07-12 13:13:21,460: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6586279387925342}
INFO: 2024-07-12 13:13:21,531: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:13:21,577: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.640 0.404 0.536 0.830 0.507 0.608 0.118 1.000 0.999 0.487 0.754 0.540 0.876 0.659
INFO: 2024-07-12 13:13:23,597: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 11.66s
INFO: 2024-07-12 13:17:32,181: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 248.58s
INFO: 2024-07-12 13:17:32,186: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-12 13:17:32,198: llmtf.base.russiannlp/rucola_custom: {'acc': 0.736275565123789, 'mcc': 0.37026925316854403}
INFO: 2024-07-12 13:17:32,210: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:17:32,235: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.634 0.404 0.536 0.830 0.507 0.608 0.118 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.553
INFO: 2024-07-12 13:22:29,666: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1817.21s
INFO: 2024-07-12 13:22:29,672: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-12 13:22:29,713: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.300000
anatomy 0.392593
astronomy 0.565789
business_ethics 0.560000
clinical_knowledge 0.554717
college_biology 0.465278
college_chemistry 0.410000
college_computer_science 0.500000
college_mathematics 0.370000
college_medicine 0.560694
college_physics 0.333333
computer_security 0.580000
conceptual_physics 0.472340
econometrics 0.403509
electrical_engineering 0.503448
elementary_mathematics 0.362434
formal_logic 0.357143
global_facts 0.320000
high_school_biology 0.609677
high_school_chemistry 0.389163
high_school_computer_science 0.640000
high_school_european_history 0.672727
high_school_geography 0.671717
high_school_government_and_politics 0.652850
high_school_macroeconomics 0.515385
high_school_mathematics 0.318519
high_school_microeconomics 0.521008
high_school_physics 0.337748
high_school_psychology 0.656881
high_school_statistics 0.430556
high_school_us_history 0.725490
high_school_world_history 0.691983
human_aging 0.520179
human_sexuality 0.610687
international_law 0.710744
jurisprudence 0.592593
logical_fallacies 0.503067
machine_learning 0.446429
management 0.669903
marketing 0.735043
medical_genetics 0.540000
miscellaneous 0.607918
moral_disputes 0.580925
moral_scenarios 0.188827
nutrition 0.611111
philosophy 0.575563
prehistory 0.527778
professional_accounting 0.397163
professional_law 0.365059
professional_medicine 0.437500
professional_psychology 0.493464
public_relations 0.545455
security_studies 0.595918
sociology 0.681592
us_foreign_policy 0.680000
virology 0.433735
world_religions 0.748538
INFO: 2024-07-12 13:22:29,720: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.446373
humanities 0.556957
other (business, health, misc.) 0.524325
social sciences 0.585705
INFO: 2024-07-12 13:22:29,744: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5283401236619901}
INFO: 2024-07-12 13:22:29,827: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:22:29,843: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.627 0.404 0.536 0.830 0.507 0.608 0.118 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553
INFO: 2024-07-12 13:24:30,400: llmtf.base.daru/treewayabstractive: Processing Dataset: 2133.53s
INFO: 2024-07-12 13:24:30,406: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-12 13:24:30,411: llmtf.base.daru/treewayabstractive: {'rouge1': 0.35648633247803135, 'rouge2': 0.13258370390182936}
INFO: 2024-07-12 13:24:30,414: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:24:30,426: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.603 0.245 0.404 0.536 0.830 0.507 0.608 0.118 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553
INFO: 2024-07-12 13:25:00,587: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 917.89s
INFO: 2024-07-12 13:25:00,603: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-12 13:25:00,607: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.470731796239884, 'len': 0.9979824104186845, 'lcs': 0.959932364013627}
INFO: 2024-07-12 13:25:00,609: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [32000, 13]
INFO: 2024-07-12 13:25:00,609: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 13:25:03,462: llmtf.base.darumeru/cp_para_en: Loading Dataset: 2.85s
INFO: 2024-07-12 13:36:47,842: llmtf.base.darumeru/cp_para_en: Processing Dataset: 704.38s
INFO: 2024-07-12 13:36:47,846: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-12 13:36:47,850: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 3.960763996832381, 'len': 0.9995281850843424, 'lcs': 0.9811766452032213}
INFO: 2024-07-12 13:36:47,852: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 13:36:47,881: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.644 0.245 0.404 0.536 0.830 0.507 0.608 0.118 0.981 0.960 1.000 0.999 0.487 0.754 0.540 0.876 0.659 0.528 0.553