RefalMachine's picture
Upload folder using huggingface_hub
c1a6d04 verified
INFO: 2024-07-12 15:24:01,535: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-07-12 15:24:01,537: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:24:01,537: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:24:02,751: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-07-12 15:24:02,751: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:24:02,751: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:24:04,874: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-07-12 15:24:04,875: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 15:24:04,875: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 15:24:06,815: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-07-12 15:24:06,815: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 15:24:06,816: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 15:24:08,159: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-07-12 15:24:08,160: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:24:08,160: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:24:10,405: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-07-12 15:24:10,406: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 15:24:10,406: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 15:24:12,527: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-07-12 15:24:12,527: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:24:12,527: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:24:16,840: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.31s
INFO: 2024-07-12 15:24:23,439: llmtf.base.daru/treewayextractive: Loading Dataset: 13.03s
INFO: 2024-07-12 15:24:23,853: llmtf.base.daru/treewayabstractive: Loading Dataset: 15.69s
INFO: 2024-07-12 15:24:24,325: llmtf.base.darumeru/MultiQ: Loading Dataset: 22.79s
INFO: 2024-07-12 15:25:39,133: llmtf.base.darumeru/ruMMLU: Loading Dataset: 96.38s
INFO: 2024-07-12 15:27:44,062: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 217.25s
INFO: 2024-07-12 15:28:38,363: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 273.49s
INFO: 2024-07-12 15:32:51,873: llmtf.base.darumeru/MultiQ: Processing Dataset: 507.55s
INFO: 2024-07-12 15:32:51,875: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-07-12 15:32:51,894: llmtf.base.darumeru/MultiQ: {'f1': 0.5111874982201721, 'em': 0.4225621414913958}
INFO: 2024-07-12 15:32:51,905: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:32:51,905: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:32:55,373: llmtf.base.darumeru/PARus: Loading Dataset: 3.47s
INFO: 2024-07-12 15:33:10,364: llmtf.base.darumeru/PARus: Processing Dataset: 14.99s
INFO: 2024-07-12 15:33:10,366: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-07-12 15:33:10,378: llmtf.base.darumeru/PARus: {'acc': 0.87}
INFO: 2024-07-12 15:33:10,380: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:33:10,380: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:33:14,425: llmtf.base.darumeru/RCB: Loading Dataset: 4.05s
INFO: 2024-07-12 15:33:38,575: llmtf.base.darumeru/RCB: Processing Dataset: 24.15s
INFO: 2024-07-12 15:33:38,577: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-07-12 15:33:38,584: llmtf.base.darumeru/RCB: {'acc': 0.5545454545454546, 'f1_macro': 0.4639261439099977}
INFO: 2024-07-12 15:33:38,586: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:33:38,586: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:33:54,452: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.87s
INFO: 2024-07-12 15:34:38,320: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 621.48s
INFO: 2024-07-12 15:34:38,323: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-07-12 15:34:38,368: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.534645622610045, 'len': 0.9981951213704352, 'lcs': 0.9829681051511814}
INFO: 2024-07-12 15:34:38,372: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:34:38,372: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:34:42,596: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.22s
INFO: 2024-07-12 15:36:33,250: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 158.80s
INFO: 2024-07-12 15:36:33,252: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-07-12 15:36:33,279: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8414948453608248, 'f1_macro': 0.8414933293172036}
INFO: 2024-07-12 15:36:33,295: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:36:33,295: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:36:40,911: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.62s
INFO: 2024-07-12 15:41:03,973: llmtf.base.darumeru/ruTiE: Processing Dataset: 263.06s
INFO: 2024-07-12 15:41:03,977: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE:
INFO: 2024-07-12 15:41:04,006: llmtf.base.darumeru/ruTiE: {'acc': 0.6348837209302326}
INFO: 2024-07-12 15:41:04,009: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:41:04,009: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:41:06,993: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.98s
INFO: 2024-07-12 15:41:15,555: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 8.56s
INFO: 2024-07-12 15:41:15,556: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-07-12 15:41:15,575: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8857142857142857, 'f1_macro': 0.8834382566585957}
INFO: 2024-07-12 15:41:15,576: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:41:15,576: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:41:19,869: llmtf.base.darumeru/RWSD: Loading Dataset: 4.29s
INFO: 2024-07-12 15:41:40,855: llmtf.base.darumeru/RWSD: Processing Dataset: 20.98s
INFO: 2024-07-12 15:41:40,857: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-07-12 15:41:40,860: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019}
INFO: 2024-07-12 15:41:40,862: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:41:40,862: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:41:56,986: llmtf.base.darumeru/USE: Loading Dataset: 16.12s
INFO: 2024-07-12 15:42:29,978: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 467.38s
INFO: 2024-07-12 15:42:29,981: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-07-12 15:42:29,988: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.291694209457502, 'len': 0.9987076617949456, 'lcs': 0.9919123923876758}
INFO: 2024-07-12 15:42:29,991: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:42:29,991: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:42:33,964: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.97s
INFO: 2024-07-12 15:46:15,596: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1236.46s
INFO: 2024-07-12 15:46:15,597: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-07-12 15:46:15,606: llmtf.base.darumeru/ruMMLU: {'acc': 0.5865509328544348}
INFO: 2024-07-12 15:46:15,688: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:46:15,733: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree
0.740 0.467 0.870 0.509 0.608 0.999 0.998 0.587 0.841 0.635 0.885
INFO: 2024-07-12 15:46:39,430: llmtf.base.darumeru/USE: Processing Dataset: 282.44s
INFO: 2024-07-12 15:46:39,432: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-07-12 15:46:39,451: llmtf.base.darumeru/USE: {'grade_norm': 0.15980392156862747}
INFO: 2024-07-12 15:46:39,457: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645]
INFO: 2024-07-12 15:46:39,457: llmtf.base.hfmodel: Updated generation_config.stop_strings: []
INFO: 2024-07-12 15:46:46,553: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1142.49s
INFO: 2024-07-12 15:46:46,555: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-07-12 15:46:46,597: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.450000
anatomy 0.614815
astronomy 0.809211
business_ethics 0.800000
clinical_knowledge 0.758491
college_biology 0.798611
college_chemistry 0.460000
college_computer_science 0.650000
college_mathematics 0.370000
college_medicine 0.664740
college_physics 0.411765
computer_security 0.730000
conceptual_physics 0.736170
econometrics 0.596491
electrical_engineering 0.758621
elementary_mathematics 0.621693
formal_logic 0.547619
global_facts 0.500000
high_school_biology 0.838710
high_school_chemistry 0.620690
high_school_computer_science 0.770000
high_school_european_history 0.830303
high_school_geography 0.873737
high_school_government_and_politics 0.943005
high_school_macroeconomics 0.769231
high_school_mathematics 0.488889
high_school_microeconomics 0.827731
high_school_physics 0.456954
high_school_psychology 0.871560
high_school_statistics 0.666667
high_school_us_history 0.848039
high_school_world_history 0.852321
human_aging 0.739910
human_sexuality 0.793893
international_law 0.859504
jurisprudence 0.842593
logical_fallacies 0.803681
machine_learning 0.464286
management 0.786408
marketing 0.910256
medical_genetics 0.810000
miscellaneous 0.855683
moral_disputes 0.757225
moral_scenarios 0.496089
nutrition 0.787582
philosophy 0.762058
prehistory 0.780864
professional_accounting 0.560284
professional_law 0.516949
professional_medicine 0.724265
professional_psychology 0.759804
public_relations 0.700000
security_studies 0.755102
sociology 0.880597
us_foreign_policy 0.870000
virology 0.524096
world_religions 0.807018
INFO: 2024-07-12 15:46:46,604: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.616792
humanities 0.746482
other (business, health, misc.) 0.716895
social sciences 0.803429
INFO: 2024-07-12 15:46:46,611: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7208996287079178}
INFO: 2024-07-12 15:46:46,682: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:46:46,693: llmtf.base.evaluator:
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.690 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721
INFO: 2024-07-12 15:47:01,688: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 22.23s
INFO: 2024-07-12 15:48:38,284: llmtf.base.daru/treewayextractive: Processing Dataset: 1454.84s
INFO: 2024-07-12 15:48:38,287: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-07-12 15:48:38,541: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917012265512266}
INFO: 2024-07-12 15:48:39,039: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:48:39,049: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU
0.667 0.392 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721
INFO: 2024-07-12 15:50:32,789: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 211.10s
INFO: 2024-07-12 15:50:32,794: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-07-12 15:50:32,806: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7448869752421959, 'mcc': 0.41288254708577415}
INFO: 2024-07-12 15:50:32,817: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:50:32,828: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.661 0.392 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.579
INFO: 2024-07-12 15:55:27,603: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1609.24s
INFO: 2024-07-12 15:55:27,606: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-07-12 15:55:27,649: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.380000
anatomy 0.562963
astronomy 0.684211
business_ethics 0.680000
clinical_knowledge 0.649057
college_biology 0.659722
college_chemistry 0.410000
college_computer_science 0.600000
college_mathematics 0.430000
college_medicine 0.612717
college_physics 0.362745
computer_security 0.680000
conceptual_physics 0.655319
econometrics 0.491228
electrical_engineering 0.544828
elementary_mathematics 0.587302
formal_logic 0.484127
global_facts 0.440000
high_school_biology 0.751613
high_school_chemistry 0.532020
high_school_computer_science 0.720000
high_school_european_history 0.757576
high_school_geography 0.742424
high_school_government_and_politics 0.746114
high_school_macroeconomics 0.661538
high_school_mathematics 0.429630
high_school_microeconomics 0.697479
high_school_physics 0.417219
high_school_psychology 0.785321
high_school_statistics 0.597222
high_school_us_history 0.715686
high_school_world_history 0.767932
human_aging 0.650224
human_sexuality 0.625954
international_law 0.760331
jurisprudence 0.638889
logical_fallacies 0.644172
machine_learning 0.455357
management 0.747573
marketing 0.811966
medical_genetics 0.640000
miscellaneous 0.692209
moral_disputes 0.667630
moral_scenarios 0.301676
nutrition 0.676471
philosophy 0.678457
prehistory 0.617284
professional_accounting 0.390071
professional_law 0.423729
professional_medicine 0.580882
professional_psychology 0.591503
public_relations 0.636364
security_studies 0.697959
sociology 0.771144
us_foreign_policy 0.800000
virology 0.445783
world_religions 0.748538
INFO: 2024-07-12 15:55:27,657: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.549844
humanities 0.631233
other (business, health, misc.) 0.612851
social sciences 0.687252
INFO: 2024-07-12 15:55:27,664: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6202950081820329}
INFO: 2024-07-12 15:55:27,746: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 15:55:27,760: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.658 0.392 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.620 0.579
INFO: 2024-07-12 15:56:03,507: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 809.54s
INFO: 2024-07-12 15:56:03,510: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-07-12 15:56:03,543: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.6436302612451867, 'len': 0.9982892915927111, 'lcs': 0.9647170300253489}
INFO: 2024-07-12 15:56:03,545: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271]
INFO: 2024-07-12 15:56:03,545: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-07-12 15:56:08,301: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.76s
INFO: 2024-07-12 16:04:07,299: llmtf.base.daru/treewayabstractive: Processing Dataset: 2383.44s
INFO: 2024-07-12 16:04:07,304: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-07-12 16:04:07,339: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34568224366963635, 'rouge2': 0.1241863962379739}
INFO: 2024-07-12 16:04:07,344: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 16:04:07,353: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.651 0.235 0.392 0.467 0.870 0.509 0.608 0.160 0.965 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.620 0.579
INFO: 2024-07-12 16:06:16,094: llmtf.base.darumeru/cp_para_en: Processing Dataset: 607.79s
INFO: 2024-07-12 16:06:16,097: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-07-12 16:06:16,101: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.3451848505406785, 'len': 0.9992281640193543, 'lcs': 0.9778434645601852}
INFO: 2024-07-12 16:06:16,103: llmtf.base.evaluator: Ended eval
INFO: 2024-07-12 16:06:16,116: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.669 0.235 0.392 0.467 0.870 0.509 0.608 0.160 0.978 0.965 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.620 0.579