|
INFO: 2024-07-12 15:24:01,535: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/rutie', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
|
INFO: 2024-07-12 15:24:01,537: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:24:01,537: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:24:02,751: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
|
INFO: 2024-07-12 15:24:02,751: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:24:02,751: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:24:04,874: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-07-12 15:24:04,875: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 15:24:04,875: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 15:24:06,815: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-07-12 15:24:06,815: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 15:24:06,816: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 15:24:08,159: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-07-12 15:24:08,160: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:24:08,160: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:24:10,405: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-07-12 15:24:10,406: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 15:24:10,406: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 15:24:12,527: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
|
INFO: 2024-07-12 15:24:12,527: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:24:12,527: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:24:16,840: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.31s |
|
INFO: 2024-07-12 15:24:23,439: llmtf.base.daru/treewayextractive: Loading Dataset: 13.03s |
|
INFO: 2024-07-12 15:24:23,853: llmtf.base.daru/treewayabstractive: Loading Dataset: 15.69s |
|
INFO: 2024-07-12 15:24:24,325: llmtf.base.darumeru/MultiQ: Loading Dataset: 22.79s |
|
INFO: 2024-07-12 15:25:39,133: llmtf.base.darumeru/ruMMLU: Loading Dataset: 96.38s |
|
INFO: 2024-07-12 15:27:44,062: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 217.25s |
|
INFO: 2024-07-12 15:28:38,363: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 273.49s |
|
INFO: 2024-07-12 15:32:51,873: llmtf.base.darumeru/MultiQ: Processing Dataset: 507.55s |
|
INFO: 2024-07-12 15:32:51,875: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-07-12 15:32:51,894: llmtf.base.darumeru/MultiQ: {'f1': 0.5111874982201721, 'em': 0.4225621414913958} |
|
INFO: 2024-07-12 15:32:51,905: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:32:51,905: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:32:55,373: llmtf.base.darumeru/PARus: Loading Dataset: 3.47s |
|
INFO: 2024-07-12 15:33:10,364: llmtf.base.darumeru/PARus: Processing Dataset: 14.99s |
|
INFO: 2024-07-12 15:33:10,366: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-07-12 15:33:10,378: llmtf.base.darumeru/PARus: {'acc': 0.87} |
|
INFO: 2024-07-12 15:33:10,380: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:33:10,380: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:33:14,425: llmtf.base.darumeru/RCB: Loading Dataset: 4.05s |
|
INFO: 2024-07-12 15:33:38,575: llmtf.base.darumeru/RCB: Processing Dataset: 24.15s |
|
INFO: 2024-07-12 15:33:38,577: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-07-12 15:33:38,584: llmtf.base.darumeru/RCB: {'acc': 0.5545454545454546, 'f1_macro': 0.4639261439099977} |
|
INFO: 2024-07-12 15:33:38,586: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:33:38,586: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:33:54,452: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.87s |
|
INFO: 2024-07-12 15:34:38,320: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 621.48s |
|
INFO: 2024-07-12 15:34:38,323: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
|
INFO: 2024-07-12 15:34:38,368: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 2.534645622610045, 'len': 0.9981951213704352, 'lcs': 0.9829681051511814} |
|
INFO: 2024-07-12 15:34:38,372: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:34:38,372: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:34:42,596: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.22s |
|
INFO: 2024-07-12 15:36:33,250: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 158.80s |
|
INFO: 2024-07-12 15:36:33,252: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-07-12 15:36:33,279: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.8414948453608248, 'f1_macro': 0.8414933293172036} |
|
INFO: 2024-07-12 15:36:33,295: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:36:33,295: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:36:40,911: llmtf.base.darumeru/ruTiE: Loading Dataset: 7.62s |
|
INFO: 2024-07-12 15:41:03,973: llmtf.base.darumeru/ruTiE: Processing Dataset: 263.06s |
|
INFO: 2024-07-12 15:41:03,977: llmtf.base.darumeru/ruTiE: Results for darumeru/ruTiE: |
|
INFO: 2024-07-12 15:41:04,006: llmtf.base.darumeru/ruTiE: {'acc': 0.6348837209302326} |
|
INFO: 2024-07-12 15:41:04,009: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:41:04,009: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:41:06,993: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.98s |
|
INFO: 2024-07-12 15:41:15,555: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 8.56s |
|
INFO: 2024-07-12 15:41:15,556: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-07-12 15:41:15,575: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8857142857142857, 'f1_macro': 0.8834382566585957} |
|
INFO: 2024-07-12 15:41:15,576: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:41:15,576: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:41:19,869: llmtf.base.darumeru/RWSD: Loading Dataset: 4.29s |
|
INFO: 2024-07-12 15:41:40,855: llmtf.base.darumeru/RWSD: Processing Dataset: 20.98s |
|
INFO: 2024-07-12 15:41:40,857: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-07-12 15:41:40,860: llmtf.base.darumeru/RWSD: {'acc': 0.6078431372549019} |
|
INFO: 2024-07-12 15:41:40,862: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:41:40,862: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:41:56,986: llmtf.base.darumeru/USE: Loading Dataset: 16.12s |
|
INFO: 2024-07-12 15:42:29,978: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 467.38s |
|
INFO: 2024-07-12 15:42:29,981: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
|
INFO: 2024-07-12 15:42:29,988: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.291694209457502, 'len': 0.9987076617949456, 'lcs': 0.9919123923876758} |
|
INFO: 2024-07-12 15:42:29,991: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:42:29,991: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:42:33,964: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.97s |
|
INFO: 2024-07-12 15:46:15,596: llmtf.base.darumeru/ruMMLU: Processing Dataset: 1236.46s |
|
INFO: 2024-07-12 15:46:15,597: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
|
INFO: 2024-07-12 15:46:15,606: llmtf.base.darumeru/ruMMLU: {'acc': 0.5865509328544348} |
|
INFO: 2024-07-12 15:46:15,688: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:46:15,733: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree |
|
0.740 0.467 0.870 0.509 0.608 0.999 0.998 0.587 0.841 0.635 0.885 |
|
INFO: 2024-07-12 15:46:39,430: llmtf.base.darumeru/USE: Processing Dataset: 282.44s |
|
INFO: 2024-07-12 15:46:39,432: llmtf.base.darumeru/USE: Results for darumeru/USE: |
|
INFO: 2024-07-12 15:46:39,451: llmtf.base.darumeru/USE: {'grade_norm': 0.15980392156862747} |
|
INFO: 2024-07-12 15:46:39,457: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645] |
|
INFO: 2024-07-12 15:46:39,457: llmtf.base.hfmodel: Updated generation_config.stop_strings: [] |
|
INFO: 2024-07-12 15:46:46,553: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 1142.49s |
|
INFO: 2024-07-12 15:46:46,555: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-07-12 15:46:46,597: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.450000 |
|
anatomy 0.614815 |
|
astronomy 0.809211 |
|
business_ethics 0.800000 |
|
clinical_knowledge 0.758491 |
|
college_biology 0.798611 |
|
college_chemistry 0.460000 |
|
college_computer_science 0.650000 |
|
college_mathematics 0.370000 |
|
college_medicine 0.664740 |
|
college_physics 0.411765 |
|
computer_security 0.730000 |
|
conceptual_physics 0.736170 |
|
econometrics 0.596491 |
|
electrical_engineering 0.758621 |
|
elementary_mathematics 0.621693 |
|
formal_logic 0.547619 |
|
global_facts 0.500000 |
|
high_school_biology 0.838710 |
|
high_school_chemistry 0.620690 |
|
high_school_computer_science 0.770000 |
|
high_school_european_history 0.830303 |
|
high_school_geography 0.873737 |
|
high_school_government_and_politics 0.943005 |
|
high_school_macroeconomics 0.769231 |
|
high_school_mathematics 0.488889 |
|
high_school_microeconomics 0.827731 |
|
high_school_physics 0.456954 |
|
high_school_psychology 0.871560 |
|
high_school_statistics 0.666667 |
|
high_school_us_history 0.848039 |
|
high_school_world_history 0.852321 |
|
human_aging 0.739910 |
|
human_sexuality 0.793893 |
|
international_law 0.859504 |
|
jurisprudence 0.842593 |
|
logical_fallacies 0.803681 |
|
machine_learning 0.464286 |
|
management 0.786408 |
|
marketing 0.910256 |
|
medical_genetics 0.810000 |
|
miscellaneous 0.855683 |
|
moral_disputes 0.757225 |
|
moral_scenarios 0.496089 |
|
nutrition 0.787582 |
|
philosophy 0.762058 |
|
prehistory 0.780864 |
|
professional_accounting 0.560284 |
|
professional_law 0.516949 |
|
professional_medicine 0.724265 |
|
professional_psychology 0.759804 |
|
public_relations 0.700000 |
|
security_studies 0.755102 |
|
sociology 0.880597 |
|
us_foreign_policy 0.870000 |
|
virology 0.524096 |
|
world_religions 0.807018 |
|
INFO: 2024-07-12 15:46:46,604: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.616792 |
|
humanities 0.746482 |
|
other (business, health, misc.) 0.716895 |
|
social sciences 0.803429 |
|
INFO: 2024-07-12 15:46:46,611: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.7208996287079178} |
|
INFO: 2024-07-12 15:46:46,682: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:46:46,693: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
|
0.690 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721 |
|
INFO: 2024-07-12 15:47:01,688: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 22.23s |
|
INFO: 2024-07-12 15:48:38,284: llmtf.base.daru/treewayextractive: Processing Dataset: 1454.84s |
|
INFO: 2024-07-12 15:48:38,287: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-07-12 15:48:38,541: llmtf.base.daru/treewayextractive: {'r-prec': 0.3917012265512266} |
|
INFO: 2024-07-12 15:48:39,039: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:48:39,049: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU |
|
0.667 0.392 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721 |
|
INFO: 2024-07-12 15:50:32,789: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 211.10s |
|
INFO: 2024-07-12 15:50:32,794: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
|
INFO: 2024-07-12 15:50:32,806: llmtf.base.russiannlp/rucola_custom: {'acc': 0.7448869752421959, 'mcc': 0.41288254708577415} |
|
INFO: 2024-07-12 15:50:32,817: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:50:32,828: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom |
|
0.661 0.392 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.579 |
|
INFO: 2024-07-12 15:55:27,603: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 1609.24s |
|
INFO: 2024-07-12 15:55:27,606: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-07-12 15:55:27,649: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.380000 |
|
anatomy 0.562963 |
|
astronomy 0.684211 |
|
business_ethics 0.680000 |
|
clinical_knowledge 0.649057 |
|
college_biology 0.659722 |
|
college_chemistry 0.410000 |
|
college_computer_science 0.600000 |
|
college_mathematics 0.430000 |
|
college_medicine 0.612717 |
|
college_physics 0.362745 |
|
computer_security 0.680000 |
|
conceptual_physics 0.655319 |
|
econometrics 0.491228 |
|
electrical_engineering 0.544828 |
|
elementary_mathematics 0.587302 |
|
formal_logic 0.484127 |
|
global_facts 0.440000 |
|
high_school_biology 0.751613 |
|
high_school_chemistry 0.532020 |
|
high_school_computer_science 0.720000 |
|
high_school_european_history 0.757576 |
|
high_school_geography 0.742424 |
|
high_school_government_and_politics 0.746114 |
|
high_school_macroeconomics 0.661538 |
|
high_school_mathematics 0.429630 |
|
high_school_microeconomics 0.697479 |
|
high_school_physics 0.417219 |
|
high_school_psychology 0.785321 |
|
high_school_statistics 0.597222 |
|
high_school_us_history 0.715686 |
|
high_school_world_history 0.767932 |
|
human_aging 0.650224 |
|
human_sexuality 0.625954 |
|
international_law 0.760331 |
|
jurisprudence 0.638889 |
|
logical_fallacies 0.644172 |
|
machine_learning 0.455357 |
|
management 0.747573 |
|
marketing 0.811966 |
|
medical_genetics 0.640000 |
|
miscellaneous 0.692209 |
|
moral_disputes 0.667630 |
|
moral_scenarios 0.301676 |
|
nutrition 0.676471 |
|
philosophy 0.678457 |
|
prehistory 0.617284 |
|
professional_accounting 0.390071 |
|
professional_law 0.423729 |
|
professional_medicine 0.580882 |
|
professional_psychology 0.591503 |
|
public_relations 0.636364 |
|
security_studies 0.697959 |
|
sociology 0.771144 |
|
us_foreign_policy 0.800000 |
|
virology 0.445783 |
|
world_religions 0.748538 |
|
INFO: 2024-07-12 15:55:27,657: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.549844 |
|
humanities 0.631233 |
|
other (business, health, misc.) 0.612851 |
|
social sciences 0.687252 |
|
INFO: 2024-07-12 15:55:27,664: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.6202950081820329} |
|
INFO: 2024-07-12 15:55:27,746: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 15:55:27,760: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.658 0.392 0.467 0.870 0.509 0.608 0.160 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.620 0.579 |
|
INFO: 2024-07-12 15:56:03,507: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 809.54s |
|
INFO: 2024-07-12 15:56:03,510: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-07-12 15:56:03,543: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 2.6436302612451867, 'len': 0.9982892915927111, 'lcs': 0.9647170300253489} |
|
INFO: 2024-07-12 15:56:03,545: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [151645, 198, 271] |
|
INFO: 2024-07-12 15:56:03,545: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-07-12 15:56:08,301: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.76s |
|
INFO: 2024-07-12 16:04:07,299: llmtf.base.daru/treewayabstractive: Processing Dataset: 2383.44s |
|
INFO: 2024-07-12 16:04:07,304: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-07-12 16:04:07,339: llmtf.base.daru/treewayabstractive: {'rouge1': 0.34568224366963635, 'rouge2': 0.1241863962379739} |
|
INFO: 2024-07-12 16:04:07,344: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 16:04:07,353: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.651 0.235 0.392 0.467 0.870 0.509 0.608 0.160 0.965 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.620 0.579 |
|
INFO: 2024-07-12 16:06:16,094: llmtf.base.darumeru/cp_para_en: Processing Dataset: 607.79s |
|
INFO: 2024-07-12 16:06:16,097: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
|
INFO: 2024-07-12 16:06:16,101: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.3451848505406785, 'len': 0.9992281640193543, 'lcs': 0.9778434645601852} |
|
INFO: 2024-07-12 16:06:16,103: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-07-12 16:06:16,116: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruTiE darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.669 0.235 0.392 0.467 0.870 0.509 0.608 0.160 0.978 0.965 0.999 0.998 0.587 0.841 0.635 0.885 0.721 0.620 0.579 |
|
|