--- license: apache-2.0 datasets: - IVUL-KAUST/MOLE-plus metrics: - f1 base_model: - Qwen/Qwen2.5-0.5B-Instruct --- ## Model Description MeXtract 0.5B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 0.5B Instruct on synthetically generated dataset. Metadata attributes are defined using schema-based apporach where for each attribute we define the Type, min length and max lenght, and options if possible. ## Usage Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then ```python from schema import TextSchema from type_classes import * from search import extract class ExampleSchema(TextSchema): Name: Field(Str, 1, 5) Hobbies: Field(List[Str], 1, 1, ['Hiking', 'Swimming', 'Reading']) Age : Field(Int, 1, 100) Married: Field(Bool, 1, 1) text = """ My name is Zaid. I am 25 years old. I like swimming and reading. I am is married. """ metadata = extract( text, "IVUL-KAUST/MeXtract-0.5B", schema=ExampleSchema, backend = "transformers" ) print(metadata) ## {'Name': 'Zaid', 'Hobbies': ['Swimming'], 'Age': 25, 'Married': True} ``` ## Model Details - Developed by: IVUL at KAUST - Model type: The model is based on transformers as it was finetuned from Qwen2.5 - Language(s): languages supported in the model if it is an LLM - Datasets: we use synthetically generated dataset ## Evaluation Results The dataset is evaluated on the [MOLE+](https://huggingface.co/IVUL-KAUST/MOLE-plus). | **Model** | **ar** | **en** | **jp** | **fr** | **ru** | **multi** | **model** | **Average** | | ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- | | **Falcon3 3B Instruct** | 20.46 | 16.30 | 20.29 | 17.81 | 17.23 | 16.13 | 15.96 | 17.74 | | **Llama3.2 3B Instruct** | 28.77 | 25.17 | 33.14 | 27.73 | 22.21 | 22.58 | 33.37 | 27.57 | | **Gemma 3 4B It** | 44.88 | 46.50 | 48.46 | 43.85 | 46.06 | 42.05 | 56.04 | 46.83 | | **Qwen2.5 3B Instruct** | 49.99 | 56.72 | 61.13 | 57.08 | 64.10 | 52.07 | 59.05 | 57.16 | | **MOLE 3B** | 23.03 | 50.88 | 50.83 | 50.05 | 57.72 | 43.34 | 17.17 | 41.86 | | **Nuextract 2.0 4B** | 44.61 | 43.57 | 43.82 | 48.96 | 47.78 | 40.14 | 49.90 | 45.54 | | **Nuextract 2.0 8B** | 51.93 | 58.93 | 62.11 | 58.41 | 63.21 | 38.21 | 53.70 | 55.21 | | **MeXtract 0.5B** | 65.96 | 69.95 | 73.79 | 68.42 | 72.07 | 68.20 | 32.41 | 64.40 | | **MeXtract 1.5B** | 67.06 | 73.71 | 75.08 | 71.57 | 76.28 | 71.87 | 52.05 | 69.66 | | **MeXtract 3B** | **70.81** | **78.02** | **78.32** | **72.87** | **77.51** | **74.92** | **60.18** | **73.23** | ## Use and Limitations ### Limitations and Bias the model is optimized for metadata extraction, it might not work for regular NLP tasks. ## License the model is licensed under Apache 2.0 ## Citation @misc{mextract, title={MeXtract: Light-Weight Metadata Extraction from Scientific Papers}, author={Zaid Alyafeai and Maged S. Al-Shaibani and Bernard Ghanem}, year={2025}, eprint={2510.06889}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.06889}, }