Model Description
MeXtract 1.5B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 1.5B Instruct on synthetically generated dataset. Metadata attributes are defined using schema-based apporach where for each attribute we define the Type, min length and max lenght, and options if possible.
Usage
Follow the instructions from MeXtract to install all the dendencies then
from schema import TextSchema
from type_classes import *
from search import extract
class ExampleSchema(TextSchema):
Name: Field(Str, 1, 5)
Hobbies: Field(List[Str], 1, 1, ['Hiking', 'Swimming', 'Reading'])
Age : Field(Int, 1, 100)
Married: Field(Bool, 1, 1)
text = """
My name is Zaid. I am 25 years old. I like swimming and reading. I am is married.
"""
metadata = extract(
text, "IVUL-KAUST/MeXtract-1.5B", schema=ExampleSchema, backend = "transformers"
)
print(metadata)
## {'Name': 'Zaid', 'Hobbies': ['Swimming'], 'Age': 25, 'Married': True}
Model Details
- Developed by: IVUL at KAUST
- Model type: The model is based on transformers as it was finetuned from Qwen2.5
- Language(s): languages supported in the model if it is an LLM
- Datasets: we use synthetically generated dataset
Evaluation Results
The dataset is evaluated on the MOLE+.
Model | ar | en | jp | fr | ru | multi | model | Average |
---|---|---|---|---|---|---|---|---|
Falcon3 3B Instruct | 20.46 | 16.30 | 20.29 | 17.81 | 17.23 | 16.13 | 15.96 | 17.74 |
Llama3.2 3B Instruct | 28.77 | 25.17 | 33.14 | 27.73 | 22.21 | 22.58 | 33.37 | 27.57 |
Gemma 3 4B It | 44.88 | 46.50 | 48.46 | 43.85 | 46.06 | 42.05 | 56.04 | 46.83 |
Qwen2.5 3B Instruct | 49.99 | 56.72 | 61.13 | 57.08 | 64.10 | 52.07 | 59.05 | 57.16 |
MOLE 3B | 23.03 | 50.88 | 50.83 | 50.05 | 57.72 | 43.34 | 17.17 | 41.86 |
Nuextract 2.0 4B | 44.61 | 43.57 | 43.82 | 48.96 | 47.78 | 40.14 | 49.90 | 45.54 |
Nuextract 2.0 8B | 51.93 | 58.93 | 62.11 | 58.41 | 63.21 | 38.21 | 53.70 | 55.21 |
MeXtract 0.5B | 65.96 | 69.95 | 73.79 | 68.42 | 72.07 | 68.20 | 32.41 | 64.40 |
MeXtract 1.5B | 67.06 | 73.71 | 75.08 | 71.57 | 76.28 | 71.87 | 52.05 | 69.66 |
MeXtract 3B | 70.81 | 78.02 | 78.32 | 72.87 | 77.51 | 74.92 | 60.18 | 73.23 |
Use and Limitations
Limitations and Bias
the model is optimized for metadata extraction, it might not work for regular NLP tasks.
License
the model is licensed under Apache 2.0
Citation
@misc{mextract, title={MeXtract: Light-Weight Metadata Extraction from Scientific Papers}, author={Zaid Alyafeai and Maged S. Al-Shaibani and Bernard Ghanem}, year={2025}, eprint={2510.06889}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.06889}, }
- Downloads last month
- 8