Safetensors
qwen2

Model Description

MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct on synthetically generated dataset. Metadata attributes are defined using schema-based apporach where for each attribute we define the Type, min length and max lenght, and options if possible.

Usage

Follow the instructions from MeXtract to install all the dendencies then

from schema import TextSchema
from type_classes import *
from search import extract


class ExampleSchema(TextSchema):
    Name: Field(Str, 1, 5)
    Hobbies: Field(List[Str], 1, 1, ['Hiking', 'Swimming', 'Reading'])
    Age : Field(Int, 1, 100)
    Married: Field(Bool, 1, 1)

text = """
My name is Zaid. I am 25 years old. I like swimming and reading. I am is married. 
"""
metadata = extract(
    text, "IVUL-KAUST/MeXtract-3B", schema=ExampleSchema, backend = "transformers"   
)
print(metadata)

## {'Name': 'Zaid', 'Hobbies': ['Swimming'], 'Age': 25, 'Married': True}

Model Details

  • Developed by: IVUL at KAUST
  • Model type: The model is based on transformers as it was finetuned from Qwen2.5
  • Language(s): languages supported in the model if it is an LLM
  • Datasets: we use synthetically generated dataset

Evaluation Results

The dataset is evaluated on the MOLE+.

Model ar en jp fr ru multi model Average
Falcon3 3B Instruct 20.46 16.30 20.29 17.81 17.23 16.13 15.96 17.74
Llama3.2 3B Instruct 28.77 25.17 33.14 27.73 22.21 22.58 33.37 27.57
Gemma 3 4B It 44.88 46.50 48.46 43.85 46.06 42.05 56.04 46.83
Qwen2.5 3B Instruct 49.99 56.72 61.13 57.08 64.10 52.07 59.05 57.16
MOLE 3B 23.03 50.88 50.83 50.05 57.72 43.34 17.17 41.86
Nuextract 2.0 4B 44.61 43.57 43.82 48.96 47.78 40.14 49.90 45.54
Nuextract 2.0 8B 51.93 58.93 62.11 58.41 63.21 38.21 53.70 55.21
MeXtract 0.5B 65.96 69.95 73.79 68.42 72.07 68.20 32.41 64.40
MeXtract 1.5B 67.06 73.71 75.08 71.57 76.28 71.87 52.05 69.66
MeXtract 3B 70.81 78.02 78.32 72.87 77.51 74.92 60.18 73.23

Use and Limitations

Limitations and Bias

the model is optimized for metadata extraction, it might not work for regular NLP tasks.

License

the model is licensed under Apache 2.0

Citation

@misc{mextract,
      title={MeXtract: Light-Weight Metadata Extraction from Scientific Papers}, 
      author={Zaid Alyafeai and Maged S. Al-Shaibani and Bernard Ghanem},
      year={2025},
      eprint={2510.06889},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.06889}, 
}
Downloads last month
33
Safetensors
Model size
3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IVUL-KAUST/MeXtract-3B

Base model

Qwen/Qwen2.5-3B
Finetuned
(787)
this model
Quantizations
1 model

Dataset used to train IVUL-KAUST/MeXtract-3B

Collection including IVUL-KAUST/MeXtract-3B