MAMUT-Bert (Math Mutator BERT)
MAMUT-BERT is a pretrained language model based on bert-base-cased, further pretrained on mathematical texts and formulas. It was introduced in MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training.
The model aims to provide improved mathematical understanding by extending BERT with domain-specific knowledge from mathematical LaTeX formulas and terminology.
Model Details
Overview
MAMUT-BERT was pretrained on four math-specific tasks across four datasets.
- Mathematical Formulas (MF): A Masked Language Modeling (MLM) task on math formulas written in LaTeX.
- Mathematical Texts (MT): An MLM task on natural language text containing inline LaTeX math (mathematical texts). The masking probability was biased toward mathematical tokens (inside math environment $...$) and domain-specific terms (e.g., sum, one, ...)
- Named Math Formulas (NMF): A Next-Sentence-Prediction (NSP)-style task: given a formula and the name of a mathematical identity (e.g., Pythagorean Theorem), classify whether they match.
- Math Formula Retrieval (MFR): Another NSP-style task to decide if two formulas describe the same mathematical identity or concept.
To support mathematical syntax, 300 additional mathematical LaTeX-specific tokens were added to the tokenizer, e.g., \sum
, \frac
, and pmatrix
.
Model Sources
- Base Model: bert-base-cased
- Pretraining Code: aieng-lab/transformer-math-pretraining
- MAMUT Repository: aieng-lab/math-mutator
- Paper: MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training
Uses
MAMUT-BERT is intended for downstream tasks that require improved mathematical understanding, such as:
- Formula classification
- Retrieval of semantically similar formulas
- Math-related question answering
Note: This model was saved without the MLM or NSP heads and requires fine-tuning before use in downstream tasks.
Similarly trained models are MAMUT-MathBERT based on tbs17/MathBERT
and MAMUT-MPBERT based on AnReu/math_structure_bert
(best of the three models according to our evaluation).
Training Details
Training configurations are described in Appendix C of the MAMUT paper.
Evaluation
The model is evaluated in Section 7 and Appendix C.4 of the MAMUT paper (MAMUT-BERT).
Environmental Impact
- Hardware Type: 8xA100
- Hours used: 48
- Compute Region: Germany
Citation
BibTeX:
@article{
drechsel2025mamut,
title={{MAMUT}: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training},
author={Jonathan Drechsel and Anja Reusch and Steffen Herbold},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=khODmRpQEx}
}
- Downloads last month
- 10
Model tree for aieng-lab/bert-base-cased-mamut
Base model
google-bert/bert-base-cased