
LAMB (LAtin ModernBERT) is a Latin encoder-only model based on the ModernBERT architecture, pre-trained on nearly 24B Latin tokens, and ready for use with any Latin orthography.
Features
Usage
Predicting Masked Tokens
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_id = "aimgo/LAMB"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
text = "et ecce tu eras [MASK] me et ego foris, et ibi te quaerebam"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Input:", text)
print("Predicted:", predicted_token)
If you use this in your work, please cite:
@misc{mccarthy2025LAMB,
author = {McCarthy, A. M.},
title = {{LAMB}: A Modern Masked Language Model for Latin},
year = {2025},
howpublished = {\url{https://huggingface.co/aimgo/LAMB}},
note = {Model}
}
Model tree for aimgo/LAMB
Base model
answerdotai/ModernBERT-base