Tigre-xlm-roberta-base

This model is a multilingual masked language model (MLM) based on XLM-RoBERTa, fine-tuned on diverse linguistic data with a strong focus on Tigre (tig_Ethi). It supports multiple languages, including:

  • [tig_Ethi] Tigre
  • [tir_Ethi] Tigrinya
  • [amh_Ethi] Amharic
  • [gez_Ethi] Ge'ez
  • [ara_Arab] Arabic
  • [eng_Latn] English
  • [deu_Latn] German
  • [swe_Latn] Swedish
  • [nno_Latn], [nob_Latn] Norwegian

Usage

The model is designed for masked language modeling — predicting missing words in a sentence. Use it to:

  • Fill in blanks in multilingual text
  • Study language understanding in low-resource languages
  • Build downstream tools for Tigre and related languages

Example (Python)

from transformers import AutoTokenizer, XLMRobertaForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("BeitTigreAI/tigre-xlm-roberta-base")
model = XLMRobertaForMaskedLM.from_pretrained("BeitTigreAI/tigre-xlm-roberta-base")

text = "[tig_Ethi] መርሐበ ብኩም እት <mask>"

inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

mask_token_index = (inputs.input_ids[0] == tokenizer.mask_token_id).nonzero().item()
predicted_token_id = logits[0, mask_token_index].argmax(-1)
predicted_word = tokenizer.decode([predicted_token_id])

print(predicted_word)  # Output: likely a word like "ቤት" (home)
Downloads last month
4
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BeitTigreAI/tigre-xlm-roberta-base

Finetuned
(3461)
this model