--- license: cc-by-sa-4.0 language: - tig metrics: - perplexity 4.58 base_model: - FacebookAI/xlm-roberta-base --- # Tigre-xlm-roberta-base This model is a multilingual masked language model (MLM) based on XLM-RoBERTa, fine-tuned on diverse linguistic data with a strong focus on **Tigre** (`tig_Ethi`). It supports multiple languages, including: - `[tig_Ethi]` Tigre - `[tir_Ethi]` Tigrinya - `[amh_Ethi]` Amharic - `[gez_Ethi]` Ge'ez - `[ara_Arab]` Arabic - `[eng_Latn]` English - `[deu_Latn]` German - `[swe_Latn]` Swedish - `[nno_Latn]`, `[nob_Latn]` Norwegian ## Usage The model is designed for **masked language modeling** — predicting missing words in a sentence. Use it to: - Fill in blanks in multilingual text - Study language understanding in low-resource languages - Build downstream tools for Tigre and related languages ### Example (Python) ```python from transformers import AutoTokenizer, XLMRobertaForMaskedLM tokenizer = AutoTokenizer.from_pretrained("BeitTigreAI/tigre-xlm-roberta-base") model = XLMRobertaForMaskedLM.from_pretrained("BeitTigreAI/tigre-xlm-roberta-base") text = "[tig_Ethi] መርሐበ ብኩም እት " inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits mask_token_index = (inputs.input_ids[0] == tokenizer.mask_token_id).nonzero().item() predicted_token_id = logits[0, mask_token_index].argmax(-1) predicted_word = tokenizer.decode([predicted_token_id]) print(predicted_word) # Output: likely a word like "ቤት" (home)