fabert

File size: 5,779 Bytes

---
language:
- fa
library_name: transformers
widget:
  - text: "ز سوزناکی گفتار من [MASK] بگریست"
    example_title: "Poetry 1"
  - text: "نظر از تو برنگیرم همه [MASK] تا بمیرم که تو در دلم نشستی و سر مقام داری"
    example_title: "Poetry 2"
  - text: "هر ساعتم اندرون بجوشد [MASK] را وآگاهی نیست مردم بیرون را"
    example_title: "Poetry 3"
  - text: "غلام همت آن رند عافیت سوزم که در گدا صفتی [MASK] داند"
    example_title: "Poetry 4"
  - text: "این [MASK] اولشه."
    example_title: "Informal 1"
  - text: "دیگه خسته شدم! [MASK] اینم شد کار؟!"
    example_title: "Informal 2"
  - text: "فکر نکنم به موقع برسیم. بهتره [MASK] این یکی بشیم."
    example_title: "Informal 3"
  - text: "تا صبح بیدار موندم و داشتم برای [MASK] آماده می شدم."
    example_title: "Informal 4"
  - text: "زندگی بدون [MASK] خسته‌کننده است."
    example_title: "Formal 1"
  - text: "در حکم اولیه این شرکت مجاز به فعالیت شد ولی پس از بررسی مجدد، مجوز این شرکت [MASK] شد."
    example_title: "Formal 2"
---


# FaBERT: Pre-training BERT on Persian Blogs

## Model Details

FaBERT is a Persian BERT-base model trained on the diverse HmBlogs corpus, encompassing both casual and formal Persian texts. Developed for natural language processing tasks, FaBERT is a robust solution for processing Persian text. Through evaluation across various Natural Language Understanding (NLU) tasks, FaBERT consistently demonstrates notable improvements, while having a compact model size. Now available on Hugging Face, integrating FaBERT into your projects is hassle-free. Experience enhanced performance without added complexity as FaBERT tackles a variety of NLP tasks.

## Features
- Pre-trained on the diverse HmBlogs corpus consisting more than 50 GB of text from Persian Blogs
- Remarkable performance across various downstream NLP tasks
- BERT architecture with 124 million parameters

## Useful Links
- **Repository:** [FaBERT on Github](https://github.com/SBU-NLP-LAB/FaBERT)
- **Paper:** [ACL Anthology](https://aclanthology.org/2025.wnut-1.10/)

## Usage

### Loading the Model with MLM head

```python
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("sbunlp/fabert") # make sure to use the default fast tokenizer
model = AutoModelForMaskedLM.from_pretrained("sbunlp/fabert")
```
### Downstream Tasks

Similar to the original English BERT, FaBERT can be fine-tuned on many downstream tasks.(https://huggingface.co/docs/transformers/en/training) 

Examples on Persian datasets are available in our [GitHub repository](#useful-links).

**make sure to use the default Fast Tokenizer**

## Training Details

FaBERT was pre-trained with the MLM (WWM) objective, and the resulting perplexity on validation set was 7.76.

| Hyperparameter    | Value        |
|-------------------|:--------------:|
| Batch Size        | 32           |
| Optimizer         | Adam         |
| Learning Rate     | 6e-5         |
| Weight Decay      | 0.01         |
| Total Steps       | 18 Million    |
| Warmup Steps      | 1.8 Million   |
| Precision Format  | TF32          |

## Evaluation

Here are some key performance results for the FaBERT model:

**Sentiment Analysis**
| Task         | FaBERT | ParsBERT | XLM-R |
|:-------------|:------:|:--------:|:-----:|
| MirasOpinion | **87.51**      | 86.73     | 84.92  |
| MirasIrony | 74.82      | 71.08     | **75.51**  |
| DeepSentiPers | **79.85**      | 74.94     | 79.00  |

**Named Entity Recognition**
| Task         | FaBERT | ParsBERT | XLM-R |
|:-------------|:------:|:--------:|:-----:|
| PEYMA        |   **91.39**    |   91.24   | 90.91  |
| ParsTwiner   |   **82.22**    |  81.13   | 79.50  |
| MultiCoNER v2   |   57.92    |   **58.09**   | 51.47  |

**Question Answering**
| Task         | FaBERT | ParsBERT | XLM-R |
|:-------------|:------:|:--------:|:-----:|
| ParsiNLU | **55.87**      | 44.89     | 42.55  |
| PQuAD  | 87.34      | 86.89     | **87.60**  |
| PCoQA  | **53.51**      | 50.96     | 51.12  |

**Natural Language Inference & QQP**
| Task         | FaBERT | ParsBERT | XLM-R |
|:-------------|:------:|:--------:|:-----:|
| FarsTail | **84.45**      | 82.52     | 83.50  |
| SBU-NLI | **66.65**      | 58.41     | 58.85  |
| ParsiNLU QQP | **82.62**      | 77.60     | 79.74  |

**Number of Parameters**
|          | FaBERT | ParsBERT | XLM-R |
|:-------------|:------:|:--------:|:-----:|
| Parameter Count (M) | 124      | 162     | 278  |
| Vocabulary Size (K) | 50      | 100     | 250  |

For a more detailed performance analysis refer to the paper.

## How to Cite

If you use FaBERT in your research or projects, please cite it using the following BibTeX:

```bibtex
@inproceedings{masumi-etal-2025-fabert,
    title = "{F}a{BERT}: Pre-training {BERT} on {P}ersian Blogs",
    author = "Masumi, Mostafa  and
      Majd, Seyed Soroush  and
      Shamsfard, Mehrnoush  and
      Beigy, Hamid",
    editor = "Bak, JinYeong  and
      Goot, Rob van der  and
      Jang, Hyeju  and
      Buaphet, Weerayut  and
      Ramponi, Alan  and
      Xu, Wei  and
      Ritter, Alan",
    booktitle = "Proceedings of the Tenth Workshop on Noisy and User-generated Text",
    month = may,
    year = "2025",
    address = "Albuquerque, New Mexico, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.wnut-1.10/",
    doi = "10.18653/v1/2025.wnut-1.10",
    pages = "85--96",
    ISBN = "979-8-89176-232-9",
}
```