textdetox
/

twitter-xlmr-toxicity-classifier

Text Classification

Model card Files Files and versions

twitter-xlmr-toxicity-classifier / README.md

dardem's picture

Update README.md

82cbc7b verified 6 months ago

|

history blame contribute delete

1.9 kB

	---
	library_name: transformers
	language:
	- en
	- fr
	- it
	- es
	- ru
	- uk
	- tt
	- ar
	- hi
	- ja
	- zh
	- he
	- am
	- de
	license: openrail++
	datasets:
	- textdetox/multilingual_toxicity_dataset
	metrics:
	- f1
	base_model:
	- cardiffnlp/twitter-xlm-roberta-large-2022
	pipeline_tag: text-classification
	---

	## Multilingual Toxicity Classifier for 15 Languages (2025)

	This is an instance of [cardiffnlp/twitter-xlm-roberta-large-2022](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-large-2022) that was fine-tuned on binary toxicity classification task based on our updated (2025) dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).

	Now, the models covers 15 languages from various language families:

	\| Language \| Code \| F1 Score \|
	\|-----------\|------\|---------\|
	\| English \| en \| 0.9071 \|
	\| Russian \| ru \| 0.9022 \|
	\| Ukrainian \| uk \| 0.9075 \|
	\| German \| de \| 0.6528 \|
	\| Spanish \| es \| 0.7430 \|
	\| Arabic \| ar \| 0.6207 \|
	\| Amharic \| am \| 0.6676 \|
	\| Hindi \| hi \| 0.7171 \|
	\| Chinese \| zh \| 0.6483 \|
	\| Italian \| it \| 0.7597 \|
	\| French \| fr \| 0.9114 \|
	\| Hinglish \| hin \| 0.7051 \|
	\| Hebrew \| he \| 0.8911 \|
	\| Japanese \| ja \| 0.8725 \|
	\| Tatar \| tt \| 0.6542 \|

	## How to use

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained('textdetox/twitter-xlmr-toxicity-classifier')
	model = AutoModelForSequenceClassification.from_pretrained('textdetox/twitter-xlmr-toxicity-classifier')

	batch = tokenizer.encode("You are amazing!", return_tensors="pt")

	output = model(batch)
	# idx 0 for neutral, idx 1 for toxic
	```

	## Citation
	The model is prepared for [TextDetox 2025 Shared Task](https://pan.webis.de/clef25/pan25-web/text-detoxification.html) evaluation.

	Citation TBD soon.