Wrong classification
I tested this model and discovered that it fails with certain rare inputs. Here’s how to reproduce the problem:
Input text: "the man with red box"
Expected output: Neutral
Actual output: Toxic
Does anyone know what might be causing this issue?
Hi!
Thank you for pointing out to us such an interesting case. Could you please also let us know what are the probabilities of both classes that you are getting? It indeed might be some strange borderline case.
While this model should perform well in English at least, we have other better and bigger model that showed better results: https://huggingface.co/textdetox/xlmr-large-toxicity-classifier. If you can check as well this bigger model and share with us the insights, it would be highly appreciated.
Best,
Daryna
Hi Daryna,
Thank you for your response. Here are the probabilities for both classes:
probabilities with torch = tensor([[0.0382, 0.9618]], grad_fn=<SoftmaxBackward0>)
Text: the man with red box
Probabilities: [0.03824055194854736, 0.9617594480514526]
Prediction: toxic
Here is the sample code I used to check:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')
model = AutoModelForSequenceClassification.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')
text = "the man with red box"
batch = tokenizer.encode(text, return_tensors="pt")
# Run through model
output = model(batch)
# Get the predicted logits
logits = output.logits
# Get probabilities
probs = torch.softmax(output.logits, dim=-1)
print(f'probabilities with torch = {probs}')
# Predicted class index
pred_idx = torch.argmax(probs, dim=-1).item()
# Map index to label
labels = ["neutral", "toxic"]
pred_label = labels[pred_idx]
print(f"Text: {text}")
print(f"Probabilities: {probs.tolist()[0]}")
print(f"Prediction: {pred_label}")
I tested the classifier available at https://huggingface.co/textdetox/xlmr-large-toxicity-classifier, and it worked well for this case. Below is the output for XLMR model:
probabilities with torch = tensor([[0.9965, 0.0035]], grad_fn=<SoftmaxBackward0>)
Text: the man with red box
Probabilities: [0.9964614510536194, 0.0035385973751544952]
Prediction: neutral
Hi!
Thank you for your such detailed answer.
Indeed, we can see as bert itself is smaller model than roberta, thus their multilingual versions -- bert-multilingual and xlml -- have the same behaviour. Thus, it just means that Bert-multilingual was not so robust for the multilingual fine-tuning and, then, can be 'adversarially' attacked even with simple examples. XLMR, moreover, it is large version, was way more stable and one of the reasons, simply, the larger amount of parameters.
I am not aware about your application, but better to use xlmd-large instance.
Best,
Daryna