Wrong classification

by hungmol - opened 5 days ago

Discussion

hungmol

5 days ago

I tested this model and discovered that it fails with certain rare inputs. Here’s how to reproduce the problem:

Input text: "the man with red box"
Expected output: Neutral
Actual output: Toxic

Does anyone know what might be causing this issue?

dardem

Multilingual Text Detoxification org 5 days ago

Hi!

Thank you for pointing out to us such an interesting case. Could you please also let us know what are the probabilities of both classes that you are getting? It indeed might be some strange borderline case.

While this model should perform well in English at least, we have other better and bigger model that showed better results: https://huggingface.co/textdetox/xlmr-large-toxicity-classifier. If you can check as well this bigger model and share with us the insights, it would be highly appreciated.

Best,
Daryna

hungmol

2 days ago

Hi Daryna,

Thank you for your response. Here are the probabilities for both classes:

probabilities with torch = tensor([[0.0382, 0.9618]], grad_fn=<SoftmaxBackward0>)
Text: the man with red box
Probabilities: [0.03824055194854736, 0.9617594480514526]
Prediction: toxic

Here is the sample code I used to check:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')
model = AutoModelForSequenceClassification.from_pretrained('textdetox/bert-multilingual-toxicity-classifier')

text = "the man with red box"
batch = tokenizer.encode(text, return_tensors="pt")

# Run through model
output = model(batch)

# Get the predicted logits
logits = output.logits

# Get probabilities
probs = torch.softmax(output.logits, dim=-1)
print(f'probabilities with torch = {probs}')

# Predicted class index
pred_idx = torch.argmax(probs, dim=-1).item()

# Map index to label
labels = ["neutral", "toxic"]
pred_label = labels[pred_idx]

print(f"Text: {text}")
print(f"Probabilities: {probs.tolist()[0]}")
print(f"Prediction: {pred_label}")

I tested the classifier available at https://huggingface.co/textdetox/xlmr-large-toxicity-classifier, and it worked well for this case. Below is the output for XLMR model:

probabilities with torch = tensor([[0.9965, 0.0035]], grad_fn=<SoftmaxBackward0>)
Text: the man with red box
Probabilities: [0.9964614510536194, 0.0035385973751544952]
Prediction: neutral

dardem

Multilingual Text Detoxification org 2 days ago

Hi!

Thank you for your such detailed answer.

Indeed, we can see as bert itself is smaller model than roberta, thus their multilingual versions -- bert-multilingual and xlml -- have the same behaviour. Thus, it just means that Bert-multilingual was not so robust for the multilingual fine-tuning and, then, can be 'adversarially' attacked even with simple examples. XLMR, moreover, it is large version, was way more stable and one of the reasons, simply, the larger amount of parameters.

I am not aware about your application, but better to use xlmd-large instance.

Best,
Daryna

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment