ToxicBERT / README.md

Update README.md

6d13add verified 24 days ago

6.28 kB

	---
	license: mit
	datasets:
	- Overfit-GM/turkish-toxic-language
	language:
	- tr
	base_model:
	- dbmdz/bert-base-turkish-cased
	pipeline_tag: text-classification
	library_name: transformers
	tags:
	- text-classification
	- toxicity-detection
	- turkish
	- bert
	- nlp
	- content-moderation
	---

	# MeowML/ToxicBERT - Turkish Toxic Language Detection

	## Model Description

	ToxicBERT is a fine-tuned BERT model specifically designed for detecting toxic language in Turkish text. Built upon the `dbmdz/bert-base-turkish-cased` foundation model, this classifier can identify potentially harmful, offensive, or toxic content in Turkish social media posts, comments, and general text.

	## Model Details

	- Model Type: Text Classification (Binary)
	- Language: Turkish (tr)
	- Base Model: `dbmdz/bert-base-turkish-cased`
	- License: MIT
	- Library: Transformers
	- Task: Toxicity Detection

	## Intended Use

	### Primary Use Cases
	- Content moderation for Turkish social media platforms
	- Automated filtering of user-generated content
	- Research in Turkish NLP and toxicity detection
	- Educational purposes for understanding toxic language patterns

	### Out-of-Scope Use
	- This model should not be used as the sole decision-maker for content moderation without human oversight
	- Not suitable for languages other than Turkish
	- Should not be used for sensitive applications without proper validation and testing

	## Training Data

	The model was trained on the `Overfit-GM/turkish-toxic-language` dataset, which contains Turkish text samples labeled for toxicity. The dataset includes various forms of toxic content commonly found in online Turkish communications.

	## Model Performance

	The model outputs:
	- Binary Classification: 0 (Non-toxic) or 1 (Toxic)
	- Confidence Score: Probability score indicating model confidence
	- Toxic Probability: Specific probability of the text being toxic

	## Usage

	### Quick Start

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
	model = AutoModelForSequenceClassification.from_pretrained("MeowML/ToxicBERT")

	# Prepare text
	text = "Merhaba, nasılsın?"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)

	# Get prediction
	with torch.no_grad():
	outputs = model(**inputs)
	probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
	prediction = torch.argmax(probabilities, dim=-1)

	toxic_probability = probabilities[0][1].item()
	is_toxic = bool(prediction.item())

	print(f"Is toxic: {is_toxic}")
	print(f"Toxic probability: {toxic_probability:.4f}")
	```

	### Advanced Usage with Custom Class

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	class ToxicLanguageDetector:
	def __init__(self, model_name="MeowML/ToxicBERT"):
	self.tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-turkish-cased")
	self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
	self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
	self.model.to(self.device)
	self.model.eval()

	def predict(self, text):
	inputs = self.tokenizer(
	text,
	truncation=True,
	padding='max_length',
	max_length=256,
	return_tensors='pt'
	).to(self.device)

	with torch.no_grad():
	outputs = self.model(**inputs)
	probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
	prediction = torch.argmax(probabilities, dim=-1)

	return {
	'text': text,
	'is_toxic': bool(prediction.item()),
	'toxic_probability': probabilities[0][1].item(),
	'confidence': max(probabilities[0]).item()
	}

	# Usage
	detector = ToxicLanguageDetector()
	result = detector.predict("Merhaba, nasılsın?")
	print(result)
	```

	## Limitations and Biases

	### Limitations
	- The model's performance depends heavily on the training data quality and coverage
	- May have difficulty with context-dependent toxicity (sarcasm, irony)
	- Performance may vary across different Turkish dialects or informal language
	- Shorter texts might be more challenging to classify accurately

	### Potential Biases
	- The model may reflect biases present in the training dataset
	- Certain topics, demographics, or linguistic patterns might be over- or under-represented
	- Regular evaluation and bias testing are recommended for production use

	## Ethical Considerations

	- This model should be used responsibly with human oversight
	- False positives and negatives are expected and should be accounted for
	- Consider the impact on freedom of expression when implementing automated moderation
	- Regular auditing and updating are recommended to maintain fairness

	## Technical Specifications

	- Input: Text strings (max 256 tokens)
	- Output: Binary classification with probability scores
	- Model Size: Based on BERT-base architecture
	- Inference Speed: Optimized for both CPU and GPU inference
	- Memory Requirements: Suitable for standard hardware configurations

	## Citation

	If you use this model in your research or applications, please cite:

	```bibtex
	@misc{meowml_toxicbert_2024,
	title={ToxicBERT: Turkish Toxic Language Detection},
	author={MeowML},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/MeowML/ToxicBERT}
	}
	```

	## Acknowledgments

	- Base model: `dbmdz/bert-base-turkish-cased`
	- Training dataset: `Overfit-GM/turkish-toxic-language`
	- Built with Hugging Face Transformers library

	## Contact

	For questions, issues, or suggestions, please open an issue in the model repository or contact the MeowML team.

	---

	Disclaimer: This model is provided for research and educational purposes. Users are responsible for ensuring appropriate and ethical use in their applications.