|
--- |
|
language: |
|
- en |
|
base_model: |
|
- answerdotai/ModernBERT-base |
|
pipeline_tag: text-classification |
|
tags: |
|
- text |
|
- text classification |
|
- LLM |
|
- LLM text detection |
|
- Detection |
|
- detector |
|
--- |
|
# LLM_Detector_Preview_model |
|
|
|
**Preview release of an LLM-generated text detector.** |
|
|
|
## Model Description |
|
This model is designed to classify text as Human, Mixed, or AI-generated. It is based on a sequence classification architecture and was trained on a mix of human and AI-generated texts. The model can be used for document, sentence, and token-level analysis. |
|
|
|
- **Architecture:** ModernBERT (or compatible Transformer) |
|
- **Labels:** |
|
- 0: Human |
|
- 1: Mixed |
|
- 2: AI |
|
|
|
## Intended Use |
|
- **For research and curiosity only.** |
|
- Not for academic, legal, medical, or high-stakes use. |
|
- Results are easy to bypass and may be unreliable. |
|
|
|
## Limitations & Warnings |
|
- This model is **experimental** and not clinically accurate. |
|
- It can produce false positives and false negatives. |
|
- Simple paraphrasing or editing can fool the detector. |
|
- Do not use for academic integrity, hiring, or legal decisions. |
|
|
|
## How It Works |
|
The model analyzes text and predicts the likelihood of it being human-written, mixed, or AI-generated. It uses statistical patterns learned from training data, but these patterns are not foolproof and can be circumvented. |
|
|
|
## Example Usage |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('Donnyed/LLM_Detector_Preview_model') |
|
model = AutoModelForSequenceClassification.from_pretrained('Donnyed/LLM_Detector_Preview_model') |
|
|
|
text = "Paste your text here." |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
probs = torch.softmax(outputs.logits, dim=1) |
|
pred = torch.argmax(probs, dim=1).item() |
|
print('Prediction:', pred) |
|
print('Probabilities:', probs) |
|
``` |
|
|
|
## Files Included |
|
- `model.safetensors` β Model weights |
|
- `config.json` β Model configuration |
|
- `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json` β Tokenizer files |