| | --- |
| | language: |
| | - en |
| | pipeline_tag: text-classification |
| | library_name: peft |
| | base_model: microsoft/deberta-v3-large |
| | datasets: |
| | - stealthcode/ai-detection |
| | tags: |
| | - lora |
| | - ai-detection |
| | - binary-classification |
| | - deberta-v3-large |
| | metrics: |
| | - accuracy |
| | - f1 |
| | - auroc |
| | - average_precision |
| | model-index: |
| | - name: AI Detector LoRA (DeBERTa-v3-large) |
| | results: |
| | - task: |
| | type: text-classification |
| | name: AI Text Detection |
| | dataset: |
| | name: stealthcode/ai-detection |
| | type: stealthcode/ai-detection |
| | metrics: |
| | - type: auroc |
| | value: 0.9985 |
| | - type: f1 |
| | value: 0.9812 |
| | - type: accuracy |
| | value: 0.9814 |
| | --- |
| | |
| | # AI Detector LoRA (DeBERTa-v3-large) |
| |
|
| | LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples |
| | (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model. |
| |
|
| | - **Base model:** `microsoft/deberta-v3-large` |
| | - **Task:** Binary classification (AI vs Human) |
| | - **Head:** Single-logit + `BCEWithLogitsLoss` |
| | - **Adapter type:** LoRA (`peft`) |
| | - **Hardware:** 8 x RTX 5090, bf16, multi-GPU |
| | - **Final decision threshold:** **0.8697** (max-F1 on calibration set) |
| |
|
| | --- |
| |
|
| | ## Files in this repo |
| |
|
| | - `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)` |
| | - `merged_model/` – fully merged model (base + LoRA) for standalone use |
| | - `threshold.json` – chosen deployment threshold and validation F1 |
| | - `calibration.json` – temperature scaling parameters and calibration metrics |
| | - `results.json` – hyperparameters, validation threshold search, test metrics |
| | - `training_log_history.csv` – raw Trainer log history |
| | - `predictions_calib.csv` – calibration-set probabilities and labels |
| | - `predictions_test.csv` – test probabilities and labels |
| | - `figures/` – training and evaluation plots |
| | - `README.md` – this file |
| |
|
| | --- |
| |
|
| | ## Metrics (test set, n=279,241) |
| |
|
| | Using threshold **0.8697**: |
| |
|
| | | Metric | Value | |
| | | ---------------------- | ------ | |
| | | AUROC | 0.9985 | |
| | | Average Precision (AP) | 0.9985 | |
| | | F1 | 0.9812 | |
| | | Accuracy | 0.9814 | |
| | | Precision (AI) | 0.9902 | |
| | | Recall (AI) | 0.9724 | |
| | | Precision (Human) | 0.9728 | |
| | | Recall (Human) | 0.9904 | |
| |
|
| | Confusion matrix (test): |
| |
|
| | - **True Negatives (Human correctly)**: 138,276 |
| | - **False Positives (Human → AI)**: 1,345 |
| | - **False Negatives (AI → Human)**: 3,859 |
| | - **True Positives (AI correctly)**: 135,761 |
| |
|
| | ### Calibration |
| |
|
| | - **Method:** temperature scaling |
| | - **Temperature (T):** 1.4437 |
| | - **Calibration set:** calibration |
| | - Test ECE: 0.0075 → 0.0116 (after calibration) |
| | - Test Brier: 0.0157 → 0.0156 (after calibration) |
| |
|
| | --- |
| |
|
| | ## Plots |
| |
|
| | ### Training & validation |
| |
|
| | - Learning curves: |
| |
|
| |  |
| |
|
| | - Eval metrics over time: |
| |
|
| |  |
| |
|
| | ### Validation set |
| |
|
| | - ROC: |
| |
|
| |  |
| |
|
| | - Precision–Recall: |
| |
|
| |  |
| |
|
| | - Calibration curve: |
| |
|
| |  |
| |
|
| | - F1 vs threshold: |
| |
|
| |  |
| |
|
| | ### Test set |
| |
|
| | - ROC: |
| |
|
| |  |
| |
|
| | - Precision–Recall: |
| |
|
| |  |
| |
|
| | - Calibration curve: |
| |
|
| |  |
| |
|
| | - Confusion matrix: |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | ## Usage |
| |
|
| | ### Load base + LoRA adapter |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | from peft import PeftModel |
| | import torch |
| | import json |
| | |
| | base_model_id = "microsoft/deberta-v3-large" |
| | adapter_id = "stealthcode/ai-detection" # or local: "./adapter" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
| | |
| | base_model = AutoModelForSequenceClassification.from_pretrained( |
| | base_model_id, |
| | num_labels=1, # single logit for BCEWithLogitsLoss |
| | ) |
| | model = PeftModel.from_pretrained(base_model, adapter_id) |
| | model.eval() |
| | ``` |
| |
|
| | ### Inference with threshold |
| |
|
| | ```python |
| | # load threshold |
| | with open("threshold.json") as f: |
| | thr = json.load(f)["threshold"] # 0.8697 |
| | |
| | def predict_proba(texts): |
| | enc = tokenizer( |
| | texts, |
| | padding=True, |
| | truncation=True, |
| | max_length=512, |
| | return_tensors="pt", |
| | ) |
| | with torch.no_grad(): |
| | logits = model(**enc).logits.squeeze(-1) |
| | probs = torch.sigmoid(logits) |
| | return probs.cpu().numpy() |
| | |
| | def predict_label(texts, threshold=thr): |
| | probs = predict_proba(texts) |
| | return (probs >= threshold).astype(int) |
| | |
| | # example |
| | texts = ["Some example text to classify"] |
| | probs = predict_proba(texts) |
| | labels = predict_label(texts) |
| | print(probs, labels) # label 1 = AI, 0 = Human |
| | ``` |
| |
|
| | ### Load merged model (no PEFT required) |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch, json |
| | |
| | model_dir = "./merged_model" |
| | tokenizer = AutoTokenizer.from_pretrained(model_dir) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_dir) |
| | model.eval() |
| | |
| | with open("threshold.json") as f: |
| | thr = json.load(f)["threshold"] # 0.8697 |
| | |
| | def predict_proba(texts): |
| | enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
| | with torch.no_grad(): |
| | logits = model(**enc).logits.squeeze(-1) |
| | probs = torch.sigmoid(logits) |
| | return probs.cpu().numpy() |
| | ``` |
| |
|
| | ### Optional: apply temperature scaling to logits |
| |
|
| | ```python |
| | import json |
| | with open("calibration.json") as f: |
| | T = json.load(f)["temperature"] # e.g., 1.4437 |
| | |
| | def predict_proba_calibrated(texts): |
| | enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
| | with torch.no_grad(): |
| | logits = model(**enc).logits.squeeze(-1) |
| | probs = torch.sigmoid(logits / T) |
| | return probs.cpu().numpy() |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Notes |
| |
|
| | - Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT). |
| | - **LoRA config:** |
| | - `r=32`, `alpha=128`, `dropout=0.0` |
| | - Target modules: `query_proj`, `key_proj`, `value_proj` |
| | - **Training config:** |
| |
|
| | - `bf16=True` |
| | - `optim="adamw_torch_fused"` |
| | - `lr_scheduler_type="cosine_with_restarts"` |
| | - `num_train_epochs=2` |
| | - `per_device_train_batch_size=8`, `gradient_accumulation_steps=4` |
| | - `max_grad_norm=0.5` |
| |
|
| | - Threshold `0.8697` was chosen as the **max-F1** point on the calibration set. |
| | You can adjust it if you prefer fewer false positives or fewer false negatives. |
| |
|