TinyBERT for URL Phishing Detection
This model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D to detect phishing URLs.
Model description
The model is a fine-tuned version of TinyBERT, specifically trained to classify URLs as either legitimate or phishing.
Intended uses & limitations
This model is intended to be used for detecting phishing URLs. It takes a URL as input and outputs a prediction of whether the URL is legitimate or phishing.
Training data
The model was trained on a combination of:
- Legitimate URLs from the Majestic Million dataset
- Phishing URLs from phishing-links-ACTIVE.txt and phishing-links-INACTIVE.txt
Training procedure
The model was fine-tuned using the Hugging Face Transformers library with the following parameters:
- Learning rate: 5e-5
- Batch size: 16
- Number of epochs: 3
- Weight decay: 0.01
Evaluation results
The model was evaluated on a test set consisting of both legitimate and phishing URLs.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0")
model = AutoModelForSequenceClassification.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0")
# Prepare URL for classification
url = "https://example.com"
inputs = tokenizer(url, return_tensors="pt", truncation=True, padding=True, max_length=128)
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=1)
label = torch.argmax(predictions, dim=1).item()
# Output result
result = "phishing" if label == 1 else "legitimate"
confidence = predictions[0][label].item()
print(f"URL: {url}")
print(f"Prediction: {result}")
print(f"Confidence: {confidence:.4f}")
- Downloads last month
- 48
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support