|
--- |
|
language: en |
|
license: mit |
|
tags: |
|
- url-phishing-detection |
|
- tinybert |
|
- sequence-classification |
|
datasets: |
|
- custom |
|
metrics: |
|
- accuracy |
|
- f1 |
|
--- |
|
|
|
# TinyBERT for URL Phishing Detection |
|
|
|
This model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D to detect phishing URLs. |
|
|
|
## Model description |
|
|
|
The model is a fine-tuned version of TinyBERT, specifically trained to classify URLs as either legitimate or phishing. |
|
|
|
## Intended uses & limitations |
|
|
|
This model is intended to be used for detecting phishing URLs. It takes a URL as input and outputs a prediction of whether the URL is legitimate or phishing. |
|
|
|
## Training data |
|
|
|
The model was trained on a combination of: |
|
- Legitimate URLs from the Majestic Million dataset |
|
- Phishing URLs from phishing-links-ACTIVE.txt and phishing-links-INACTIVE.txt |
|
|
|
## Training procedure |
|
|
|
The model was fine-tuned using the Hugging Face Transformers library with the following parameters: |
|
- Learning rate: 5e-5 |
|
- Batch size: 16 |
|
- Number of epochs: 3 |
|
- Weight decay: 0.01 |
|
|
|
## Evaluation results |
|
|
|
The model was evaluated on a test set consisting of both legitimate and phishing URLs. |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0") |
|
model = AutoModelForSequenceClassification.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0") |
|
|
|
# Prepare URL for classification |
|
url = "https://example.com" |
|
inputs = tokenizer(url, return_tensors="pt", truncation=True, padding=True, max_length=128) |
|
|
|
# Make prediction |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
predictions = torch.softmax(outputs.logits, dim=1) |
|
label = torch.argmax(predictions, dim=1).item() |
|
|
|
# Output result |
|
result = "phishing" if label == 1 else "legitimate" |
|
confidence = predictions[0][label].item() |
|
print(f"URL: {url}") |
|
print(f"Prediction: {result}") |
|
print(f"Confidence: {confidence:.4f}") |
|
``` |
|
|