File size: 1,977 Bytes
974b711 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
language: en
license: mit
tags:
- url-phishing-detection
- tinybert
- sequence-classification
datasets:
- custom
metrics:
- accuracy
- f1
---
# TinyBERT for URL Phishing Detection
This model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D to detect phishing URLs.
## Model description
The model is a fine-tuned version of TinyBERT, specifically trained to classify URLs as either legitimate or phishing.
## Intended uses & limitations
This model is intended to be used for detecting phishing URLs. It takes a URL as input and outputs a prediction of whether the URL is legitimate or phishing.
## Training data
The model was trained on a combination of:
- Legitimate URLs from the Majestic Million dataset
- Phishing URLs from phishing-links-ACTIVE.txt and phishing-links-INACTIVE.txt
## Training procedure
The model was fine-tuned using the Hugging Face Transformers library with the following parameters:
- Learning rate: 5e-5
- Batch size: 16
- Number of epochs: 3
- Weight decay: 0.01
## Evaluation results
The model was evaluated on a test set consisting of both legitimate and phishing URLs.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0")
model = AutoModelForSequenceClassification.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0")
# Prepare URL for classification
url = "https://example.com"
inputs = tokenizer(url, return_tensors="pt", truncation=True, padding=True, max_length=128)
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=1)
label = torch.argmax(predictions, dim=1).item()
# Output result
result = "phishing" if label == 1 else "legitimate"
confidence = predictions[0][label].item()
print(f"URL: {url}")
print(f"Prediction: {result}")
print(f"Confidence: {confidence:.4f}")
```
|