songhieng
/

TinyBERT-URL-Detection-1.0

url-phishing-detection

sequence-classification

Model card Files Files and versions Community

TinyBERT-URL-Detection-1.0 / README.md

songhieng's picture

Upload folder using huggingface_hub

974b711 verified 25 days ago

|

history blame contribute delete

1.98 kB

	---
	language: en
	license: mit
	tags:
	- url-phishing-detection
	- tinybert
	- sequence-classification
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	---

	# TinyBERT for URL Phishing Detection

	This model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D to detect phishing URLs.

	## Model description

	The model is a fine-tuned version of TinyBERT, specifically trained to classify URLs as either legitimate or phishing.

	## Intended uses & limitations

	This model is intended to be used for detecting phishing URLs. It takes a URL as input and outputs a prediction of whether the URL is legitimate or phishing.

	## Training data

	The model was trained on a combination of:
	- Legitimate URLs from the Majestic Million dataset
	- Phishing URLs from phishing-links-ACTIVE.txt and phishing-links-INACTIVE.txt

	## Training procedure

	The model was fine-tuned using the Hugging Face Transformers library with the following parameters:
	- Learning rate: 5e-5
	- Batch size: 16
	- Number of epochs: 3
	- Weight decay: 0.01

	## Evaluation results

	The model was evaluated on a test set consisting of both legitimate and phishing URLs.

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0")
	model = AutoModelForSequenceClassification.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0")

	# Prepare URL for classification
	url = "https://example.com"
	inputs = tokenizer(url, return_tensors="pt", truncation=True, padding=True, max_length=128)

	# Make prediction
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.softmax(outputs.logits, dim=1)
	label = torch.argmax(predictions, dim=1).item()

	# Output result
	result = "phishing" if label == 1 else "legitimate"
	confidence = predictions[0][label].item()
	print(f"URL: {url}")
	print(f"Prediction: {result}")
	print(f"Confidence: {confidence:.4f}")
	```