--- language: en license: mit tags: - url-phishing-detection - tinybert - sequence-classification datasets: - custom metrics: - accuracy - f1 --- # TinyBERT for URL Phishing Detection This model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D to detect phishing URLs. ## Model description The model is a fine-tuned version of TinyBERT, specifically trained to classify URLs as either legitimate or phishing. ## Intended uses & limitations This model is intended to be used for detecting phishing URLs. It takes a URL as input and outputs a prediction of whether the URL is legitimate or phishing. ## Training data The model was trained on a combination of: - Legitimate URLs from the Majestic Million dataset - Phishing URLs from phishing-links-ACTIVE.txt and phishing-links-INACTIVE.txt ## Training procedure The model was fine-tuned using the Hugging Face Transformers library with the following parameters: - Learning rate: 5e-5 - Batch size: 16 - Number of epochs: 3 - Weight decay: 0.01 ## Evaluation results The model was evaluated on a test set consisting of both legitimate and phishing URLs. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0") model = AutoModelForSequenceClassification.from_pretrained("songhieng/TinyBERT-URL-Detection-1.0") # Prepare URL for classification url = "https://example.com" inputs = tokenizer(url, return_tensors="pt", truncation=True, padding=True, max_length=128) # Make prediction with torch.no_grad(): outputs = model(**inputs) predictions = torch.softmax(outputs.logits, dim=1) label = torch.argmax(predictions, dim=1).item() # Output result result = "phishing" if label == 1 else "legitimate" confidence = predictions[0][label].item() print(f"URL: {url}") print(f"Prediction: {result}") print(f"Confidence: {confidence:.4f}") ```