Model Card for Model ID
Model Summary
bert-log-anomaly-detectionis a BERT-based NLP model fine-tuned for single SQL transaction log anomaly detection.The model classifies each database transaction log as either
NormalorAnomaly, with the goal of supporting AI-powered fraud detection and cybersecurity monitoring systems.This model was developed as part of the Samsung ร KBTG Digital Fraud Cybersecurity Hackathon (Thailand) under the AI-Powered Fraud Detection & Prevention track.
Model Description
This model analyzes individual SQL database transaction logs and detects abnormal patterns that may indicate fraudulent, malicious, or suspicious behavior.
Demo: Hackathon prototype
- Developed by: Aungruk Vanichanai, Napat Wanitwatthakorn, Thanakrit Sriphiphattana
- Shared by: Aungruk Vanichanai
- Model type: Transformer-based binary text classifier
- Language(s) (NLP): English (SQL logs in text format)
- License: Apache 2.0
- Finetuned from model: google-bert/bert-base-uncased
Model Sources
- GitHub Repository: https://github.com/AungMoonLord/AI-Cybersecurity-Hackathon/tree/main/New%20Finetune%20Hackathon
How to Get Started with the Model
Step 1 (Setup)
import torch
from transformers import BertForSequenceClassification, BertTokenizer
MODEL_PATH = "AungMoonLord/bert-log-anomaly-detection"
model = BertForSequenceClassification.from_pretrained(MODEL_PATH)
tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)
model.eval()
Step 2 (Clean and Label Logs) โ Optional, but may slightly improve accuracy, recall, and F1-score
# Perfom log preprocessing
def add_prefix_token(text): # log data must pass this code before training/inferencing
# clean log
text = text.replace("\t", " ")
text = text.strip()
# add token
if text[0].isalpha() or text[3].isalpha():
return "[SQL]\n" + text
else:
return "[LOG]\n" + text
Step 3 (Create the Function for Log Classification)
def predict_log(log_text):
log_text = add_prefix_token(log_text)
inputs = tokenizer(
log_text,
return_tensors="pt",
truncation=True,
padding=True, # for cases when the inference contains more than 1 log, i.e., batch size > 1
max_length=128
)
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=1).item()
prob = torch.softmax(logits, dim=-1).tolist()[0]
return "Normal" if pred == 1 else "Anomaly", prob
Step 4 (Samples of Inferences)
# Example 1
text1 = "SELECT * FROM users WHERE id = 1 OR 1=1"
print(predict_log(text1))
# Example 2
text2 = "2025-01-06 14:23:45 | User: anonymous | IP: 203.154.89.102 | Duration: 0.05s SELECT * FROM users WHERE username = 'admin' OR '1'='1' -- ' AND password = 'x'"
print(predict_log(text2))
# Example 3
text3 = "3051-06-22T07:20:02.296945Z 3 Query select e3mJKDCCY from 7Q8SpG8LLEWhrfpe4s5 where ph4d = 'a1S9hQa92uC1EAyJf2Y';"
print(predict_log(text3))
Application in Hackathon Project
- Developed by Waris Sripatoomrak, this model integrates with an n8n workflow to automate fraud detection within financial transaction logs.
Out-of-Scope Use
Multi-log sequence anomaly detection
Non-textual anomaly detection
Training Data
SQL database transaction logs (1,611 samples) synthetically generated by ChatGPT, Qwen, DeepSeek, Grok, Gemini, and Claude
Each log labeled as either
NormalorAnomalyData prepared for single-log classification
Evaluation
Metrics
- Training Set
| Metric | Value |
|---|---|
| Accuracy | 0.8950 |
| Precision | 0.8580 |
| Recall | 0.9026 |
| F1-score | 0.8797 |
| Validation Loss | 0.3279 |
- Test Set (Baseline โ No Step 2 Preprocessing)
| Metric | Value |
|---|---|
| Accuracy | 0.6950 |
| Precision | 0.6639 |
| Recall | 0.7900 |
| F1-score | 0.7215 |
| Validation Loss | 0.6251 |
- Test Set (Full Pipeline โ With Step 2 Preprocessing)
| Metric | Value |
|---|---|
| Accuracy | 0.7000 |
| Precision | 0.6613 |
| Recall | 0.8200 |
| F1-score | 0.7321 |
| Validation Loss | 0.6344 |
Summary
The model demonstrates strong anomaly detection capability with high recall, making it suitable for fraud detection and cybersecurity use cases.
- Downloads last month
- 21
Model tree for AungMoonLord/bert-log-anomaly-detection
Base model
google-bert/bert-base-uncased