Model Card for Model ID

Model Summary

bert-log-anomaly-detection is a BERT-based NLP model fine-tuned for single SQL transaction log anomaly detection.
The model classifies each database transaction log as either Normal or Anomaly, with the goal of supporting AI-powered fraud detection and cybersecurity monitoring systems.
This model was developed as part of the Samsung × KBTG Digital Fraud Cybersecurity Hackathon (Thailand) under the AI-Powered Fraud Detection & Prevention track.

Model Description

This model analyzes individual SQL database transaction logs and detects abnormal patterns that may indicate fraudulent, malicious, or suspicious behavior.

Demo: Hackathon prototype

Developed by: Aungruk Vanichanai, Napat Wanitwatthakorn, Thanakrit Sriphiphattana
Shared by: Aungruk Vanichanai
Model type: Transformer-based binary text classifier
Language(s) (NLP): English (SQL logs in text format)
License: Apache 2.0
Finetuned from model: google-bert/bert-base-uncased

Model Sources

GitHub Repository: https://github.com/AungMoonLord/AI-Cybersecurity-Hackathon/tree/main/New%20Finetune%20Hackathon

How to Get Started with the Model

Step 1 (Setup)

import torch
from transformers import BertForSequenceClassification, BertTokenizer

MODEL_PATH = "AungMoonLord/bert-log-anomaly-detection"

model = BertForSequenceClassification.from_pretrained(MODEL_PATH)
tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)

model.eval()

Step 2 (Clean and Label Logs) — Optional, but may slightly improve accuracy, recall, and F1-score

# Perfom log preprocessing
def add_prefix_token(text): # log data must pass this code before training/inferencing
    # clean log
    text = text.replace("\t", " ")
    text = text.strip()
    # add token
    if text[0].isalpha() or text[3].isalpha():
        return "[SQL]\n" + text
    else:
        return "[LOG]\n" + text

Step 3 (Create the Function for Log Classification)

def predict_log(log_text):
    log_text = add_prefix_token(log_text)
    inputs = tokenizer(
        log_text,
        return_tensors="pt",
        truncation=True,
        padding=True, # for cases when the inference contains more than 1 log, i.e., batch size > 1
        max_length=128
    )

    with torch.no_grad():
        logits = model(**inputs).logits
        pred = torch.argmax(logits, dim=1).item()
        prob = torch.softmax(logits, dim=-1).tolist()[0]

    return "Normal" if pred == 1 else "Anomaly", prob

Step 4 (Samples of Inferences)

# Example 1
text1 = "SELECT * FROM users WHERE id = 1 OR 1=1"
print(predict_log(text1))

# Example 2
text2 = "2025-01-06 14:23:45 | User: anonymous | IP: 203.154.89.102 | Duration: 0.05s SELECT * FROM users WHERE username = 'admin' OR '1'='1' -- ' AND password = 'x'"
print(predict_log(text2))

# Example 3
text3 = "3051-06-22T07:20:02.296945Z 3 Query select e3mJKDCCY from 7Q8SpG8LLEWhrfpe4s5 where ph4d = 'a1S9hQa92uC1EAyJf2Y';"
print(predict_log(text3))

Application in Hackathon Project

Developed by Waris Sripatoomrak, this model integrates with an n8n workflow to automate fraud detection within financial transaction logs.

Out-of-Scope Use

Multi-log sequence anomaly detection
Non-textual anomaly detection

Training Data

SQL database transaction logs (1,611 samples) synthetically generated by ChatGPT, Qwen, DeepSeek, Grok, Gemini, and Claude
Each log labeled as either Normal or Anomaly
Data prepared for single-log classification

Evaluation

Metrics

- Training Set

Metric	Value
Accuracy	0.8950
Precision	0.8580
Recall	0.9026
F1-score	0.8797
Validation Loss	0.3279

- Test Set (Baseline — No Step 2 Preprocessing)

Metric	Value
Accuracy	0.6950
Precision	0.6639
Recall	0.7900
F1-score	0.7215
Validation Loss	0.6251

- Test Set (Full Pipeline — With Step 2 Preprocessing)

Metric	Value
Accuracy	0.7000
Precision	0.6613
Recall	0.8200
F1-score	0.7321
Validation Loss	0.6344

Summary

The model demonstrates strong anomaly detection capability with high recall, making it suitable for fraud detection and cybersecurity use cases.

Downloads last month: 21

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for AungMoonLord/bert-log-anomaly-detection

Base model

google-bert/bert-base-uncased

Finetuned

(6603)

this model