Model Card for Model ID

Model Summary

  1. bert-log-anomaly-detection is a BERT-based NLP model fine-tuned for single SQL transaction log anomaly detection.

  2. The model classifies each database transaction log as either Normal or Anomaly, with the goal of supporting AI-powered fraud detection and cybersecurity monitoring systems.

  3. This model was developed as part of the Samsung ร— KBTG Digital Fraud Cybersecurity Hackathon (Thailand) under the AI-Powered Fraud Detection & Prevention track.

Model Description

This model analyzes individual SQL database transaction logs and detects abnormal patterns that may indicate fraudulent, malicious, or suspicious behavior.

Demo: Hackathon prototype

  • Developed by: Aungruk Vanichanai, Napat Wanitwatthakorn, Thanakrit Sriphiphattana
  • Shared by: Aungruk Vanichanai
  • Model type: Transformer-based binary text classifier
  • Language(s) (NLP): English (SQL logs in text format)
  • License: Apache 2.0
  • Finetuned from model: google-bert/bert-base-uncased

Model Sources

How to Get Started with the Model

Step 1 (Setup)

import torch
from transformers import BertForSequenceClassification, BertTokenizer

MODEL_PATH = "AungMoonLord/bert-log-anomaly-detection"

model = BertForSequenceClassification.from_pretrained(MODEL_PATH)
tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)

model.eval()

Step 2 (Clean and Label Logs) โ€” Optional, but may slightly improve accuracy, recall, and F1-score

# Perfom log preprocessing
def add_prefix_token(text): # log data must pass this code before training/inferencing
    # clean log
    text = text.replace("\t", " ")
    text = text.strip()
    # add token
    if text[0].isalpha() or text[3].isalpha():
        return "[SQL]\n" + text
    else:
        return "[LOG]\n" + text

Step 3 (Create the Function for Log Classification)

def predict_log(log_text):
    log_text = add_prefix_token(log_text)
    inputs = tokenizer(
        log_text,
        return_tensors="pt",
        truncation=True,
        padding=True, # for cases when the inference contains more than 1 log, i.e., batch size > 1
        max_length=128
    )

    with torch.no_grad():
        logits = model(**inputs).logits
        pred = torch.argmax(logits, dim=1).item()
        prob = torch.softmax(logits, dim=-1).tolist()[0]

    return "Normal" if pred == 1 else "Anomaly", prob

Step 4 (Samples of Inferences)

# Example 1
text1 = "SELECT * FROM users WHERE id = 1 OR 1=1"
print(predict_log(text1))

# Example 2
text2 = "2025-01-06 14:23:45 | User: anonymous | IP: 203.154.89.102 | Duration: 0.05s SELECT * FROM users WHERE username = 'admin' OR '1'='1' -- ' AND password = 'x'"
print(predict_log(text2))

# Example 3
text3 = "3051-06-22T07:20:02.296945Z 3 Query select e3mJKDCCY from 7Q8SpG8LLEWhrfpe4s5 where ph4d = 'a1S9hQa92uC1EAyJf2Y';"
print(predict_log(text3))

Application in Hackathon Project

  • Developed by Waris Sripatoomrak, this model integrates with an n8n workflow to automate fraud detection within financial transaction logs.

Out-of-Scope Use

  • Multi-log sequence anomaly detection

  • Non-textual anomaly detection

Training Data

  • SQL database transaction logs (1,611 samples) synthetically generated by ChatGPT, Qwen, DeepSeek, Grok, Gemini, and Claude

  • Each log labeled as either Normal or Anomaly

  • Data prepared for single-log classification

Evaluation

Metrics

- Training Set
Metric Value
Accuracy 0.8950
Precision 0.8580
Recall 0.9026
F1-score 0.8797
Validation Loss 0.3279
- Test Set (Baseline โ€” No Step 2 Preprocessing)
Metric Value
Accuracy 0.6950
Precision 0.6639
Recall 0.7900
F1-score 0.7215
Validation Loss 0.6251
- Test Set (Full Pipeline โ€” With Step 2 Preprocessing)
Metric Value
Accuracy 0.7000
Precision 0.6613
Recall 0.8200
F1-score 0.7321
Validation Loss 0.6344

Summary

The model demonstrates strong anomaly detection capability with high recall, making it suitable for fraud detection and cybersecurity use cases.

Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AungMoonLord/bert-log-anomaly-detection

Finetuned
(6603)
this model