Model Card for Model ID

Model Summary

  1. bert-log-anomaly-detection is a BERT-based NLP model fine-tuned for single SQL transaction log anomaly detection.

  2. The model classifies each database transaction log as either Normal or Anomaly, with the goal of supporting AI-powered fraud detection and cybersecurity monitoring systems.

  3. This model was developed as part of the Samsung ร— KBTG Digital Fraud Cybersecurity Hackathon (Thailand) under the AI-Powered Fraud Detection & Prevention track.

Model Description

This model analyzes individual SQL database transaction logs and detects abnormal patterns that may indicate fraudulent, malicious, or suspicious behavior.

Demo: Hackathon prototype

  • Developed by: Aungruk Vanichanai, Napat Wanitwatthakorn, Thanakrit Sriphiphattana
  • Shared by: Aungruk Vanichanai
  • Model type: Transformer-based binary text classifier
  • Language(s) (NLP): English (SQL logs in text format)
  • License: Apache 2.0
  • Finetuned from model: google-bert/bert-base-uncased

Model Sources

How to Get Started with the Model

Step 1 (Setup)

import torch
from transformers import BertForSequenceClassification, BertTokenizer

MODEL_PATH = "AungMoonLord/bert-log-anomaly-detection"

model = BertForSequenceClassification.from_pretrained(MODEL_PATH)
tokenizer = BertTokenizer.from_pretrained(MODEL_PATH)

model.eval()

Step 2 (Clean and Label Logs) โ€” Optional, but may slightly improve accuracy, recall, and F1-score

# Perfom log preprocessing
def add_prefix_token(text): # log data must pass this code before training/inferencing
    # clean log
    text = text.replace("\t", " ")
    text = text.strip()
    # add token
    if text[0].isalpha() or text[3].isalpha():
        return "[SQL]\n" + text
    else:
        return "[LOG]\n" + text

Step 3 (Create the Function for Log Classification)

def predict_log(log_text):
    log_text = add_prefix_token(log_text)
    inputs = tokenizer(
        log_text,
        return_tensors="pt",
        truncation=True,
        padding=True, # for cases when the inference contains more than 1 log, i.e., batch size > 1
        max_length=128
    )

    with torch.no_grad():
        logits = model(**inputs).logits
        pred = torch.argmax(logits, dim=1).item()
        prob = torch.softmax(logits, dim=-1).tolist()[0]

    return "Normal" if pred == 1 else "Anomaly", prob

Step 4 (Samples of Inferences)

# Example 1
text1 = "SELECT * FROM users WHERE id = 1 OR 1=1"
print(predict_log(text1))

# Example 2
text2 = "2025-01-06 14:23:45 | User: anonymous | IP: 203.154.89.102 | Duration: 0.05s SELECT * FROM users WHERE username = 'admin' OR '1'='1' -- ' AND password = 'x'"
print(predict_log(text2))

# Example 3
text3 = "3051-06-22T07:20:02.296945Z 3 Query select e3mJKDCCY from 7Q8SpG8LLEWhrfpe4s5 where ph4d = 'a1S9hQa92uC1EAyJf2Y';"
print(predict_log(text3))

Application in Hackathon Project

  • Developed by Waris Sripatoomrak, this model integrates with an n8n workflow to automate fraud detection within financial transaction logs.

Out-of-Scope Use

  • Multi-log sequence anomaly detection

  • Non-textual anomaly detection

Training Data

  • SQL database transaction logs (1,611 samples) synthetically generated by ChatGPT, Qwen, DeepSeek, Grok, Gemini, and Claude

  • Each log labeled as either Normal or Anomaly

  • Data prepared for single-log classification

Evaluation

Metrics

- Training Set
Metric Value
Accuracy 0.8950
Precision 0.8580
Recall 0.9026
F1-score 0.8797
Validation Loss 0.3279
- Test Set (Baseline โ€” No Step 2 Preprocessing)
Metric Value
Accuracy 0.6950
Precision 0.6639
Recall 0.7900
F1-score 0.7215
Validation Loss 0.6251
- Test Set (Full Pipeline โ€” With Step 2 Preprocessing)
Metric Value
Accuracy 0.7000
Precision 0.6613
Recall 0.8200
F1-score 0.7321
Validation Loss 0.6344

Summary

The model demonstrates strong anomaly detection capability with high recall, making it suitable for fraud detection and cybersecurity use cases.

Downloads last month
9
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AungMoonLord/bert-log-anomaly-detection

Finetuned
(6682)
this model