TheKavach - AI Cybersecurity Threat Intelligence Model

A fine-tuned MiniLM (all-MiniLM-L6-v2) model for real-time cybersecurity threat classification. Trained on 6 million synthetic network security logs to detect benign, suspicious, and malicious traffic patterns.

Model Details

Property Value
Base Model sentence-transformers/all-MiniLM-L6-v2
Task Text Classification (3-class)
Labels benign, suspicious, malicious
Training Steps 204,075
Framework HuggingFace Transformers + PyTorch

Labels

The model outputs one of three threat classifications:

Label Description Target Distribution
LABEL_0 Benign - Normal network traffic ~70%
LABEL_1 Suspicious - Anomalous but unconfirmed ~20%
LABEL_2 Malicious - Confirmed threat behavior ~10%

Usage

Python

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="OMCHOKSI108/TheKavach",
    tokenizer="OMCHOKSI108/TheKavach"
)

text = "Blocked TCP connection detected by firewall log using nmap scanner targeting high-risk path with small data transfer."
result = classifier(text)
print(result)
# [{'label': 'LABEL_1', 'score': 0.9993}]

Using TheKavach Inference Engine

from models.inference import CybersecurityAI

ai = CybersecurityAI(hf_model="OMCHOKSI108/TheKavach")

raw_log = {
    "protocol": "TCP",
    "action": "blocked",
    "user_agent": "nmap scripting engine",
    "request_path": "/admin/config",
    "bytes_transferred": 5000,
    "log_type": "firewall"
}

result = ai.analyze_log(raw_log)
# Returns: threat, confidence, severity, explanation

REST API

curl -X POST https://thekavach.onrender.com/api/ai/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "protocol": "TCP",
    "action": "blocked",
    "user_agent": "nmap scripting engine",
    "request_path": "/admin/config",
    "bytes_transferred": 5000,
    "log_type": "firewall"
  }'

Input Format

The model expects semantic text produced by the LogNormalizer. Raw log fields are converted as follows:

Raw Fields Normalized Text
TCP, blocked, Nmap, /admin/config Blocked TCP connection detected by firewall log using nmap scanner targeting high-risk path
HTTP, allowed, Chrome, /login Permitted HTTP request recorded by application log accessing authentication path
HTTPS, blocked, SQLMap, /api/login Blocked HTTPS request detected by IDS using sqlmap scanner targeting authentication path

Evaluation Results

Evaluated on 19,511 synthetically generated logs over 20 minutes.

Metric Value
Accuracy 79.8%
Macro Precision 0.520
Macro Recall 0.502
Macro F1-Score 0.499
Throughput 16.3 logs/sec
Avg Response Time 11.1 ms

Per-Class Performance

Class Precision Recall F1-Score Support
benign 0.805 1.000 0.892 13,575
suspicious 0.754 0.505 0.605 3,949
malicious 0.000 0.000 0.000 1,987

Confusion Matrix

True \ Pred benign suspicious malicious
benign 13,575 0 0
suspicious 1,955 1,994 0
malicious 1,337 650 0

Files

File Purpose
model.safetensors Fine-tuned MiniLM weights (90.9 MB)
config.json Model configuration
tokenizer.json Text tokenizer
tokenizer_config.json Tokenizer settings
training_args.bin Training hyperparameters
threat_classifier.pkl Sklearn threat classifier
struct_scaler.pkl Feature scaler

Training

Full training pipeline available at:

Links

License

MIT License

Downloads last month
99
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support