TheKavach - AI Cybersecurity Threat Intelligence Model

A fine-tuned MiniLM (all-MiniLM-L6-v2) model for real-time cybersecurity threat classification. Trained on 6 million synthetic network security logs to detect benign, suspicious, and malicious traffic patterns.

Model Details

Property	Value
Base Model	sentence-transformers/all-MiniLM-L6-v2
Task	Text Classification (3-class)
Labels	`benign`, `suspicious`, `malicious`
Training Steps	204,075
Framework	HuggingFace Transformers + PyTorch

Labels

The model outputs one of three threat classifications:

Label	Description	Target Distribution
`LABEL_0`	Benign - Normal network traffic	~70%
`LABEL_1`	Suspicious - Anomalous but unconfirmed	~20%
`LABEL_2`	Malicious - Confirmed threat behavior	~10%

Usage

Python

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="OMCHOKSI108/TheKavach",
    tokenizer="OMCHOKSI108/TheKavach"
)

text = "Blocked TCP connection detected by firewall log using nmap scanner targeting high-risk path with small data transfer."
result = classifier(text)
print(result)
# [{'label': 'LABEL_1', 'score': 0.9993}]

Using TheKavach Inference Engine

from models.inference import CybersecurityAI

ai = CybersecurityAI(hf_model="OMCHOKSI108/TheKavach")

raw_log = {
    "protocol": "TCP",
    "action": "blocked",
    "user_agent": "nmap scripting engine",
    "request_path": "/admin/config",
    "bytes_transferred": 5000,
    "log_type": "firewall"
}

result = ai.analyze_log(raw_log)
# Returns: threat, confidence, severity, explanation

REST API

curl -X POST https://thekavach.onrender.com/api/ai/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "protocol": "TCP",
    "action": "blocked",
    "user_agent": "nmap scripting engine",
    "request_path": "/admin/config",
    "bytes_transferred": 5000,
    "log_type": "firewall"
  }'

Input Format

The model expects semantic text produced by the LogNormalizer. Raw log fields are converted as follows:

Raw Fields	Normalized Text
TCP, blocked, Nmap, /admin/config	Blocked TCP connection detected by firewall log using nmap scanner targeting high-risk path
HTTP, allowed, Chrome, /login	Permitted HTTP request recorded by application log accessing authentication path
HTTPS, blocked, SQLMap, /api/login	Blocked HTTPS request detected by IDS using sqlmap scanner targeting authentication path

Evaluation Results

Evaluated on 19,511 synthetically generated logs over 20 minutes.

Metric	Value
Accuracy	79.8%
Macro Precision	0.520
Macro Recall	0.502
Macro F1-Score	0.499
Throughput	16.3 logs/sec
Avg Response Time	11.1 ms

Per-Class Performance

Class	Precision	Recall	F1-Score	Support
benign	0.805	1.000	0.892	13,575
suspicious	0.754	0.505	0.605	3,949
malicious	0.000	0.000	0.000	1,987

Confusion Matrix

True \ Pred	benign	suspicious
benign	13,575	0
suspicious	1,955	1,994
malicious	1,337	650

Files

File	Purpose
`model.safetensors`	Fine-tuned MiniLM weights (90.9 MB)
`config.json`	Model configuration
`tokenizer.json`	Text tokenizer
`tokenizer_config.json`	Tokenizer settings
`training_args.bin`	Training hyperparameters
`threat_classifier.pkl`	Sklearn threat classifier
`struct_scaler.pkl`	Feature scaler

Training

Full training pipeline available at:

Kaggle: https://www.kaggle.com/code/omchoksi04/thekavach
GitHub: https://github.com/omchoksi04/TheKavach

License

MIT License

Downloads last month: 2

Safetensors

Model size

22.7M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

OMCHOKSI108
/

TheKavach