TheKavach - AI Cybersecurity Threat Intelligence Model
A fine-tuned MiniLM (all-MiniLM-L6-v2) model for real-time cybersecurity threat classification. Trained on 6 million synthetic network security logs to detect benign, suspicious, and malicious traffic patterns.
Model Details
| Property | Value |
|---|---|
| Base Model | sentence-transformers/all-MiniLM-L6-v2 |
| Task | Text Classification (3-class) |
| Labels | benign, suspicious, malicious |
| Training Steps | 204,075 |
| Framework | HuggingFace Transformers + PyTorch |
Labels
The model outputs one of three threat classifications:
| Label | Description | Target Distribution |
|---|---|---|
LABEL_0 |
Benign - Normal network traffic | ~70% |
LABEL_1 |
Suspicious - Anomalous but unconfirmed | ~20% |
LABEL_2 |
Malicious - Confirmed threat behavior | ~10% |
Usage
Python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="OMCHOKSI108/TheKavach",
tokenizer="OMCHOKSI108/TheKavach"
)
text = "Blocked TCP connection detected by firewall log using nmap scanner targeting high-risk path with small data transfer."
result = classifier(text)
print(result)
# [{'label': 'LABEL_1', 'score': 0.9993}]
Using TheKavach Inference Engine
from models.inference import CybersecurityAI
ai = CybersecurityAI(hf_model="OMCHOKSI108/TheKavach")
raw_log = {
"protocol": "TCP",
"action": "blocked",
"user_agent": "nmap scripting engine",
"request_path": "/admin/config",
"bytes_transferred": 5000,
"log_type": "firewall"
}
result = ai.analyze_log(raw_log)
# Returns: threat, confidence, severity, explanation
REST API
curl -X POST https://thekavach.onrender.com/api/ai/analyze \
-H "Content-Type: application/json" \
-d '{
"protocol": "TCP",
"action": "blocked",
"user_agent": "nmap scripting engine",
"request_path": "/admin/config",
"bytes_transferred": 5000,
"log_type": "firewall"
}'
Input Format
The model expects semantic text produced by the LogNormalizer. Raw log fields are converted as follows:
| Raw Fields | Normalized Text |
|---|---|
| TCP, blocked, Nmap, /admin/config | Blocked TCP connection detected by firewall log using nmap scanner targeting high-risk path |
| HTTP, allowed, Chrome, /login | Permitted HTTP request recorded by application log accessing authentication path |
| HTTPS, blocked, SQLMap, /api/login | Blocked HTTPS request detected by IDS using sqlmap scanner targeting authentication path |
Evaluation Results
Evaluated on 19,511 synthetically generated logs over 20 minutes.
| Metric | Value |
|---|---|
| Accuracy | 79.8% |
| Macro Precision | 0.520 |
| Macro Recall | 0.502 |
| Macro F1-Score | 0.499 |
| Throughput | 16.3 logs/sec |
| Avg Response Time | 11.1 ms |
Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| benign | 0.805 | 1.000 | 0.892 | 13,575 |
| suspicious | 0.754 | 0.505 | 0.605 | 3,949 |
| malicious | 0.000 | 0.000 | 0.000 | 1,987 |
Confusion Matrix
| True \ Pred | benign | suspicious | malicious |
|---|---|---|---|
| benign | 13,575 | 0 | 0 |
| suspicious | 1,955 | 1,994 | 0 |
| malicious | 1,337 | 650 | 0 |
Files
| File | Purpose |
|---|---|
model.safetensors |
Fine-tuned MiniLM weights (90.9 MB) |
config.json |
Model configuration |
tokenizer.json |
Text tokenizer |
tokenizer_config.json |
Tokenizer settings |
training_args.bin |
Training hyperparameters |
threat_classifier.pkl |
Sklearn threat classifier |
struct_scaler.pkl |
Feature scaler |
Training
Full training pipeline available at:
- Kaggle: https://www.kaggle.com/code/omchoksi04/thekavach
- GitHub: https://github.com/omchoksi04/TheKavach
Links
- Live API: https://thekavach.onrender.com
- API Docs: https://thekavach.onrender.com/docs
- Live Viewer: https://thekavach.onrender.com/viewer
License
MIT License
- Downloads last month
- 99
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support