| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - protectai/deberta-v3-base-prompt-injection-v2 |
| pipeline_tag: text-classification |
| tags: |
| - security |
| - prompt |
| - cyber-security |
| - llm-security |
| - prompt-injection |
| - command-injection |
| library_name: transformers |
| --- |
| |
| # Command Injection Detector |
|
|
| A fine-tuned DeBERTa model for detecting command injection attacks in prompts before they reach an LLM. |
|
|
| ## Overview |
|
|
| This model is part of [PromptWAF](https://github.com/edaerer/promptwaf) — a multi-layered ML-based Web Application Firewall designed to detect and block prompt injection attacks. |
|
|
| The model identifies prompts containing shell command execution patterns (`; rm -rf`, `| cat /etc/passwd`, `$(whoami)`, backtick execution, etc.) commonly used in command injection attacks. |
|
|
| ## Model Details |
|
|
| - **Architecture**: DeBERTa (Base) |
| - **Task**: Binary Sequence Classification |
| - **Training Data**: Trained on a custom, internally curated command injection dataset |
| - **Labels**: |
| - `0` → Safe/Benign |
| - `1` → Command Injection Attack |
|
|
| ## Usage |
|
|
| ### With PromptWAF |
|
|
| ```bash |
| # Automatically used in PromptWAF via .env configuration |
| CMD_INJECTION_MODEL_DIR=edaerer/promptwaf-command-injection |
| ``` |
|
|
| ### Standalone |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_id = "edaerer/promptwaf-command-injection" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| |
| text = "List files; rm -rf / --no-preserve-root" |
| inputs = tokenizer(text, return_tensors="pt") |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| |
| probabilities = torch.softmax(outputs.logits, dim=-1) |
| score = probabilities[0][1].item() # Malicious score |
| |
| print(f"Command Injection Risk: {score:.2%}") |
| ``` |
|
|
| ## Performance |
|
|
| - **Threshold**: 0.5 (adjustable in PromptWAF) |
| - **Input**: Max 256 tokens |
|
|
| ## Integration |
|
|
| This model is designed to work seamlessly with: |
| - **PromptWAF** - The main security orchestrator |
| - **HuggingFace Transformers** - For inference |
| - Any standard sequence classification pipeline |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{promptwaf2026, |
| author = {Erer, Eda and Odabasi, Talha}, |
| title = {PromptWAF: A Multi-Layered ML Defense for LLM Prompt Security}, |
| year = {2026}, |
| url = {https://github.com/edaerer/promptwaf} |
| } |
| ``` |
|
|
| ## License |
|
|
| Apache License 2.0 |
|
|
| --- |
|
|
| For more information, visit [PromptWAF GitHub Repository](https://github.com/edaerer/promptwaf) |