Canstralian's picture
Update README.md
f7e35ad verified
---
license: mit
language:
- en
auto_detected: true
datasets:
- Canstralian/pentesting_dataset
- Canstralian/Wordlists
- Canstralian/ShellCommands
- Canstralian/CyberExploitDB
- Chemically-motivated/CyberSecurityDataset
- Chemically-motivated/AI-Agent-Generating-Tool-Debugging-Prompt-Library
metrics:
- accuracy
- precision
- f1
- code_eval
base_model:
- WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5
library_name: transformers
tags:
- code
---
# CyberAttackDetection
## Overview
The **CyberAttackDetection** model is a fine-tuned BERT-based sequence classification model designed to identify cyberattacks in textual descriptions. It classifies input data into two categories:
- **Attack (1)**: The text describes a cybersecurity threat or attack.
- **Non-Attack (0)**: The text does not describe a cybersecurity threat.
---
## Model Details
- **License**: [MIT License](LICENSE)
- **Datasets**:
- Custom cybersecurity datasets:
- `Canstralian/pentesting_dataset`
- `Canstralian/Wordlists`
- `Canstralian/ShellCommands`
- `Canstralian/CyberExploitDB`
- `Chemically-motivated/CyberSecurityDataset`
- `Chemically-motivated/AI-Agent-Generating-Tool-Debugging-Prompt-Library`
- **Language**: English
- **Metrics**:
- **Accuracy**: 85%
- **F1 Score**: 0.83
- **Precision**: 0.80
- **Recall**: 0.87
- **Base Model**: `WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5`
- **Pipeline Tag**: `text-classification`
- **Library Name**: `transformers`
- **Tags**: `cybersecurity`, `text-classification`, `attack-detection`, `BERT`
- **New Version**: `v1.0.0`
- **Auto-Detected Features**: True
---
## Model Usage
### Installation
Before using the model, ensure the necessary dependencies are installed:
```bash
pip install transformers torch
```
### Example Code
Use the following Python code to load the model and classify a sample text:
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load the fine-tuned model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Canstralian/CyberAttackDetection")
tokenizer = AutoTokenizer.from_pretrained("Canstralian/CyberAttackDetection")
# Example input: Cyberattack description
text = "A vulnerability was discovered in the server software."
# Tokenize the input
inputs = tokenizer(text, return_tensors="pt")
# Get model predictions
outputs = model(**inputs)
# Predict the label (1 = attack, 0 = non-attack)
prediction = outputs.logits.argmax(dim=-1)
print(f"Prediction: {'Attack' if prediction.item() == 1 else 'Non-Attack'}")
```
## Prompts:
- Open Ports: "Analyze the following network scan report and identify open ports and their associated vulnerabilities. Suggest best practices to secure these ports: [Insert network scan report]."
- Outdated Software or Services: "Given this list of installed software and services, identify outdated versions and known vulnerabilities. Provide recommendations for updates or patches to mitigate risks: [Insert software and service list]."
- Default Credentials: "Scan the following system configurations for any use of default credentials. Provide a list of affected services and recommendations for securing these credentials: [Insert system configuration details]."
- Misconfigurations: "Evaluate the provided system configuration for potential misconfigurations. Highlight risks and provide recommendations for secure setup: [Insert system configuration details]."
- Injection Flaws: "Review the given web application code or request logs and identify potential injection vulnerabilities such as SQL injection, command injection, or XSS. Provide remediation steps: [Insert code or logs]."
- Unencrypted Services: "Analyze the following network configuration and identify services that are transmitting data without encryption. Suggest strategies to enforce secure transmission: [Insert network configuration details]."
- Known Software Vulnerabilities: "Review the provided software inventory and cross-reference it with known vulnerabilities in the National Vulnerability Database (NVD). Recommend patches or workarounds: [Insert software inventory]."
- Cross-Site Request Forgery (CSRF): "Examine the provided web application code for potential CSRF vulnerabilities. Suggest specific coding or configuration techniques to prevent these attacks: [Insert code]."
- Insecure Direct Object References (IDOR): "Analyze the provided API endpoints and their associated access controls. Identify any IDOR vulnerabilities and suggest secure implementation strategies: [Insert API endpoint details]."
- Security Misconfigurations in Web Servers/Applications: "Assess the given web server configuration for security misconfigurations, such as improper HTTP headers or verbose error messages. Recommend changes to harden the server: [Insert server configuration]."
- Broken Authentication and Session Management: "Review the provided authentication and session management implementation. Identify weaknesses and recommend strategies to prevent compromise: [Insert authentication/session management details]."
- Sensitive Data Exposure: "Analyze the system's data handling processes and storage practices to identify potential sensitive data exposure. Recommend measures to protect sensitive information: [Insert system details]."
- API Vulnerabilities: "Examine the following API documentation and implementation for vulnerabilities, including insecure endpoints and data leakage. Provide recommendations for securing the API: [Insert API documentation]."
- Denial of Service (DoS) Vulnerabilities: "Review the system's architecture and configuration for potential vulnerabilities to DoS attacks. Suggest mitigation strategies such as rate limiting and load balancing: [Insert system architecture]."
- Buffer Overflows: "Analyze the provided code or application for buffer overflow vulnerabilities. Highlight potential weak points and recommend secure coding practices to prevent exploitation: [Insert code]."
## Model Training Details
### Training Objective
The model was fine-tuned to classify descriptive text as either an attack or non-attack event. It uses a **binary classification** approach.
### Training Data
- The training data includes cybersecurity-related attack descriptions and non-attack examples from curated datasets.
---
## Evaluation
The model was evaluated on a balanced test set using the following metrics:
- **Accuracy**: 85%
- **F1 Score**: 0.83
- **Precision**: 0.80
- **Recall**: 0.87
These results indicate strong performance in detecting cyberattacks from text.
---
## License
This project is licensed under the **MIT License**. Refer to the [LICENSE](LICENSE) file for details.
---
## How to Contribute
We welcome contributions!
- **Submit Issues**: If you encounter problems, open an issue on the repository.
- **Pull Requests**: Feel free to contribute code improvements or documentation updates.
---
## Contact
For further information or inquiries, contact: **canstralian@cybersecurity.com**