Update README.md

f7e35ad verified about 1 year ago

7.06 kB

	---
	license: mit
	language:
	- en
	auto_detected: true
	datasets:
	- Canstralian/pentesting_dataset
	- Canstralian/Wordlists
	- Canstralian/ShellCommands
	- Canstralian/CyberExploitDB
	- Chemically-motivated/CyberSecurityDataset
	- Chemically-motivated/AI-Agent-Generating-Tool-Debugging-Prompt-Library
	metrics:
	- accuracy
	- precision
	- f1
	- code_eval
	base_model:
	- WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5
	library_name: transformers
	tags:
	- code
	---

	# CyberAttackDetection

	## Overview

	The CyberAttackDetection model is a fine-tuned BERT-based sequence classification model designed to identify cyberattacks in textual descriptions. It classifies input data into two categories:
	- Attack (1): The text describes a cybersecurity threat or attack.
	- Non-Attack (0): The text does not describe a cybersecurity threat.

	---

	## Model Details

	- License: [MIT License](LICENSE)
	- Datasets:
	- Custom cybersecurity datasets:
	- `Canstralian/pentesting_dataset`
	- `Canstralian/Wordlists`
	- `Canstralian/ShellCommands`
	- `Canstralian/CyberExploitDB`
	- `Chemically-motivated/CyberSecurityDataset`
	- `Chemically-motivated/AI-Agent-Generating-Tool-Debugging-Prompt-Library`
	- Language: English
	- Metrics:
	- Accuracy: 85%
	- F1 Score: 0.83
	- Precision: 0.80
	- Recall: 0.87
	- Base Model: `WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5`
	- Pipeline Tag: `text-classification`
	- Library Name: `transformers`
	- Tags: `cybersecurity`, `text-classification`, `attack-detection`, `BERT`
	- New Version: `v1.0.0`
	- Auto-Detected Features: True

	---

	## Model Usage

	### Installation
	Before using the model, ensure the necessary dependencies are installed:
	```bash
	pip install transformers torch
	```

	### Example Code
	Use the following Python code to load the model and classify a sample text:

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	# Load the fine-tuned model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained("Canstralian/CyberAttackDetection")
	tokenizer = AutoTokenizer.from_pretrained("Canstralian/CyberAttackDetection")

	# Example input: Cyberattack description
	text = "A vulnerability was discovered in the server software."

	# Tokenize the input
	inputs = tokenizer(text, return_tensors="pt")

	# Get model predictions
	outputs = model(**inputs)

	# Predict the label (1 = attack, 0 = non-attack)
	prediction = outputs.logits.argmax(dim=-1)
	print(f"Prediction: {'Attack' if prediction.item() == 1 else 'Non-Attack'}")
	```

	## Prompts:
	- Open Ports: "Analyze the following network scan report and identify open ports and their associated vulnerabilities. Suggest best practices to secure these ports: [Insert network scan report]."
	- Outdated Software or Services: "Given this list of installed software and services, identify outdated versions and known vulnerabilities. Provide recommendations for updates or patches to mitigate risks: [Insert software and service list]."
	- Default Credentials: "Scan the following system configurations for any use of default credentials. Provide a list of affected services and recommendations for securing these credentials: [Insert system configuration details]."
	- Misconfigurations: "Evaluate the provided system configuration for potential misconfigurations. Highlight risks and provide recommendations for secure setup: [Insert system configuration details]."
	- Injection Flaws: "Review the given web application code or request logs and identify potential injection vulnerabilities such as SQL injection, command injection, or XSS. Provide remediation steps: [Insert code or logs]."
	- Unencrypted Services: "Analyze the following network configuration and identify services that are transmitting data without encryption. Suggest strategies to enforce secure transmission: [Insert network configuration details]."
	- Known Software Vulnerabilities: "Review the provided software inventory and cross-reference it with known vulnerabilities in the National Vulnerability Database (NVD). Recommend patches or workarounds: [Insert software inventory]."
	- Cross-Site Request Forgery (CSRF): "Examine the provided web application code for potential CSRF vulnerabilities. Suggest specific coding or configuration techniques to prevent these attacks: [Insert code]."
	- Insecure Direct Object References (IDOR): "Analyze the provided API endpoints and their associated access controls. Identify any IDOR vulnerabilities and suggest secure implementation strategies: [Insert API endpoint details]."
	- Security Misconfigurations in Web Servers/Applications: "Assess the given web server configuration for security misconfigurations, such as improper HTTP headers or verbose error messages. Recommend changes to harden the server: [Insert server configuration]."
	- Broken Authentication and Session Management: "Review the provided authentication and session management implementation. Identify weaknesses and recommend strategies to prevent compromise: [Insert authentication/session management details]."
	- Sensitive Data Exposure: "Analyze the system's data handling processes and storage practices to identify potential sensitive data exposure. Recommend measures to protect sensitive information: [Insert system details]."
	- API Vulnerabilities: "Examine the following API documentation and implementation for vulnerabilities, including insecure endpoints and data leakage. Provide recommendations for securing the API: [Insert API documentation]."
	- Denial of Service (DoS) Vulnerabilities: "Review the system's architecture and configuration for potential vulnerabilities to DoS attacks. Suggest mitigation strategies such as rate limiting and load balancing: [Insert system architecture]."
	- Buffer Overflows: "Analyze the provided code or application for buffer overflow vulnerabilities. Highlight potential weak points and recommend secure coding practices to prevent exploitation: [Insert code]."


	## Model Training Details

	### Training Objective
	The model was fine-tuned to classify descriptive text as either an attack or non-attack event. It uses a binary classification approach.

	### Training Data
	- The training data includes cybersecurity-related attack descriptions and non-attack examples from curated datasets.

	---

	## Evaluation

	The model was evaluated on a balanced test set using the following metrics:
	- Accuracy: 85%
	- F1 Score: 0.83
	- Precision: 0.80
	- Recall: 0.87

	These results indicate strong performance in detecting cyberattacks from text.

	---

	## License

	This project is licensed under the MIT License. Refer to the [LICENSE](LICENSE) file for details.

	---

	## How to Contribute

	We welcome contributions!
	- Submit Issues: If you encounter problems, open an issue on the repository.
	- Pull Requests: Feel free to contribute code improvements or documentation updates.

	---

	## Contact

	For further information or inquiries, contact: canstralian@cybersecurity.com