Update README.md
Browse files
README.md
CHANGED
|
@@ -69,4 +69,40 @@ nlp = pipeline("text-classification", model=model_name)
|
|
| 69 |
# Example usage
|
| 70 |
result = nlp("Example shell command or exploit input")
|
| 71 |
print(result)
|
| 72 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
# Example usage
|
| 70 |
result = nlp("Example shell command or exploit input")
|
| 71 |
print(result)
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Training Details
|
| 75 |
+
### Training Data
|
| 76 |
+
The model was fine-tuned on the following datasets:
|
| 77 |
+
|
| 78 |
+
- Canstralian/ShellCommands: A collection of shell commands used in cybersecurity contexts.
|
| 79 |
+
- Canstralian/CyberExploitDB: A curated set of known exploits and vulnerabilities.
|
| 80 |
+
Further details on the preprocessing of these datasets can be found in their respective dataset cards.
|
| 81 |
+
|
| 82 |
+
## Training Procedure
|
| 83 |
+
### Preprocessing
|
| 84 |
+
The data was preprocessed to remove any sensitive or personally identifiable information. Text normalization and tokenization were applied to ensure consistency across the datasets.
|
| 85 |
+
|
| 86 |
+
### Training Hyperparameters
|
| 87 |
+
Training regime: fp16 mixed precision
|
| 88 |
+
Evaluation
|
| 89 |
+
Testing Data, Factors & Metrics
|
| 90 |
+
Testing was performed on both synthetic and real-world shell command and exploit datasets, focusing on their ability to correctly parse shell commands and identify exploit signatures.
|
| 91 |
+
|
| 92 |
+
## Factors
|
| 93 |
+
The evaluation factors included:
|
| 94 |
+
|
| 95 |
+
Model performance across different types of shell commands and exploits.
|
| 96 |
+
Accuracy, precision, recall, and F1-score in detecting known exploits.
|
| 97 |
+
## Metrics
|
| 98 |
+
Metrics used for evaluation include:
|
| 99 |
+
|
| 100 |
+
- Accuracy: Percentage of correct predictions made by the model.
|
| 101 |
+
- Precision: The number of relevant instances among the retrieved instances.
|
| 102 |
+
- Recall: The number of relevant instances that were retrieved.
|
| 103 |
+
- F1-score: The harmonic mean of precision and recall.
|
| 104 |
+
## Results
|
| 105 |
+
The model performs well on standard shell command parsing tasks and exploit detection, with high accuracy for common exploits. However, its performance may degrade on newer or less common exploits.
|
| 106 |
+
|
| 107 |
+
## Summary
|
| 108 |
+
The model is well-suited for cybersecurity applications involving shell command and exploit detection. While it excels in these areas, users should monitor its performance for emerging threats and unusual attack patterns.
|