nanda-rani commited on
Commit
abfb4e5
·
verified ·
1 Parent(s): 0c3cf44

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -40
README.md CHANGED
@@ -1,12 +1,14 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
11
 
12
  ## Model Details
@@ -14,78 +16,112 @@ tags: []
14
  ### Model Description
15
 
16
  <!-- Provide a longer summary of what this model is. -->
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
  ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
37
 
38
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
41
 
42
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
49
 
50
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
- ### Out-of-Scope Use
 
 
53
 
54
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
 
56
- [More Information Needed]
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
65
 
66
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
-
76
- ## Training Details
77
 
78
  ### Training Data
79
 
80
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
83
 
84
  ### Training Procedure
85
 
86
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
  [More Information Needed]
91
 
@@ -94,35 +130,35 @@ Use the code below to get started with the model.
94
 
95
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
  ## Evaluation
104
 
105
  <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
  #### Testing Data
110
 
111
  <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
  #### Factors
116
 
117
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
  #### Metrics
122
 
123
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
  ### Results
128
 
@@ -136,13 +172,13 @@ Use the code below to get started with the model.
136
 
137
  <!-- Relevant interpretability work for the model goes here -->
138
 
139
- [More Information Needed]
140
 
141
  ## Environmental Impact
142
 
143
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
  - **Hardware Type:** [More Information Needed]
148
  - **Hours used:** [More Information Needed]
@@ -174,17 +210,30 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
174
 
175
  **BibTeX:**
176
 
177
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
  **APA:**
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
184
 
185
  <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
188
 
189
  ## More Information [optional]
190
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ language:
5
+ - en
6
  ---
7
 
8
  # Model Card for Model ID
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
+ The **TTPXHunter** model is designed to automate the extraction of actionable threat intelligence by identifying **Tactics, Techniques, and Procedures (TTPs)** from unstructured narrative threat reports. Using natural language processing (NLP) techniques, TTPXHunter processes text, identifying adversarial tactics and techniques in accordance with established frameworks like MITRE ATT&CK. The model filters predictions based on a confidence threshold, ensuring only high-confidence TTPs are considered for analysis. Once identified, these TTPs are mapped to predefined labels, converting them into actionable insights for cybersecurity teams. This automation enhances the speed and accuracy of threat intelligence gathering, allowing for timely and effective responses to emerging threats.
12
 
13
 
14
  ## Model Details
 
16
  ### Model Description
17
 
18
  <!-- Provide a longer summary of what this model is. -->
19
+ **TTPXHunter** is an advanced model aimed at automating the extraction of actionable threat intelligence from unstructured cybersecurity reports, with a particular focus on identifying **Tactics, Techniques, and Procedures (TTPs)**. These TTPs represent the strategies, methods, and activities used by cyber adversaries during attacks. Typically, threat reports, which are generated by cybersecurity researchers or intelligence units, are dense with information but are presented in a narrative form, making it difficult and time-consuming for security teams to extract relevant intelligence manually. **TTPXHunter** addresses this challenge by leveraging **natural language processing (NLP)** and **machine learning** to automatically analyze these reports and highlight the key components related to adversary behavior.
20
+
21
+ At its core, TTPXHunter functions by tokenizing and processing the raw text from threat reports, breaking it down into manageable pieces for analysis. Once the text is tokenized, the model applies sophisticated algorithms to detect and extract TTPs embedded within the narrative. These TTPs are crucial in understanding how a specific attack unfolds, as they align with known behaviors described in widely adopted frameworks like **MITRE ATT&CK**, which categorizes adversary behaviors into tactics and techniques.
22
+
23
+ TTPXHunter goes beyond simple text extraction by incorporating a **prediction filtering mechanism**. This involves applying a confidence threshold to the predicted TTPs, ensuring that only those with a high degree of certainty are retained for further use. This filtering process is essential for reducing noise and focusing on the most relevant and actionable insights from the text.
24
+
25
+ After identifying and filtering the TTPs, **TTPXHunter** maps them to predefined labels using a mapping system (such as **id2label**), which translates the extracted information into structured, actionable intelligence. These labels are often tied to industry-standard classifications, enabling cybersecurity teams to easily integrate the findings into their existing threat analysis workflows. For example, the model might map a detected technique directly to a known technique within the **MITRE ATT&CK** framework, allowing security teams to quickly correlate the intelligence with known adversary activities.
26
+
27
+ The final output of **TTPXHunter** is a set of unique TTP identifiers, along with their corresponding names, which represent a comprehensive view of the adversary’s strategies, techniques, and methods. This output provides security teams with the actionable data needed to enhance their defenses and inform their response strategies. By automating the extraction and mapping of TTPs, **TTPXHunter** significantly reduces the manual effort required to analyze narrative reports, accelerates the time to threat detection, and improves the overall accuracy of intelligence gathering.
28
+
29
+ In summary, **TTPXHunter** serves as a powerful tool in the realm of threat intelligence by automating the tedious and complex process of extracting TTPs from large volumes of unstructured text. It provides security professionals with the insights they need to stay ahead of cyber threats, making it a valuable asset in the modern cybersecurity landscape.
30
+
31
 
32
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
33
 
34
+ - **Developed by:** Nanda Rani
35
+ <!-- - **Funded by [optional]:** [More Information Needed] -->
36
+ <!-- - **Shared by [optional]:** [More Information Needed]
37
+ <!-- - **Model type:** [More Information Needed]
38
+ <!-- - **Language(s) (NLP):** [More Information Needed]
39
+ <!-- - **License:** [More Information Needed]
40
+ <!-- - **Finetuned from model [optional]:** [More Information Needed]-->
41
 
42
  ### Model Sources [optional]
43
 
44
  <!-- Provide the basic links for the model. -->
45
 
46
+ - **Repository:** [https://github.com/nanda-rani/TTPXHunter-Actionable-Threat-Intelligence-Extraction-as-TTPs-from-Finished-Cyber-Threat-Reports](https://github.com/nanda-rani/TTPXHunter-Actionable-Threat-Intelligence-Extraction-as-TTPs-from-Finished-Cyber-Threat-Reports)
47
+ - **Paper [optional]:** [https://dl.acm.org/doi/abs/10.1145/3579375.3579391](https://dl.acm.org/doi/abs/10.1145/3579375.3579391)
48
+ <!-- - **Demo [optional]:** [More Information Needed] -->
49
 
50
+ <!-- ## Uses
51
 
52
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
53
 
54
+ <!-- ### Direct Use
55
 
56
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
57
 
58
+ <!-- [More Information Needed] -->
59
 
60
+ <!-- ### Downstream Use [optional] -->
61
 
62
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
63
+ <!-- ## Model Usage: Fine-Tuning and Integration into Larger Systems -->
64
 
65
+ ### Fine-Tuning TTPXHunter for Specific Tasks
66
+
67
+ The **TTPXHunter** model can be fine-tuned for specific cybersecurity tasks, making it adaptable to various threat intelligence scenarios. By fine-tuning the model on domain-specific threat reports or focusing on certain threat actors, sectors, or techniques, the accuracy and relevance of the TTP extraction can be significantly enhanced.
68
+
69
+ Fine-tuning may involve retraining TTPXHunter on specialized datasets such as:
70
+ - **Industry-Specific Threat Reports**: For example, threat intelligence reports in telecom, healthcare, or finance, which may focus on different TTPs.
71
+ - **Region-Specific Threats**: Training the model on regional adversaries or geopolitically motivated cyber attacks.
72
+ - **Emerging Techniques**: Fine-tuning to better capture newly observed attack vectors or novel techniques.
73
+
74
+ Fine-tuning allows **TTPXHunter** to perform more effectively in niche areas, enabling organizations to adapt the model to the nuances of their specific threat landscape. When fine-tuned, TTPXHunter can provide more targeted intelligence, helping security teams stay one step ahead of adversaries that focus on particular industries or regions.
75
+
76
+ ### Integrating TTPXHunter into Larger Ecosystems or Applications
77
+
78
+ **TTPXHunter** can also be integrated as a core component in a larger cybersecurity ecosystem or application. Its ability to automatically extract and map TTPs makes it suitable for various roles, such as:
79
+
80
+ - **Threat Intelligence Platforms (TIPs)**: By plugging **TTPXHunter** into a TIP, organizations can automatically enrich incoming threat reports with actionable intelligence, accelerating the correlation of new information with known attack patterns.
81
+ - **Security Information and Event Management (SIEM) Systems**: Integration with SIEM systems allows TTPXHunter to analyze logs, alerts, and threat reports in real time, generating enriched insights that aid in threat hunting and incident response.
82
+ - **Endpoint Detection and Response (EDR) Solutions**: In the context of EDR, **TTPXHunter** can enhance detection capabilities by mapping endpoint behaviors and suspicious activity to specific TTPs, allowing faster identification of adversarial behaviors and informing the appropriate mitigation strategies.
83
+ - **Automated Threat Attribution Systems**: Integrated into an attribution pipeline, **TTPXHunter** helps match TTPs from unstructured reports to known adversaries, improving accuracy in linking incidents to specific threat actors.
84
+ - **Machine Learning Pipelines for Threat Prediction**: When coupled with other machine learning models for anomaly detection or predictive analytics, **TTPXHunter** can serve as a feature extractor, contributing TTP-based intelligence to the model and improving prediction accuracy.
85
+
86
+ By integrating **TTPXHunter** into these systems, organizations can enhance their overall cybersecurity posture, making real-time detection and response more intelligent and actionable. Additionally, its outputs can be fed into orchestration tools to automate the response to detected threats based on the extracted TTPs, allowing for rapid action to mitigate adversarial activities.
87
 
88
+
89
+
90
+ <!-- ### Out-of-Scope Use
91
 
92
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
93
 
94
+ <!-- [More Information Needed]
95
 
96
  ## Bias, Risks, and Limitations
97
 
98
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
99
 
100
+ <!-- [More Information Needed]
101
 
102
+ <!-- ### Recommendations
103
 
104
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
105
 
106
+ <!-- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. -->
107
 
108
  ## How to Get Started with the Model
109
 
110
+ Run the notebook named **TTPXHunter.ipynb** available at project GitHub [https://github.com/nanda-rani/TTPXHunter-Actionable-Threat-Intelligence-Extraction-as-TTPs-from-Finished-Cyber-Threat-Reports](link)
111
 
112
+ <!-- ## Training Details
 
 
113
 
114
  ### Training Data
115
 
116
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
117
 
118
+ <!-- [More Information Needed]
119
 
120
  ### Training Procedure
121
 
122
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
123
 
124
+ <!-- #### Preprocessing [optional]
125
 
126
  [More Information Needed]
127
 
 
130
 
131
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
132
 
133
+ <!-- #### Speeds, Sizes, Times [optional]
134
 
135
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
136
 
137
+ <!-- [More Information Needed]
138
 
139
  ## Evaluation
140
 
141
  <!-- This section describes the evaluation protocols and provides the results. -->
142
 
143
+ <!-- ### Testing Data, Factors & Metrics
144
 
145
  #### Testing Data
146
 
147
  <!-- This should link to a Dataset Card if possible. -->
148
 
149
+ <!-- [More Information Needed]
150
 
151
  #### Factors
152
 
153
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
154
 
155
+ <!-- [More Information Needed]
156
 
157
  #### Metrics
158
 
159
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
160
 
161
+ <!-- [More Information Needed]
162
 
163
  ### Results
164
 
 
172
 
173
  <!-- Relevant interpretability work for the model goes here -->
174
 
175
+ <!-- [More Information Needed]
176
 
177
  ## Environmental Impact
178
 
179
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
180
 
181
+ <!-- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
182
 
183
  - **Hardware Type:** [More Information Needed]
184
  - **Hours used:** [More Information Needed]
 
210
 
211
  **BibTeX:**
212
 
213
+ @article{10.1145/3696427,
214
+ author = {Rani, Nanda and Saha, Bikash and Maurya, Vikas and Shukla, Sandeep Kumar},
215
+ title = {TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports},
216
+ year = {2024},
217
+ publisher = {Association for Computing Machinery},
218
+ address = {New York, NY, USA},
219
+ url = {https://doi.org/10.1145/3696427},
220
+ doi = {10.1145/3696427},
221
+ abstract = {Understanding the modus operandi of adversaries aids organizations to employ efficient defensive strategies and share intelligence in the community. This knowledge is often present in unstructured natural language text within threat analysis reports. A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and convert it into a structured format. This research introduces a methodology named TTPXHunter for automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports. It leverages cyber domain-specific state-of-the-art natural language model to augment sentences for minority class TTPs and refine pinpointing the TTPs in threat analysis reports significantly. We create two datasets: an augmented sentence-TTP dataset of (39,296) sentence samples and a (149) real-world cyber threat intelligence report-to-TTP dataset. Further, we evaluate TTPXHunter on the augmented sentence and report datasets. The TTPXHunter achieves the highest performance of (92.42\%) f1-score on the augmented dataset, and it also outperforms existing state-of-the-art TTP extraction method by achieving an f1-score of (97.09\%) when evaluated over the report dataset. TTPXHunter significantly improves cybersecurity threat intelligence by offering quick, actionable insights into attacker behaviors. This advancement automates threat intelligence analysis and provides a crucial tool for cybersecurity professionals to combat cyber threats.},
222
+ note = {Just Accepted},
223
+ journal = {Digital Threats},
224
+ month = {sep},
225
+ keywords = {Threat Intelligence, TTP Extraction, MITRE ATT&CK, Natural Language Processing, Threat Intelligence Extraction, TTP Classification, Cyber Security and AI, Cyber Security Threats, NLP, Cybersecurity}
226
+ }
227
 
228
  **APA:**
229
 
230
+ Nanda Rani, Bikash Saha, Vikas Maurya, and Sandeep Kumar Shukla. 2024. TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports. Digital Threats Just Accepted (September 2024). https://doi.org/10.1145/3696427
231
 
232
+ <!-- ## Glossary [optional]
233
 
234
  <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
235
 
236
+ <!-- [More Information Needed]
237
 
238
  ## More Information [optional]
239