nanda-rani
/

TTPXHunter

@@ -1,12 +1,14 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -14,78 +16,112 @@ tags: []
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
 [More Information Needed]
@@ -94,35 +130,35 @@ Use the code below to get started with the model.
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
 #### Testing Data
 <!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
@@ -136,13 +172,13 @@ Use the code below to get started with the model.
 <!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 - **Hardware Type:** [More Information Needed]
 - **Hours used:** [More Information Needed]
@@ -174,17 +210,30 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 **BibTeX:**
-[More Information Needed]
 **APA:**
-[More Information Needed]
-## Glossary [optional]
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
 ## More Information [optional]

 ---
 library_name: transformers
+license: mit
+language:
+- en
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
+The **TTPXHunter** model is designed to automate the extraction of actionable threat intelligence by identifying **Tactics, Techniques, and Procedures (TTPs)** from unstructured narrative threat reports. Using natural language processing (NLP) techniques, TTPXHunter processes text, identifying adversarial tactics and techniques in accordance with established frameworks like MITRE ATT&CK. The model filters predictions based on a confidence threshold, ensuring only high-confidence TTPs are considered for analysis. Once identified, these TTPs are mapped to predefined labels, converting them into actionable insights for cybersecurity teams. This automation enhances the speed and accuracy of threat intelligence gathering, allowing for timely and effective responses to emerging threats.
 ## Model Details
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
+**TTPXHunter** is an advanced model aimed at automating the extraction of actionable threat intelligence from unstructured cybersecurity reports, with a particular focus on identifying **Tactics, Techniques, and Procedures (TTPs)**. These TTPs represent the strategies, methods, and activities used by cyber adversaries during attacks. Typically, threat reports, which are generated by cybersecurity researchers or intelligence units, are dense with information but are presented in a narrative form, making it difficult and time-consuming for security teams to extract relevant intelligence manually. **TTPXHunter** addresses this challenge by leveraging **natural language processing (NLP)** and **machine learning** to automatically analyze these reports and highlight the key components related to adversary behavior.
+At its core, TTPXHunter functions by tokenizing and processing the raw text from threat reports, breaking it down into manageable pieces for analysis. Once the text is tokenized, the model applies sophisticated algorithms to detect and extract TTPs embedded within the narrative. These TTPs are crucial in understanding how a specific attack unfolds, as they align with known behaviors described in widely adopted frameworks like **MITRE ATT&CK**, which categorizes adversary behaviors into tactics and techniques.
+TTPXHunter goes beyond simple text extraction by incorporating a **prediction filtering mechanism**. This involves applying a confidence threshold to the predicted TTPs, ensuring that only those with a high degree of certainty are retained for further use. This filtering process is essential for reducing noise and focusing on the most relevant and actionable insights from the text.
+After identifying and filtering the TTPs, **TTPXHunter** maps them to predefined labels using a mapping system (such as **id2label**), which translates the extracted information into structured, actionable intelligence. These labels are often tied to industry-standard classifications, enabling cybersecurity teams to easily integrate the findings into their existing threat analysis workflows. For example, the model might map a detected technique directly to a known technique within the **MITRE ATT&CK** framework, allowing security teams to quickly correlate the intelligence with known adversary activities.
+The final output of **TTPXHunter** is a set of unique TTP identifiers, along with their corresponding names, which represent a comprehensive view of the adversary’s strategies, techniques, and methods. This output provides security teams with the actionable data needed to enhance their defenses and inform their response strategies. By automating the extraction and mapping of TTPs, **TTPXHunter** significantly reduces the manual effort required to analyze narrative reports, accelerates the time to threat detection, and improves the overall accuracy of intelligence gathering.
+In summary, **TTPXHunter** serves as a powerful tool in the realm of threat intelligence by automating the tedious and complex process of extracting TTPs from large volumes of unstructured text. It provides security professionals with the insights they need to stay ahead of cyber threats, making it a valuable asset in the modern cybersecurity landscape.
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** Nanda Rani
+<!-- - **Funded by [optional]:** [More Information Needed] -->
+<!-- - **Shared by [optional]:** [More Information Needed]
+<!-- - **Model type:** [More Information Needed]
+<!-- - **Language(s) (NLP):** [More Information Needed]
+<!-- - **License:** [More Information Needed]
+<!-- - **Finetuned from model [optional]:** [More Information Needed]-->
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
+- **Repository:** [https://github.com/nanda-rani/TTPXHunter-Actionable-Threat-Intelligence-Extraction-as-TTPs-from-Finished-Cyber-Threat-Reports](https://github.com/nanda-rani/TTPXHunter-Actionable-Threat-Intelligence-Extraction-as-TTPs-from-Finished-Cyber-Threat-Reports)
+- **Paper [optional]:** [https://dl.acm.org/doi/abs/10.1145/3579375.3579391](https://dl.acm.org/doi/abs/10.1145/3579375.3579391)
+<!-- - **Demo [optional]:** [More Information Needed] -->
+<!-- ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+<!-- ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+<!-- [More Information Needed] -->
+<!-- ### Downstream Use [optional] -->
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+<!-- ## Model Usage: Fine-Tuning and Integration into Larger Systems -->
+### Fine-Tuning TTPXHunter for Specific Tasks
+The **TTPXHunter** model can be fine-tuned for specific cybersecurity tasks, making it adaptable to various threat intelligence scenarios. By fine-tuning the model on domain-specific threat reports or focusing on certain threat actors, sectors, or techniques, the accuracy and relevance of the TTP extraction can be significantly enhanced.
+Fine-tuning may involve retraining TTPXHunter on specialized datasets such as:
+- **Industry-Specific Threat Reports**: For example, threat intelligence reports in telecom, healthcare, or finance, which may focus on different TTPs.
+- **Region-Specific Threats**: Training the model on regional adversaries or geopolitically motivated cyber attacks.
+- **Emerging Techniques**: Fine-tuning to better capture newly observed attack vectors or novel techniques.
+Fine-tuning allows **TTPXHunter** to perform more effectively in niche areas, enabling organizations to adapt the model to the nuances of their specific threat landscape. When fine-tuned, TTPXHunter can provide more targeted intelligence, helping security teams stay one step ahead of adversaries that focus on particular industries or regions.
+### Integrating TTPXHunter into Larger Ecosystems or Applications
+**TTPXHunter** can also be integrated as a core component in a larger cybersecurity ecosystem or application. Its ability to automatically extract and map TTPs makes it suitable for various roles, such as:
+- **Threat Intelligence Platforms (TIPs)**: By plugging **TTPXHunter** into a TIP, organizations can automatically enrich incoming threat reports with actionable intelligence, accelerating the correlation of new information with known attack patterns.
+- **Security Information and Event Management (SIEM) Systems**: Integration with SIEM systems allows TTPXHunter to analyze logs, alerts, and threat reports in real time, generating enriched insights that aid in threat hunting and incident response.
+- **Endpoint Detection and Response (EDR) Solutions**: In the context of EDR, **TTPXHunter** can enhance detection capabilities by mapping endpoint behaviors and suspicious activity to specific TTPs, allowing faster identification of adversarial behaviors and informing the appropriate mitigation strategies.
+- **Automated Threat Attribution Systems**: Integrated into an attribution pipeline, **TTPXHunter** helps match TTPs from unstructured reports to known adversaries, improving accuracy in linking incidents to specific threat actors.
+- **Machine Learning Pipelines for Threat Prediction**: When coupled with other machine learning models for anomaly detection or predictive analytics, **TTPXHunter** can serve as a feature extractor, contributing TTP-based intelligence to the model and improving prediction accuracy.
+By integrating **TTPXHunter** into these systems, organizations can enhance their overall cybersecurity posture, making real-time detection and response more intelligent and actionable. Additionally, its outputs can be fed into orchestration tools to automate the response to detected threats based on the extracted TTPs, allowing for rapid action to mitigate adversarial activities.
+<!-- ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+<!-- [More Information Needed]
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
+<!-- [More Information Needed]
+<!-- ### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+<!-- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. -->
 ## How to Get Started with the Model
+Run the notebook named **TTPXHunter.ipynb** available at project GitHub [https://github.com/nanda-rani/TTPXHunter-Actionable-Threat-Intelligence-Extraction-as-TTPs-from-Finished-Cyber-Threat-Reports](link)
+<!-- ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+<!-- [More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+<!-- #### Preprocessing [optional]
 [More Information Needed]
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+<!-- #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+<!-- [More Information Needed]
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
+<!-- ### Testing Data, Factors & Metrics
 #### Testing Data
 <!-- This should link to a Dataset Card if possible. -->
+<!-- [More Information Needed]
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+<!-- [More Information Needed]
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
+<!-- [More Information Needed]
 ### Results
 <!-- Relevant interpretability work for the model goes here -->
+<!-- [More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+<!-- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 - **Hardware Type:** [More Information Needed]
 - **Hours used:** [More Information Needed]
 **BibTeX:**
+@article{10.1145/3696427,
+author = {Rani, Nanda and Saha, Bikash and Maurya, Vikas and Shukla, Sandeep Kumar},
+title = {TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports},
+year = {2024},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3696427},
+doi = {10.1145/3696427},
+abstract = {Understanding the modus operandi of adversaries aids organizations to employ efficient defensive strategies and share intelligence in the community. This knowledge is often present in unstructured natural language text within threat analysis reports. A translation tool is needed to interpret the modus operandi explained in the sentences of the threat report and convert it into a structured format. This research introduces a methodology named TTPXHunter for automated extraction of threat intelligence in terms of Tactics, Techniques, and Procedures (TTPs) from finished cyber threat reports. It leverages cyber domain-specific state-of-the-art natural language model to augment sentences for minority class TTPs and refine pinpointing the TTPs in threat analysis reports significantly. We create two datasets: an augmented sentence-TTP dataset of  (39,296)  sentence samples and a  (149)  real-world cyber threat intelligence report-to-TTP dataset. Further, we evaluate TTPXHunter on the augmented sentence and report datasets. The TTPXHunter achieves the highest performance of  (92.42\%)  f1-score on the augmented dataset, and it also outperforms existing state-of-the-art TTP extraction method by achieving an f1-score of  (97.09\%)  when evaluated over the report dataset. TTPXHunter significantly improves cybersecurity threat intelligence by offering quick, actionable insights into attacker behaviors. This advancement automates threat intelligence analysis and provides a crucial tool for cybersecurity professionals to combat cyber threats.},
+note = {Just Accepted},
+journal = {Digital Threats},
+month = {sep},
+keywords = {Threat Intelligence, TTP Extraction, MITRE ATT&CK, Natural Language Processing, Threat Intelligence Extraction, TTP Classification, Cyber Security and AI, Cyber Security Threats, NLP, Cybersecurity}
+}
 **APA:**
+Nanda Rani, Bikash Saha, Vikas Maurya, and Sandeep Kumar Shukla. 2024. TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports. Digital Threats Just Accepted (September 2024). https://doi.org/10.1145/3696427
+<!-- ## Glossary [optional]
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+<!-- [More Information Needed]
 ## More Information [optional]