Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

1_Pooling/config.json +10 -0
README.md +654 -0
config.json +55 -0
config_sentence_transformers.json +14 -0
model.safetensors +3 -0
modules.json +14 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +952 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 768,
+    "pooling_mode_cls_token": false,
+    "pooling_mode_mean_tokens": true,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,654 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- dense
+- generated_from_trainer
+- dataset_size:35705
+- loss:MultipleNegativesRankingLoss
+widget:
+- source_sentence: What is the primary responsibility of the Information Security
+    Oversight Committee in an organization?
+  sentences:
+  - Least privilege
+  - By searching for repeating ciphertext sequences at fixed displacements.
+  - Ensuring and supporting information protection awareness and training programs
+- source_sentence: Which of the following databases are required to be maintained
+    by any system participating in an IPSec VPN?
+  sentences:
+  - 'Gatekeeper bypass through code signing exploitation represents a sophisticated
+    attack vector targeting macOS''s application verification mechanism. Understanding
+    detection indicators requires examining both technical artifacts and behavioral
+    patterns associated with compromised digital signatures.\n\n**Primary Technical
+    Indicators:**\n\nCode signing certificate anomalies constitute the most direct
+    indicator. Legitimate applications possess valid, unexpired certificates from
+    trusted authorities like Apple or recognized developers. Suspicious indicators
+    include self-signed certificates, expired certificates, certificates issued by
+    unrecognized authorities, or certificates with unusual subject alternative names
+    (SANs). The `codesign` command reveals signature validity, while examining certificate
+    chains through Keychain Access exposes potential anomalies.\n\nBinary modification
+    signatures often manifest as \\\"unsigned\\\" status for previously signed applications.
+    Gatekeeper maintains a whitelist of notarized applications; unsigned binaries
+    attempting execution trigger alerts in system logs located at `/var/log/system.log`.
+    Additionally, applications with altered code signing identifiers (CSIDs) or modified
+    entitlements may indicate tampering.\n\n**Behavioral and System-Level Indicators:**\n\nProcess
+    execution from non-standard locations frequently accompanies successful bypasses.
+    Legitimate Gatekeeper-approved applications typically execute from `/Applications`
+    or user-specific application directories. Execution from temporary directories,
+    Downloads folders, or unusual paths warrants investigation.\n\nNetwork behavior
+    analysis reveals additional indicators. Compromised applications may exhibit unexpected
+    network connections, particularly to suspicious domains or IP addresses not associated
+    with the legitimate application''s functionality. DNS queries to newly registered
+    domains (NRDs) or domains with high entropy often indicate command-and-control
+    communications.\n\n**MITRE ATT&CK Framework Alignment:**\n\nThis technique aligns
+    with T1553.002 (Subvert Trust Controls: Code Signing). Adversaries exploit weaknesses
+    in code signing verification processes, potentially through stolen certificates,
+    certificate authority compromise, or exploitation of bypass mechanisms like manual
+    allowlisting.\n\n**Detection and Response Strategies:**\n\nImplement comprehensive
+    logging using the Unified Logging system with custom predicates monitoring `com.apple.securityd`
+    events. Deploy endpoint detection solutions capable of real-time code signing
+    validation and behavioral analysis. Regularly audit installed applications against
+    known-good baselines, focusing on unsigned or suspiciously signed executables.\n\nNIST
+    Cybersecurity Framework alignment emphasizes continuous monitoring (DE.CM) and
+    anomaly detection capabilities within the Detect function, ensuring organizations
+    maintain visibility into potential Gatekeeper bypass attempts through robust logging
+    and behavioral analysis mechanisms.'
+  - Security Policy Database (SPD) and Security Association Database (SAD)
+  - Virus
+- source_sentence: How is a supply chain attack implemented through compromised software
+    development kits (SDKs) and their propagation to thousands of applications?
+  sentences:
+  - 'Detecting security label tampering through extended attributes (xattrs) requires
+    implementing comprehensive monitoring and validation mechanisms aligned with NIST
+    Cybersecurity Framework''s Detect function and MITRE ATT&CK''s Defense Evasion
+    tactics.\n\n**Xattr Monitoring Techniques:**\n\nImplement real-time file system
+    monitoring using tools like `auditd` or Windows Event Tracing to track xattr modifications.
+    Configure audit rules targeting specific security-critical files and directories,
+    focusing on operations like `SETXATTR`, `GETXATTR`, and `LISTXATTR`. This aligns
+    with NIST CSF DE.CM-1 (continuous monitoring) by establishing baseline behaviors
+    for legitimate xattr usage patterns.\n\n**Integrity Validation Methods:**\n\nDeploy
+    cryptographic hashing of security labels stored in xattrs, creating immutable
+    reference values. Implement periodic verification against these baselines using
+    SHA-256 or stronger algorithms. This corresponds to NIST CSF PR.DS-6 (integrity
+    checking mechanisms) and provides detection capabilities for unauthorized modifications.\n\n**Behavioral
+    Analysis:**\n\nEstablish user and process behavior profiling for xattr operations,
+    identifying anomalous patterns that deviate from established baselines. Monitor
+    for unusual privilege escalation attempts modifying security labels, particularly
+    focusing on MITRE ATT&CK technique T1562.008 (Impair Defenses: Disable or Modify
+    Tools) where adversaries manipulate security mechanisms.\n\n**System Integration:**\n\nLeverage
+    SELinux or AppArmor mandatory access controls to restrict unauthorized xattr modifications.
+    Implement centralized logging aggregation correlating xattr changes with process
+    execution and network activities, enabling correlation analysis for sophisticated
+    tampering attempts.\n\n**Detection Signatures:**\n\nDevelop custom detection rules
+    identifying suspicious xattr patterns, including rapid successive modifications,
+    bulk security label changes across multiple files, or modifications from unexpected
+    processes. Integrate these signatures into SIEM platforms for automated alerting
+    and incident response workflows.\n\nThis multi-layered approach provides comprehensive
+    coverage against sophisticated tampering attempts while maintaining operational
+    efficiency through targeted monitoring strategies.'
+  - Supply chain attacks occur when an attacker injects malicious code into trusted
+    components in the software supply chain, such as open source libraries or SDKs.
+    These components are then distributed to many developers and organizations worldwide.
+    Once they integrate these seemingly legitimate tools into their own products,
+    the malware is automatically embedded within them, propagating widely across various
+    applications and devices. Attackers can also compromise update servers that deliver
+    patches to millions of systems simultaneously. The Sunburst attack on SolarWinds
+    was one such supply chain attack where a malicious update was pushed through the
+    Orion update server. In this case, attackers used the compromised SDK from Pulse
+    Secure to propagate the malware. Because Pulse Secure is used by many organizations
+    for secure remote access solutions, their software development kit was distributed
+    as part of legitimate downloads. Attackers then inserted their own malicious code
+    into that SDK, which in turn infected all products built using it. This attack
+    caused massive damage and forced a significant number of companies to implement
+    new policies regarding software updates and vendor trustworthiness. The SolarWinds
+    supply chain attack also demonstrated the importance of monitoring for suspicious
+    network traffic patterns and adopting multi-factor authentication (MFA) to limit
+    access to sensitive systems. Attackers can easily bypass traditional security
+    measures if they manage to compromise a legitimate update server or SDK provider.
+    Thus, it is essential for companies to carefully vet their third-party software
+    providers and implement strict controls around the development lifecycle of critical
+    applications. Furthermore, adopting robust patch management strategies and keeping
+    an eye on suspicious activity in software repositories are important steps toward
+    reducing the risk of supply chain attacks. Finally, implementing continuous monitoring
+    tools like SIEM can provide early warning signs if any unexpected changes occur
+    within a software component or its corresponding repository. Overall, understanding
+    how supply chain attacks work is crucial for building more secure systems and
+    ensuring that organizations do not inadvertently become victims of such sophisticated
+    breaches in the future. Attackers can easily bypass traditional security measures
+    if they manage to compromise a legitimate update server or SDK provider. Thus,
+    it is essential for companies to carefully vet their third-party software providers
+    and implement strict controls around the development lifecycle of critical applications.
+    Furthermore, adopting robust patch management strategies and keeping an eye on
+    suspicious activity in software repositories are important steps toward reducing
+    the risk of supply chain attacks. Finally, implementing continuous monitoring
+    tools like SIEM can provide early warning signs if any unexpected changes occur
+    within a software component or its corresponding repository.
+  - 'An automated response system for isolating compromised systems requires a multi-layered
+    architecture integrating detection capabilities with orchestrated containment
+    actions, aligned with NIST Cybersecurity Framework''s Respond (RS) function and
+    MITRE ATT&CK defensive strategies.\n\n**Core Architecture Components:**\n\nThe
+    system should implement Security Orchestration, Automation, and Response (SOAR)
+    platforms integrated with Security Information and Event Management (SIEM) systems.
+    Central components include: detection engines processing indicators of compromise
+    (IoCs), automated decision matrices for risk assessment, and isolation mechanisms
+    that can quarantine affected assets without disrupting critical operations.\n\n**Detection
+    Integration:**\n\nLeverage MITRE ATT&CK techniques to establish comprehensive
+    monitoring across the attack lifecycle. Implement behavioral analytics detecting
+    tactics like Initial Access (T1566 Phishing), Execution (T1059 Command and Scripting
+    Interpreter), and Defense Evasion (T1027 Obfuscated Files). Deploy endpoint detection
+    and response (EDR) solutions monitoring process execution, network communications,
+    and file system modifications. Integrate threat intelligence feeds correlating
+    observed indicators with known exploitation campaigns.\n\n**Automated Response
+    Logic:**\n\nDesign tiered response capabilities based on confidence levels and
+    asset criticality. Implement network microsegmentation enabling granular isolation
+    through software-defined networking (SDN) controllers. Automated actions should
+    include: DNS sinkholing for malicious domains, firewall rule deployment blocking
+    suspicious traffic patterns, and network switch port isolation. Critical systems
+    require graceful degradation procedures maintaining business continuity.\n\n**Decision
+    Framework:**\n\nEstablish risk scoring algorithms incorporating asset value, threat
+    severity, and exploitation likelihood. Implement approval workflows for high-confidence
+    isolations while enabling rapid containment for confirmed compromises. Integration
+    with Configuration Management Databases (CMDB) ensures accurate asset inventory
+    and dependency mapping before executing isolation procedures.\n\n**Validation
+    and Recovery:**\n\nPost-isolation processes should include automated forensic
+    data collection, incident classification against MITRE ATT&CK framework, and coordinated
+    recovery procedures. Implement continuous monitoring ensuring isolation effectiveness
+    while maintaining operational readiness for subsequent threats.'
+- source_sentence: What are the best practices for SOC teams to enhance their threat
+    hunting capabilities against ScreenConnect vulnerabilities?
+  sentences:
+  - 'The hiberfil.sys file represents a critical artifact in digital forensics for
+    establishing temporal context and system state at specific points in time. This
+    Windows hibernation file contains compressed memory contents when a system enters
+    power-saving mode, preserving volatile data including running processes, loaded
+    drivers, and network connections.\n\n**Timeline Establishment Through Metadata
+    Analysis**\n\nThe creation timestamp of hiberfil.sys provides definitive evidence
+    of the last hibernation event, establishing a concrete temporal anchor point.
+    This timestamp corresponds to the exact moment Windows initiated hibernation mode,
+    typically occurring during system shutdown or power management events. By analyzing
+    this metadata alongside related artifacts like registry entries (HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Control\\\\Power)
+    and Event Viewer logs (Event ID 4634 for logoff), investigators can reconstruct
+    precise chronological sequences.\n\n**MITRE ATT&CK Framework Integration**\n\nWithin
+    the MITRE ATT&CK framework, hiberfil.sys analysis supports detection of several
+    techniques including T1055 (Process Injection) and T1070.004 (File Deletion).
+    Adversaries may attempt to manipulate or delete this file to eliminate forensic
+    evidence, making its presence or absence significant for timeline reconstruction.\n\n**NIST
+    Cybersecurity Framework Alignment**\n\nFollowing NIST CSF guidelines, particularly
+    the Respond (RS) and Recover (RC) functions, hiberfil.sys analysis supports incident
+    response activities. The metadata provides crucial temporal context for understanding
+    attack progression and system compromise duration.\n\n**Technical Implementation
+    Considerations**\n\nForensic investigators should examine multiple timestamp artifacts:
+    creation time, last modification time, and access time. However, the creation
+    timestamp remains most reliable as it reflects the initial hibernation event rather
+    than subsequent file operations that might alter other timestamps through normal
+    system operations or potential tampering attempts.\n\nThe hiberfil.sys file size
+    typically correlates with installed RAM capacity, providing additional contextual
+    information about system configuration at the time of hibernation. This metadata
+    correlation strengthens timeline accuracy by validating system state consistency
+    across multiple forensic artifacts.'
+  - 'Organizations can effectively leverage log analysis and Endpoint Detection and
+    Response (EDR) systems to monitor Alternate Data Streams (ADS) through a multi-layered
+    approach that addresses both detection capabilities and operational visibility.\n\n**Log
+    Analysis Implementation:**\nWindows Event Logs provide foundational monitoring
+    through Security, System, and Application logs. Critical events include 4663 (object
+    access), 4688 (process creation), and 5140 (file share access). Organizations
+    should configure advanced audit policies for \\\"Audit File System\\\" and \\\"Audit
+    Handle Manipulation\\\" under Local Security Policy. Sysmon configuration becomes
+    essential, particularly Event ID 2 (CreateFile) and Event ID 3 (NetworkConnect),
+    as these capture detailed file system interactions that standard Windows logs
+    might miss.\n\n**EDR System Configuration:**\nModern EDR platforms like CrowdStrike,
+    SentinelOne, or Microsoft Defender for Endpoint offer native ADS detection capabilities.
+    These systems should be configured to monitor:\n- File creation/modification events
+    with stream enumeration\n- Process access to files with multiple data streams\n-
+    Registry modifications associated with ADS-enabled applications\n- Network communications
+    from processes accessing hidden streams\n\n**Critical Directory Monitoring:**\nSystem
+    directories requiring enhanced monitoring include %SystemRoot%, %ProgramFiles%,
+    and user profile directories. Implement baseline integrity monitoring using tools
+    like Microsoft''s Attack Surface Reduction (ASR) rules or custom PowerShell scripts
+    that enumerate ADS presence through Get-ItemProperty -Name \\\"*\\\" commands.\n\n**MITRE
+    ATT&CK Alignment:**\nThis approach addresses T1096 (NTFS File Attributes), T1547.001
+    (Registry Run Keys/Startup Folder), and T1564.002 (Impair Defenses: Disable or
+    Modify Tools). Detection rules should correlate ADS creation with suspicious process
+    ancestry, particularly PowerShell execution or living-off-the-land binaries.\n\n**Operational
+    Integration:**\nEstablish automated response workflows that quarantine systems
+    exhibiting ADS anomalies while preserving forensic evidence. Implement centralized
+    logging aggregation using SIEM platforms configured to detect patterns indicating
+    ADS abuse, such as rapid stream creation followed by executable access attempts.\n\nThis
+    comprehensive monitoring strategy ensures organizations maintain visibility into
+    ADS activities while minimizing false positives through contextual analysis and
+    behavioral correlation.'
+  - SOC teams can enhance their threat hunting capabilities against ScreenConnect
+    vulnerabilities by adopting a proactive and iterative approach to searching for
+    indicators of compromise (IoCs) and anomalous activities that may indicate exploitation.
+    Develop and regularly update threat hunting hypotheses based on the latest threat
+    intelligence, focusing on known TTPs associated with the exploitation of ScreenConnect
+    vulnerabilities. Utilize advanced analytics and machine learning tools to sift
+    through large volumes of data for patterns and anomalies that may signify malicious
+    activity. Leverage endpoint detection and response (EDR) tools to continuously
+    monitor endpoints for signs of exploitation, such as unusual PowerShell command
+    execution, modification of system files, or unexpected network connections. Conduct
+    regular vulnerability scans and penetration tests to identify and remediate potential
+    weaknesses in ScreenConnect and other critical systems before attackers can exploit
+    them. Foster collaboration and information sharing with other organizations and
+    cybersecurity communities to gain insights into emerging threats and effective
+    detection and response strategies. Invest in continuous training and development
+    for SOC team members to keep them abreast of the latest cybersecurity trends,
+    tools, and techniques. By implementing these best practices, SOC teams can significantly
+    improve their ability to detect and respond to threats targeting ScreenConnect
+    vulnerabilities, thereby enhancing the overall security posture of their organization.
+- source_sentence: How would you use Amcache analysis to detect fileless malware that
+    drops temporary components for initial system compromise?
+  sentences:
+  - '# Automated Extraction of Empire Agent Configurations: Defensive Analysis\n\n##
+    NIST Cybersecurity Framework Context\n\nWithin the NIST CSF''s **Detect (DE)**
+    and **Respond (RS)** functions, organizations must implement capabilities to identify
+    and analyze malicious configurations. PowerShell Empire represents a sophisticated
+    post-exploitation framework mapped to MITRE ATT&CK techniques including T1059.001
+    (PowerShell) and T1027 (Obfuscated Files or Information).\n\n## Detection and
+    Analysis Methodology\n\n**Memory Forensics Approach:**\nDevelop automated tools
+    leveraging memory acquisition frameworks like Volatility or Rekall to identify
+    Empire''s in-memory artifacts. Focus on detecting:\n- PowerShell reflection objects
+    characteristic of Empire''s module loading\n- Base64-encoded configuration blobs
+    within process memory spaces\n- Registry keys containing encoded agent parameters
+    (typically HKLM\\\\SOFTWARE\\\\Classes\\\\ms-settings)\n\n**File System Analysis:**\nImplement
+    scanning mechanisms targeting:\n- Temporary directories where Empire extracts
+    configurations\n- PowerShell execution logs revealing obfuscated command patterns\n-
+    Event log analysis for suspicious PowerShell execution contexts\n\n## Technical
+    Implementation Framework\n\n**Automated Extraction Pipeline:**\n1. **Signature-Based
+    Detection:** Develop YARA rules identifying Empire''s distinctive code patterns
+    and configuration structures\n2. **Memory Parsing:** Implement plugins parsing
+    .NET objects and PowerShell runspaces\n3. **Decryption Routines:** Create automated
+    decoding mechanisms for Empire''s XOR-based configuration encryption\n4. **Artifact
+    Correlation:** Cross-reference multiple data sources to validate findings\n\n**MITRE
+    ATT&CK Mapping:**\n- T1083 (File and Directory Discovery)\n- T1057 (Process Discovery)\n-
+    T1005 (Data from Local System)\n\n## Defensive Considerations\n\nTools must incorporate
+    anti-evasion techniques, including detection of common obfuscation methods like
+    string concatenation and variable substitution. Integration with SIEM platforms
+    enables real-time alerting when Empire artifacts are discovered.\n\n**Validation
+    Framework:**\nImplement multi-layered validation ensuring extracted configurations
+    correspond to active threats rather than benign PowerShell activity. This includes
+    behavioral analysis correlating configuration parameters with observed network
+    communications and file system modifications'
+  - To capture and display network traffic
+  - 'Amcache analysis provides critical forensic artifacts for detecting fileless
+    malware employing temporary component deployment during initial system compromise,
+    aligning with MITRE ATT&CK techniques T1055 (Process Injection) and T1620 (Reflective
+    Code Loading).\n\n**Amcache Artifact Analysis Framework:**\n\nThe Amcache.hve
+    registry hive maintains comprehensive application execution metadata, including
+    file paths, hashes, and execution timestamps. For fileless malware detection,
+    focus on:\n\n1. **Temporary File Creation Patterns**: Analyze entries with suspicious
+    temporal clustering in the \\\"Programs\\\" key, particularly executables stored
+    in system directories (C:\\\\Windows\\\\Temp, C:\\\\Users\\\\[User]\\\\AppData\\\\Local\\\\Temp).
+    Legitimate applications typically exhibit predictable installation patterns, while
+    malicious components often manifest as isolated, recently-created executables.\n\n2.
+    **Hash-Based Indicators**: Cross-reference SHA-1 hashes against threat intelligence
+    feeds and known malware signatures. Fileless malware frequently employs legitimate
+    system binaries for process hollowing (T1055.012) or reflective DLL loading (T1620),
+    making hash analysis crucial for identifying repurposed executables.\n\n3. **Execution
+    Chain Analysis**: Examine parent-child relationships within Amcache entries to
+    identify anomalous process spawning patterns. Fileless malware often exhibits
+    unusual execution chains, particularly when temporary components spawn from unexpected
+    parent processes or system services.\n\n**NIST CSF Implementation Strategy:**\n\nUnder
+    the Detect (DE) function, specifically DE.AE-2 (Detected events are analyzed),
+    implement continuous Amcache monitoring through:\n\n- **Baseline Establishment**:
+    Create organizational baselines for normal temporary file creation patterns and
+    execution behaviors\n- **Anomaly Detection**: Deploy automated analysis tools
+    to identify deviations from established baselines\n- **Correlation Analysis**:
+    Integrate Amcache findings with network traffic analysis and endpoint detection
+    systems\n\n**Advanced Detection Methodologies:**\n\nUtilize PowerShell-based parsing
+    scripts or specialized forensic tools like KAPE to extract and analyze Amcache
+    artifacts. Focus on:\n\n- Unusual file extensions in temporary directories\n-
+    Executables created immediately before suspicious network activity\n- Components
+    with execution timestamps correlating with initial access events\n- Hash collisions
+    or similarities between temporary files and known malware families\n\nThis approach
+    enables proactive identification of fileless malware campaigns leveraging temporary
+    components for system compromise, supporting comprehensive threat hunting and
+    incident response activities within enterprise environments.'
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+---
+# SentenceTransformer
+This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
+- **Maximum Sequence Length:** 1024 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'How would you use Amcache analysis to detect fileless malware that drops temporary components for initial system compromise?',
+    'Amcache analysis provides critical forensic artifacts for detecting fileless malware employing temporary component deployment during initial system compromise, aligning with MITRE ATT&CK techniques T1055 (Process Injection) and T1620 (Reflective Code Loading).\\n\\n**Amcache Artifact Analysis Framework:**\\n\\nThe Amcache.hve registry hive maintains comprehensive application execution metadata, including file paths, hashes, and execution timestamps. For fileless malware detection, focus on:\\n\\n1. **Temporary File Creation Patterns**: Analyze entries with suspicious temporal clustering in the \\\\\\"Programs\\\\\\" key, particularly executables stored in system directories (C:\\\\\\\\Windows\\\\\\\\Temp, C:\\\\\\\\Users\\\\\\\\[User]\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Temp). Legitimate applications typically exhibit predictable installation patterns, while malicious components often manifest as isolated, recently-created executables.\\n\\n2. **Hash-Based Indicators**: Cross-reference SHA-1 hashes against threat intelligence feeds and known malware signatures. Fileless malware frequently employs legitimate system binaries for process hollowing (T1055.012) or reflective DLL loading (T1620), making hash analysis crucial for identifying repurposed executables.\\n\\n3. **Execution Chain Analysis**: Examine parent-child relationships within Amcache entries to identify anomalous process spawning patterns. Fileless malware often exhibits unusual execution chains, particularly when temporary components spawn from unexpected parent processes or system services.\\n\\n**NIST CSF Implementation Strategy:**\\n\\nUnder the Detect (DE) function, specifically DE.AE-2 (Detected events are analyzed), implement continuous Amcache monitoring through:\\n\\n- **Baseline Establishment**: Create organizational baselines for normal temporary file creation patterns and execution behaviors\\n- **Anomaly Detection**: Deploy automated analysis tools to identify deviations from established baselines\\n- **Correlation Analysis**: Integrate Amcache findings with network traffic analysis and endpoint detection systems\\n\\n**Advanced Detection Methodologies:**\\n\\nUtilize PowerShell-based parsing scripts or specialized forensic tools like KAPE to extract and analyze Amcache artifacts. Focus on:\\n\\n- Unusual file extensions in temporary directories\\n- Executables created immediately before suspicious network activity\\n- Components with execution timestamps correlating with initial access events\\n- Hash collisions or similarities between temporary files and known malware families\\n\\nThis approach enables proactive identification of fileless malware campaigns leveraging temporary components for system compromise, supporting comprehensive threat hunting and incident response activities within enterprise environments.',
+    'To capture and display network traffic',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[ 1.0000,  0.8653,  0.0078],
+#         [ 0.8653,  1.0000, -0.0407],
+#         [ 0.0078, -0.0407,  1.0000]])
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 35,705 training samples
+* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                         | sentence_1                                                                          | label                                                         |
+  |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------|
+  | type    | string                                                                             | string                                                                              | float                                                         |
+  | details | <ul><li>min: 9 tokens</li><li>mean: 20.14 tokens</li><li>max: 103 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 293.14 tokens</li><li>max: 934 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
+* Samples:
+  | sentence_0                                                                                                                       | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | label            |
+  |:---------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>Under what circumstances does attribution bias distort intrusion linking, and what countermeasures can mitigate it?</code> | <code>Attribution bias in intrusion linking occurs when analysts allow preconceived notions, organizational pressures, or cognitive shortcuts to influence their assessment of attack origins and relationships between incidents. This distortion manifests through several mechanisms: confirmation bias leads analysts to interpret ambiguous indicators as evidence supporting their initial hypotheses; availability heuristic causes over-reliance on memorable or recent attacks; and anchoring effect results in disproportionate weight given to early intelligence assessments.\n\nThe MITRE ATT&CK framework's Groups matrix exemplifies how attribution bias emerges when linking tactics, techniques, and procedures (TTPs) across campaigns. Analysts may incorrectly associate similar tools or methodologies without sufficient evidence of operational connection, particularly when adversaries employ publicly available malware or common techniques across multiple threat groups. This challenge intensifies during high...</code> | <code>1.0</code> |
+  | <code>How can you identify store buffer bypass speculation artifacts?</code>                                                     | <code>Store buffer bypass speculation artifacts represent sophisticated side-channel vulnerabilities that exploit microarchitectural features in modern processors. These vulnerabilities occur when speculative execution accesses memory locations before stores are committed, creating observable timing differences that leak sensitive information.\n\n**Detection Methodologies:**\n\nStatic analysis tools can identify potential store buffer bypass vulnerabilities by examining assembly code patterns where load instructions follow store operations without proper memory barriers. Tools like Intel's Pin or LLVM-based analyzers can detect missing lfence/stfence instructions that prevent speculative execution from bypassing stores.\n\nDynamic analysis approaches involve precise timing measurements using high-resolution performance counters. Researchers measure memory access latencies to identify anomalous patterns where load operations complete faster than expected, indicating potential speculation thro...</code> | <code>1.0</code> |
+  | <code>How would you develop methods for analyzing attacks targeting industrial control systems?</code>                           | <code>Developing comprehensive methods for analyzing attacks targeting Industrial Control Systems (ICS) requires a multi-layered approach integrating specialized frameworks and industry standards.\n\n**Framework Integration**: Begin with NIST's Cybersecurity Framework, particularly the Identify function, to catalog ICS assets, data flows, and criticality levels. The Detect function provides guidance on implementing continuous monitoring capabilities tailored for operational technology environments. MITRE ATT&CK for ICS offers crucial tactical intelligence, mapping adversary behaviors specific to industrial contexts through techniques like \\\"Inhibit Response Function\\\" (T0803) or \\\"Modify Parameter\\\" (T0832).\n\n**Technical Analysis Methodology**: Establish baseline behavioral profiles for normal ICS operations using network traffic analysis, protocol inspection, and system state monitoring. Deploy specialized tools capable of deep packet inspection for industrial protocols (Modbus, D...</code> | <code>1.0</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 32
+- `per_device_eval_batch_size`: 32
+- `num_train_epochs`: 20
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 32
+- `per_device_eval_batch_size`: 32
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 20
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: True
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+| Epoch   | Step | Training Loss |
+|:-------:|:----:|:-------------:|
+| 1.0     | 139  | -             |
+| 2.0     | 278  | -             |
+| 3.0     | 417  | -             |
+| 3.5971  | 500  | 1.1678        |
+| 4.0     | 556  | -             |
+| 5.0     | 695  | -             |
+| 6.0     | 834  | -             |
+| 7.0     | 973  | -             |
+| 7.1942  | 1000 | 0.0258        |
+| 8.0     | 1112 | -             |
+| 9.0     | 1251 | -             |
+| 10.0    | 1390 | -             |
+| 10.7914 | 1500 | 0.0037        |
+| 11.0    | 1529 | -             |
+| 12.0    | 1668 | -             |
+| 13.0    | 1807 | -             |
+| 14.0    | 1946 | -             |
+| 14.3885 | 2000 | 0.0016        |
+| 15.0    | 2085 | -             |
+| 16.0    | 2224 | -             |
+| 17.0    | 2363 | -             |
+| 17.9856 | 2500 | 0.0009        |
+| 18.0    | 2502 | -             |
+| 19.0    | 2641 | -             |
+| 20.0    | 2780 | -             |
+### Framework Versions
+- Python: 3.10.10
+- Sentence Transformers: 5.0.0
+- Transformers: 4.52.4
+- PyTorch: 2.7.0+cu128
+- Accelerate: 1.9.0
+- Datasets: 3.6.0
+- Tokenizers: 0.21.1
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "architectures": [
+    "ModernBertModel"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 50281,
+  "classifier_activation": "gelu",
+  "classifier_bias": false,
+  "classifier_dropout": 0.0,
+  "classifier_pooling": "mean",
+  "cls_token_id": 50281,
+  "decoder_bias": true,
+  "deterministic_flash_attn": false,
+  "embedding_dropout": 0.0,
+  "eos_token_id": 50282,
+  "global_attn_every_n_layers": 3,
+  "global_rope_theta": 160000.0,
+  "gradient_checkpointing": false,
+  "hidden_activation": "gelu",
+  "hidden_size": 768,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_cutoff_factor": 2.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 1152,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-05,
+  "local_attention": 128,
+  "local_rope_theta": 10000.0,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "mlp_dropout": 0.0,
+  "model_type": "modernbert",
+  "norm_bias": false,
+  "norm_eps": 1e-05,
+  "num_attention_heads": 12,
+  "num_hidden_layers": 22,
+  "pad_token_id": 50283,
+  "position_embedding_type": "absolute",
+  "repad_logits_with_grad": false,
+  "sentence_transformers": {
+    "activation_fn": "torch.nn.modules.activation.Sigmoid",
+    "version": "5.0.0"
+  },
+  "sep_token_id": 50282,
+  "sparse_pred_ignore_index": -100,
+  "sparse_prediction": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.52.4",
+  "vocab_size": 50368
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "model_type": "SentenceTransformer",
+  "__version__": {
+    "sentence_transformers": "5.0.0",
+    "transformers": "4.52.4",
+    "pytorch": "2.7.0+cu128"
+  },
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab97ffc3039e63cde330b9e83b6c359fbbd4cb3eb59c05d804ca29000486ef9e
+size 596070136

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 1024,
+    "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,952 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "|||IP_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50277": {
+      "content": "|||EMAIL_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50278": {
+      "content": "|||PHONE_NUMBER|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50279": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50280": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50281": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50282": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50283": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50284": {
+      "content": "[MASK]",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50285": {
+      "content": "[unused0]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50286": {
+      "content": "[unused1]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50287": {
+      "content": "[unused2]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50288": {
+      "content": "[unused3]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50289": {
+      "content": "[unused4]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50290": {
+      "content": "[unused5]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50291": {
+      "content": "[unused6]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50292": {
+      "content": "[unused7]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50293": {
+      "content": "[unused8]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50294": {
+      "content": "[unused9]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50295": {
+      "content": "[unused10]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50296": {
+      "content": "[unused11]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50297": {
+      "content": "[unused12]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50298": {
+      "content": "[unused13]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50299": {
+      "content": "[unused14]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50300": {
+      "content": "[unused15]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50301": {
+      "content": "[unused16]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50302": {
+      "content": "[unused17]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50303": {
+      "content": "[unused18]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50304": {
+      "content": "[unused19]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50305": {
+      "content": "[unused20]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50306": {
+      "content": "[unused21]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50307": {
+      "content": "[unused22]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50308": {
+      "content": "[unused23]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50309": {
+      "content": "[unused24]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50310": {
+      "content": "[unused25]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50311": {
+      "content": "[unused26]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50312": {
+      "content": "[unused27]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50313": {
+      "content": "[unused28]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50314": {
+      "content": "[unused29]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50315": {
+      "content": "[unused30]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50316": {
+      "content": "[unused31]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50317": {
+      "content": "[unused32]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50318": {
+      "content": "[unused33]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50319": {
+      "content": "[unused34]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50320": {
+      "content": "[unused35]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50321": {
+      "content": "[unused36]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50322": {
+      "content": "[unused37]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50323": {
+      "content": "[unused38]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50324": {
+      "content": "[unused39]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50325": {
+      "content": "[unused40]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50326": {
+      "content": "[unused41]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50327": {
+      "content": "[unused42]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50328": {
+      "content": "[unused43]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50329": {
+      "content": "[unused44]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50330": {
+      "content": "[unused45]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50331": {
+      "content": "[unused46]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50332": {
+      "content": "[unused47]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50333": {
+      "content": "[unused48]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50334": {
+      "content": "[unused49]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50335": {
+      "content": "[unused50]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50336": {
+      "content": "[unused51]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50337": {
+      "content": "[unused52]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50338": {
+      "content": "[unused53]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50339": {
+      "content": "[unused54]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50340": {
+      "content": "[unused55]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50341": {
+      "content": "[unused56]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50342": {
+      "content": "[unused57]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50343": {
+      "content": "[unused58]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50344": {
+      "content": "[unused59]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50345": {
+      "content": "[unused60]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50346": {
+      "content": "[unused61]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50347": {
+      "content": "[unused62]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50348": {
+      "content": "[unused63]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50349": {
+      "content": "[unused64]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50350": {
+      "content": "[unused65]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50351": {
+      "content": "[unused66]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50352": {
+      "content": "[unused67]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50353": {
+      "content": "[unused68]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50354": {
+      "content": "[unused69]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50355": {
+      "content": "[unused70]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50356": {
+      "content": "[unused71]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50357": {
+      "content": "[unused72]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50358": {
+      "content": "[unused73]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50359": {
+      "content": "[unused74]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50360": {
+      "content": "[unused75]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50361": {
+      "content": "[unused76]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50362": {
+      "content": "[unused77]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50363": {
+      "content": "[unused78]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50364": {
+      "content": "[unused79]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50365": {
+      "content": "[unused80]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50366": {
+      "content": "[unused81]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50367": {
+      "content": "[unused82]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 1024,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1024,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "tokenizer_class": "PreTrainedTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}