SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B
This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Qwen/Qwen3-Embedding-0.6B
- Maximum Sequence Length: 2048 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("ThienLe/Qwen3-SecEmbed")
queries = [
"Scheduled Task\nAdversaries may abuse the Windows Task Scheduler to perform task scheduling for initial or recurring execution of malicious code. There are multiple ways to access the Task Scheduler in Windows. The schtasks utility can be run directly on the command line, or the Task Scheduler can be opened through the GUI within the Administrator Tools section of the Control Panel.[1] In some cases, adversaries have used a .NET wrapper for the Windows Task Scheduler, and alternatively, adversaries have used the Windows netapi32 library and Windows Management Instrumentation (WMI) to create a scheduled task. Adversaries may also utilize the Powershell Cmdlet Invoke-CimMethod, which leverages WMI class PS_ScheduledTask to create a scheduled task via an XML path.[2] An adversary may use Windows Task Scheduler to execute programs at system startup or on a scheduled basis for persistence. The Windows Task Scheduler can also be abused to conduct remote Execution as part of Lateral Movement and/or to run a process under the context of a specified account (such as SYSTEM). Similar to System Binary Proxy Execution, adversaries have also abused the Windows Task Scheduler to potentially mask one-time execution under signed/trusted system processes.[3] Adversaries may also create \"hidden\" scheduled tasks (i.e. Hide Artifacts) that may not be visible to defender tools and manual queries used to enumerate tasks. Specifically, an adversary may hide a task from schtasks /query and the Task Scheduler by deleting the associated Security Descriptor (SD) registry value (where deletion of this value must be completed using SYSTEM permissions).[4][5] Adversaries may also employ alternate methods to hide tasks, such as altering the metadata (e.g., Index value) within associated registry keys.[6]",
]
documents = [
'BlackByte\nBlackByte is a ransomware threat actor operating since at least 2021. BlackByte is associated with several versions of ransomware also labeled BlackByte Ransomware. BlackByte ransomware operations initially used a common encryption key allowing for the development of a universal decryptor, but subsequent versions such as BlackByte 2.0 Ransomware use more robust encryption mechanisms. BlackByte is notable for operations targeting critical infrastructure entities among other targets across North America.[1][2][3][4][5]\nBlackByte created scheduled tasks for payload execution.[38][39]',
'APT32\nAPT32 is a suspected Vietnam-based threat group that has been active since at least 2014. The group has targeted multiple private sector industries as well as foreign governments, dissidents, and journalists with a strong focus on Southeast Asian countries like Vietnam, the Philippines, Laos, and Cambodia. They have extensively used strategic web compromises to compromise victims.[1][2][3]\nAPT32 used the net view command to show all shares available, including the administrative shares such as C$ and ADMIN$.[5]',
'RFC 4880, the standard for the format of PGP messages, says:\n\nThe high 16 bits (first two octets) of the hash are included in the\n\nSignature packet to provide a quick test to reject some invalid\nsignatures.\n\nHowever, you are thinking it wrong. Signatures are not encryption, and signatures are not encrypted. In fact, given the value of the public key (which is public) and the signature itself, one can recompute the complete hash value of the message which is signed (at least for RSA, which is technically known as a signature algorithm with recovery). The first 16 bits are just a helper so that software can avoid many modular exponentiations when it is looking for the "correct" public key among a set of candidates; they save a few milliseconds worth of computation, that\'s all.\n\nA generic note is that signatures can leak information on that which is signed; so if you sign and encrypt a confidential message, then you should, conceptually, either sign the encrypted message, or encrypt the signature along with the message contents as well. OpenPGP uses the latter method.\nWhen a message is just signed, not encrypted, then it makes no sense to hide the message hash, since the message itself, by definition, is transmitted as cleartext.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
Training Details
Training Dataset
json
Evaluation Dataset
json
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 16
num_train_epochs: 1
warmup_steps: 300
optim: adamw_8bit
gradient_accumulation_steps: 2
bf16: True
gradient_checkpointing: True
eval_strategy: steps
per_device_eval_batch_size: 16
dataloader_num_workers: 4
dataloader_pin_memory: False
All Hyperparameters
Click to expand
per_device_train_batch_size: 16
num_train_epochs: 1
max_steps: -1
learning_rate: 5e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 300
optim: adamw_8bit
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 2
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: steps
per_device_eval_batch_size: 16
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: False
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 4
dataloader_pin_memory: False
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
| 0.0648 |
100 |
0.2915 |
- |
| 0.1297 |
200 |
0.0912 |
0.0981 |
| 0.1945 |
300 |
0.1002 |
- |
| 0.2593 |
400 |
0.1025 |
0.0940 |
| 0.3241 |
500 |
0.0851 |
- |
| 0.3890 |
600 |
0.0707 |
0.0785 |
| 0.4538 |
700 |
0.0498 |
- |
| 0.5186 |
800 |
0.0683 |
0.0609 |
| 0.5835 |
900 |
0.0536 |
- |
| 0.6483 |
1000 |
0.0484 |
0.0562 |
| 0.7131 |
1100 |
0.0406 |
- |
| 0.7780 |
1200 |
0.0468 |
0.0503 |
| 0.8428 |
1300 |
0.0392 |
- |
| 0.9076 |
1400 |
0.0386 |
0.0486 |
| 0.9724 |
1500 |
0.0406 |
- |
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.2.3
- Transformers: 5.2.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.12.0
- Datasets: 4.5.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}