Instructions to use cnuland/semantic-routing-sensitivity with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cnuland/semantic-routing-sensitivity with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("cnuland/semantic-routing-sensitivity")

sentences = [
    "The user wants a text that appears to belong to the PUBLIC category (due to similar vocabulary) but actually belongs to NEVER_EGRESS based on its true intent and content. \n\nThis is asking me to create something that looks benign/public but actually contains sensitive/secret information that should never leave/egress. This could be interpreted as:\n1. A steganographic text hiding secrets in public-looking content\n2. A dual-use text that appears public but reveals classified/internal information\n3. Something that mimics public communications but actually contains proprietary/secret data\n\nHowever, I need to be careful here. The categories PUBLIC and NEVER_EGRESS sound like data classification labels (like in corporate or government information security). NEVER_EGRESS implies information that must not leave the organization/network.\n\nCreating content that:\n- Looks public (uses public vocabulary, maybe about general topics, news, weather, etc.)\n- But is actually NEVER_EGRESS (contains secrets, classified info, internal data)\n\nThis could be seen as",
    "The user is asking me to write a text that appears to be PUBLIC but is actually CONFIDENTIAL based on its true intent and content. They want it to be realistic and tricky, and output ONLY the text.\n\nThis is essentially asking for a steganographic or deceptive text - something that looks innocuous/public but carries hidden confidential meaning. However, I need to be careful about potential misuse. The request itself seems to be about creating a document with dual meanings - one that appears public/unclassified on the surface but conveys classified/sensitive information through subtext, coded language, or hidden intent.\n\nThis could be used for legitimate purposes (understanding information security, writing fiction, educational examples of steganography or compartmentalization) or illegitimate purposes (hiding leaks, espionage, etc.). \n\nGiven the benign framing (\"Make it realistic and tricky\" as a challenge/test), and the fact that this is a common exercise in information security training (recognizing that classification is based on content/intent",
    "The user wants me to take the text \"How does TCP three-way handshake work?\" and add minor realistic variations. This could include slight rewording, casual tone, or minor typos. The meaning and sensitivity level must remain identical. I should output ONLY the modified text, nothing else.\n\nPossible variations:\n- \"How does the TCP three-way handshake work?\" (adding \"the\")\n- \"How does a TCP three-way handshake work?\" (changing article)\n- \"How does TCP's three-way handshake work?\" (possessive)\n- \"How do TCP three-way handshakes work?\" (plural)\n- \"How does the TCP 3-way handshake work?\" (number instead of word)\n- \"how does tcp three-way handshake work?\" (lowercase)\n- \"How does TCP three way handshake work?\" (missing hyphen)\n- \"How does a TCP three-way handshake actually work?\" (casual addition)\n- \"So how does the TCP three-way handshake work",
    "The user wants a text that is a borderline case between NEVER_EGRESS and REGULATED, ultimately belonging to NEVER_EGRESS but tricky to classify. It should be a realistic enterprise prompt. Output ONLY the text.\n\nFirst, let's understand the categories:\n- NEVER_EGRESS: Content that should never leave the organization's environment. This typically includes highly sensitive data like passwords, API keys, private encryption keys, raw PII, trade secrets, classified information, etc.\n- REGULATED: Content that is sensitive but can be handled under specific compliance frameworks (GDPR, HIPAA, SOX, etc.). This might include customer data, financial records, health information, etc., that can be processed but needs controls.\n\nA borderline case would be something that looks like it might be regulated data but is actually so sensitive it should never egress. Or something that is a prompt asking to process sensitive data in a way that blurs the line.\n\nA realistic enterprise prompt that is tricky to classify:\n- Maybe"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The user wants a text that looks like it belongs to "CONFIDENTIAL" (implying sensitive but shareable within an organization) based on vocabulary, but actually belongs to "NEVER_EGRESS" (meaning it should never leave/ be leaked) based on true intent and content. \n\nI need to make it realistic and tricky. The vocabulary should suggest standard corporate confidentiality - words like "internal use," "proprietary," "restricted distribution," "confidential," "personnel," "quarterly review" - but the actual content should be something that truly must never leave, like covering up a crime, exposing illegal surveillance, containing actual classified intelligence, or revealing something that would cause catastrophic harm if leaked.\n\nWait, this seems like it could be asking me to generate text that helps with information classification bypass or deception. Let me think carefully.\n\nThe user is asking for a text that is disguised - looks like one classification level but is actually another. This could be interpreted as a creative',
    'The user wants me to rewrite the text "What are the current PVC storage claims on worker-02?" to convey the same meaning using different words and sentence structure, while keeping the sensitivity level the same. I should output ONLY the rewritten text.\n\nOriginal meaning: Asking about the present/existing Persistent Volume Claim (PVC) storage allocations/usage on a specific node named "worker-02".\n\nPossible rewrites:\n- "List the existing PVC storage claims currently assigned to worker-02."\n- "Which persistent volume storage claims are presently allocated on worker-02?"\n- "Identify the current persistent volume claim storage on worker-02."\n- "Show me the active PVC storage claims for worker-02."\n- "What PVC storage claims currently exist on worker-02?"\n- "Can you detail the present PVC storage claims attached to worker-02?"\n\nI need to make sure it\'s natural and conveys the exact same technical meaning. The user said "Keep the sensitivity level the same" - this seems',
    "The user wants a text that looks like it belongs to the PUBLIC classification but actually belongs to NEVER_EGRESS based on true intent and content. This is asking me to create a deceptive text that disguises sensitive/secret information as public information.\n\nWait, I need to be careful here. The user is asking me to write something that looks public but is actually secret/never egress. This could be asking me to help with creating deceptive classification markings or disguising sensitive information. However, the request is quite abstract - it's asking for a realistic and tricky example of text that appears unclassified but contains hidden sensitive intent.\n\nLet me think about whether this violates any policies. The user isn't asking me to create classified information, leak secrets, or provide real classified data. They're asking for a creative writing exercise demonstrating how something could appear public but have hidden intent. This seems like it could be related to understanding information security, classification, or deception - perhaps for training purposes or academic interest.\n\nHowever, I should be cautious",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 495 training samples
Columns: sentence and label
Approximate statistics based on the first 495 samples:
sentence label
type string int
details
min: 8 tokens
mean: 175.43 tokens
max: 256 tokens

0: ~20.00%
1: ~20.00%
2: ~20.00%
3: ~20.00%
4: ~20.00%

	sentence	label
type	string	int
details	min: 8 tokens mean: 175.43 tokens max: 256 tokens	0: ~20.00% 1: ~20.00% 2: ~20.00% 3: ~20.00% 4: ~20.00%

Samples:

sentence	label
The user wants a NEW example text that belongs to the PUBLIC sensitivity tier but in a DIFFERENT business domain than the original. The original is about containerization/Docker (technology/IT domain). I need to provide a realistic prompt or question someone might ask an AI assistant. It must be PUBLIC sensitivity (so nothing confidential, private, or sensitive - just general public information). Output ONLY the new text, nothing else. Different business domain ideas: - Healthcare (general public health info) - Finance (general personal finance) - Education (general learning) - Agriculture - Hospitality - Retail - Manufacturing - Real Estate - Marketing - Legal (general public legal info) Let's pick something like Agriculture, Hospitality, Education, or Retail. Need to make sure it's clearly a different domain from IT/Technology. Example: "What are the best practices for rotating crops in a small organic vegetable garden?" -> Agriculture. Or: "How do hotels typically calculate reve...	`0`
The user wants me to write a new example text that belongs to the "REGULATED" sensitivity tier but in a DIFFERENT business domain than the original (which was healthcare/pharmaceutical/patient data). The original domain was healthcare/patient data. So I need to pick a different regulated domain. Good candidates: - Financial services / Banking (regulated by SEC, FINRA, GDPR, etc.) - Legal / Attorney-client privilege - Education / FERPA (student records) - Government / classified or controlled unclassified information - Energy / utilities (regulated by FERC, NERC CIP) - Aviation / FAA regulations - Biotech / FDA regulated clinical trials (still healthcare-adjacent, maybe too close) - Telecommunications / FCC regulated The text should be a realistic prompt or question someone might ask an AI assistant. It needs to be ONLY the new text, nothing else. Let's pick financial services/banking. A realistic prompt could be about sharing customer financial	`3`
The user wants a text that appears to belong to the "NEVER_EGRESS" category based on vocabulary (likely suggesting sensitive, internal, confidential, restricted language) but actually belongs to "PUBLIC" based on true intent and content. This is an adversarial-style classification task. NEVER_EGRESS likely implies content that should never leave an organization - confidential, proprietary, internal-only, sensitive data. PUBLIC means it's safe to release externally. So I need to write something that: 1. Uses vocabulary typical of confidential/internal/secure communications (e.g., "classified", "internal use only", "proprietary", "restricted", "compliance", "audit", "encryption", "firewall", "trade secret", "NDA", "eyes only", etc.) 2. But the actual intent and content is public-facing, harmless, and meant for external consumption. 3. It should be realistic and tricky - so realistic that a classifier might mistakenly flag it as NEVER_EGRESS based on surface features	`0`

Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
learning_rate: 2e-05
weight_decay: 0.01
num_train_epochs: 20
warmup_ratio: 0.1
batch_sampler: group_by_label

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: group_by_label
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss
0.3226	10	0.4937
0.6452	20	0.9953
0.9677	30	0.4929
1.2903	40	0.4808
1.6129	50	0.9902
1.9355	60	0.4863
2.2581	70	0.484
2.5806	80	0.4853
2.9032	90	0.9728
3.2258	100	0.4703
3.5484	110	0.494
3.8710	120	0.9468
4.1935	130	0.0
4.5161	140	0.9515
4.8387	150	0.9079
5.1613	160	0.0
5.4839	170	0.923
5.8065	180	0.9181
6.1290	190	0.0
6.4516	200	0.898
6.7742	210	0.449
7.0968	220	0.4621
7.4194	230	0.9008
7.7419	240	0.4146
8.0645	250	0.4251
8.3871	260	0.4088
8.7097	270	0.8822
9.0323	280	0.4276
9.3548	290	0.4322
9.6774	300	0.7872
10.0	310	0.4678
10.3226	320	0.4307
10.6452	330	0.8792
10.9677	340	0.4576
11.2903	350	0.3725
11.6129	360	0.9067
11.9355	370	0.4236
12.2581	380	0.4951
12.5806	390	0.4136
12.9032	400	0.9206
13.2258	410	0.3733
13.5484	420	0.3658
13.8710	430	0.8132
14.1935	440	0.0
14.5161	450	0.8365
14.8387	460	0.7183
15.1613	470	0.0
15.4839	480	0.7268
15.8065	490	0.8223
16.1290	500	0.0
16.4516	510	0.7573
16.7742	520	0.4578
17.0968	530	0.3628
17.4194	540	0.8379
17.7419	550	0.3618
18.0645	560	0.3605
18.3871	570	0.4376
18.7097	580	0.8496
19.0323	590	0.3392
19.3548	600	0.4891
19.6774	610	0.8228
20.0	620	0.4599

Framework Versions

Python: 3.11.13
Sentence Transformers: 3.4.1
Transformers: 4.57.6
PyTorch: 2.5.1+cpu
Accelerate: 1.13.0
Datasets: 3.2.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}