Sentence Similarity
sentence-transformers
Safetensors
bert
feature-extraction
Generated from Trainer
dataset_size:495
loss:BatchAllTripletLoss
text-embeddings-inference
Instructions to use cnuland/semantic-routing-sensitivity with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cnuland/semantic-routing-sensitivity with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("cnuland/semantic-routing-sensitivity") sentences = [ "The user wants a text that appears to belong to the PUBLIC category (due to similar vocabulary) but actually belongs to NEVER_EGRESS based on its true intent and content. \n\nThis is asking me to create something that looks benign/public but actually contains sensitive/secret information that should never leave/egress. This could be interpreted as:\n1. A steganographic text hiding secrets in public-looking content\n2. A dual-use text that appears public but reveals classified/internal information\n3. Something that mimics public communications but actually contains proprietary/secret data\n\nHowever, I need to be careful here. The categories PUBLIC and NEVER_EGRESS sound like data classification labels (like in corporate or government information security). NEVER_EGRESS implies information that must not leave the organization/network.\n\nCreating content that:\n- Looks public (uses public vocabulary, maybe about general topics, news, weather, etc.)\n- But is actually NEVER_EGRESS (contains secrets, classified info, internal data)\n\nThis could be seen as", "The user is asking me to write a text that appears to be PUBLIC but is actually CONFIDENTIAL based on its true intent and content. They want it to be realistic and tricky, and output ONLY the text.\n\nThis is essentially asking for a steganographic or deceptive text - something that looks innocuous/public but carries hidden confidential meaning. However, I need to be careful about potential misuse. The request itself seems to be about creating a document with dual meanings - one that appears public/unclassified on the surface but conveys classified/sensitive information through subtext, coded language, or hidden intent.\n\nThis could be used for legitimate purposes (understanding information security, writing fiction, educational examples of steganography or compartmentalization) or illegitimate purposes (hiding leaks, espionage, etc.). \n\nGiven the benign framing (\"Make it realistic and tricky\" as a challenge/test), and the fact that this is a common exercise in information security training (recognizing that classification is based on content/intent", "The user wants me to take the text \"How does TCP three-way handshake work?\" and add minor realistic variations. This could include slight rewording, casual tone, or minor typos. The meaning and sensitivity level must remain identical. I should output ONLY the modified text, nothing else.\n\nPossible variations:\n- \"How does the TCP three-way handshake work?\" (adding \"the\")\n- \"How does a TCP three-way handshake work?\" (changing article)\n- \"How does TCP's three-way handshake work?\" (possessive)\n- \"How do TCP three-way handshakes work?\" (plural)\n- \"How does the TCP 3-way handshake work?\" (number instead of word)\n- \"how does tcp three-way handshake work?\" (lowercase)\n- \"How does TCP three way handshake work?\" (missing hyphen)\n- \"How does a TCP three-way handshake actually work?\" (casual addition)\n- \"So how does the TCP three-way handshake work", "The user wants a text that is a borderline case between NEVER_EGRESS and REGULATED, ultimately belonging to NEVER_EGRESS but tricky to classify. It should be a realistic enterprise prompt. Output ONLY the text.\n\nFirst, let's understand the categories:\n- NEVER_EGRESS: Content that should never leave the organization's environment. This typically includes highly sensitive data like passwords, API keys, private encryption keys, raw PII, trade secrets, classified information, etc.\n- REGULATED: Content that is sensitive but can be handled under specific compliance frameworks (GDPR, HIPAA, SOX, etc.). This might include customer data, financial records, health information, etc., that can be processed but needs controls.\n\nA borderline case would be something that looks like it might be regulated data but is actually so sensitive it should never egress. Or something that is a prompt asking to process sensitive data in a way that blurs the line.\n\nA realistic enterprise prompt that is tricky to classify:\n- Maybe" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Welcome to the community
The community tab is the place to discuss and collaborate with the HF community!