cisco-ai
/

SecureBERT2.0-cross_encoder

@@ -1,21 +1,22 @@
-```yaml
-language: en
 license: apache-2.0
-tags:
-  - sentence-transformers
-  - cross-encoder
-  - reranker
-dataset_size: 35705
-loss: CachedMultipleNegativesRankingLoss
-pipeline_tag: text-ranking
-library_name: sentence-transformers
 base_model:
-  - CiscoAITeam/SecureBERT2.0-base
-```
-# SecureBERT 2.0 Cross-Encoder Fine-Tuned
-This is a cybersecurity domain-specific [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model, fine-tuned on top of [SecureBERT 2.0](CiscoAITeam/SecureBERT2.0-base). It scores pairs of texts for document reranking, semantic similarity, and information retrieval tasks.
 ## Model Details
 - **Model Type:** Cross Encoder
 - **Max Sequence Length:** 1024 tokens
@@ -24,18 +25,67 @@ This is a cybersecurity domain-specific [Cross Encoder](https://www.sbert.net/do
 - **License:** Apache-2.0
 ## Usage
 ```python
 from sentence_transformers import CrossEncoder
-model = CrossEncoder("CiscoAITeam/SecureBERT2.0-cross_encoder")
 pairs = [
-    ["Text A1", "Text A2"],
-    ["Text B1", "Text B2"]
 ]
 scores = model.predict(pairs)
 print(scores)
 ```
 # Reference
 ```
@@ -45,4 +95,4 @@ print(scores)
   journal={arXiv preprint arXiv:2510.00240},
   year={2025}
 }
-```

+---
 license: apache-2.0
+language:
+- en
 base_model:
+- CiscoAITeam/SecureBERT2.0-base
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+tags:
+- IR
+- reranking
+- securebert
+- docembedding
+---
+# SecureBERT 2.0 Cross-Encoder Fine-Tuned for Cybersecurity
+This is a Cross Encoder
+ model fine-tuned on top of SecureBERT 2.0, a cybersecurity domain-specific BERT model. It computes similarity scores for pairs of texts, which can be used for text reranking, semantic search, or other cybersecurity-related natural language tasks.
 ## Model Details
 - **Model Type:** Cross Encoder
 - **Max Sequence Length:** 1024 tokens
 - **License:** Apache-2.0
 ## Usage
+Sentence Transformers API
+Install the library:
+```bash
+pip install -U sentence-transformers
+```
+Load the model and run inference:
 ```python
 from sentence_transformers import CrossEncoder
+# Load the model
+model = CrossEncoder("cross_encoder_model_id")
+# Score pairs of cybersecurity text
 pairs = [
+    ["How does Stealc malware extract browser data?", "Stealc uses Sqlite3 DLL to query browser databases and retrieve cookies, passwords, and history."],
+    ["Best practices for post-acquisition cybersecurity integration?", "Conduct security assessment, align policies, integrate security technologies, and train employees."],
 ]
 scores = model.predict(pairs)
 print(scores)
 ```
+Rank a set of candidate responses based on similarity to a query:
+```python
+query = "How to prevent Kerberoasting attacks?"
+candidates = [
+    "Implement MFA and privileged access management",
+    "Monitor Kerberos tickets for anomalous activity",
+    "Apply zero-trust network segmentation",
+]
+ranking = model.rank(query, candidates)
+print(ranking)
+```
+## Training Details
+### Training Dataset
+- **Size:** 35,705 samples
+- **Columns:** `sentence1`, `sentence2`, `label`
+- **Approximate statistics (first 1000 samples):**
+  | Field | Sentence1 | Sentence2 | Label |
+  |-------|-----------|-----------|-------|
+  | Type | string | string | float |
+  | Mean Length | 98.46 | 1468.34 | 1.0 |
+- **Loss Function:** [CachedMultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#cachedmultiplenegativesrankingloss)
+```json
+{
+    "scale": 10.0,
+    "num_negatives": 10,
+    "activation_fn": "torch.nn.modules.activation.Sigmoid",
+    "mini_batch_size": 24
+}
+```
 # Reference
 ```
   journal={arXiv preprint arXiv:2510.00240},
   year={2025}
 }
+```