cisco-ehsan commited on
Commit
ec6f0ba
·
verified ·
1 Parent(s): 7937385

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -18
README.md CHANGED
@@ -1,21 +1,22 @@
1
- ```yaml
2
- language: en
3
  license: apache-2.0
4
- tags:
5
- - sentence-transformers
6
- - cross-encoder
7
- - reranker
8
- dataset_size: 35705
9
- loss: CachedMultipleNegativesRankingLoss
10
- pipeline_tag: text-ranking
11
- library_name: sentence-transformers
12
  base_model:
13
- - CiscoAITeam/SecureBERT2.0-base
14
- ```
15
- # SecureBERT 2.0 Cross-Encoder Fine-Tuned
 
 
 
 
 
 
16
 
17
- This is a cybersecurity domain-specific [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model, fine-tuned on top of [SecureBERT 2.0](CiscoAITeam/SecureBERT2.0-base). It scores pairs of texts for document reranking, semantic similarity, and information retrieval tasks.
18
 
 
 
19
  ## Model Details
20
  - **Model Type:** Cross Encoder
21
  - **Max Sequence Length:** 1024 tokens
@@ -24,18 +25,67 @@ This is a cybersecurity domain-specific [Cross Encoder](https://www.sbert.net/do
24
  - **License:** Apache-2.0
25
 
26
  ## Usage
 
 
 
 
 
 
 
 
 
27
  ```python
28
  from sentence_transformers import CrossEncoder
29
 
30
- model = CrossEncoder("CiscoAITeam/SecureBERT2.0-cross_encoder")
 
 
 
31
  pairs = [
32
- ["Text A1", "Text A2"],
33
- ["Text B1", "Text B2"]
34
  ]
 
35
  scores = model.predict(pairs)
36
  print(scores)
37
  ```
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  # Reference
40
 
41
  ```
@@ -45,4 +95,4 @@ print(scores)
45
  journal={arXiv preprint arXiv:2510.00240},
46
  year={2025}
47
  }
48
- ```
 
1
+ ---
 
2
  license: apache-2.0
3
+ language:
4
+ - en
 
 
 
 
 
 
5
  base_model:
6
+ - CiscoAITeam/SecureBERT2.0-base
7
+ pipeline_tag: sentence-similarity
8
+ library_name: sentence-transformers
9
+ tags:
10
+ - IR
11
+ - reranking
12
+ - securebert
13
+ - docembedding
14
+ ---
15
 
16
+ # SecureBERT 2.0 Cross-Encoder Fine-Tuned for Cybersecurity
17
 
18
+ This is a Cross Encoder
19
+ model fine-tuned on top of SecureBERT 2.0, a cybersecurity domain-specific BERT model. It computes similarity scores for pairs of texts, which can be used for text reranking, semantic search, or other cybersecurity-related natural language tasks.
20
  ## Model Details
21
  - **Model Type:** Cross Encoder
22
  - **Max Sequence Length:** 1024 tokens
 
25
  - **License:** Apache-2.0
26
 
27
  ## Usage
28
+ Sentence Transformers API
29
+
30
+ Install the library:
31
+
32
+ ```bash
33
+ pip install -U sentence-transformers
34
+ ```
35
+ Load the model and run inference:
36
+
37
  ```python
38
  from sentence_transformers import CrossEncoder
39
 
40
+ # Load the model
41
+ model = CrossEncoder("cross_encoder_model_id")
42
+
43
+ # Score pairs of cybersecurity text
44
  pairs = [
45
+ ["How does Stealc malware extract browser data?", "Stealc uses Sqlite3 DLL to query browser databases and retrieve cookies, passwords, and history."],
46
+ ["Best practices for post-acquisition cybersecurity integration?", "Conduct security assessment, align policies, integrate security technologies, and train employees."],
47
  ]
48
+
49
  scores = model.predict(pairs)
50
  print(scores)
51
  ```
52
 
53
+ Rank a set of candidate responses based on similarity to a query:
54
+ ```python
55
+ query = "How to prevent Kerberoasting attacks?"
56
+ candidates = [
57
+ "Implement MFA and privileged access management",
58
+ "Monitor Kerberos tickets for anomalous activity",
59
+ "Apply zero-trust network segmentation",
60
+ ]
61
+ ranking = model.rank(query, candidates)
62
+ print(ranking)
63
+
64
+ ```
65
+
66
+ ## Training Details
67
+
68
+ ### Training Dataset
69
+
70
+
71
+ - **Size:** 35,705 samples
72
+ - **Columns:** `sentence1`, `sentence2`, `label`
73
+ - **Approximate statistics (first 1000 samples):**
74
+ | Field | Sentence1 | Sentence2 | Label |
75
+ |-------|-----------|-----------|-------|
76
+ | Type | string | string | float |
77
+ | Mean Length | 98.46 | 1468.34 | 1.0 |
78
+
79
+ - **Loss Function:** [CachedMultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#cachedmultiplenegativesrankingloss)
80
+ ```json
81
+ {
82
+ "scale": 10.0,
83
+ "num_negatives": 10,
84
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
85
+ "mini_batch_size": 24
86
+ }
87
+
88
+ ```
89
  # Reference
90
 
91
  ```
 
95
  journal={arXiv preprint arXiv:2510.00240},
96
  year={2025}
97
  }
98
+ ```