File size: 7,980 Bytes
4f9eb53
 
d9f49f0
 
 
 
 
 
 
 
 
 
 
4f9eb53
d9f49f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f8035c0
d9f49f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
license: apache-2.0
base_model: cisco-ai/SecureBERT2.0-cross_encoder
tags:
  - core-ml
  - apple-silicon
  - cross-encoder
  - cybersecurity
  - reranking
  - modernbert
language:
  - en
pipeline_tag: text-classification
---

# SecureBERT 2.0 Cross-Encoder for Core ML

Core ML conversion of [cisco-ai/SecureBERT2.0-cross_encoder](https://huggingface.co/cisco-ai/SecureBERT2.0-cross_encoder),
ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework.

The original model is a cybersecurity domain-specific cross-encoder built on
ModernBERT. It takes a pair of texts (query + document) and outputs a similarity
score between 0 and 1, suitable for retrieval reranking, semantic search, and
cybersecurity intelligence applications.

This repository contains pre-converted `.mlpackage` files plus the conversion
script that produced them, allowing direct use in Swift applications without
running Python or Ollama at inference time.

## What's in this repository

| File | Size | Purpose |
|---|---|---|
| `SecureBERT2_CrossEncoder_FP16.mlpackage/` | 286 MB | FP16 Core ML model (recommended) |
| `SecureBERT2_CrossEncoder_FP32.mlpackage/` | 572 MB | FP32 Core ML model (reference precision) |
| `convert_via_torch_export.py` | ~6 KB | The conversion script that produced these files |

For most use cases, use the FP16 version. It is half the size and runs identically
on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch).

## Model specification

Both models share the same input/output specification:

| Tensor | Name | Shape | Dtype |
|---|---|---|---|
| Input 1 | `input_ids` | (1, 512) | INT32 |
| Input 2 | `attention_mask` | (1, 512) | INT32 |
| Output | `score` | (1, 1) | FLOAT16 (FP16 model) / FLOAT32 (FP32 model) |

The model expects standard BERT pair tokenization:

```
[CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ...
```

Special token IDs (from the original tokenizer):

| Token | ID |
|---|---|
| `[CLS]` | 50281 |
| `[SEP]` | 50282 |
| `[PAD]` | 50283 |
| `[UNK]` | 50280 |

The output score is already sigmoid-activated (range 0-1). The sigmoid was baked
into the model graph during conversion, so no post-processing is needed in Swift.

## Quick start (Swift)

Install [huggingface/swift-transformers](https://github.com/huggingface/swift-transformers)
for tokenization, then use Core ML directly:

```swift
import CoreML
import Tokenizers

// Load tokenizer (matches Python tokenization exactly)
let tokenizer = try await AutoTokenizer.from(
    pretrained: "cisco-ai/SecureBERT2.0-cross_encoder"
)

// Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc)
let config = MLModelConfiguration()
config.computeUnits = .all  // Use Neural Engine when available

guard let modelURL = Bundle.main.url(
    forResource: "SecureBERT2_CrossEncoder_FP16",
    withExtension: "mlmodelc"
) else { fatalError("Model not found in bundle") }

let model = try MLModel(contentsOf: modelURL, configuration: config)

// Score a query/document pair
func score(query: String, document: String) throws -> Double {
    // Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]...
    // (Use tokenizer's pair encoding API, or build manually using
    //  CLS=50281, SEP=50282, PAD=50283)
    let inputIds: [Int] = /* your tokenization here, length 512 */
    let attentionMask: [Int] = /* 1s for content, 0s for padding */
    
    let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
    let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
    
    for i in 0..<512 {
        inputIdsArray[i] = NSNumber(value: inputIds[i])
        attentionMaskArray[i] = NSNumber(value: attentionMask[i])
    }
    
    let inputs = try MLDictionaryFeatureProvider(dictionary: [
        "input_ids": MLFeatureValue(multiArray: inputIdsArray),
        "attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
    ])
    
    let prediction = try model.prediction(from: inputs)
    let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue!
    return scoreArray[0].doubleValue
}
```

## Verification

Conversion correctness was verified by comparing Core ML output against the
original PyTorch model on three test cases:

| Test case | PyTorch | Core ML FP16 | Diff |
|---|---|---|---|
| Highly relevant (vPC config Q + vPC config A) | 0.9948 | 0.9946 | 0.000132 |
| Same domain, different topic | 0.3406 | 0.3420 | 0.001481 |
| Unrelated content | 0.0160 | 0.0158 | 0.000190 |

Max numerical drift: ~0.0015. Ranking order is identical to PyTorch.

Inference benchmarks on M4 Max (36 GB):

- Model load time: ~0.5 seconds
- First inference (warm-up): ~2300 ms
- Subsequent inferences: ~20 ms per query/document pair
- Throughput after warm-up: ~50 pairs/second

The high first-inference latency is one-time cost from Neural Engine compilation.
For interactive applications, perform a warm-up inference at app startup.

## Conversion recipe

The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based
models. The standard `torch.jit.trace` path fails on ModernBERT's attention
operations due to int-op handling in coremltools 9.0.

The working recipe:

1. Pin dependency versions: `torch==2.7.0`, `transformers==4.52.4`,
   `sentence-transformers==5.0.0`, `coremltools==9.0`
2. Load model with `attn_implementation="eager"` to avoid SDPA tracing issues
3. Use `torch.export.export(strict=False)` instead of `torch.jit.trace`
4. Call `exported_program.run_decompositions({})` to convert from TRAINING
   dialect to ATEN dialect (required by coremltools 9.0)
5. Pass the resulting `ExportedProgram` to `ct.convert()`

See `convert_via_torch_export.py` for the complete script. This recipe should
generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives,
ModernBERT classifiers, etc.).

## Limitations

Inherited from the base model:

- English language only
- Trained primarily on cybersecurity content; performance on other domains
  may vary
- May reflect biases in the training data toward over-represented threats,
  technologies, or vendors

Specific to this conversion:

- Fixed sequence length of 512 tokens (the original model supports up to 1024;
  this conversion uses 512 for faster inference and smaller memory footprint)
- FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring
  exact PyTorch-equivalent output but irrelevant for ranking tasks
- macOS 14 (Sonoma) or newer required (`minimum_deployment_target=ct.target.macOS14`)

## Citation

If you use this model, please cite the original SecureBERT 2.0 paper:

```bibtex
@article{aghaei2025securebert2,
  title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
  author={Aghaei, Ehsan and others},
  journal={arXiv preprint arXiv:2510.00240},
  year={2025}
}
```

## License

Apache 2.0, matching the license of the original model.

## Acknowledgments

- Cisco AI for the original [SecureBERT 2.0](https://github.com/cisco-ai-defense/securebert2)
  model family
- Apple's [coremltools](https://github.com/apple/coremltools) team for ongoing
  ModernBERT support
- Hugging Face's [swift-transformers](https://github.com/huggingface/swift-transformers)
  team for the Swift tokenizer support that makes this practical to use

## Related models

Other SecureBERT 2.0 models from Cisco AI:

- [`cisco-ai/SecureBERT2.0-base`](https://huggingface.co/cisco-ai/SecureBERT2.0-base) — Base encoder
- [`cisco-ai/SecureBERT2.0-biencoder`](https://huggingface.co/cisco-ai/SecureBERT2.0-biencoder) — Bi-encoder for retrieval
- [`cisco-ai/SecureBERT2.0-NER`](https://huggingface.co/cisco-ai/SecureBERT2.0-NER) — Named entity recognition
- [`cisco-ai/SecureBERT2.0-code-vuln-detection`](https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection) — Vulnerability classification

If you convert any of these to Core ML using a similar recipe, feel free to
open an issue and I'll link your repo here.