File size: 7,980 Bytes
4f9eb53 d9f49f0 4f9eb53 d9f49f0 f8035c0 d9f49f0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | ---
license: apache-2.0
base_model: cisco-ai/SecureBERT2.0-cross_encoder
tags:
- core-ml
- apple-silicon
- cross-encoder
- cybersecurity
- reranking
- modernbert
language:
- en
pipeline_tag: text-classification
---
# SecureBERT 2.0 Cross-Encoder for Core ML
Core ML conversion of [cisco-ai/SecureBERT2.0-cross_encoder](https://huggingface.co/cisco-ai/SecureBERT2.0-cross_encoder),
ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework.
The original model is a cybersecurity domain-specific cross-encoder built on
ModernBERT. It takes a pair of texts (query + document) and outputs a similarity
score between 0 and 1, suitable for retrieval reranking, semantic search, and
cybersecurity intelligence applications.
This repository contains pre-converted `.mlpackage` files plus the conversion
script that produced them, allowing direct use in Swift applications without
running Python or Ollama at inference time.
## What's in this repository
| File | Size | Purpose |
|---|---|---|
| `SecureBERT2_CrossEncoder_FP16.mlpackage/` | 286 MB | FP16 Core ML model (recommended) |
| `SecureBERT2_CrossEncoder_FP32.mlpackage/` | 572 MB | FP32 Core ML model (reference precision) |
| `convert_via_torch_export.py` | ~6 KB | The conversion script that produced these files |
For most use cases, use the FP16 version. It is half the size and runs identically
on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch).
## Model specification
Both models share the same input/output specification:
| Tensor | Name | Shape | Dtype |
|---|---|---|---|
| Input 1 | `input_ids` | (1, 512) | INT32 |
| Input 2 | `attention_mask` | (1, 512) | INT32 |
| Output | `score` | (1, 1) | FLOAT16 (FP16 model) / FLOAT32 (FP32 model) |
The model expects standard BERT pair tokenization:
```
[CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ...
```
Special token IDs (from the original tokenizer):
| Token | ID |
|---|---|
| `[CLS]` | 50281 |
| `[SEP]` | 50282 |
| `[PAD]` | 50283 |
| `[UNK]` | 50280 |
The output score is already sigmoid-activated (range 0-1). The sigmoid was baked
into the model graph during conversion, so no post-processing is needed in Swift.
## Quick start (Swift)
Install [huggingface/swift-transformers](https://github.com/huggingface/swift-transformers)
for tokenization, then use Core ML directly:
```swift
import CoreML
import Tokenizers
// Load tokenizer (matches Python tokenization exactly)
let tokenizer = try await AutoTokenizer.from(
pretrained: "cisco-ai/SecureBERT2.0-cross_encoder"
)
// Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc)
let config = MLModelConfiguration()
config.computeUnits = .all // Use Neural Engine when available
guard let modelURL = Bundle.main.url(
forResource: "SecureBERT2_CrossEncoder_FP16",
withExtension: "mlmodelc"
) else { fatalError("Model not found in bundle") }
let model = try MLModel(contentsOf: modelURL, configuration: config)
// Score a query/document pair
func score(query: String, document: String) throws -> Double {
// Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]...
// (Use tokenizer's pair encoding API, or build manually using
// CLS=50281, SEP=50282, PAD=50283)
let inputIds: [Int] = /* your tokenization here, length 512 */
let attentionMask: [Int] = /* 1s for content, 0s for padding */
let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
for i in 0..<512 {
inputIdsArray[i] = NSNumber(value: inputIds[i])
attentionMaskArray[i] = NSNumber(value: attentionMask[i])
}
let inputs = try MLDictionaryFeatureProvider(dictionary: [
"input_ids": MLFeatureValue(multiArray: inputIdsArray),
"attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
])
let prediction = try model.prediction(from: inputs)
let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue!
return scoreArray[0].doubleValue
}
```
## Verification
Conversion correctness was verified by comparing Core ML output against the
original PyTorch model on three test cases:
| Test case | PyTorch | Core ML FP16 | Diff |
|---|---|---|---|
| Highly relevant (vPC config Q + vPC config A) | 0.9948 | 0.9946 | 0.000132 |
| Same domain, different topic | 0.3406 | 0.3420 | 0.001481 |
| Unrelated content | 0.0160 | 0.0158 | 0.000190 |
Max numerical drift: ~0.0015. Ranking order is identical to PyTorch.
Inference benchmarks on M4 Max (36 GB):
- Model load time: ~0.5 seconds
- First inference (warm-up): ~2300 ms
- Subsequent inferences: ~20 ms per query/document pair
- Throughput after warm-up: ~50 pairs/second
The high first-inference latency is one-time cost from Neural Engine compilation.
For interactive applications, perform a warm-up inference at app startup.
## Conversion recipe
The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based
models. The standard `torch.jit.trace` path fails on ModernBERT's attention
operations due to int-op handling in coremltools 9.0.
The working recipe:
1. Pin dependency versions: `torch==2.7.0`, `transformers==4.52.4`,
`sentence-transformers==5.0.0`, `coremltools==9.0`
2. Load model with `attn_implementation="eager"` to avoid SDPA tracing issues
3. Use `torch.export.export(strict=False)` instead of `torch.jit.trace`
4. Call `exported_program.run_decompositions({})` to convert from TRAINING
dialect to ATEN dialect (required by coremltools 9.0)
5. Pass the resulting `ExportedProgram` to `ct.convert()`
See `convert_via_torch_export.py` for the complete script. This recipe should
generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives,
ModernBERT classifiers, etc.).
## Limitations
Inherited from the base model:
- English language only
- Trained primarily on cybersecurity content; performance on other domains
may vary
- May reflect biases in the training data toward over-represented threats,
technologies, or vendors
Specific to this conversion:
- Fixed sequence length of 512 tokens (the original model supports up to 1024;
this conversion uses 512 for faster inference and smaller memory footprint)
- FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring
exact PyTorch-equivalent output but irrelevant for ranking tasks
- macOS 14 (Sonoma) or newer required (`minimum_deployment_target=ct.target.macOS14`)
## Citation
If you use this model, please cite the original SecureBERT 2.0 paper:
```bibtex
@article{aghaei2025securebert2,
title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
author={Aghaei, Ehsan and others},
journal={arXiv preprint arXiv:2510.00240},
year={2025}
}
```
## License
Apache 2.0, matching the license of the original model.
## Acknowledgments
- Cisco AI for the original [SecureBERT 2.0](https://github.com/cisco-ai-defense/securebert2)
model family
- Apple's [coremltools](https://github.com/apple/coremltools) team for ongoing
ModernBERT support
- Hugging Face's [swift-transformers](https://github.com/huggingface/swift-transformers)
team for the Swift tokenizer support that makes this practical to use
## Related models
Other SecureBERT 2.0 models from Cisco AI:
- [`cisco-ai/SecureBERT2.0-base`](https://huggingface.co/cisco-ai/SecureBERT2.0-base) — Base encoder
- [`cisco-ai/SecureBERT2.0-biencoder`](https://huggingface.co/cisco-ai/SecureBERT2.0-biencoder) — Bi-encoder for retrieval
- [`cisco-ai/SecureBERT2.0-NER`](https://huggingface.co/cisco-ai/SecureBERT2.0-NER) — Named entity recognition
- [`cisco-ai/SecureBERT2.0-code-vuln-detection`](https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection) — Vulnerability classification
If you convert any of these to Core ML using a similar recipe, feel free to
open an issue and I'll link your repo here. |