chrslrssn's picture
Update README.md
f8035c0 verified
---
license: apache-2.0
base_model: cisco-ai/SecureBERT2.0-cross_encoder
tags:
- core-ml
- apple-silicon
- cross-encoder
- cybersecurity
- reranking
- modernbert
language:
- en
pipeline_tag: text-classification
---
# SecureBERT 2.0 Cross-Encoder for Core ML
Core ML conversion of [cisco-ai/SecureBERT2.0-cross_encoder](https://huggingface.co/cisco-ai/SecureBERT2.0-cross_encoder),
ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework.
The original model is a cybersecurity domain-specific cross-encoder built on
ModernBERT. It takes a pair of texts (query + document) and outputs a similarity
score between 0 and 1, suitable for retrieval reranking, semantic search, and
cybersecurity intelligence applications.
This repository contains pre-converted `.mlpackage` files plus the conversion
script that produced them, allowing direct use in Swift applications without
running Python or Ollama at inference time.
## What's in this repository
| File | Size | Purpose |
|---|---|---|
| `SecureBERT2_CrossEncoder_FP16.mlpackage/` | 286 MB | FP16 Core ML model (recommended) |
| `SecureBERT2_CrossEncoder_FP32.mlpackage/` | 572 MB | FP32 Core ML model (reference precision) |
| `convert_via_torch_export.py` | ~6 KB | The conversion script that produced these files |
For most use cases, use the FP16 version. It is half the size and runs identically
on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch).
## Model specification
Both models share the same input/output specification:
| Tensor | Name | Shape | Dtype |
|---|---|---|---|
| Input 1 | `input_ids` | (1, 512) | INT32 |
| Input 2 | `attention_mask` | (1, 512) | INT32 |
| Output | `score` | (1, 1) | FLOAT16 (FP16 model) / FLOAT32 (FP32 model) |
The model expects standard BERT pair tokenization:
```
[CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ...
```
Special token IDs (from the original tokenizer):
| Token | ID |
|---|---|
| `[CLS]` | 50281 |
| `[SEP]` | 50282 |
| `[PAD]` | 50283 |
| `[UNK]` | 50280 |
The output score is already sigmoid-activated (range 0-1). The sigmoid was baked
into the model graph during conversion, so no post-processing is needed in Swift.
## Quick start (Swift)
Install [huggingface/swift-transformers](https://github.com/huggingface/swift-transformers)
for tokenization, then use Core ML directly:
```swift
import CoreML
import Tokenizers
// Load tokenizer (matches Python tokenization exactly)
let tokenizer = try await AutoTokenizer.from(
pretrained: "cisco-ai/SecureBERT2.0-cross_encoder"
)
// Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc)
let config = MLModelConfiguration()
config.computeUnits = .all // Use Neural Engine when available
guard let modelURL = Bundle.main.url(
forResource: "SecureBERT2_CrossEncoder_FP16",
withExtension: "mlmodelc"
) else { fatalError("Model not found in bundle") }
let model = try MLModel(contentsOf: modelURL, configuration: config)
// Score a query/document pair
func score(query: String, document: String) throws -> Double {
// Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]...
// (Use tokenizer's pair encoding API, or build manually using
// CLS=50281, SEP=50282, PAD=50283)
let inputIds: [Int] = /* your tokenization here, length 512 */
let attentionMask: [Int] = /* 1s for content, 0s for padding */
let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
for i in 0..<512 {
inputIdsArray[i] = NSNumber(value: inputIds[i])
attentionMaskArray[i] = NSNumber(value: attentionMask[i])
}
let inputs = try MLDictionaryFeatureProvider(dictionary: [
"input_ids": MLFeatureValue(multiArray: inputIdsArray),
"attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
])
let prediction = try model.prediction(from: inputs)
let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue!
return scoreArray[0].doubleValue
}
```
## Verification
Conversion correctness was verified by comparing Core ML output against the
original PyTorch model on three test cases:
| Test case | PyTorch | Core ML FP16 | Diff |
|---|---|---|---|
| Highly relevant (vPC config Q + vPC config A) | 0.9948 | 0.9946 | 0.000132 |
| Same domain, different topic | 0.3406 | 0.3420 | 0.001481 |
| Unrelated content | 0.0160 | 0.0158 | 0.000190 |
Max numerical drift: ~0.0015. Ranking order is identical to PyTorch.
Inference benchmarks on M4 Max (36 GB):
- Model load time: ~0.5 seconds
- First inference (warm-up): ~2300 ms
- Subsequent inferences: ~20 ms per query/document pair
- Throughput after warm-up: ~50 pairs/second
The high first-inference latency is one-time cost from Neural Engine compilation.
For interactive applications, perform a warm-up inference at app startup.
## Conversion recipe
The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based
models. The standard `torch.jit.trace` path fails on ModernBERT's attention
operations due to int-op handling in coremltools 9.0.
The working recipe:
1. Pin dependency versions: `torch==2.7.0`, `transformers==4.52.4`,
`sentence-transformers==5.0.0`, `coremltools==9.0`
2. Load model with `attn_implementation="eager"` to avoid SDPA tracing issues
3. Use `torch.export.export(strict=False)` instead of `torch.jit.trace`
4. Call `exported_program.run_decompositions({})` to convert from TRAINING
dialect to ATEN dialect (required by coremltools 9.0)
5. Pass the resulting `ExportedProgram` to `ct.convert()`
See `convert_via_torch_export.py` for the complete script. This recipe should
generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives,
ModernBERT classifiers, etc.).
## Limitations
Inherited from the base model:
- English language only
- Trained primarily on cybersecurity content; performance on other domains
may vary
- May reflect biases in the training data toward over-represented threats,
technologies, or vendors
Specific to this conversion:
- Fixed sequence length of 512 tokens (the original model supports up to 1024;
this conversion uses 512 for faster inference and smaller memory footprint)
- FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring
exact PyTorch-equivalent output but irrelevant for ranking tasks
- macOS 14 (Sonoma) or newer required (`minimum_deployment_target=ct.target.macOS14`)
## Citation
If you use this model, please cite the original SecureBERT 2.0 paper:
```bibtex
@article{aghaei2025securebert2,
title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
author={Aghaei, Ehsan and others},
journal={arXiv preprint arXiv:2510.00240},
year={2025}
}
```
## License
Apache 2.0, matching the license of the original model.
## Acknowledgments
- Cisco AI for the original [SecureBERT 2.0](https://github.com/cisco-ai-defense/securebert2)
model family
- Apple's [coremltools](https://github.com/apple/coremltools) team for ongoing
ModernBERT support
- Hugging Face's [swift-transformers](https://github.com/huggingface/swift-transformers)
team for the Swift tokenizer support that makes this practical to use
## Related models
Other SecureBERT 2.0 models from Cisco AI:
- [`cisco-ai/SecureBERT2.0-base`](https://huggingface.co/cisco-ai/SecureBERT2.0-base) — Base encoder
- [`cisco-ai/SecureBERT2.0-biencoder`](https://huggingface.co/cisco-ai/SecureBERT2.0-biencoder) — Bi-encoder for retrieval
- [`cisco-ai/SecureBERT2.0-NER`](https://huggingface.co/cisco-ai/SecureBERT2.0-NER) — Named entity recognition
- [`cisco-ai/SecureBERT2.0-code-vuln-detection`](https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection) — Vulnerability classification
If you convert any of these to Core ML using a similar recipe, feel free to
open an issue and I'll link your repo here.