license: apache-2.0
base_model: cisco-ai/SecureBERT2.0-cross_encoder
tags:
- core-ml
- apple-silicon
- cross-encoder
- cybersecurity
- reranking
- modernbert
language:
- en
pipeline_tag: text-classification
SecureBERT 2.0 Cross-Encoder for Core ML
Core ML conversion of cisco-ai/SecureBERT2.0-cross_encoder, ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework.
The original model is a cybersecurity domain-specific cross-encoder built on ModernBERT. It takes a pair of texts (query + document) and outputs a similarity score between 0 and 1, suitable for retrieval reranking, semantic search, and cybersecurity intelligence applications.
This repository contains pre-converted .mlpackage files plus the conversion
script that produced them, allowing direct use in Swift applications without
running Python or Ollama at inference time.
What's in this repository
| File | Size | Purpose |
|---|---|---|
SecureBERT2_CrossEncoder_FP16.mlpackage/ |
286 MB | FP16 Core ML model (recommended) |
SecureBERT2_CrossEncoder_FP32.mlpackage/ |
572 MB | FP32 Core ML model (reference precision) |
convert_via_torch_export.py |
~6 KB | The conversion script that produced these files |
For most use cases, use the FP16 version. It is half the size and runs identically on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch).
Model specification
Both models share the same input/output specification:
| Tensor | Name | Shape | Dtype |
|---|---|---|---|
| Input 1 | input_ids |
(1, 512) | INT32 |
| Input 2 | attention_mask |
(1, 512) | INT32 |
| Output | score |
(1, 1) | FLOAT16 (FP16 model) / FLOAT32 (FP32 model) |
The model expects standard BERT pair tokenization:
[CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ...
Special token IDs (from the original tokenizer):
| Token | ID |
|---|---|
[CLS] |
50281 |
[SEP] |
50282 |
[PAD] |
50283 |
[UNK] |
50280 |
The output score is already sigmoid-activated (range 0-1). The sigmoid was baked into the model graph during conversion, so no post-processing is needed in Swift.
Quick start (Swift)
Install huggingface/swift-transformers for tokenization, then use Core ML directly:
import CoreML
import Tokenizers
// Load tokenizer (matches Python tokenization exactly)
let tokenizer = try await AutoTokenizer.from(
pretrained: "cisco-ai/SecureBERT2.0-cross_encoder"
)
// Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc)
let config = MLModelConfiguration()
config.computeUnits = .all // Use Neural Engine when available
guard let modelURL = Bundle.main.url(
forResource: "SecureBERT2_CrossEncoder_FP16",
withExtension: "mlmodelc"
) else { fatalError("Model not found in bundle") }
let model = try MLModel(contentsOf: modelURL, configuration: config)
// Score a query/document pair
func score(query: String, document: String) throws -> Double {
// Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]...
// (Use tokenizer's pair encoding API, or build manually using
// CLS=50281, SEP=50282, PAD=50283)
let inputIds: [Int] = /* your tokenization here, length 512 */
let attentionMask: [Int] = /* 1s for content, 0s for padding */
let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
for i in 0..<512 {
inputIdsArray[i] = NSNumber(value: inputIds[i])
attentionMaskArray[i] = NSNumber(value: attentionMask[i])
}
let inputs = try MLDictionaryFeatureProvider(dictionary: [
"input_ids": MLFeatureValue(multiArray: inputIdsArray),
"attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
])
let prediction = try model.prediction(from: inputs)
let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue!
return scoreArray[0].doubleValue
}
Verification
Conversion correctness was verified by comparing Core ML output against the original PyTorch model on three test cases:
| Test case | PyTorch | Core ML FP16 | Diff |
|---|---|---|---|
| Highly relevant (vPC config Q + vPC config A) | 0.9948 | 0.9946 | 0.000132 |
| Same domain, different topic | 0.3406 | 0.3420 | 0.001481 |
| Unrelated content | 0.0160 | 0.0158 | 0.000190 |
Max numerical drift: ~0.0015. Ranking order is identical to PyTorch.
Inference benchmarks on M4 Max (36 GB):
- Model load time: ~0.5 seconds
- First inference (warm-up): ~2300 ms
- Subsequent inferences: ~20 ms per query/document pair
- Throughput after warm-up: ~50 pairs/second
The high first-inference latency is one-time cost from Neural Engine compilation. For interactive applications, perform a warm-up inference at app startup.
Conversion recipe
The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based
models. The standard torch.jit.trace path fails on ModernBERT's attention
operations due to int-op handling in coremltools 9.0.
The working recipe:
- Pin dependency versions:
torch==2.7.0,transformers==4.52.4,sentence-transformers==5.0.0,coremltools==9.0 - Load model with
attn_implementation="eager"to avoid SDPA tracing issues - Use
torch.export.export(strict=False)instead oftorch.jit.trace - Call
exported_program.run_decompositions({})to convert from TRAINING dialect to ATEN dialect (required by coremltools 9.0) - Pass the resulting
ExportedProgramtoct.convert()
See convert_via_torch_export.py for the complete script. This recipe should
generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives,
ModernBERT classifiers, etc.).
Limitations
Inherited from the base model:
- English language only
- Trained primarily on cybersecurity content; performance on other domains may vary
- May reflect biases in the training data toward over-represented threats, technologies, or vendors
Specific to this conversion:
- Fixed sequence length of 512 tokens (the original model supports up to 1024; this conversion uses 512 for faster inference and smaller memory footprint)
- FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring exact PyTorch-equivalent output but irrelevant for ranking tasks
- macOS 14 (Sonoma) or newer required (
minimum_deployment_target=ct.target.macOS14)
Citation
If you use this model, please cite the original SecureBERT 2.0 paper:
@article{aghaei2025securebert2,
title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
author={Aghaei, Ehsan and others},
journal={arXiv preprint arXiv:2510.00240},
year={2025}
}
License
Apache 2.0, matching the license of the original model.
Acknowledgments
- Cisco AI for the original SecureBERT 2.0 model family
- Apple's coremltools team for ongoing ModernBERT support
- Hugging Face's swift-transformers team for the Swift tokenizer support that makes this practical to use
Related models
Other SecureBERT 2.0 models from Cisco AI:
cisco-ai/SecureBERT2.0-base— Base encodercisco-ai/SecureBERT2.0-biencoder— Bi-encoder for retrievalcisco-ai/SecureBERT2.0-NER— Named entity recognitioncisco-ai/SecureBERT2.0-code-vuln-detection— Vulnerability classification
If you convert any of these to Core ML using a similar recipe, feel free to open an issue and I'll link your repo here.