SecureBERT 2.0 Cross-Encoder for Core ML

Core ML conversion of cisco-ai/SecureBERT2.0-cross_encoder, ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework.

The original model is a cybersecurity domain-specific cross-encoder built on ModernBERT. It takes a pair of texts (query + document) and outputs a similarity score between 0 and 1, suitable for retrieval reranking, semantic search, and cybersecurity intelligence applications.

This repository contains pre-converted .mlpackage files plus the conversion script that produced them, allowing direct use in Swift applications without running Python or Ollama at inference time.

What's in this repository

File	Size	Purpose
`SecureBERT2_CrossEncoder_FP16.mlpackage/`	286 MB	FP16 Core ML model (recommended)
`SecureBERT2_CrossEncoder_FP32.mlpackage/`	572 MB	FP32 Core ML model (reference precision)
`convert_via_torch_export.py`	~6 KB	The conversion script that produced these files

For most use cases, use the FP16 version. It is half the size and runs identically on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch).

Model specification

Both models share the same input/output specification:

Tensor	Name	Shape	Dtype
Input 1	`input_ids`	(1, 512)	INT32
Input 2	`attention_mask`	(1, 512)	INT32
Output	`score`	(1, 1)	FLOAT16 (FP16 model) / FLOAT32 (FP32 model)

The model expects standard BERT pair tokenization:

[CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ...

Special token IDs (from the original tokenizer):

Token	ID
`[CLS]`	50281
`[SEP]`	50282
`[PAD]`	50283
`[UNK]`	50280

The output score is already sigmoid-activated (range 0-1). The sigmoid was baked into the model graph during conversion, so no post-processing is needed in Swift.

Quick start (Swift)

Install huggingface/swift-transformers for tokenization, then use Core ML directly:

import CoreML
import Tokenizers

// Load tokenizer (matches Python tokenization exactly)
let tokenizer = try await AutoTokenizer.from(
    pretrained: "cisco-ai/SecureBERT2.0-cross_encoder"
)

// Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc)
let config = MLModelConfiguration()
config.computeUnits = .all  // Use Neural Engine when available

guard let modelURL = Bundle.main.url(
    forResource: "SecureBERT2_CrossEncoder_FP16",
    withExtension: "mlmodelc"
) else { fatalError("Model not found in bundle") }

let model = try MLModel(contentsOf: modelURL, configuration: config)

// Score a query/document pair
func score(query: String, document: String) throws -> Double {
    // Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]...
    // (Use tokenizer's pair encoding API, or build manually using
    //  CLS=50281, SEP=50282, PAD=50283)
    let inputIds: [Int] = /* your tokenization here, length 512 */
    let attentionMask: [Int] = /* 1s for content, 0s for padding */
    
    let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
    let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
    
    for i in 0..<512 {
        inputIdsArray[i] = NSNumber(value: inputIds[i])
        attentionMaskArray[i] = NSNumber(value: attentionMask[i])
    }
    
    let inputs = try MLDictionaryFeatureProvider(dictionary: [
        "input_ids": MLFeatureValue(multiArray: inputIdsArray),
        "attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
    ])
    
    let prediction = try model.prediction(from: inputs)
    let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue!
    return scoreArray[0].doubleValue
}

Verification

Conversion correctness was verified by comparing Core ML output against the original PyTorch model on three test cases:

Test case	PyTorch	Core ML FP16	Diff
Highly relevant (vPC config Q + vPC config A)	0.9948	0.9946	0.000132
Same domain, different topic	0.3406	0.3420	0.001481
Unrelated content	0.0160	0.0158	0.000190

Max numerical drift: ~0.0015. Ranking order is identical to PyTorch.

Inference benchmarks on M4 Max (36 GB):

Model load time: ~0.5 seconds
First inference (warm-up): ~2300 ms
Subsequent inferences: ~20 ms per query/document pair
Throughput after warm-up: ~50 pairs/second

The high first-inference latency is one-time cost from Neural Engine compilation. For interactive applications, perform a warm-up inference at app startup.

Conversion recipe

The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based models. The standard torch.jit.trace path fails on ModernBERT's attention operations due to int-op handling in coremltools 9.0.

The working recipe:

Pin dependency versions: torch==2.7.0, transformers==4.52.4, sentence-transformers==5.0.0, coremltools==9.0
Load model with attn_implementation="eager" to avoid SDPA tracing issues
Use torch.export.export(strict=False) instead of torch.jit.trace
Call exported_program.run_decompositions({}) to convert from TRAINING dialect to ATEN dialect (required by coremltools 9.0)
Pass the resulting ExportedProgram to ct.convert()

See convert_via_torch_export.py for the complete script. This recipe should generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives, ModernBERT classifiers, etc.).

Limitations

Inherited from the base model:

English language only
Trained primarily on cybersecurity content; performance on other domains may vary
May reflect biases in the training data toward over-represented threats, technologies, or vendors

Specific to this conversion:

Fixed sequence length of 512 tokens (the original model supports up to 1024; this conversion uses 512 for faster inference and smaller memory footprint)
FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring exact PyTorch-equivalent output but irrelevant for ranking tasks
macOS 14 (Sonoma) or newer required (minimum_deployment_target=ct.target.macOS14)

Citation

If you use this model, please cite the original SecureBERT 2.0 paper:

@article{aghaei2025securebert2,
  title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
  author={Aghaei, Ehsan and others},
  journal={arXiv preprint arXiv:2510.00240},
  year={2025}
}

License

Apache 2.0, matching the license of the original model.

Acknowledgments

Cisco AI for the original SecureBERT 2.0 model family
Apple's coremltools team for ongoing ModernBERT support
Hugging Face's swift-transformers team for the Swift tokenizer support that makes this practical to use