| --- |
| license: apache-2.0 |
| base_model: cisco-ai/SecureBERT2.0-cross_encoder |
| tags: |
| - core-ml |
| - apple-silicon |
| - cross-encoder |
| - cybersecurity |
| - reranking |
| - modernbert |
| language: |
| - en |
| pipeline_tag: text-classification |
| --- |
| |
| # SecureBERT 2.0 Cross-Encoder for Core ML |
|
|
| Core ML conversion of [cisco-ai/SecureBERT2.0-cross_encoder](https://huggingface.co/cisco-ai/SecureBERT2.0-cross_encoder), |
| ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework. |
|
|
| The original model is a cybersecurity domain-specific cross-encoder built on |
| ModernBERT. It takes a pair of texts (query + document) and outputs a similarity |
| score between 0 and 1, suitable for retrieval reranking, semantic search, and |
| cybersecurity intelligence applications. |
|
|
| This repository contains pre-converted `.mlpackage` files plus the conversion |
| script that produced them, allowing direct use in Swift applications without |
| running Python or Ollama at inference time. |
|
|
| ## What's in this repository |
|
|
| | File | Size | Purpose | |
| |---|---|---| |
| | `SecureBERT2_CrossEncoder_FP16.mlpackage/` | 286 MB | FP16 Core ML model (recommended) | |
| | `SecureBERT2_CrossEncoder_FP32.mlpackage/` | 572 MB | FP32 Core ML model (reference precision) | |
| | `convert_via_torch_export.py` | ~6 KB | The conversion script that produced these files | |
|
|
| For most use cases, use the FP16 version. It is half the size and runs identically |
| on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch). |
|
|
| ## Model specification |
|
|
| Both models share the same input/output specification: |
|
|
| | Tensor | Name | Shape | Dtype | |
| |---|---|---|---| |
| | Input 1 | `input_ids` | (1, 512) | INT32 | |
| | Input 2 | `attention_mask` | (1, 512) | INT32 | |
| | Output | `score` | (1, 1) | FLOAT16 (FP16 model) / FLOAT32 (FP32 model) | |
|
|
| The model expects standard BERT pair tokenization: |
|
|
| ``` |
| [CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ... |
| ``` |
|
|
| Special token IDs (from the original tokenizer): |
|
|
| | Token | ID | |
| |---|---| |
| | `[CLS]` | 50281 | |
| | `[SEP]` | 50282 | |
| | `[PAD]` | 50283 | |
| | `[UNK]` | 50280 | |
|
|
| The output score is already sigmoid-activated (range 0-1). The sigmoid was baked |
| into the model graph during conversion, so no post-processing is needed in Swift. |
|
|
| ## Quick start (Swift) |
|
|
| Install [huggingface/swift-transformers](https://github.com/huggingface/swift-transformers) |
| for tokenization, then use Core ML directly: |
|
|
| ```swift |
| import CoreML |
| import Tokenizers |
| |
| // Load tokenizer (matches Python tokenization exactly) |
| let tokenizer = try await AutoTokenizer.from( |
| pretrained: "cisco-ai/SecureBERT2.0-cross_encoder" |
| ) |
| |
| // Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc) |
| let config = MLModelConfiguration() |
| config.computeUnits = .all // Use Neural Engine when available |
| |
| guard let modelURL = Bundle.main.url( |
| forResource: "SecureBERT2_CrossEncoder_FP16", |
| withExtension: "mlmodelc" |
| ) else { fatalError("Model not found in bundle") } |
| |
| let model = try MLModel(contentsOf: modelURL, configuration: config) |
| |
| // Score a query/document pair |
| func score(query: String, document: String) throws -> Double { |
| // Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]... |
| // (Use tokenizer's pair encoding API, or build manually using |
| // CLS=50281, SEP=50282, PAD=50283) |
| let inputIds: [Int] = /* your tokenization here, length 512 */ |
| let attentionMask: [Int] = /* 1s for content, 0s for padding */ |
| |
| let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32) |
| let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32) |
| |
| for i in 0..<512 { |
| inputIdsArray[i] = NSNumber(value: inputIds[i]) |
| attentionMaskArray[i] = NSNumber(value: attentionMask[i]) |
| } |
| |
| let inputs = try MLDictionaryFeatureProvider(dictionary: [ |
| "input_ids": MLFeatureValue(multiArray: inputIdsArray), |
| "attention_mask": MLFeatureValue(multiArray: attentionMaskArray) |
| ]) |
| |
| let prediction = try model.prediction(from: inputs) |
| let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue! |
| return scoreArray[0].doubleValue |
| } |
| ``` |
|
|
| ## Verification |
|
|
| Conversion correctness was verified by comparing Core ML output against the |
| original PyTorch model on three test cases: |
|
|
| | Test case | PyTorch | Core ML FP16 | Diff | |
| |---|---|---|---| |
| | Highly relevant (vPC config Q + vPC config A) | 0.9948 | 0.9946 | 0.000132 | |
| | Same domain, different topic | 0.3406 | 0.3420 | 0.001481 | |
| | Unrelated content | 0.0160 | 0.0158 | 0.000190 | |
|
|
| Max numerical drift: ~0.0015. Ranking order is identical to PyTorch. |
|
|
| Inference benchmarks on M4 Max (36 GB): |
|
|
| - Model load time: ~0.5 seconds |
| - First inference (warm-up): ~2300 ms |
| - Subsequent inferences: ~20 ms per query/document pair |
| - Throughput after warm-up: ~50 pairs/second |
|
|
| The high first-inference latency is one-time cost from Neural Engine compilation. |
| For interactive applications, perform a warm-up inference at app startup. |
|
|
| ## Conversion recipe |
|
|
| The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based |
| models. The standard `torch.jit.trace` path fails on ModernBERT's attention |
| operations due to int-op handling in coremltools 9.0. |
|
|
| The working recipe: |
|
|
| 1. Pin dependency versions: `torch==2.7.0`, `transformers==4.52.4`, |
| `sentence-transformers==5.0.0`, `coremltools==9.0` |
| 2. Load model with `attn_implementation="eager"` to avoid SDPA tracing issues |
| 3. Use `torch.export.export(strict=False)` instead of `torch.jit.trace` |
| 4. Call `exported_program.run_decompositions({})` to convert from TRAINING |
| dialect to ATEN dialect (required by coremltools 9.0) |
| 5. Pass the resulting `ExportedProgram` to `ct.convert()` |
|
|
| See `convert_via_torch_export.py` for the complete script. This recipe should |
| generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives, |
| ModernBERT classifiers, etc.). |
|
|
| ## Limitations |
|
|
| Inherited from the base model: |
|
|
| - English language only |
| - Trained primarily on cybersecurity content; performance on other domains |
| may vary |
| - May reflect biases in the training data toward over-represented threats, |
| technologies, or vendors |
|
|
| Specific to this conversion: |
|
|
| - Fixed sequence length of 512 tokens (the original model supports up to 1024; |
| this conversion uses 512 for faster inference and smaller memory footprint) |
| - FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring |
| exact PyTorch-equivalent output but irrelevant for ranking tasks |
| - macOS 14 (Sonoma) or newer required (`minimum_deployment_target=ct.target.macOS14`) |
|
|
| ## Citation |
|
|
| If you use this model, please cite the original SecureBERT 2.0 paper: |
|
|
| ```bibtex |
| @article{aghaei2025securebert2, |
| title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence}, |
| author={Aghaei, Ehsan and others}, |
| journal={arXiv preprint arXiv:2510.00240}, |
| year={2025} |
| } |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0, matching the license of the original model. |
|
|
| ## Acknowledgments |
|
|
| - Cisco AI for the original [SecureBERT 2.0](https://github.com/cisco-ai-defense/securebert2) |
| model family |
| - Apple's [coremltools](https://github.com/apple/coremltools) team for ongoing |
| ModernBERT support |
| - Hugging Face's [swift-transformers](https://github.com/huggingface/swift-transformers) |
| team for the Swift tokenizer support that makes this practical to use |
|
|
| ## Related models |
|
|
| Other SecureBERT 2.0 models from Cisco AI: |
|
|
| - [`cisco-ai/SecureBERT2.0-base`](https://huggingface.co/cisco-ai/SecureBERT2.0-base) — Base encoder |
| - [`cisco-ai/SecureBERT2.0-biencoder`](https://huggingface.co/cisco-ai/SecureBERT2.0-biencoder) — Bi-encoder for retrieval |
| - [`cisco-ai/SecureBERT2.0-NER`](https://huggingface.co/cisco-ai/SecureBERT2.0-NER) — Named entity recognition |
| - [`cisco-ai/SecureBERT2.0-code-vuln-detection`](https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection) — Vulnerability classification |
|
|
| If you convert any of these to Core ML using a similar recipe, feel free to |
| open an issue and I'll link your repo here. |