--- license: apache-2.0 base_model: cisco-ai/SecureBERT2.0-cross_encoder tags: - core-ml - apple-silicon - cross-encoder - cybersecurity - reranking - modernbert language: - en pipeline_tag: text-classification --- # SecureBERT 2.0 Cross-Encoder for Core ML Core ML conversion of [cisco-ai/SecureBERT2.0-cross_encoder](https://huggingface.co/cisco-ai/SecureBERT2.0-cross_encoder), ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework. The original model is a cybersecurity domain-specific cross-encoder built on ModernBERT. It takes a pair of texts (query + document) and outputs a similarity score between 0 and 1, suitable for retrieval reranking, semantic search, and cybersecurity intelligence applications. This repository contains pre-converted `.mlpackage` files plus the conversion script that produced them, allowing direct use in Swift applications without running Python or Ollama at inference time. ## What's in this repository | File | Size | Purpose | |---|---|---| | `SecureBERT2_CrossEncoder_FP16.mlpackage/` | 286 MB | FP16 Core ML model (recommended) | | `SecureBERT2_CrossEncoder_FP32.mlpackage/` | 572 MB | FP32 Core ML model (reference precision) | | `convert_via_torch_export.py` | ~6 KB | The conversion script that produced these files | For most use cases, use the FP16 version. It is half the size and runs identically on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch). ## Model specification Both models share the same input/output specification: | Tensor | Name | Shape | Dtype | |---|---|---|---| | Input 1 | `input_ids` | (1, 512) | INT32 | | Input 2 | `attention_mask` | (1, 512) | INT32 | | Output | `score` | (1, 1) | FLOAT16 (FP16 model) / FLOAT32 (FP32 model) | The model expects standard BERT pair tokenization: ``` [CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ... ``` Special token IDs (from the original tokenizer): | Token | ID | |---|---| | `[CLS]` | 50281 | | `[SEP]` | 50282 | | `[PAD]` | 50283 | | `[UNK]` | 50280 | The output score is already sigmoid-activated (range 0-1). The sigmoid was baked into the model graph during conversion, so no post-processing is needed in Swift. ## Quick start (Swift) Install [huggingface/swift-transformers](https://github.com/huggingface/swift-transformers) for tokenization, then use Core ML directly: ```swift import CoreML import Tokenizers // Load tokenizer (matches Python tokenization exactly) let tokenizer = try await AutoTokenizer.from( pretrained: "cisco-ai/SecureBERT2.0-cross_encoder" ) // Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc) let config = MLModelConfiguration() config.computeUnits = .all // Use Neural Engine when available guard let modelURL = Bundle.main.url( forResource: "SecureBERT2_CrossEncoder_FP16", withExtension: "mlmodelc" ) else { fatalError("Model not found in bundle") } let model = try MLModel(contentsOf: modelURL, configuration: config) // Score a query/document pair func score(query: String, document: String) throws -> Double { // Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]... // (Use tokenizer's pair encoding API, or build manually using // CLS=50281, SEP=50282, PAD=50283) let inputIds: [Int] = /* your tokenization here, length 512 */ let attentionMask: [Int] = /* 1s for content, 0s for padding */ let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32) let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32) for i in 0..<512 { inputIdsArray[i] = NSNumber(value: inputIds[i]) attentionMaskArray[i] = NSNumber(value: attentionMask[i]) } let inputs = try MLDictionaryFeatureProvider(dictionary: [ "input_ids": MLFeatureValue(multiArray: inputIdsArray), "attention_mask": MLFeatureValue(multiArray: attentionMaskArray) ]) let prediction = try model.prediction(from: inputs) let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue! return scoreArray[0].doubleValue } ``` ## Verification Conversion correctness was verified by comparing Core ML output against the original PyTorch model on three test cases: | Test case | PyTorch | Core ML FP16 | Diff | |---|---|---|---| | Highly relevant (vPC config Q + vPC config A) | 0.9948 | 0.9946 | 0.000132 | | Same domain, different topic | 0.3406 | 0.3420 | 0.001481 | | Unrelated content | 0.0160 | 0.0158 | 0.000190 | Max numerical drift: ~0.0015. Ranking order is identical to PyTorch. Inference benchmarks on M4 Max (36 GB): - Model load time: ~0.5 seconds - First inference (warm-up): ~2300 ms - Subsequent inferences: ~20 ms per query/document pair - Throughput after warm-up: ~50 pairs/second The high first-inference latency is one-time cost from Neural Engine compilation. For interactive applications, perform a warm-up inference at app startup. ## Conversion recipe The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based models. The standard `torch.jit.trace` path fails on ModernBERT's attention operations due to int-op handling in coremltools 9.0. The working recipe: 1. Pin dependency versions: `torch==2.7.0`, `transformers==4.52.4`, `sentence-transformers==5.0.0`, `coremltools==9.0` 2. Load model with `attn_implementation="eager"` to avoid SDPA tracing issues 3. Use `torch.export.export(strict=False)` instead of `torch.jit.trace` 4. Call `exported_program.run_decompositions({})` to convert from TRAINING dialect to ATEN dialect (required by coremltools 9.0) 5. Pass the resulting `ExportedProgram` to `ct.convert()` See `convert_via_torch_export.py` for the complete script. This recipe should generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives, ModernBERT classifiers, etc.). ## Limitations Inherited from the base model: - English language only - Trained primarily on cybersecurity content; performance on other domains may vary - May reflect biases in the training data toward over-represented threats, technologies, or vendors Specific to this conversion: - Fixed sequence length of 512 tokens (the original model supports up to 1024; this conversion uses 512 for faster inference and smaller memory footprint) - FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring exact PyTorch-equivalent output but irrelevant for ranking tasks - macOS 14 (Sonoma) or newer required (`minimum_deployment_target=ct.target.macOS14`) ## Citation If you use this model, please cite the original SecureBERT 2.0 paper: ```bibtex @article{aghaei2025securebert2, title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence}, author={Aghaei, Ehsan and others}, journal={arXiv preprint arXiv:2510.00240}, year={2025} } ``` ## License Apache 2.0, matching the license of the original model. ## Acknowledgments - Cisco AI for the original [SecureBERT 2.0](https://github.com/cisco-ai-defense/securebert2) model family - Apple's [coremltools](https://github.com/apple/coremltools) team for ongoing ModernBERT support - Hugging Face's [swift-transformers](https://github.com/huggingface/swift-transformers) team for the Swift tokenizer support that makes this practical to use ## Related models Other SecureBERT 2.0 models from Cisco AI: - [`cisco-ai/SecureBERT2.0-base`](https://huggingface.co/cisco-ai/SecureBERT2.0-base) — Base encoder - [`cisco-ai/SecureBERT2.0-biencoder`](https://huggingface.co/cisco-ai/SecureBERT2.0-biencoder) — Bi-encoder for retrieval - [`cisco-ai/SecureBERT2.0-NER`](https://huggingface.co/cisco-ai/SecureBERT2.0-NER) — Named entity recognition - [`cisco-ai/SecureBERT2.0-code-vuln-detection`](https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection) — Vulnerability classification If you convert any of these to Core ML using a similar recipe, feel free to open an issue and I'll link your repo here.