File size: 7,980 Bytes

---
license: apache-2.0
base_model: cisco-ai/SecureBERT2.0-cross_encoder
tags:
  - core-ml
  - apple-silicon
  - cross-encoder
  - cybersecurity
  - reranking
  - modernbert
language:
  - en
pipeline_tag: text-classification
---

# SecureBERT 2.0 Cross-Encoder for Core ML

Core ML conversion of [cisco-ai/SecureBERT2.0-cross_encoder](https://huggingface.co/cisco-ai/SecureBERT2.0-cross_encoder),
ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework.

The original model is a cybersecurity domain-specific cross-encoder built on
ModernBERT. It takes a pair of texts (query + document) and outputs a similarity
score between 0 and 1, suitable for retrieval reranking, semantic search, and
cybersecurity intelligence applications.

This repository contains pre-converted `.mlpackage` files plus the conversion
script that produced them, allowing direct use in Swift applications without
running Python or Ollama at inference time.

## What's in this repository

| File | Size | Purpose |
|---|---|---|
| `SecureBERT2_CrossEncoder_FP16.mlpackage/` | 286 MB | FP16 Core ML model (recommended) |
| `SecureBERT2_CrossEncoder_FP32.mlpackage/` | 572 MB | FP32 Core ML model (reference precision) |
| `convert_via_torch_export.py` | ~6 KB | The conversion script that produced these files |

For most use cases, use the FP16 version. It is half the size and runs identically
on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch).

## Model specification

Both models share the same input/output specification:

| Tensor | Name | Shape | Dtype |
|---|---|---|---|
| Input 1 | `input_ids` | (1, 512) | INT32 |
| Input 2 | `attention_mask` | (1, 512) | INT32 |
| Output | `score` | (1, 1) | FLOAT16 (FP16 model) / FLOAT32 (FP32 model) |

The model expects standard BERT pair tokenization:

```
[CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ...
```

Special token IDs (from the original tokenizer):

| Token | ID |
|---|---|
| `[CLS]` | 50281 |
| `[SEP]` | 50282 |
| `[PAD]` | 50283 |
| `[UNK]` | 50280 |

The output score is already sigmoid-activated (range 0-1). The sigmoid was baked
into the model graph during conversion, so no post-processing is needed in Swift.

## Quick start (Swift)

Install [huggingface/swift-transformers](https://github.com/huggingface/swift-transformers)
for tokenization, then use Core ML directly:

```swift
import CoreML
import Tokenizers

// Load tokenizer (matches Python tokenization exactly)
let tokenizer = try await AutoTokenizer.from(
    pretrained: "cisco-ai/SecureBERT2.0-cross_encoder"
)

// Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc)
let config = MLModelConfiguration()
config.computeUnits = .all  // Use Neural Engine when available

guard let modelURL = Bundle.main.url(
    forResource: "SecureBERT2_CrossEncoder_FP16",
    withExtension: "mlmodelc"
) else { fatalError("Model not found in bundle") }

let model = try MLModel(contentsOf: modelURL, configuration: config)

// Score a query/document pair
func score(query: String, document: String) throws -> Double {
    // Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]...
    // (Use tokenizer's pair encoding API, or build manually using
    //  CLS=50281, SEP=50282, PAD=50283)
    let inputIds: [Int] = /* your tokenization here, length 512 */
    let attentionMask: [Int] = /* 1s for content, 0s for padding */
    
    let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
    let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
    
    for i in 0..<512 {
        inputIdsArray[i] = NSNumber(value: inputIds[i])
        attentionMaskArray[i] = NSNumber(value: attentionMask[i])
    }
    
    let inputs = try MLDictionaryFeatureProvider(dictionary: [
        "input_ids": MLFeatureValue(multiArray: inputIdsArray),
        "attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
    ])
    
    let prediction = try model.prediction(from: inputs)
    let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue!
    return scoreArray[0].doubleValue
}
```

## Verification

Conversion correctness was verified by comparing Core ML output against the
original PyTorch model on three test cases:

| Test case | PyTorch | Core ML FP16 | Diff |
|---|---|---|---|
| Highly relevant (vPC config Q + vPC config A) | 0.9948 | 0.9946 | 0.000132 |
| Same domain, different topic | 0.3406 | 0.3420 | 0.001481 |
| Unrelated content | 0.0160 | 0.0158 | 0.000190 |

Max numerical drift: ~0.0015. Ranking order is identical to PyTorch.

Inference benchmarks on M4 Max (36 GB):

- Model load time: ~0.5 seconds
- First inference (warm-up): ~2300 ms
- Subsequent inferences: ~20 ms per query/document pair
- Throughput after warm-up: ~50 pairs/second

The high first-inference latency is one-time cost from Neural Engine compilation.
For interactive applications, perform a warm-up inference at app startup.

## Conversion recipe

The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based
models. The standard `torch.jit.trace` path fails on ModernBERT's attention
operations due to int-op handling in coremltools 9.0.

The working recipe:

1. Pin dependency versions: `torch==2.7.0`, `transformers==4.52.4`,
   `sentence-transformers==5.0.0`, `coremltools==9.0`
2. Load model with `attn_implementation="eager"` to avoid SDPA tracing issues
3. Use `torch.export.export(strict=False)` instead of `torch.jit.trace`
4. Call `exported_program.run_decompositions({})` to convert from TRAINING
   dialect to ATEN dialect (required by coremltools 9.0)
5. Pass the resulting `ExportedProgram` to `ct.convert()`

See `convert_via_torch_export.py` for the complete script. This recipe should
generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives,
ModernBERT classifiers, etc.).

## Limitations

Inherited from the base model:

- English language only
- Trained primarily on cybersecurity content; performance on other domains
  may vary
- May reflect biases in the training data toward over-represented threats,
  technologies, or vendors

Specific to this conversion:

- Fixed sequence length of 512 tokens (the original model supports up to 1024;
  this conversion uses 512 for faster inference and smaller memory footprint)
- FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring
  exact PyTorch-equivalent output but irrelevant for ranking tasks
- macOS 14 (Sonoma) or newer required (`minimum_deployment_target=ct.target.macOS14`)

## Citation

If you use this model, please cite the original SecureBERT 2.0 paper:

```bibtex
@article{aghaei2025securebert2,
  title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
  author={Aghaei, Ehsan and others},
  journal={arXiv preprint arXiv:2510.00240},
  year={2025}
}
```

## License

Apache 2.0, matching the license of the original model.

## Acknowledgments

- Cisco AI for the original [SecureBERT 2.0](https://github.com/cisco-ai-defense/securebert2)
  model family
- Apple's [coremltools](https://github.com/apple/coremltools) team for ongoing
  ModernBERT support
- Hugging Face's [swift-transformers](https://github.com/huggingface/swift-transformers)
  team for the Swift tokenizer support that makes this practical to use

## Related models

Other SecureBERT 2.0 models from Cisco AI:

- [`cisco-ai/SecureBERT2.0-base`](https://huggingface.co/cisco-ai/SecureBERT2.0-base) — Base encoder
- [`cisco-ai/SecureBERT2.0-biencoder`](https://huggingface.co/cisco-ai/SecureBERT2.0-biencoder) — Bi-encoder for retrieval
- [`cisco-ai/SecureBERT2.0-NER`](https://huggingface.co/cisco-ai/SecureBERT2.0-NER) — Named entity recognition
- [`cisco-ai/SecureBERT2.0-code-vuln-detection`](https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection) — Vulnerability classification

If you convert any of these to Core ML using a similar recipe, feel free to
open an issue and I'll link your repo here.