Update README.md

f8035c0 verified 8 days ago

7.98 kB

	---
	license: apache-2.0
	base_model: cisco-ai/SecureBERT2.0-cross_encoder
	tags:
	- core-ml
	- apple-silicon
	- cross-encoder
	- cybersecurity
	- reranking
	- modernbert
	language:
	- en
	pipeline_tag: text-classification
	---

	# SecureBERT 2.0 Cross-Encoder for Core ML

	Core ML conversion of [cisco-ai/SecureBERT2.0-cross_encoder](https://huggingface.co/cisco-ai/SecureBERT2.0-cross_encoder),
	ready to use on Apple Silicon (macOS / iOS / iPadOS) via the Core ML framework.

	The original model is a cybersecurity domain-specific cross-encoder built on
	ModernBERT. It takes a pair of texts (query + document) and outputs a similarity
	score between 0 and 1, suitable for retrieval reranking, semantic search, and
	cybersecurity intelligence applications.

	This repository contains pre-converted `.mlpackage` files plus the conversion
	script that produced them, allowing direct use in Swift applications without
	running Python or Ollama at inference time.

	## What's in this repository

	\| File \| Size \| Purpose \|
	\|---\|---\|---\|
	\| `SecureBERT2_CrossEncoder_FP16.mlpackage/` \| 286 MB \| FP16 Core ML model (recommended) \|
	\| `SecureBERT2_CrossEncoder_FP32.mlpackage/` \| 572 MB \| FP32 Core ML model (reference precision) \|
	\| `convert_via_torch_export.py` \| ~6 KB \| The conversion script that produced these files \|

	For most use cases, use the FP16 version. It is half the size and runs identically
	on Apple Neural Engine with negligible numerical drift (max diff ~0.0015 vs PyTorch).

	## Model specification

	Both models share the same input/output specification:

	\| Tensor \| Name \| Shape \| Dtype \|
	\|---\|---\|---\|---\|
	\| Input 1 \| `input_ids` \| (1, 512) \| INT32 \|
	\| Input 2 \| `attention_mask` \| (1, 512) \| INT32 \|
	\| Output \| `score` \| (1, 1) \| FLOAT16 (FP16 model) / FLOAT32 (FP32 model) \|

	The model expects standard BERT pair tokenization:

	```
	[CLS] query tokens [SEP] document tokens [SEP] [PAD] [PAD] ...
	```

	Special token IDs (from the original tokenizer):

	\| Token \| ID \|
	\|---\|---\|
	\| `[CLS]` \| 50281 \|
	\| `[SEP]` \| 50282 \|
	\| `[PAD]` \| 50283 \|
	\| `[UNK]` \| 50280 \|

	The output score is already sigmoid-activated (range 0-1). The sigmoid was baked
	into the model graph during conversion, so no post-processing is needed in Swift.

	## Quick start (Swift)

	Install [huggingface/swift-transformers](https://github.com/huggingface/swift-transformers)
	for tokenization, then use Core ML directly:

	```swift
	import CoreML
	import Tokenizers

	// Load tokenizer (matches Python tokenization exactly)
	let tokenizer = try await AutoTokenizer.from(
	pretrained: "cisco-ai/SecureBERT2.0-cross_encoder"
	)

	// Load model (place .mlpackage in your bundle, Xcode compiles it to .mlmodelc)
	let config = MLModelConfiguration()
	config.computeUnits = .all // Use Neural Engine when available

	guard let modelURL = Bundle.main.url(
	forResource: "SecureBERT2_CrossEncoder_FP16",
	withExtension: "mlmodelc"
	) else { fatalError("Model not found in bundle") }

	let model = try MLModel(contentsOf: modelURL, configuration: config)

	// Score a query/document pair
	func score(query: String, document: String) throws -> Double {
	// Tokenize as pair: [CLS] query [SEP] document [SEP] [PAD]...
	// (Use tokenizer's pair encoding API, or build manually using
	// CLS=50281, SEP=50282, PAD=50283)
	let inputIds: [Int] = /* your tokenization here, length 512 */
	let attentionMask: [Int] = /* 1s for content, 0s for padding */

	let inputIdsArray = try MLMultiArray(shape: [1, 512], dataType: .int32)
	let attentionMaskArray = try MLMultiArray(shape: [1, 512], dataType: .int32)

	for i in 0..<512 {
	inputIdsArray[i] = NSNumber(value: inputIds[i])
	attentionMaskArray[i] = NSNumber(value: attentionMask[i])
	}

	let inputs = try MLDictionaryFeatureProvider(dictionary: [
	"input_ids": MLFeatureValue(multiArray: inputIdsArray),
	"attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
	])

	let prediction = try model.prediction(from: inputs)
	let scoreArray = prediction.featureValue(for: "score")!.multiArrayValue!
	return scoreArray[0].doubleValue
	}
	```

	## Verification

	Conversion correctness was verified by comparing Core ML output against the
	original PyTorch model on three test cases:

	\| Test case \| PyTorch \| Core ML FP16 \| Diff \|
	\|---\|---\|---\|---\|
	\| Highly relevant (vPC config Q + vPC config A) \| 0.9948 \| 0.9946 \| 0.000132 \|
	\| Same domain, different topic \| 0.3406 \| 0.3420 \| 0.001481 \|
	\| Unrelated content \| 0.0160 \| 0.0158 \| 0.000190 \|

	Max numerical drift: ~0.0015. Ranking order is identical to PyTorch.

	Inference benchmarks on M4 Max (36 GB):

	- Model load time: ~0.5 seconds
	- First inference (warm-up): ~2300 ms
	- Subsequent inferences: ~20 ms per query/document pair
	- Throughput after warm-up: ~50 pairs/second

	The high first-inference latency is one-time cost from Neural Engine compilation.
	For interactive applications, perform a warm-up inference at app startup.

	## Conversion recipe

	The conversion from PyTorch to Core ML is non-trivial for ModernBERT-based
	models. The standard `torch.jit.trace` path fails on ModernBERT's attention
	operations due to int-op handling in coremltools 9.0.

	The working recipe:

	1. Pin dependency versions: `torch==2.7.0`, `transformers==4.52.4`,
	`sentence-transformers==5.0.0`, `coremltools==9.0`
	2. Load model with `attn_implementation="eager"` to avoid SDPA tracing issues
	3. Use `torch.export.export(strict=False)` instead of `torch.jit.trace`
	4. Call `exported_program.run_decompositions({})` to convert from TRAINING
	dialect to ATEN dialect (required by coremltools 9.0)
	5. Pass the resulting `ExportedProgram` to `ct.convert()`

	See `convert_via_torch_export.py` for the complete script. This recipe should
	generalize to other ModernBERT-based fine-tunes (DeBERTa-v2 alternatives,
	ModernBERT classifiers, etc.).

	## Limitations

	Inherited from the base model:

	- English language only
	- Trained primarily on cybersecurity content; performance on other domains
	may vary
	- May reflect biases in the training data toward over-represented threats,
	technologies, or vendors

	Specific to this conversion:

	- Fixed sequence length of 512 tokens (the original model supports up to 1024;
	this conversion uses 512 for faster inference and smaller memory footprint)
	- FP16 introduces ~0.0015 numerical drift; impractical for tasks requiring
	exact PyTorch-equivalent output but irrelevant for ranking tasks
	- macOS 14 (Sonoma) or newer required (`minimum_deployment_target=ct.target.macOS14`)

	## Citation

	If you use this model, please cite the original SecureBERT 2.0 paper:

	```bibtex
	@article{aghaei2025securebert2,
	title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
	author={Aghaei, Ehsan and others},
	journal={arXiv preprint arXiv:2510.00240},
	year={2025}
	}
	```

	## License

	Apache 2.0, matching the license of the original model.

	## Acknowledgments

	- Cisco AI for the original [SecureBERT 2.0](https://github.com/cisco-ai-defense/securebert2)
	model family
	- Apple's [coremltools](https://github.com/apple/coremltools) team for ongoing
	ModernBERT support
	- Hugging Face's [swift-transformers](https://github.com/huggingface/swift-transformers)
	team for the Swift tokenizer support that makes this practical to use

	## Related models

	Other SecureBERT 2.0 models from Cisco AI:

	- [`cisco-ai/SecureBERT2.0-base`](https://huggingface.co/cisco-ai/SecureBERT2.0-base) — Base encoder
	- [`cisco-ai/SecureBERT2.0-biencoder`](https://huggingface.co/cisco-ai/SecureBERT2.0-biencoder) — Bi-encoder for retrieval
	- [`cisco-ai/SecureBERT2.0-NER`](https://huggingface.co/cisco-ai/SecureBERT2.0-NER) — Named entity recognition
	- [`cisco-ai/SecureBERT2.0-code-vuln-detection`](https://huggingface.co/cisco-ai/SecureBERT2.0-code-vuln-detection) — Vulnerability classification

	If you convert any of these to Core ML using a similar recipe, feel free to
	open an issue and I'll link your repo here.