--- license: mit tags: - coreml - sentence-transformers - embedding - code - roberta base_model: microsoft/codebert-base library_name: coremltools pipeline_tag: feature-extraction --- # codebert-base — CoreML (.mlpackage) CoreML conversion of [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for native Apple Neural Engine / GPU inference on macOS and iOS. ## Files | File | Description | |------|-------------| | `model.mlpackage/` | CoreML model (FP16, flexible shapes) | | `tokenizer.json` | HF fast tokenizer | ## Details - **Architecture**: RoBERTa (encoder-only, no token_type_ids) - **Precision**: FP16 (native ANE precision) - **Compute units**: `.all` — CoreML schedules across ANE, GPU, and CPU - **Input shapes**: batch=1..512, seq_len=1..512 (flexible range) - **Embedding dimension**: 768 ## Usage with cai ```bash cai index --embed-backend swift --embed-model "rsvalerio/codebert-base-coreml" ``` The Swift backend downloads the `.mlpackage` from this repo, compiles it to `.mlmodelc` on first run (~30-60s), and caches the compiled model for subsequent runs. ## Conversion Converted using [rsvalerio/models](https://github.com/rsvalerio/models) CI pipeline with `coremltools`. ```bash pip install coremltools transformers torch python convert.py ```