fullstop-punctuation-coreml-fp16
oliverguhr/fullstop-punctuation-multilang-large (XLM-RoBERTa-large punctuation restoration, trained on Europarl speech transcripts) converted to Core ML for on-device use.
Contents
punctuation.mlmodelc/โ compiled Core ML model, fp16 activations, fp16 weights (~1.0 GB)sentencepiece.bpe.modelโ the XLM-RoBERTa SentencePiece tokenizer model
Model details
- Inputs:
input_ids[1, 256] int32,attention_mask[1, 256] int32 (HF XLM-RoBERTa id scheme, pad=1) - Outputs:
label_preds[1, 256] int32 (argmax label per subtoken),label_logits[1, 256, 6] (raw scores) - Labels:
0(none),.,,,?,-,:โ the mark to append after the word ending at each subtoken - Conversion verified by exact-label parity with the PyTorch fp32 reference
- ~54 ms per 256-token window on Apple Silicon GPU; loads in ~0.3โ3 s (the compile is cached by macOS after the first launch)
Plain fp16 was chosen deliberately: int8 per-block weight compression halves the size but its inline dequantize ops cost ~12 s of uncached GPU shader compilation on every process launch.
Converted for the Babble dictation app.
License
MIT, following the original model.