fullstop-punctuation-coreml-fp16

oliverguhr/fullstop-punctuation-multilang-large (XLM-RoBERTa-large punctuation restoration, trained on Europarl speech transcripts) converted to Core ML for on-device use.

Contents

  • punctuation.mlmodelc/ โ€” compiled Core ML model, fp16 activations, fp16 weights (~1.0 GB)
  • sentencepiece.bpe.model โ€” the XLM-RoBERTa SentencePiece tokenizer model

Model details

  • Inputs: input_ids [1, 256] int32, attention_mask [1, 256] int32 (HF XLM-RoBERTa id scheme, pad=1)
  • Outputs: label_preds [1, 256] int32 (argmax label per subtoken), label_logits [1, 256, 6] (raw scores)
  • Labels: 0 (none), ., ,, ?, -, : โ€” the mark to append after the word ending at each subtoken
  • Conversion verified by exact-label parity with the PyTorch fp32 reference
  • ~54 ms per 256-token window on Apple Silicon GPU; loads in ~0.3โ€“3 s (the compile is cached by macOS after the first launch)

Plain fp16 was chosen deliberately: int8 per-block weight compression halves the size but its inline dequantize ops cost ~12 s of uncached GPU shader compilation on every process launch.

Converted for the Babble dictation app.

License

MIT, following the original model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for soloish90/fullstop-punctuation-coreml-fp16

Finetuned
(1)
this model