Cohere Transcribe โ€” CoreML FP16

Corrected CoreML FP16 release of CohereLabs/cohere-transcribe-03-2026 for Apple Silicon inference.

This release uses:

  • a length-aware encoder with inputs mel and length
  • decoder prefill and decode models with explicit encoder_mask inputs
  • verified token IDs: pad=2, eos=3, bos=4

The March 28 encoder fix preserves the real mel length at inference time, which avoids the padding-related repetition and hallucination issues seen with the older static encoder export.

Contents

Artifact Precision Notes
cohere_encoder.mlpackage FP16 Length-aware encoder (mel, length)
cohere_decoder_prefill.mlpackage FP16 Prompt prefill with encoder_mask
cohere_decoder_decode.mlpackage FP16 Single-token decode with encoder_mask

Inputs / Outputs

Encoder

Input Shape Type
mel [1, 128, 3500] float32
length [1] int32
Output Shape Type
encoder_hidden [1, 438, 1024] float16

Decoder Prefill

Input Shape Type
encoder_hidden [1, 438, 1024] float32
input_ids [1, 10] int32
encoder_mask [1, 438] float32

Decoder Decode

Input Shape Type
input_ids [1, 1] int32
cache_update_mask [1, 512] float32
cache_valid_mask [1, 512] float32
encoder_mask [1, 438] float32

Prompt

English transcription with punctuation uses the token IDs:

[13764, 7, 4, 16, 62, 62, 5, 9, 11, 13]

Relevant tokenizer IDs:

  • pad_token_id = 2
  • eos_token_id = 3
  • bos_token_id = 4

Validation

The corrected encoder was validated in Apple CoreML runtime with the same padded mel input and different length values:

  • same-length repeat max diff: 0.0
  • full-vs-short length max diff: 2.988

License

Apache 2.0 (same as the base model)

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for phequals/cohere-transcribe-coreml-fp16

Quantized
(9)
this model