Cohere Transcribe — CoreML FP16

Corrected CoreML FP16 release of CohereLabs/cohere-transcribe-03-2026 for Apple Silicon inference.

This release uses:

a length-aware encoder with inputs mel and length
decoder prefill and decode models with explicit encoder_mask inputs
verified token IDs: pad=2, eos=3, bos=4

The March 28 encoder fix preserves the real mel length at inference time, which avoids the padding-related repetition and hallucination issues seen with the older static encoder export.

Artifact	Precision	Notes
`cohere_encoder.mlpackage`	FP16	Length-aware encoder (`mel`, `length`)
`cohere_decoder_prefill.mlpackage`	FP16	Prompt prefill with `encoder_mask`
`cohere_decoder_decode.mlpackage`	FP16	Single-token decode with `encoder_mask`

Inputs / Outputs

Encoder

Input	Shape	Type
`mel`	`[1, 128, 3500]`	float32
`length`	`[1]`	int32

Output	Shape	Type
`encoder_hidden`	`[1, 438, 1024]`	float16

Decoder Prefill

Input	Shape	Type
`encoder_hidden`	`[1, 438, 1024]`	float32
`input_ids`	`[1, 10]`	int32
`encoder_mask`	`[1, 438]`	float32

Decoder Decode

Input	Shape	Type
`input_ids`	`[1, 1]`	int32
`cache_update_mask`	`[1, 512]`	float32
`cache_valid_mask`	`[1, 512]`	float32
`encoder_mask`	`[1, 438]`	float32

Prompt

English transcription with punctuation uses the token IDs:

[13764, 7, 4, 16, 62, 62, 5, 9, 11, 13]

Relevant tokenizer IDs:

pad_token_id = 2
eos_token_id = 3
bos_token_id = 4

Validation

The corrected encoder was validated in Apple CoreML runtime with the same padded mel input and different length values:

same-length repeat max diff: 0.0
full-vs-short length max diff: 2.988

License

Apache 2.0 (same as the base model)

Downloads last month: 11

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phequals/cohere-transcribe-coreml-fp16

Base model

CohereLabs/cohere-transcribe-03-2026

Quantized

(25)

this model

phequals
/

cohere-transcribe-coreml-fp16