Cohere Transcribe — CoreML INT8

Corrected CoreML release of CohereLabs/cohere-transcribe-03-2026 for Apple Silicon inference.

This release uses:

a length-aware encoder with inputs mel and length
decoder prefill and decode models with explicit encoder_mask inputs
verified token IDs: pad=2, eos=3, bos=4

The important March 28 fix is the encoder contract. The earlier static encoder baked padding behavior into traced inference and could cause repetition or hallucination on shorter utterances. The corrected encoder keeps the fixed mel shape but accepts the real mel length as a second input.

Artifact	Precision	Notes
`cohere_encoder_int8.mlpackage`	INT8 weights	Length-aware encoder (`mel`, `length`)
`cohere_decoder_prefill_int8.mlpackage`	INT8 weights	Prompt prefill with `encoder_mask`
`cohere_decoder_decode_int8.mlpackage`	INT8 weights	Single-token decode with `encoder_mask`
`tokenizer.model`	n/a	SentencePiece tokenizer
`cohere_mel_filterbank.bin`	n/a	Mel frontend weights
`cohere_mel_window.bin`	n/a	Mel frontend window

Inputs / Outputs

Encoder

Input	Shape	Type
`mel`	`[1, 128, 3500]`	float32
`length`	`[1]`	int32

Output	Shape	Type
`encoder_hidden`	`[1, 438, 1024]`	float16

Decoder Prefill

Input	Shape	Type
`encoder_hidden`	`[1, 438, 1024]`	float32
`input_ids`	`[1, 10]`	int32
`encoder_mask`	`[1, 438]`	float32

Output	Shape	Type
`logits`	`[1, 10, 16384]`	float32

Decoder Decode

Input	Shape	Type
`input_ids`	`[1, 1]`	int32
`cache_update_mask`	`[1, 512]`	float32
`cache_valid_mask`	`[1, 512]`	float32
`encoder_mask`	`[1, 438]`	float32

Output	Shape	Type
`logits`	`[1, 1, 16384]`	float32

Prompt

English transcription with punctuation uses the token IDs:

[13764, 7, 4, 16, 62, 62, 5, 9, 11, 13]

Relevant tokenizer IDs:

pad_token_id = 2
eos_token_id = 3
bos_token_id = 4

Validation

The corrected encoder was validated in Apple CoreML runtime with the same padded mel input and different length values:

same-length repeat max diff: 0.0
full-vs-short length max diff: 3.01220703125

That confirms the published encoder is not ignoring length.

Notes

The encoder still uses a fixed mel tensor shape of [1, 128, 3500]; length tells the encoder how many frames are real.
Longer audio should still be chunked upstream.
No timestamps or speaker diarization are included.

License

Apache 2.0 (same as the base model)

Downloads last month: 52

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phequals/cohere-transcribe-coreml-int8

Base model

CohereLabs/cohere-transcribe-03-2026

Quantized

(9)

this model

phequals
/

cohere-transcribe-coreml-int8