Cohere Transcribe โ CoreML INT8
Corrected CoreML release of CohereLabs/cohere-transcribe-03-2026 for Apple Silicon inference.
This release uses:
- a length-aware encoder with inputs
melandlength - decoder prefill and decode models with explicit
encoder_maskinputs - verified token IDs:
pad=2,eos=3,bos=4
The important March 28 fix is the encoder contract. The earlier static encoder baked padding behavior into traced inference and could cause repetition or hallucination on shorter utterances. The corrected encoder keeps the fixed mel shape but accepts the real mel length as a second input.
Contents
| Artifact | Precision | Notes |
|---|---|---|
cohere_encoder_int8.mlpackage |
INT8 weights | Length-aware encoder (mel, length) |
cohere_decoder_prefill_int8.mlpackage |
INT8 weights | Prompt prefill with encoder_mask |
cohere_decoder_decode_int8.mlpackage |
INT8 weights | Single-token decode with encoder_mask |
tokenizer.model |
n/a | SentencePiece tokenizer |
cohere_mel_filterbank.bin |
n/a | Mel frontend weights |
cohere_mel_window.bin |
n/a | Mel frontend window |
Inputs / Outputs
Encoder
| Input | Shape | Type |
|---|---|---|
mel |
[1, 128, 3500] |
float32 |
length |
[1] |
int32 |
| Output | Shape | Type |
|---|---|---|
encoder_hidden |
[1, 438, 1024] |
float16 |
Decoder Prefill
| Input | Shape | Type |
|---|---|---|
encoder_hidden |
[1, 438, 1024] |
float32 |
input_ids |
[1, 10] |
int32 |
encoder_mask |
[1, 438] |
float32 |
| Output | Shape | Type |
|---|---|---|
logits |
[1, 10, 16384] |
float32 |
Decoder Decode
| Input | Shape | Type |
|---|---|---|
input_ids |
[1, 1] |
int32 |
cache_update_mask |
[1, 512] |
float32 |
cache_valid_mask |
[1, 512] |
float32 |
encoder_mask |
[1, 438] |
float32 |
| Output | Shape | Type |
|---|---|---|
logits |
[1, 1, 16384] |
float32 |
Prompt
English transcription with punctuation uses the token IDs:
[13764, 7, 4, 16, 62, 62, 5, 9, 11, 13]
Relevant tokenizer IDs:
pad_token_id = 2eos_token_id = 3bos_token_id = 4
Validation
The corrected encoder was validated in Apple CoreML runtime with the same padded mel input and different length values:
- same-length repeat max diff:
0.0 - full-vs-short length max diff:
3.01220703125
That confirms the published encoder is not ignoring length.
Notes
- The encoder still uses a fixed mel tensor shape of
[1, 128, 3500];lengthtells the encoder how many frames are real. - Longer audio should still be chunked upstream.
- No timestamps or speaker diarization are included.
License
Apache 2.0 (same as the base model)
- Downloads last month
- 52
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for phequals/cohere-transcribe-coreml-int8
Base model
CohereLabs/cohere-transcribe-03-2026