Cohere Transcribe โ CoreML FP16
Corrected CoreML FP16 release of CohereLabs/cohere-transcribe-03-2026 for Apple Silicon inference.
This release uses:
- a length-aware encoder with inputs
melandlength - decoder prefill and decode models with explicit
encoder_maskinputs - verified token IDs:
pad=2,eos=3,bos=4
The March 28 encoder fix preserves the real mel length at inference time, which avoids the padding-related repetition and hallucination issues seen with the older static encoder export.
Contents
| Artifact | Precision | Notes |
|---|---|---|
cohere_encoder.mlpackage |
FP16 | Length-aware encoder (mel, length) |
cohere_decoder_prefill.mlpackage |
FP16 | Prompt prefill with encoder_mask |
cohere_decoder_decode.mlpackage |
FP16 | Single-token decode with encoder_mask |
Inputs / Outputs
Encoder
| Input | Shape | Type |
|---|---|---|
mel |
[1, 128, 3500] |
float32 |
length |
[1] |
int32 |
| Output | Shape | Type |
|---|---|---|
encoder_hidden |
[1, 438, 1024] |
float16 |
Decoder Prefill
| Input | Shape | Type |
|---|---|---|
encoder_hidden |
[1, 438, 1024] |
float32 |
input_ids |
[1, 10] |
int32 |
encoder_mask |
[1, 438] |
float32 |
Decoder Decode
| Input | Shape | Type |
|---|---|---|
input_ids |
[1, 1] |
int32 |
cache_update_mask |
[1, 512] |
float32 |
cache_valid_mask |
[1, 512] |
float32 |
encoder_mask |
[1, 438] |
float32 |
Prompt
English transcription with punctuation uses the token IDs:
[13764, 7, 4, 16, 62, 62, 5, 9, 11, 13]
Relevant tokenizer IDs:
pad_token_id = 2eos_token_id = 3bos_token_id = 4
Validation
The corrected encoder was validated in Apple CoreML runtime with the same padded mel input and different length values:
- same-length repeat max diff:
0.0 - full-vs-short length max diff:
2.988
License
Apache 2.0 (same as the base model)
- Downloads last month
- 31
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for phequals/cohere-transcribe-coreml-fp16
Base model
CohereLabs/cohere-transcribe-03-2026