cohere-transcribe-03-2026-mlx-4bit

Quantized MLX weights for beshkenadze/cohere-transcribe-03-2026-mlx-fp16.

Variant

Sample: Tests/media/conversational_a.wav

Generation TPS: 394.6
Peak memory: 1.96 GB
Output: Coffee's story likely begins in Ethiopia, where legend tells of a goat herder named Khaldi, who noticed his goats became energetic after eating red berries from a particular bush; curious, he tried them himself and felt invigorated.

This checkpoint has been re-validated against the current Swift and Python MLX runtimes.

Verified semantic parity on an English fixture:

This is a test recording in English. I am speaking clearly at a normal speed. Please transcribe this sentence exactly as I said.

Matched across:

Fastest and smallest, but introduces a lexical regression on the repo sample (Kaldi → Khaldi).

Generated from the Swift-compatible fp16 checkpoint beshkenadze/cohere-transcribe-03-2026-mlx-fp16.
This repository contains inference artifacts only. Refer to the upstream Cohere model card and license for original model details.

MLX

Hardware compatibility

4-bit

Base model

Quantized

(23)

this model