cohere-transcribe-03-2026-mlx-4bit
Quantized MLX weights for beshkenadze/cohere-transcribe-03-2026-mlx-fp16.
Variant
- Precision: 4-bit
- Quantization mode:
affine - Group size:
64
Files
model.safetensorsconfig.jsontokenizer.modeltokenizer_config.jsonpreprocessor_config.jsonspecial_tokens_map.jsonkey_map.jsonconversion_summary.json
Repo-sample benchmark
Sample: Tests/media/conversational_a.wav
- Generation TPS: 394.6
- Peak memory: 1.96 GB
- Output:
Coffee's story likely begins in Ethiopia, where legend tells of a goat herder named Khaldi, who noticed his goats became energetic after eating red berries from a particular bush; curious, he tried them himself and felt invigorated.
Parity note
This checkpoint has been re-validated against the current Swift and Python MLX runtimes.
Verified semantic parity on an English fixture:
This is a test recording in English. I am speaking clearly at a normal speed. Please transcribe this sentence exactly as I said.
Matched across:
- Swift MLX fp16
- Swift MLX 8-bit
- Swift MLX 4-bit
- Python MLX fp16
- Python MLX 4-bit
- official CUDA reference path (
transformersnative Cohere ASR)
Quality note
Fastest and smallest, but introduces a lexical regression on the repo sample (Kaldi โ Khaldi).
Notes
- Generated from the Swift-compatible fp16 checkpoint
beshkenadze/cohere-transcribe-03-2026-mlx-fp16. - This repository contains inference artifacts only. Refer to the upstream Cohere model card and license for original model details.
- Downloads last month
- 158
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for beshkenadze/cohere-transcribe-03-2026-mlx-4bit
Base model
CohereLabs/cohere-transcribe-03-2026