cohere-transcribe-03-2026-mlx-4bit

Quantized MLX weights for beshkenadze/cohere-transcribe-03-2026-mlx-fp16.

Variant

  • Precision: 4-bit
  • Quantization mode: affine
  • Group size: 64

Files

  • model.safetensors
  • config.json
  • tokenizer.model
  • tokenizer_config.json
  • preprocessor_config.json
  • special_tokens_map.json
  • key_map.json
  • conversion_summary.json

Repo-sample benchmark

Sample: Tests/media/conversational_a.wav

  • Generation TPS: 394.6
  • Peak memory: 1.96 GB
  • Output: Coffee's story likely begins in Ethiopia, where legend tells of a goat herder named Khaldi, who noticed his goats became energetic after eating red berries from a particular bush; curious, he tried them himself and felt invigorated.

Parity note

This checkpoint has been re-validated against the current Swift and Python MLX runtimes.

Verified semantic parity on an English fixture:

This is a test recording in English. I am speaking clearly at a normal speed. Please transcribe this sentence exactly as I said.

Matched across:

  • Swift MLX fp16
  • Swift MLX 8-bit
  • Swift MLX 4-bit
  • Python MLX fp16
  • Python MLX 4-bit
  • official CUDA reference path (transformers native Cohere ASR)

Quality note

Fastest and smallest, but introduces a lexical regression on the repo sample (Kaldi โ†’ Khaldi).

Notes

  • Generated from the Swift-compatible fp16 checkpoint beshkenadze/cohere-transcribe-03-2026-mlx-fp16.
  • This repository contains inference artifacts only. Refer to the upstream Cohere model card and license for original model details.
Downloads last month
158
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for beshkenadze/cohere-transcribe-03-2026-mlx-4bit

Quantized
(23)
this model