Instructions to use rasgaard/hviske-v5.3-mlx-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use rasgaard/hviske-v5.3-mlx-int8 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir hviske-v5.3-mlx-int8 rasgaard/hviske-v5.3-mlx-int8
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
hviske-v5.3 — MLX INT8
Apple Silicon (MLX) INT8-quantized version of syvai/hviske-v5.3, the state-of-the-art Danish ASR model. Weights are quantized to 8-bit (affine, group_size=64) using mlx-speech.
Performance on Apple Silicon
Measured across the full CoRal v3 test set (17,560 clips, 25 hours of audio) on Apple M-series:
| Split | Avg clip | RTFx | Latency p50 | Latency p95 |
|---|---|---|---|---|
| read_aloud | 6.8 s | 36× | 177 ms | 338 ms |
| conversation | 3.4 s | 25× | 115 ms | 281 ms |
RTFx = real-time factor (higher = faster). The difference between splits is due to fixed per-clip overhead amortizing better over longer clips.
Accuracy vs. base model
Evaluated on the full CoRal v3 test sets (17,560 samples) with greedy decoding and strict normalization (lowercase + punctuation strip + Danish digit-to-word), matching the methodology on the base model card.
| Split | N | BF16 WER | INT8 WER | Δ WER | BF16 CER | INT8 CER | Δ CER |
|---|---|---|---|---|---|---|---|
| read_aloud | 9,122 | 9.37% | 10.19% | +0.82 pp | 3.80% | 4.07% | +0.27 pp |
| conversation | 8,438 | 19.63% | 25.08% | +5.45 pp | 11.56% | 16.32% | +4.76 pp |
| weighted avg | 17,560 | 14.30% | 17.35% | +3.05 pp | 7.53% | 9.96% | +2.43 pp |
Read-aloud quality is largely preserved (+0.8 pp WER). Conversation degrades more noticeably (+5.5 pp WER), which is typical of aggressive quantization on harder, more varied speech. If conversation accuracy is critical, use the full BF16 model.
Requirements
uv add mlx-speech soundfile scipy
Requires Python ≥ 3.10 and an Apple Silicon Mac (M1 or later).
Usage
Quick start
import numpy as np
import soundfile as sf
from scipy.signal import resample_poly
from math import gcd
from mlx_speech.generation.cohere_asr import CohereAsrModel
asr = CohereAsrModel.from_pretrained("rasgaard/hviske-v5.3-mlx-int8")
# Load audio and resample to 16 kHz
audio, sr = sf.read("your_audio.wav", dtype="float32", always_2d=False)
if audio.ndim > 1:
audio = audio.mean(axis=1)
if sr != 16000:
g = gcd(16000, sr)
audio = resample_poly(audio, 16000 // g, sr // g).astype("float32")
result = asr.transcribe(audio, sample_rate=16000)
print(result.text)
# → "Jeg er subjekt A og jeg hedder Veronica"
Load from a local directory
from mlx_speech.generation.cohere_asr import CohereAsrModel
asr = CohereAsrModel.from_dir("/path/to/hviske-v5.3-mlx-int8")
Transcribe multiple files
import numpy as np
import soundfile as sf
from scipy.signal import resample_poly
from math import gcd
from mlx_speech.generation.cohere_asr import CohereAsrModel
def load_16k(path):
audio, sr = sf.read(path, dtype="float32", always_2d=False)
if audio.ndim > 1:
audio = audio.mean(axis=1)
if sr != 16000:
g = gcd(16000, sr)
audio = resample_poly(audio, 16000 // g, sr // g).astype("float32")
return audio
asr = CohereAsrModel.from_pretrained("rasgaard/hviske-v5.3-mlx-int8")
for path in ["clip_a.wav", "clip_b.wav", "clip_c.wav"]:
result = asr.transcribe(load_16k(path), sample_rate=16000)
print(f"{path}: {result.text}")
Quantization details
Converted from syvai/hviske-v5.3 using mlx-speech/scripts/convert/cohere_asr.py:
python scripts/convert/cohere_asr.py \
--input-dir models/hviske-v5.3 \
--output-dir models/hviske-v5.3-mlx-int8 \
--bits 8 --group-size 64 --mode affine
Linear layers whose output dimension is divisible by 64 are quantized to 8-bit affine; embeddings, norms, and conv layers remain in BF16.
License
CC BY-NC 4.0 — non-commercial use only. See base model card for commercial licensing.
- Downloads last month
- 87
Quantized