Fluister (turbo)

Fluister is a South African Whisper. ("Fluister" is Afrikaans for "to whisper".) This turbo build is optimised for Afrikaans and South African English, including the Afrikaans/English code-switching that is everyday SA speech. It is a fine-tune of OpenAI whisper-large-v3-turbo, merged into the base weights and converted to CTranslate2 (int8) for faster-whisper. By DigiPhyte (Pty) Ltd, South Africa.

v2 (current): now covers Afrikaans and SA English. The previous release was Afrikaans-only; this version keeps the clean Afrikaans and adds robust SA English + code-switch.

On real South African audio it produces clean Afrikaans where stock Whisper drifts to Dutch spellings ("gebou" not "gebouw", "mense" not "mensen", "eintlik" not "eindelijk"), keeps Afrikaans/English code-switching intact, and transcribes SA English accurately.

Use (faster-whisper)

from faster_whisper import WhisperModel
model = WhisperModel("digiphyte/fluister-turbo", device="cuda", compute_type="int8_float16")  # CPU: device="cpu", compute_type="int8"
segments, info = model.transcribe("audio.wav", language="af", beam_size=5)  # or language="en"
for s in segments:
    print(s.text)

Tell it the language (language="af" or "en") rather than relying on auto-detect. For mixed Afrikaans/English conversations, language="af" handles the code-switch well.

Evaluation

NCHLT read-speech test sets: Afrikaans WER 0.086, English WER 0.017. Validated on real SA audio (physiotherapy intake in Afrikaans, an English project meeting, and an Afrikaans/English code-switched barber-shop conversation) -- clean Afrikaans, intact code-switching, accurate English.

Limitations

Fluister narrows specific failures (Whisper spelling Afrikaans as Dutch; degrading SA English). It does not change the base model size. Language auto-detect can still mislabel audio (tell it the language), and proper nouns, numbers, and rare or technical terms can still be wrong -- South African place names and surnames in particular are a known gap we are still improving.

Licence and attribution

MIT (see LICENSE). This is a derivative work; the base model (OpenAI Whisper, Apache-2.0) and the training data (andreoosthuizen/afrikaans-30s, CC-BY-4.0; NCHLT afr/eng, CC-BY-3.0) are credited in NOTICE.

Downloads last month
89
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for digiphyte/fluister-turbo

Finetuned
(560)
this model