STT 1B EN/FR — Q4 WebGPU

Q4-quantized weights for kyutai/stt-1b-en_fr, packaged for client-side browser inference via WASM + WebGPU.

Runs entirely in the browser — no server required. English + French, streaming, ~1B parameters.

Files

File	Size	Description
`stt-1b-en_fr-q4_0.gguf`	531 MB	STT transformer weights (Q4_0 quantized)
`mimi-encoder-f16.safetensors`	107 MB	Mimi audio codec encoder (f16)
`tokenizer.model`	118 KB	SentencePiece tokenizer (32k vocab, EN+FR)

Usage

These weights are consumed by stt-web, a Rust/WASM + WebGPU speech-to-text engine built with Burn.

import { SttClient } from './stt-client.js';

const stt = new SttClient({
    onTranscript: (text, isFinal) => console.log(text),
    onStatus: (text, ready) => console.log(text),
});

await stt.init();
await stt.startRecording();

Model weights are fetched from this repo automatically and cached by the browser.

Requirements

Chrome 113+ or Edge 113+ (WebGPU required)
HTTPS (required for WebGPU)
~640 MB download on first load (cached afterward)

Pipeline

Microphone → AudioWorklet (24kHz mono)
  → Mimi codec [WASM, CPU] → 32 codebook tokens/frame at 12.5Hz
    → STT transformer [WASM, WebGPU] → text tokens
      → SentencePiece detokenizer → transcript

Model Details

Base model: kyutai/stt-1b-en_fr by Kyutai
Architecture: Decoder-only transformer with delayed-streams modeling
Parameters: ~1B (STT) + ~25M (Mimi codec encoder)
Quantization: Q4_0 (4-bit) for STT transformer, f16 for Mimi codec
Languages: English, French
Streaming latency: ~500ms text delay (6 frames at 12.5Hz)
License: CC-BY 4.0 (same as original)

Quantization

The STT transformer weights were quantized from f32 to Q4_0 using a custom GGUF packer. Dequantization happens on-GPU via WGSL compute shaders at inference time. The Mimi codec encoder is stored at f16 as it runs on CPU via WASM.

Citation

If you use this model, please cite the original authors:

@techreport{kyutai2024stt,
    author = {Kyutai},
    title = {Speech-To-Text models},
    institution = {Kyutai},
    year = {2024},
    url = {https://huggingface.co/kyutai/stt-1b-en_fr},
}

Disclaimer

This is an independent port by idle intelligence, not affiliated with or endorsed by Kyutai Labs. Transcription quality may differ from the original PyTorch implementation due to quantization.

Downloads last month: 37

GGUF

Model size

1.0B params

Architecture

kyutai-stt

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for idle-intelligence/stt-1b-en_fr-q4_0-webgpu

Base model

kyutai/stt-1b-en_fr

Quantized

(2)

this model