STT 1B EN/FR β Q4 WebGPU
Q4-quantized weights for kyutai/stt-1b-en_fr, packaged for client-side browser inference via WASM + WebGPU.
Runs entirely in the browser β no server required. English + French, streaming, ~1B parameters.
Files
| File | Size | Description |
|---|---|---|
stt-1b-en_fr-q4_0.gguf |
531 MB | STT transformer weights (Q4_0 quantized) |
mimi-encoder-f16.safetensors |
107 MB | Mimi audio codec encoder (f16) |
tokenizer.model |
118 KB | SentencePiece tokenizer (32k vocab, EN+FR) |
Usage
These weights are consumed by stt-web, a Rust/WASM + WebGPU speech-to-text engine built with Burn.
import { SttClient } from './stt-client.js';
const stt = new SttClient({
onTranscript: (text, isFinal) => console.log(text),
onStatus: (text, ready) => console.log(text),
});
await stt.init();
await stt.startRecording();
Model weights are fetched from this repo automatically and cached by the browser.
Requirements
- Chrome 113+ or Edge 113+ (WebGPU required)
- HTTPS (required for WebGPU)
- ~640 MB download on first load (cached afterward)
Pipeline
Microphone β AudioWorklet (24kHz mono)
β Mimi codec [WASM, CPU] β 32 codebook tokens/frame at 12.5Hz
β STT transformer [WASM, WebGPU] β text tokens
β SentencePiece detokenizer β transcript
Model Details
- Base model: kyutai/stt-1b-en_fr by Kyutai
- Architecture: Decoder-only transformer with delayed-streams modeling
- Parameters: ~1B (STT) + ~25M (Mimi codec encoder)
- Quantization: Q4_0 (4-bit) for STT transformer, f16 for Mimi codec
- Languages: English, French
- Streaming latency: ~500ms text delay (6 frames at 12.5Hz)
- License: CC-BY 4.0 (same as original)
Quantization
The STT transformer weights were quantized from f32 to Q4_0 using a custom GGUF packer. Dequantization happens on-GPU via WGSL compute shaders at inference time. The Mimi codec encoder is stored at f16 as it runs on CPU via WASM.
Citation
If you use this model, please cite the original authors:
@techreport{kyutai2024stt,
author = {Kyutai},
title = {Speech-To-Text models},
institution = {Kyutai},
year = {2024},
url = {https://huggingface.co/kyutai/stt-1b-en_fr},
}
Disclaimer
This is an independent port by idle intelligence, not affiliated with or endorsed by Kyutai Labs. Transcription quality may differ from the original PyTorch implementation due to quantization.
- Downloads last month
- 113
4-bit
Model tree for idle-intelligence/stt-1b-en_fr-q4_0-webgpu
Base model
kyutai/stt-1b-en_fr