TuKoResearch
/

WavCochV8192

Feature Extraction

WavCoch.WavCoch

Model card Files Files and versions

gretatuckute commited on Aug 19, 2025

Commit

277ea3a

·

verified ·

1 Parent(s): e86c976

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ pretty_name: WavCoch (8192-code speech tokenizer)
 # WavCochV8192 — 8,192-code speech tokenizer (cochlear tokens)
-**WavCochV8192** is a biologically-inspired, learned **audio quantizer** that maps a raw waveform to **discrete "cochlear tokens".** It is used as the tokenizer for the AuriStream autoregressive speech/language model (e.g., [TuKoResearch/AuriStream1B_librilight_ckpt500k](https://huggingface.co/TuKoResearch/AuriStream1B_librilight_ckpt500k)). The model is trained on LibriSpeech960 and encodes audio into a time–frequency representation ([Cochleagram](https://github.com/jenellefeather/chcochleagram)) and reads out **8,192-way discrete codes** through a low-bit latent bottleneck (LFQ). These tokens can be fed to a transformer LM for **representation learning** and **next-token prediction** (speech continuation).
 > **API at a glance**
 > - **Input:** mono waveform at 16 kHz (pytorch tensor float32), shape **(B, 1, T)**

 # WavCochV8192 — 8,192-code speech tokenizer (cochlear tokens)
+**WavCochV8192** is a biologically-inspired, learned **audio quantizer** that maps a raw waveform to **discrete "cochlear tokens".** It is used as the tokenizer for the AuriStream autoregressive speech/language model (e.g., [TuKoResearch/AuriStream1B_librilight_ckpt500k](https://huggingface.co/TuKoResearch/AuriStream1B_librilight_ckpt500k)). The model is trained on LibriSpeech960 and encodes audio into a time–frequency representation ([Cochleagram; Feather et al., 2023 Nat Neuro](https://github.com/jenellefeather/chcochleagram)) and reads out **8,192-way discrete codes** through a low-bit latent bottleneck (LFQ). These tokens can be fed to a transformer LM for **representation learning** and **next-token prediction** (speech continuation).
 > **API at a glance**
 > - **Input:** mono waveform at 16 kHz (pytorch tensor float32), shape **(B, 1, T)**