gretatuckute commited on
Commit
e86c976
·
verified ·
1 Parent(s): f2a8343

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -16,7 +16,7 @@ pretty_name: WavCoch (8192-code speech tokenizer)
16
 
17
  # WavCochV8192 — 8,192-code speech tokenizer (cochlear tokens)
18
 
19
- **WavCochV8192** is a biologically-inspired, learned **audio quantizer** that maps a raw waveform to **discrete "cochlear tokens".** It is used as the tokenizer for the AuriStream autoregressive Speech Language Model (e.g., [TuKoResearch/AuriStream1B_librilight_ckpt500k](https://huggingface.co/TuKoResearch/AuriStream1B_librilight_ckpt500k)). The model is trained on LibriSpeech960 and encodes audio into a time–frequency representation and reads out **8,192-way discrete codes** through a low-bit latent bottleneck (LFQ). These tokens can be fed to a transformer LM for **representation learning** and **next-token prediction** (speech continuation).
20
 
21
  > **API at a glance**
22
  > - **Input:** mono waveform at 16 kHz (pytorch tensor float32), shape **(B, 1, T)**
@@ -33,7 +33,7 @@ pip install -U torch torchaudio transformers
33
 
34
  ---
35
 
36
- ## Quickstart — Quantize a WAV into cochlear tokens
37
 
38
  ```python
39
  import torch, torchaudio
@@ -65,8 +65,8 @@ print("Token IDs shape:", token_ids.shape)
65
  ---
66
 
67
  ## Intended uses & limitations
68
- - **Uses:** tokenization for speech LM training; compact storage/streaming of speech as discrete IDs in a biologically plausible manner.
69
- - **Limitations:** trained only on spoken english, so might not perform as well for other languages and non-speech sounds.
70
 
71
  ---
72
 
 
16
 
17
  # WavCochV8192 — 8,192-code speech tokenizer (cochlear tokens)
18
 
19
+ **WavCochV8192** is a biologically-inspired, learned **audio quantizer** that maps a raw waveform to **discrete "cochlear tokens".** It is used as the tokenizer for the AuriStream autoregressive speech/language model (e.g., [TuKoResearch/AuriStream1B_librilight_ckpt500k](https://huggingface.co/TuKoResearch/AuriStream1B_librilight_ckpt500k)). The model is trained on LibriSpeech960 and encodes audio into a time–frequency representation ([Cochleagram](https://github.com/jenellefeather/chcochleagram)) and reads out **8,192-way discrete codes** through a low-bit latent bottleneck (LFQ). These tokens can be fed to a transformer LM for **representation learning** and **next-token prediction** (speech continuation).
20
 
21
  > **API at a glance**
22
  > - **Input:** mono waveform at 16 kHz (pytorch tensor float32), shape **(B, 1, T)**
 
33
 
34
  ---
35
 
36
+ ## Quickstart — Quantize a waveform into cochlear tokens
37
 
38
  ```python
39
  import torch, torchaudio
 
65
  ---
66
 
67
  ## Intended uses & limitations
68
+ - **Uses:** tokenization for speech LM training; compact storage/streaming of speech as discrete IDs, loosely inspired by human biology.
69
+ - **Limitations:** trained only on spoken English, so might not perform as well for other languages and non-speech sounds.
70
 
71
  ---
72