Update README.md
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ pretty_name: WavCoch (8192-code speech tokenizer)
|
|
| 16 |
|
| 17 |
# WavCochV8192 — 8,192-code speech tokenizer (cochlear tokens)
|
| 18 |
|
| 19 |
-
**WavCochV8192** is a biologically-inspired, learned **audio quantizer** that maps a raw waveform to **discrete "cochlear tokens".** It is used as the tokenizer for the AuriStream autoregressive
|
| 20 |
|
| 21 |
> **API at a glance**
|
| 22 |
> - **Input:** mono waveform at 16 kHz (pytorch tensor float32), shape **(B, 1, T)**
|
|
@@ -33,7 +33,7 @@ pip install -U torch torchaudio transformers
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
| 36 |
-
## Quickstart — Quantize a
|
| 37 |
|
| 38 |
```python
|
| 39 |
import torch, torchaudio
|
|
@@ -65,8 +65,8 @@ print("Token IDs shape:", token_ids.shape)
|
|
| 65 |
---
|
| 66 |
|
| 67 |
## Intended uses & limitations
|
| 68 |
-
- **Uses:** tokenization for speech LM training; compact storage/streaming of speech as discrete IDs
|
| 69 |
-
- **Limitations:** trained only on spoken
|
| 70 |
|
| 71 |
---
|
| 72 |
|
|
|
|
| 16 |
|
| 17 |
# WavCochV8192 — 8,192-code speech tokenizer (cochlear tokens)
|
| 18 |
|
| 19 |
+
**WavCochV8192** is a biologically-inspired, learned **audio quantizer** that maps a raw waveform to **discrete "cochlear tokens".** It is used as the tokenizer for the AuriStream autoregressive speech/language model (e.g., [TuKoResearch/AuriStream1B_librilight_ckpt500k](https://huggingface.co/TuKoResearch/AuriStream1B_librilight_ckpt500k)). The model is trained on LibriSpeech960 and encodes audio into a time–frequency representation ([Cochleagram](https://github.com/jenellefeather/chcochleagram)) and reads out **8,192-way discrete codes** through a low-bit latent bottleneck (LFQ). These tokens can be fed to a transformer LM for **representation learning** and **next-token prediction** (speech continuation).
|
| 20 |
|
| 21 |
> **API at a glance**
|
| 22 |
> - **Input:** mono waveform at 16 kHz (pytorch tensor float32), shape **(B, 1, T)**
|
|
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
| 36 |
+
## Quickstart — Quantize a waveform into cochlear tokens
|
| 37 |
|
| 38 |
```python
|
| 39 |
import torch, torchaudio
|
|
|
|
| 65 |
---
|
| 66 |
|
| 67 |
## Intended uses & limitations
|
| 68 |
+
- **Uses:** tokenization for speech LM training; compact storage/streaming of speech as discrete IDs, loosely inspired by human biology.
|
| 69 |
+
- **Limitations:** trained only on spoken English, so might not perform as well for other languages and non-speech sounds.
|
| 70 |
|
| 71 |
---
|
| 72 |
|