WavCochCausalV64000100M
WavCoch is a causal waveform-to-cochleagram tokenizer by Greta Tuckute and Klemen Kotar.
Model Details
| Parameter | Value |
|---|---|
| Parameters | ~93.05M |
| Window Size | 1001 |
| Hop Length | 80 |
| Encoder Dim | 1536 |
| Vocabulary Size | 64000 |
| Includes Vocoder | False |
Usage
from transformers import AutoModel
wavcoch = AutoModel.from_pretrained(
"TuKoResearch/WavCochCausalV64000100M",
trust_remote_code=True,
)
codes = wavcoch.quantize(waveform_tensor)
coch = wavcoch.decode(codes)
embeddings = wavcoch(
input_values=waveform_tensor,
output_hidden_states=True,
sampling_rate=16000,
).hidden_states[0]
Notes
This repo contains the WavCoch tokenizer/autoencoder only. Audio decoding requires a vocoder-enabled checkpoint.
When called with output_hidden_states=True, WavCoch exposes a single hidden-state layer:
the post-FSQ projected embedding sequence used for direct probing.
- Downloads last month
- -