WavCochCausalV64000100M

WavCoch is a causal waveform-to-cochleagram tokenizer by Greta Tuckute and Klemen Kotar.

Model Details

Parameter Value
Parameters ~93.05M
Window Size 1001
Hop Length 80
Encoder Dim 1536
Vocabulary Size 64000
Includes Vocoder False

Usage

from transformers import AutoModel

wavcoch = AutoModel.from_pretrained(
    "TuKoResearch/WavCochCausalV64000100M",
    trust_remote_code=True,
)

codes = wavcoch.quantize(waveform_tensor)
coch = wavcoch.decode(codes)
embeddings = wavcoch(
    input_values=waveform_tensor,
    output_hidden_states=True,
    sampling_rate=16000,
).hidden_states[0]

Notes

This repo contains the WavCoch tokenizer/autoencoder only. Audio decoding requires a vocoder-enabled checkpoint.

When called with output_hidden_states=True, WavCoch exposes a single hidden-state layer: the post-FSQ projected embedding sequence used for direct probing.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support