klemenk's picture
Enable direct WavCoch probing via output_hidden_states
284b69b verified
metadata
license: apache-2.0
tags:
  - audio
  - speech
  - tokenizer
  - vocoder
  - wavcoch
library_name: transformers

WavCochCausalV8192-vocoder

WavCoch is a causal waveform-to-cochleagram tokenizer by Greta Tuckute and Klemen Kotar.

Model Details

Parameter Value
Parameters ~24.42M
Window Size 1001
Hop Length 80
Encoder Dim 512
Vocabulary Size 8192
Includes Vocoder True

Usage

from transformers import AutoModel

wavcoch = AutoModel.from_pretrained(
    "TuKoResearch/WavCochCausalV8192-vocoder",
    trust_remote_code=True,
)

codes = wavcoch.quantize(waveform_tensor)
coch = wavcoch.decode(codes)
embeddings = wavcoch(
    input_values=waveform_tensor,
    output_hidden_states=True,
    sampling_rate=16000,
).hidden_states[0]

audio = wavcoch.decode_audio(codes)

Notes

This repo includes a bundled vocoder and supports decode_audio(...) for end-to-end waveform synthesis.

When called with output_hidden_states=True, WavCoch exposes a single hidden-state layer: the post-FSQ projected embedding sequence used for direct probing.