license: mit
pipeline_tag: audio-classification
Model Card for SW2V (60k)
SW2V is a pure Transformer decoder-based speech representation model introduced in the paper Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec.
This specific checkpoint (60k) is trained via distillation of W2V-BERT 2.0.
- GitHub Repository: https://github.com/jhcodec843/jhcodec
- Demo: https://jhcodec843.github.io/jhcodec/
- License: MIT
Model Details
Model Description
SW2V (Streaming wav2vec) is designed for high-intelligibility and low-latency speech representation. It utilizes Self-Supervised Representation Reconstruction (SSRR) loss, which fundamentally improves codec training by reconstructing distilled self-supervised representations from codec outputs.
To ensure optimal performance, Flash-Attention is required.
Uses
JHCodec and the SW2V extractor can be used for research and practical applications requiring lossy audio compression or high-quality speech representations.
Intended Use
- Real-time low-latency audio codecs for speech-to-speech models
- Research into neural codecs and generative modeling
- Preprocessing for downstream speech and audio ML models (e.g., ASR or TTS)
Sample Usage
The following snippet from the official repository shows how to load data using the AudioDataset class:
from jhcodec.dataloader import AudioDataset, collate_fn
from torch.utils.data import DataLoader
dataset = AudioDataset(
audio_dir='./data', # Path to your data
sample_rate=16000,
segment_duration=10.24,
training=True,
init_dataset=False, # Use True to scan files initially (slow), or False to load from cache
cache_dir='cache_dir/dataloader/v9', # location of the cache
use_mel=False, # Set True to return also Mel features
)
Citation
@article{ssrr_codec2026,
title={Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec},
author={Anonymous},
journal={arXiv preprint arXiv:2603.05887},
year={2026}
}