jhcodec
/

sw2v_60k

Audio Classification

Model card Files Files and versions

xet

Community

Add pipeline tag and link to paper

by nielsr HF Staff - opened Mar 9

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+62

-46

Files changed (1) hide show

README.md +62 -46

README.md CHANGED Viewed

@@ -1,46 +1,62 @@
----
-license: mit
----
-# Model Card for SW2V
-*Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec*
-SW2V is a pure Transformer decoder based speech representation model. This model is trained via distillation of [W2V-Bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0)
-- **GitHub Repository:** [https://github.com/jhcodec843/jhcodec](https://github.com/jhcodec843/jhcodec)
-- **Demo:** [https://jhcodec843.github.io/jhcodec/](https://jhcodec843.github.io/jhcodec/)
-- **License:** MIT
-## Model Details
-### Model Description
-This is corresponding to the paper's SW2V model (60k).
-To ensure the performance Flash-Attention is required.
-## Uses
-JHCodec can be used for research and practical applications that require lossy audio compression. It is particularly well-suited for streaming speech, compressing large audio datasets, and serving as a neural front-end for speech recognition or synthesis pipelines.
-### Intended Use
-- Real-time low-latency audio codecs for speech-to-speech models
-- Research into neural codecs and generative modeling
-- Preprocessing for downstream speech and audio ML models
-### Out-of-Scope Use
-- Any malicious, deceptive, or privacy-violating applications
-## How to Get Started with JHCodec
-For programmatic usage, please refer to the [GitHub repository](https://github.com/jhcodec843/jhcodec) for installation, API documentation, and practical examples.
-## Training Details
-Please refer to the GitHub repository README.
-## Authors
-Anonymous, Submitted to Interspeech2026

+---
+license: mit
+pipeline_tag: audio-classification
+---
+# Model Card for SW2V (60k)
+SW2V is a pure Transformer decoder-based speech representation model introduced in the paper [Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec](https://huggingface.co/papers/2603.05887).
+This specific checkpoint (60k) is trained via distillation of [W2V-BERT 2.0](https://huggingface.co/facebook/w2v-bert-2.0).
+- **GitHub Repository:** [https://github.com/jhcodec843/jhcodec](https://github.com/jhcodec843/jhcodec)
+- **Demo:** [https://jhcodec843.github.io/jhcodec/](https://jhcodec843.github.io/jhcodec/)
+- **License:** MIT
+## Model Details
+### Model Description
+SW2V (Streaming wav2vec) is designed for high-intelligibility and low-latency speech representation. It utilizes **Self-Supervised Representation Reconstruction (SSRR)** loss, which fundamentally improves codec training by reconstructing distilled self-supervised representations from codec outputs.
+To ensure optimal performance, **Flash-Attention** is required.
+## Uses
+JHCodec and the SW2V extractor can be used for research and practical applications requiring lossy audio compression or high-quality speech representations.
+### Intended Use
+- Real-time low-latency audio codecs for speech-to-speech models
+- Research into neural codecs and generative modeling
+- Preprocessing for downstream speech and audio ML models (e.g., ASR or TTS)
+## Sample Usage
+The following snippet from the [official repository](https://github.com/jhcodec843/jhcodec) shows how to load data using the `AudioDataset` class:
+```python
+from jhcodec.dataloader import AudioDataset, collate_fn
+from torch.utils.data import DataLoader
+dataset = AudioDataset(
+    audio_dir='./data',                  # Path to your data
+    sample_rate=16000,
+    segment_duration=10.24,
+    training=True,
+    init_dataset=False,                  # Use True to scan files initially (slow), or False to load from cache
+    cache_dir='cache_dir/dataloader/v9', # location of the cache
+    use_mel=False,                       # Set True to return also Mel features
+)
+```
+## Citation
+```bibtex
+@article{ssrr_codec2026,
+  title={Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec},
+  author={Anonymous},
+  journal={arXiv preprint arXiv:2603.05887},
+  year={2026}
+}
+```