Add pipeline tag and link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +62 -46
README.md CHANGED
@@ -1,46 +1,62 @@
1
- ---
2
- license: mit
3
- ---
4
- # Model Card for SW2V
5
-
6
- *Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec*
7
-
8
- SW2V is a pure Transformer decoder based speech representation model. This model is trained via distillation of [W2V-Bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0)
9
-
10
- - **GitHub Repository:** [https://github.com/jhcodec843/jhcodec](https://github.com/jhcodec843/jhcodec)
11
- - **Demo:** [https://jhcodec843.github.io/jhcodec/](https://jhcodec843.github.io/jhcodec/)
12
- - **License:** MIT
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
-
18
- This is corresponding to the paper's SW2V model (60k).
19
- To ensure the performance Flash-Attention is required.
20
-
21
- ## Uses
22
-
23
- JHCodec can be used for research and practical applications that require lossy audio compression. It is particularly well-suited for streaming speech, compressing large audio datasets, and serving as a neural front-end for speech recognition or synthesis pipelines.
24
-
25
- ### Intended Use
26
-
27
- - Real-time low-latency audio codecs for speech-to-speech models
28
- - Research into neural codecs and generative modeling
29
- - Preprocessing for downstream speech and audio ML models
30
-
31
- ### Out-of-Scope Use
32
-
33
- - Any malicious, deceptive, or privacy-violating applications
34
-
35
- ## How to Get Started with JHCodec
36
-
37
- For programmatic usage, please refer to the [GitHub repository](https://github.com/jhcodec843/jhcodec) for installation, API documentation, and practical examples.
38
-
39
- ## Training Details
40
-
41
- Please refer to the GitHub repository README.
42
-
43
- ## Authors
44
-
45
- Anonymous, Submitted to Interspeech2026
46
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: audio-classification
4
+ ---
5
+
6
+ # Model Card for SW2V (60k)
7
+
8
+ SW2V is a pure Transformer decoder-based speech representation model introduced in the paper [Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec](https://huggingface.co/papers/2603.05887).
9
+
10
+ This specific checkpoint (60k) is trained via distillation of [W2V-BERT 2.0](https://huggingface.co/facebook/w2v-bert-2.0).
11
+
12
+ - **GitHub Repository:** [https://github.com/jhcodec843/jhcodec](https://github.com/jhcodec843/jhcodec)
13
+ - **Demo:** [https://jhcodec843.github.io/jhcodec/](https://jhcodec843.github.io/jhcodec/)
14
+ - **License:** MIT
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ SW2V (Streaming wav2vec) is designed for high-intelligibility and low-latency speech representation. It utilizes **Self-Supervised Representation Reconstruction (SSRR)** loss, which fundamentally improves codec training by reconstructing distilled self-supervised representations from codec outputs.
21
+
22
+ To ensure optimal performance, **Flash-Attention** is required.
23
+
24
+ ## Uses
25
+
26
+ JHCodec and the SW2V extractor can be used for research and practical applications requiring lossy audio compression or high-quality speech representations.
27
+
28
+ ### Intended Use
29
+
30
+ - Real-time low-latency audio codecs for speech-to-speech models
31
+ - Research into neural codecs and generative modeling
32
+ - Preprocessing for downstream speech and audio ML models (e.g., ASR or TTS)
33
+
34
+ ## Sample Usage
35
+
36
+ The following snippet from the [official repository](https://github.com/jhcodec843/jhcodec) shows how to load data using the `AudioDataset` class:
37
+
38
+ ```python
39
+ from jhcodec.dataloader import AudioDataset, collate_fn
40
+ from torch.utils.data import DataLoader
41
+
42
+ dataset = AudioDataset(
43
+ audio_dir='./data', # Path to your data
44
+ sample_rate=16000,
45
+ segment_duration=10.24,
46
+ training=True,
47
+ init_dataset=False, # Use True to scan files initially (slow), or False to load from cache
48
+ cache_dir='cache_dir/dataloader/v9', # location of the cache
49
+ use_mel=False, # Set True to return also Mel features
50
+ )
51
+ ```
52
+
53
+ ## Citation
54
+
55
+ ```bibtex
56
+ @article{ssrr_codec2026,
57
+ title={Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec},
58
+ author={Anonymous},
59
+ journal={arXiv preprint arXiv:2603.05887},
60
+ year={2026}
61
+ }
62
+ ```