Add pipeline tag and link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +71 -46
README.md CHANGED
@@ -1,46 +1,71 @@
1
- ---
2
- license: mit
3
- ---
4
- # Model Card for SW2V-120k
5
-
6
- *Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec*
7
-
8
- SW2V is a pure Transformer decoder based speech representation model. This model is trained via distillation of [W2V-Bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0)
9
-
10
- - **GitHub Repository:** [https://github.com/jhcodec843/jhcodec](https://github.com/jhcodec843/jhcodec)
11
- - **Demo:** [https://jhcodec843.github.io/jhcodec/](https://jhcodec843.github.io/jhcodec/)
12
- - **License:** MIT
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
-
18
- To enhance noise robustness for future applications, we incorporated noise augmentation during SW2V training.
19
- To ensure the performance Flash-Attention is required.
20
-
21
- ## Uses
22
-
23
- JHCodec can be used for research and practical applications that require lossy audio compression. It is particularly well-suited for streaming speech, compressing large audio datasets, and serving as a neural front-end for speech recognition or synthesis pipelines.
24
-
25
- ### Intended Use
26
-
27
- - Real-time low-latency audio codecs for speech-to-speech models
28
- - Research into neural codecs and generative modeling
29
- - Preprocessing for downstream speech and audio ML models
30
-
31
- ### Out-of-Scope Use
32
-
33
- - Any malicious, deceptive, or privacy-violating applications
34
-
35
- ## How to Get Started with JHCodec
36
-
37
- For programmatic usage, please refer to the [GitHub repository](https://github.com/jhcodec843/jhcodec) for installation, API documentation, and practical examples.
38
-
39
- ## Training Details
40
-
41
- Please refer to the GitHub repository README.
42
-
43
- ## Authors
44
-
45
- Anonymous, Submitted to Interspeech2026
46
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: audio-classification
4
+ ---
5
+
6
+ # Model Card for SW2V-120k
7
+
8
+ SW2V (Streaming Speech-to-Vector) is a pure Transformer decoder-based speech representation model. This specific checkpoint (120k) is trained with noise augmentation to enhance robustness for various real-world speech applications.
9
+
10
+ The model was introduced in the paper [Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec](https://huggingface.co/papers/2603.05887).
11
+
12
+ - **GitHub Repository:** [https://github.com/jhcodec843/jhcodec](https://github.com/jhcodec843/jhcodec)
13
+ - **Demo:** [https://jhcodec843.github.io/jhcodec/](https://jhcodec843.github.io/jhcodec/)
14
+ - **License:** MIT
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ SW2V-120k is a streaming speech representation extractor trained via distillation of [W2V-Bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0). It leverages self-supervised representation reconstruction (SSRR) loss to fundamentally improve codec training, ensuring high intelligibility and content preservation with zero lookahead. This variant incorporates noise augmentation during training for improved performance in noisy environments.
21
+
22
+ **Note:** Flash-Attention is required for optimal performance.
23
+
24
+ ## Uses
25
+
26
+ JHCodec and SW2V can be used for research and practical applications requiring:
27
+ - Real-time low-latency audio codecs for speech-to-speech models.
28
+ - Neural front-ends for speech recognition or synthesis pipelines.
29
+ - Lossy audio compression and speech representation extraction.
30
+
31
+ ### Out-of-Scope Use
32
+
33
+ - Any malicious, deceptive, or privacy-violating applications.
34
+
35
+ ## How to Get Started
36
+
37
+ For programmatic usage, please refer to the [GitHub repository](https://github.com/jhcodec843/jhcodec) for installation and environment setup.
38
+
39
+ ### Sample Usage
40
+
41
+ You can use the `AudioDataset` from the official implementation to load data for the model:
42
+
43
+ ```python
44
+ from jhcodec.dataloader import AudioDataset, collate_fn
45
+ from torch.utils.data import DataLoader
46
+
47
+ dataset = AudioDataset(
48
+ audio_dir='./data', # Path to your data
49
+ sample_rate=16000,
50
+ segment_duration=10.24,
51
+ training=True,
52
+ init_dataset=False, # Use True to scan files initially (slow), or False to load from cache
53
+ cache_dir='cache_dir/dataloader/v9', # location of the cache
54
+ use_mel=False, # Set True to return also Mel features
55
+ )
56
+ ```
57
+
58
+ ## Citation
59
+
60
+ ```bibtex
61
+ @article{sw2v2026ssrr,
62
+ title={Reconstruct! Don't Encode: Self-Supervised Representation Reconstruction Loss for High-Intelligibility and Low-Latency Streaming Neural Audio Codec},
63
+ author={Anonymous},
64
+ journal={arXiv preprint arXiv:2603.05887},
65
+ year={2026}
66
+ }
67
+ ```
68
+
69
+ ## Authors
70
+
71
+ Anonymous, Submitted to Interspeech 2026.