ortal1602 commited on
Commit
2a14d63
Β·
verified Β·
1 Parent(s): 2fe5519

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -1,4 +1,4 @@
1
- # πŸ“˜ PAST: Phonetic-Acoustic Speech Tokenizer
2
 
3
  **Authors:** Nadav Har-Tuv, Or Tal, Yossi Adi
4
  **Affiliation:** The Hebrew University of Jerusalem
@@ -8,7 +8,7 @@
8
  ![Schematic of the PAST pipeline. The auxiliary heads use the output of the first vector quantization module as input.](PAST_figure.png)
9
 
10
 
11
- 🧠 **Abstract:**
12
 
13
  We present PAST, a novel end-to-end framework that jointly models phonetic information alongside signal reconstruction, eliminating the need for external pretrained models. Unlike previous approaches that rely on pretrained self-supervised models, PAST employs supervised phonetic data, directly integrating domain knowledge into the tokenization process via auxiliary tasks. Additionally, we introduce a streamable, causal variant of PAST, enabling real-time speech applications. Results demonstrate that PAST surpasses existing evaluated baseline tokenizers across common evaluation metrics, including phonetic representation and speech reconstruction. Notably, PAST also achieves superior performance when serving as a speech representation for speech language models, further highlighting its effectiveness as a foundation for spoken language generation.
14
 
@@ -26,7 +26,7 @@ Audio samples are available on our [project demo page](https://pages.cs.huji.ac.
26
 
27
  ## Usage
28
 
29
- ### πŸ“₯ Pre-requisites
30
 
31
  Install
32
 
@@ -47,7 +47,7 @@ conda activate past_env
47
  pip install -r requirements.txt
48
  ```
49
 
50
- ### πŸš€ Inference
51
 
52
  ```python
53
  # ---------------
@@ -88,9 +88,9 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
88
 
89
  ---
90
 
91
- ## πŸ§ͺ Results (from the paper)
92
 
93
- ### 🧠 Phonetic Information
94
 
95
  | **Tokenizer** | **PNMI ↑** | **ABX ↓ Within** | **ABX ↓ Across** | **WER ↓ Clean** | **WER ↓ Other** |
96
  |------------------------|------------|------------------|------------------|------------------|------------------|
@@ -100,7 +100,7 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
100
  | **PAST** | **0.75** | **2.82** | **3.54** | 15.7 | 36.8 |
101
  | **PAST - Streamable** | 0.74 | 3.05 | 3.89 | **14.3** | **32.3** |
102
 
103
- ### πŸ”Š Reconstruction Quality
104
 
105
  | **Tokenizer** | **SISNR ↑** | **VISQOL ↑** | **PESQ ↑** |
106
  |-------------------------|-------------|--------------|------------|
@@ -110,7 +110,7 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
110
  | **PAST** | **4.84** | 4.40 | **3.55** |
111
  | **PAST - Streamable** | 3.90 | 4.37 | 3.40 |
112
 
113
- ### πŸ“– Speech Language Modeling (sWUGGY)
114
 
115
  | **Tokenizer** | **sWUGGY ↑ Inter** | **sWUGGY ↑ OOV** |
116
  |-------------------------|--------------------|------------------|
@@ -123,7 +123,7 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
123
 
124
  ---
125
 
126
- ## πŸ“ Citation
127
 
128
  > If you use PAST in your work, please cite:
129
 
 
1
+ # PAST: Phonetic-Acoustic Speech Tokenizer
2
 
3
  **Authors:** Nadav Har-Tuv, Or Tal, Yossi Adi
4
  **Affiliation:** The Hebrew University of Jerusalem
 
8
  ![Schematic of the PAST pipeline. The auxiliary heads use the output of the first vector quantization module as input.](PAST_figure.png)
9
 
10
 
11
+ ## Abstract
12
 
13
  We present PAST, a novel end-to-end framework that jointly models phonetic information alongside signal reconstruction, eliminating the need for external pretrained models. Unlike previous approaches that rely on pretrained self-supervised models, PAST employs supervised phonetic data, directly integrating domain knowledge into the tokenization process via auxiliary tasks. Additionally, we introduce a streamable, causal variant of PAST, enabling real-time speech applications. Results demonstrate that PAST surpasses existing evaluated baseline tokenizers across common evaluation metrics, including phonetic representation and speech reconstruction. Notably, PAST also achieves superior performance when serving as a speech representation for speech language models, further highlighting its effectiveness as a foundation for spoken language generation.
14
 
 
26
 
27
  ## Usage
28
 
29
+ ### Pre-requisites
30
 
31
  Install
32
 
 
47
  pip install -r requirements.txt
48
  ```
49
 
50
+ ### Inference
51
 
52
  ```python
53
  # ---------------
 
88
 
89
  ---
90
 
91
+ ## Results (from the paper)
92
 
93
+ ### Phonetic Information
94
 
95
  | **Tokenizer** | **PNMI ↑** | **ABX ↓ Within** | **ABX ↓ Across** | **WER ↓ Clean** | **WER ↓ Other** |
96
  |------------------------|------------|------------------|------------------|------------------|------------------|
 
100
  | **PAST** | **0.75** | **2.82** | **3.54** | 15.7 | 36.8 |
101
  | **PAST - Streamable** | 0.74 | 3.05 | 3.89 | **14.3** | **32.3** |
102
 
103
+ ### Reconstruction Quality
104
 
105
  | **Tokenizer** | **SISNR ↑** | **VISQOL ↑** | **PESQ ↑** |
106
  |-------------------------|-------------|--------------|------------|
 
110
  | **PAST** | **4.84** | 4.40 | **3.55** |
111
  | **PAST - Streamable** | 3.90 | 4.37 | 3.40 |
112
 
113
+ ### Speech Language Modeling (sWUGGY)
114
 
115
  | **Tokenizer** | **sWUGGY ↑ Inter** | **sWUGGY ↑ OOV** |
116
  |-------------------------|--------------------|------------------|
 
123
 
124
  ---
125
 
126
+ ## Citation
127
 
128
  > If you use PAST in your work, please cite:
129