slprl
/

PAST

Model card Files Files and versions

xet

Community

ortal1602 commited on Jul 6, 2025

Commit

2a14d63

verified ·

1 Parent(s): 2fe5519

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# 📘 PAST: Phonetic-Acoustic Speech Tokenizer
 **Authors:** Nadav Har-Tuv, Or Tal, Yossi Adi
 **Affiliation:** The Hebrew University of Jerusalem
@@ -8,7 +8,7 @@
 ![Schematic of the PAST pipeline. The auxiliary heads use the output of the first vector quantization module as input.](PAST_figure.png)
-🧠 **Abstract:**
 We present PAST, a novel end-to-end framework that jointly models phonetic information alongside signal reconstruction, eliminating the need for external pretrained models. Unlike previous approaches that rely on pretrained self-supervised models, PAST employs supervised phonetic data, directly integrating domain knowledge into the tokenization process via auxiliary tasks. Additionally, we introduce a streamable, causal variant of PAST, enabling real-time speech applications. Results demonstrate that PAST surpasses existing evaluated baseline tokenizers across common evaluation metrics, including phonetic representation and speech reconstruction. Notably, PAST also achieves superior performance when serving as a speech representation for speech language models, further highlighting its effectiveness as a foundation for spoken language generation.
@@ -26,7 +26,7 @@ Audio samples are available on our [project demo page](https://pages.cs.huji.ac.
 ## Usage
-### 📥 Pre-requisites
 Install
@@ -47,7 +47,7 @@ conda activate past_env
 pip install -r requirements.txt
 ```
-### 🚀 Inference
 ```python
 # ---------------
@@ -88,9 +88,9 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
 ---
-## 🧪 Results (from the paper)
-### 🧠 Phonetic Information
 | **Tokenizer**          | **PNMI ↑** | **ABX ↓ Within** | **ABX ↓ Across** | **WER ↓ Clean** | **WER ↓ Other** |
 |------------------------|------------|------------------|------------------|------------------|------------------|
@@ -100,7 +100,7 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
 | **PAST**               | **0.75**   | **2.82**         | **3.54**         | 15.7             | 36.8             |
 | **PAST - Streamable**  | 0.74       | 3.05             | 3.89             | **14.3**         |  **32.3**        |
-### 🔊 Reconstruction Quality
 | **Tokenizer**           | **SISNR ↑** | **VISQOL ↑** | **PESQ ↑** |
 |-------------------------|-------------|--------------|------------|
@@ -110,7 +110,7 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
 | **PAST**                | **4.84**    | 4.40         | **3.55**   |
 | **PAST - Streamable**   | 3.90        | 4.37         | 3.40       |
-### 📖 Speech Language Modeling (sWUGGY)
 | **Tokenizer**           | **sWUGGY ↑ Inter** | **sWUGGY ↑ OOV** |
 |-------------------------|--------------------|------------------|
@@ -123,7 +123,7 @@ See [Eval README](https://github.com/slp-rl/PAST/eval_readme.md)
 ---
-## 📝 Citation
 > If you use PAST in your work, please cite:

+# PAST: Phonetic-Acoustic Speech Tokenizer
 **Authors:** Nadav Har-Tuv, Or Tal, Yossi Adi
 **Affiliation:** The Hebrew University of Jerusalem
 ![Schematic of the PAST pipeline. The auxiliary heads use the output of the first vector quantization module as input.](PAST_figure.png)
+## Abstract
 We present PAST, a novel end-to-end framework that jointly models phonetic information alongside signal reconstruction, eliminating the need for external pretrained models. Unlike previous approaches that rely on pretrained self-supervised models, PAST employs supervised phonetic data, directly integrating domain knowledge into the tokenization process via auxiliary tasks. Additionally, we introduce a streamable, causal variant of PAST, enabling real-time speech applications. Results demonstrate that PAST surpasses existing evaluated baseline tokenizers across common evaluation metrics, including phonetic representation and speech reconstruction. Notably, PAST also achieves superior performance when serving as a speech representation for speech language models, further highlighting its effectiveness as a foundation for spoken language generation.
 ## Usage
+### Pre-requisites
 Install
 pip install -r requirements.txt
 ```
+### Inference
 ```python
 # ---------------
 ---
+## Results (from the paper)
+### Phonetic Information
 | **Tokenizer**          | **PNMI ↑** | **ABX ↓ Within** | **ABX ↓ Across** | **WER ↓ Clean** | **WER ↓ Other** |
 |------------------------|------------|------------------|------------------|------------------|------------------|
 | **PAST**               | **0.75**   | **2.82**         | **3.54**         | 15.7             | 36.8             |
 | **PAST - Streamable**  | 0.74       | 3.05             | 3.89             | **14.3**         |  **32.3**        |
+### Reconstruction Quality
 | **Tokenizer**           | **SISNR ↑** | **VISQOL ↑** | **PESQ ↑** |
 |-------------------------|-------------|--------------|------------|
 | **PAST**                | **4.84**    | 4.40         | **3.55**   |
 | **PAST - Streamable**   | 3.90        | 4.37         | 3.40       |
+### Speech Language Modeling (sWUGGY)
 | **Tokenizer**           | **sWUGGY ↑ Inter** | **sWUGGY ↑ OOV** |
 |-------------------------|--------------------|------------------|
 ---
+## Citation
 > If you use PAST in your work, please cite: