mispeech
/

dashengtokenizer

dashengtokenizer

feature-extraction

audio-classification

signal-processing

Model card Files Files and versions

Heinrich Dinkel commited on 19 days ago

Commit

f8e3b40

·

1 Parent(s): 3f1e105

Added README

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -12,6 +12,13 @@ license: apache-2.0
 DashengTokenizer is a high-performance continious audio tokenizer designed for audio understanding and generation tasks.
 Compared to previous works, our framework simply trains a single linear layer to enable audio generation for semantically strong encoders.
 ![Framework](./figures/framework.png)

 DashengTokenizer is a high-performance continious audio tokenizer designed for audio understanding and generation tasks.
 Compared to previous works, our framework simply trains a single linear layer to enable audio generation for semantically strong encoders.
+Achievements:
+* State-of-the-Art Audio Understanding: DashengTokenizer consistently outperforms most previous self-supervised and supervised audio encoders.
+* High-Fidelity Signal Reconstruction: Maintains exceptional signal integrity, ensuring that audio remains crisp and accurate after processing.
+* Accelerated Audio Generation Training: Achieves optimal performance significantly faster than standard VAE models, reducing training time and costs.
+* Superior Speech Enhancement: Provides a more robust encoding foundation for isolating and clarifying speech in noisy environments.
 ![Framework](./figures/framework.png)