Update README.md
Browse files
README.md
CHANGED
|
@@ -18,6 +18,10 @@ This repository contains the **Swaram (mal)** text-to-speech (TTS) model checkpo
|
|
| 18 |
|
| 19 |
Swaram's text encoder is built on top of the **Wav2Vec2 decoder**. A **VAE** is used as the decoder. A **flow-based module** predicts **spectrogram-based acoustic features**, which is composed of the **Transformer-based Contextualizer** and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of **transposed convolutional layers**. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
## Usage
|
| 22 |
|
| 23 |
```
|
|
|
|
| 18 |
|
| 19 |
Swaram's text encoder is built on top of the **Wav2Vec2 decoder**. A **VAE** is used as the decoder. A **flow-based module** predicts **spectrogram-based acoustic features**, which is composed of the **Transformer-based Contextualizer** and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of **transposed convolutional layers**. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.
|
| 20 |
|
| 21 |
+
## Architecture
|
| 22 |
+
|
| 23 |
+

|
| 24 |
+
|
| 25 |
## Usage
|
| 26 |
|
| 27 |
```
|