Update README.md
Browse files
README.md
CHANGED
|
@@ -37,8 +37,6 @@ The project adapts a SpeechT5 TTS backbone and injects **two conditioning signal
|
|
| 37 |
- **Fusion**: a trainable **StyleSpeakerFusion** merges both vectors into the **512‑D** `speaker_embeddings` tensor expected by SpeechT5 during generation. The official **SpeechT5 HiFi‑GAN** vocoder renders the waveform.
|
| 38 |
|
| 39 |
- **Developed by:** Amirhossein Yousefiramandi (GitHub: `amirhossein-yousefi`)
|
| 40 |
-
- **Funded by [optional]:** Not specified
|
| 41 |
-
- **Shared by [optional]:** Repository author
|
| 42 |
- **Model type:** TTS with emotion‑style transfer (recipe + training/inference code)
|
| 43 |
- **Language(s):** Primarily **English**
|
| 44 |
- **License:** Repository currently has **no LICENSE file**; treat code as “all rights reserved” unless the author adds a license. Base model licenses are listed in the **License** section below.
|
|
@@ -264,7 +262,7 @@ Use the [MLCO2 Impact calculator](https://mlco2.github.io/impact#compute) for yo
|
|
| 264 |
}
|
| 265 |
```
|
| 266 |
|
| 267 |
-
## Glossary
|
| 268 |
|
| 269 |
- **Style transfer (speech):** Conditioning TTS on reference audio to transfer prosodic/emotional characteristics.
|
| 270 |
- **Speaker embeddings:** Numeric vectors capturing speaker timbre (here from ECAPA‑TDNN).
|
|
@@ -275,7 +273,7 @@ Use the [MLCO2 Impact calculator](https://mlco2.github.io/impact#compute) for yo
|
|
| 275 |
|
| 276 |
- **SageMaker utilities:** The repo includes scripts for launching training jobs, and deploying real‑time/async inference endpoints.
|
| 277 |
|
| 278 |
-
## Model Card Authors
|
| 279 |
|
| 280 |
- Repository & implementation: **Amirhossein Yousefiramandi** (`@amirhossein-yousefi`).
|
| 281 |
|
|
|
|
| 37 |
- **Fusion**: a trainable **StyleSpeakerFusion** merges both vectors into the **512‑D** `speaker_embeddings` tensor expected by SpeechT5 during generation. The official **SpeechT5 HiFi‑GAN** vocoder renders the waveform.
|
| 38 |
|
| 39 |
- **Developed by:** Amirhossein Yousefiramandi (GitHub: `amirhossein-yousefi`)
|
|
|
|
|
|
|
| 40 |
- **Model type:** TTS with emotion‑style transfer (recipe + training/inference code)
|
| 41 |
- **Language(s):** Primarily **English**
|
| 42 |
- **License:** Repository currently has **no LICENSE file**; treat code as “all rights reserved” unless the author adds a license. Base model licenses are listed in the **License** section below.
|
|
|
|
| 262 |
}
|
| 263 |
```
|
| 264 |
|
| 265 |
+
## Glossary
|
| 266 |
|
| 267 |
- **Style transfer (speech):** Conditioning TTS on reference audio to transfer prosodic/emotional characteristics.
|
| 268 |
- **Speaker embeddings:** Numeric vectors capturing speaker timbre (here from ECAPA‑TDNN).
|
|
|
|
| 273 |
|
| 274 |
- **SageMaker utilities:** The repo includes scripts for launching training jobs, and deploying real‑time/async inference endpoints.
|
| 275 |
|
| 276 |
+
## Model Card Authors
|
| 277 |
|
| 278 |
- Repository & implementation: **Amirhossein Yousefiramandi** (`@amirhossein-yousefi`).
|
| 279 |
|