update readme
Browse files
README.md
CHANGED
|
@@ -27,12 +27,14 @@ When a single piece of audio needs to **sound like a real person**, **pronounce
|
|
| 27 |
|
| 28 |
| Model | Architecture | Size | Model Card | Hugging Face |
|
| 29 |
|---|---|---:|---|---|
|
| 30 |
-
| **MOSS-TTS** | MossTTSDelay | 8B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS) |
|
| 31 |
-
| | MossTTSLocal | 1.7B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer) |
|
| 32 |
-
| **MOSS‑TTSD‑V1.0** | MossTTSDelay | 8B | [moss_ttsd_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_ttsd_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTSD-v1.0) |
|
| 33 |
-
| **MOSS‑VoiceGenerator** | MossTTSDelay | 1.7B | [moss_voice_generator_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_voice_generator_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-Voice-Generator) |
|
| 34 |
-
| **MOSS‑SoundEffect** | MossTTSDelay | 8B | [moss_sound_effect_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_sound_effect_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect) |
|
| 35 |
-
| **MOSS‑TTS‑Realtime** | MossTTSRealtime | 1.7B | [moss_tts_realtime_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_realtime_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime) |
|
|
|
|
|
|
|
| 36 |
|
| 37 |
# MOSS Voice Generator Model Card
|
| 38 |
|
|
@@ -66,6 +68,48 @@ When a single piece of audio needs to **sound like a real person**, **pronounce
|
|
| 66 |
|
| 67 |
## 2. Quick Start
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
```python
|
| 70 |
import os
|
| 71 |
from pathlib import Path
|
|
|
|
| 27 |
|
| 28 |
| Model | Architecture | Size | Model Card | Hugging Face |
|
| 29 |
|---|---|---:|---|---|
|
| 30 |
+
| **MOSS-TTS** | MossTTSDelay | 8B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS) |
|
| 31 |
+
| | MossTTSLocal | 1.7B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer) |
|
| 32 |
+
| **MOSS‑TTSD‑V1.0** | MossTTSDelay | 8B | [moss_ttsd_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_ttsd_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTSD-v1.0) |
|
| 33 |
+
| **MOSS‑VoiceGenerator** | MossTTSDelay | 1.7B | [moss_voice_generator_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_voice_generator_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-Voice-Generator) |
|
| 34 |
+
| **MOSS‑SoundEffect** | MossTTSDelay | 8B | [moss_sound_effect_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_sound_effect_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect) |
|
| 35 |
+
| **MOSS‑TTS‑Realtime** | MossTTSRealtime | 1.7B | [moss_tts_realtime_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_realtime_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime) |
|
| 36 |
+
|
| 37 |
+
|
| 38 |
|
| 39 |
# MOSS Voice Generator Model Card
|
| 40 |
|
|
|
|
| 68 |
|
| 69 |
## 2. Quick Start
|
| 70 |
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
### Environment Setup
|
| 74 |
+
|
| 75 |
+
We recommend a clean, isolated Python environment with **Transformers 5.0.0** to avoid dependency conflicts.
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
conda create -n moss-tts python=3.12 -y
|
| 79 |
+
conda activate moss-tts
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
Install all required dependencies:
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
git clone https://github.com/OpenMOSS/MOSS-TTS.git
|
| 86 |
+
cd MOSS-TTS
|
| 87 |
+
pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e .
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
#### (Optional) Install FlashAttention 2
|
| 91 |
+
|
| 92 |
+
For better speed and lower GPU memory usage, you can install FlashAttention 2 if your hardware supports it.
|
| 93 |
+
|
| 94 |
+
```bash
|
| 95 |
+
pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e ".[flash-attn]"
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
If your machine has limited RAM and many CPU cores, you can cap build parallelism:
|
| 99 |
+
|
| 100 |
+
```bash
|
| 101 |
+
MAX_JOBS=4 pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e ".[flash-attn]"
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
Notes:
|
| 105 |
+
- Dependencies are managed in `pyproject.toml`, which currently pins `torch==2.9.1+cu128` and `torchaudio==2.9.1+cu128`.
|
| 106 |
+
- If FlashAttention 2 fails to build on your machine, you can skip it and use the default attention backend.
|
| 107 |
+
- FlashAttention 2 is only available on supported GPUs and is typically used with `torch.float16` or `torch.bfloat16`.
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
### Basic Usage
|
| 111 |
+
|
| 112 |
+
|
| 113 |
```python
|
| 114 |
import os
|
| 115 |
from pathlib import Path
|