YWMditto commited on
Commit
0617044
·
1 Parent(s): d48f44d

update readme

Browse files
Files changed (1) hide show
  1. README.md +49 -7
README.md CHANGED
@@ -23,15 +23,15 @@ When a single piece of audio needs to **sound like a real person**, **pronounce
23
  - **MOSS‑TTS‑Realtime**: MOSS-TTS-Realtime is a context-aware, multi-turn streaming TTS model for real-time voice agents. By conditioning on dialogue history across both text and prior user acoustics, it delivers low-latency synthesis with coherent, consistent voice responses across turns.
24
 
25
  ## Released Models
26
-
27
  | Model | Architecture | Size | Model Card | Hugging Face |
28
  |---|---|---:|---|---|
29
- | **MOSS-TTS** | MossTTSDelay | 8B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS) |
30
- | | MossTTSLocal | 1.7B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer) |
31
- | **MOSS‑TTSD‑V1.0** | MossTTSDelay | 8B | [moss_ttsd_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_ttsd_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTSD-v1.0) |
32
- | **MOSS‑VoiceGenerator** | MossTTSDelay | 1.7B | [moss_voice_generator_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_voice_generator_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-Voice-Generator) |
33
- | **MOSS‑SoundEffect** | MossTTSDelay | 8B | [moss_sound_effect_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_sound_effect_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect) |
34
- | **MOSS‑TTS‑Realtime** | MossTTSRealtime | 1.7B | [moss_tts_realtime_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/moss_tts_realtime_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime) |
 
35
 
36
  <br>
37
 
@@ -83,6 +83,48 @@ MOSS-TTSD is built on top of **Architecture A: Delay Pattern (MossTTSDelay)** fr
83
 
84
  ## 2. Quick Start
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  MOSS-TTSD uses a **continuation** workflow: provide reference audio for each speaker, their transcripts as a prefix, and the dialogue text to generate. The model continues in each speaker's identity.
87
 
88
  ```python
 
23
  - **MOSS‑TTS‑Realtime**: MOSS-TTS-Realtime is a context-aware, multi-turn streaming TTS model for real-time voice agents. By conditioning on dialogue history across both text and prior user acoustics, it delivers low-latency synthesis with coherent, consistent voice responses across turns.
24
 
25
  ## Released Models
 
26
  | Model | Architecture | Size | Model Card | Hugging Face |
27
  |---|---|---:|---|---|
28
+ | **MOSS-TTS** | MossTTSDelay | 8B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS) |
29
+ | | MossTTSLocal | 1.7B | [moss_tts_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Local-Transformer) |
30
+ | **MOSS‑TTSD‑V1.0** | MossTTSDelay | 8B | [moss_ttsd_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_ttsd_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTSD-v1.0) |
31
+ | **MOSS‑VoiceGenerator** | MossTTSDelay | 1.7B | [moss_voice_generator_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_voice_generator_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-Voice-Generator) |
32
+ | **MOSS‑SoundEffect** | MossTTSDelay | 8B | [moss_sound_effect_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_sound_effect_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect) |
33
+ | **MOSS‑TTS‑Realtime** | MossTTSRealtime | 1.7B | [moss_tts_realtime_model_card.md](https://github.com/OpenMOSS/MOSS-TTS/blob/main/docs/moss_tts_realtime_model_card.md) | 🤗 [Huggingface](https://huggingface.co/OpenMOSS-Team/MOSS-TTS-Realtime) |
34
+
35
 
36
  <br>
37
 
 
83
 
84
  ## 2. Quick Start
85
 
86
+
87
+
88
+ ### Environment Setup
89
+
90
+ We recommend a clean, isolated Python environment with **Transformers 5.0.0** to avoid dependency conflicts.
91
+
92
+ ```bash
93
+ conda create -n moss-tts python=3.12 -y
94
+ conda activate moss-tts
95
+ ```
96
+
97
+ Install all required dependencies:
98
+
99
+ ```bash
100
+ git clone https://github.com/OpenMOSS/MOSS-TTS.git
101
+ cd MOSS-TTS
102
+ pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e .
103
+ ```
104
+
105
+ #### (Optional) Install FlashAttention 2
106
+
107
+ For better speed and lower GPU memory usage, you can install FlashAttention 2 if your hardware supports it.
108
+
109
+ ```bash
110
+ pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e ".[flash-attn]"
111
+ ```
112
+
113
+ If your machine has limited RAM and many CPU cores, you can cap build parallelism:
114
+
115
+ ```bash
116
+ MAX_JOBS=4 pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e ".[flash-attn]"
117
+ ```
118
+
119
+ Notes:
120
+ - Dependencies are managed in `pyproject.toml`, which currently pins `torch==2.9.1+cu128` and `torchaudio==2.9.1+cu128`.
121
+ - If FlashAttention 2 fails to build on your machine, you can skip it and use the default attention backend.
122
+ - FlashAttention 2 is only available on supported GPUs and is typically used with `torch.float16` or `torch.bfloat16`.
123
+
124
+
125
+ ### Basic Usage
126
+
127
+
128
  MOSS-TTSD uses a **continuation** workflow: provide reference audio for each speaker, their transcripts as a prefix, and the dialogue text to generate. The model continues in each speaker's identity.
129
 
130
  ```python