Abdelrahman2922's picture
Add files using upload-large-folder tool
a4d9876 verified
|
Raw
History Blame Contribute Delete
3.19 kB
# Voice Design
Voice Design mode lets you describe the desired speaker through speaker attributes (`instruct` parameter) — no reference audio needed. The model
generates a matching voice on the fly.
## Quick Example
```python
import torch
from omnivoice import OmniVoice
model = OmniVoice.from_pretrained(
"k2-fsa/OmniVoice",
device_map="cuda:0",
dtype=torch.float16
)
audio = model.generate(
text="This is a test for voice design.",
instruct="female, young adult, high pitch, british accent",
)
```
## How It Works
The `instruct` parameter accepts a comma-separated string of speaker attributes.
Each attribute belongs to a **category** (gender, age, pitch, style, accent,
or dialect). Within a category, only one attribute may be selected at a time.
Attributes from different categories can be freely combined.
The model auto-detects the language of the instruct text and normalises it
internally — you can write in English, Chinese, or a mix of both.
## Supported Attributes
### Gender
| English | Chinese |
|---------|---------|
| male | 男 |
| female | 女 |
### Age
| English | Chinese |
|---------|---------|
| child | 儿童 |
| teenager | 少年 |
| young adult | 青年 |
| middle-aged | 中年 |
| elderly | 老年 |
### Pitch
| English | Chinese |
|---------|---------|
| very low pitch | 极低音调 |
| low pitch | 低音调 |
| moderate pitch | 中音调 |
| high pitch | 高音调 |
| very high pitch | 极高音调 |
### Style
| English | Chinese |
|---------|---------|
| whisper | 耳语 |
### English Accent
Only effective when the synthesis text is in English.
| Accent |
|--------|
| american accent |
| british accent |
| australian accent |
| canadian accent |
| indian accent |
| chinese accent |
| korean accent |
| japanese accent |
| portuguese accent |
| russian accent |
### Chinese Dialect
Only effective when the synthesis text is in Chinese.
| Dialect |
|---------|
| 河南话 |
| 陕西话 |
| 四川话 |
| 贵州话 |
| 云南话 |
| 桂林话 |
| 济南话 |
| 石家庄话 |
| 甘肃话 |
| 宁夏话 |
| 青岛话 |
| 东北话 |
## Writing Instruct Strings
Separate attributes with commas (half-width `,` for English, full-width `,`
for Chinese — the model auto-fixes mismatches).
```
# English
"female, young adult, high pitch, british accent"
# Chinese
"女,青年,高音调,四川话"
# Mixed (auto-normalised)
"female, young adult, 四川话"
```
### Tips
- **Combine freely** across categories: `"male, elderly, low pitch, whisper"`.
- **Leave it to the model**: omit attributes you don't care about — the model
fills in the rest. For example `"female"` alone is valid.
- **Case-insensitive**: `"Male"`, `"MALE"`, and `"male"` are all accepted, the code will normalize them to lower case.
- **Accent vs Dialect**: English accents are only applied to English speech, Chinese dialects are only applied to Chinese speech.
- **Attribute combinations**: Due to training data limitations, some attribute combinations may not work well — the model may ignore certain attributes in a combination. If the output doesn't match your expectation, try simplifying the instruct string.