VoiceSculptor-VD / docs /voice_design_en.md
ASLP-lab's picture
Upload 8 files (#1)
7dad981
# Voice Design Guide (Voice Design README)
## Overview
This guide provides instructions for creating high-quality **voice descriptions (voice_prompt)** to generate voices that meet specific requirements. The voice description serves as the blueprint for voice design and directly determines the quality of the generated voice.
---
## Technical Constraints
| Item | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------- |
| **Length Limit** | Each voice_prompt ≤ 200 characters |
| **Supported Languages** | Chinese only. English is not supported in the current version and will be added in future updates |
---
## Five Core Principles
### 1️⃣ Be Specific, Not Vague
**✅ Recommended**: Use perceptible and concrete voice attributes
* Pitch: low, high, bright, rich
* Speaking rate: fast, slow, rapid, steady
* Timbre: magnetic, husky, smooth, clear
**❌ Avoid**:
“Nice”, “normal”, “good” (too subjective and uninformative)
---
### 2️⃣ Multi-Dimensional, Not Single-Attribute
**✅ Recommended**: Combine at least 3–4 dimensions to create a vivid voice profile
* Persona (usage scenario) + gender + age + pitch + speaking rate + volume + timbre + emotion
**❌ Avoid**:
Only “female voice” or only “low-pitched” (too generic, lacks distinctiveness)
---
### 3️⃣ Objective, Not Subjective
**✅ Recommended**: Describe physical and acoustic characteristics
* “Slightly high-pitched with energetic delivery”
* “Slow speaking rate with clear articulation”
**❌ Avoid**:
“My favorite voice”, “This voice sounds great”
---
### 4️⃣ Original, Not Imitative
**⚠️ Copyright Notice**: Descriptions such as “sounds like XX celebrity” or “imitates XX actor” are prohibited
**✅ Recommended**: Describe voice characteristics directly rather than referencing specific individuals
---
### 5️⃣ Concise, Not Redundant
**✅ Recommended**: Ensure every word conveys meaningful information
**❌ Avoid**:
“Very, very good voice”, “Extremely, extremely gentle”
---
## Reference Dimensions for Voice Description
Based on high-quality examples, we recommend composing voice prompts using the following dimensions:
| Dimension | Example Options |
| ---------------------------- | ------------------------------------------------------------------------------------------------- |
| **Persona (Usage Scenario)** | News broadcasting, advertising voice-over, audiobooks, animated characters, documentary narration |
| **Gender** | Male, Female |
| **Age** | Child (~8 years), Young adult (20–30), Middle-aged (40–50), Elderly |
| **Personality Traits** | Lively, calm, gentle, intellectual, cute, serious |
| **Speaking Rate & Rhythm** | Fast, slow, moderate, urgent, steady |
| **Intonation Style** | Rising, neutral, passionate, relaxed |
| **Timbre** | Deep and magnetic, crisp and bright, husky and warm, youthful |
---
## High-Quality Examples
### ✅ Recommended Templates
**Example 1: Poetry Recitation**
> “A male modern poetry reciter with a deep, magnetic low voice, delivering poetry with strong rhythmic pauses, powerful volume, and intense emotional expression.”
**Example 2: News Style**
> “A female news anchor speaking standard Mandarin with a clear and bright mid-to-high pitch, steady professional pacing, strong volume, and a neutral, objective tone.”
**Example 3: Advertising Voice-Over**
> “A male voice for liquor brand advertising, featuring a rich and weathered timbre, slow and bold speaking rate, strong volume, conveying a sense of history and masculinity.”
---
## Common Mistakes and Improvements
| Type | ❌ Not Recommended | ✅ Improved Version |
| ------------------------- | -------------------------- | ---------------------------------------------------------------------------- |
| **Too Generic** | “Female voice, nice” | “Young female voice with a clear pitch and moderate speaking rate” |
| **Subjective Evaluation** | “A great-sounding voice” | “Bright timbre with strong expressiveness” |
| **Single Dimension** | “Low-pitched male voice” | “Middle-aged male with a low pitch, slow pacing, suitable for documentaries” |
| **Redundant Wording** | “Very, very gentle voice” | “Gentle and intellectual female voice” |
| **Imitation Request** | “Sounds like XX celebrity” | **Prohibited** — describe objective voice traits instead |
---
## Quick Checklist
Before submitting a voice_prompt, make sure that:
* [ ] Length ≤ 200 characters
* [ ] At least 3 different descriptive dimensions are included
* [ ] No subjective evaluation words (e.g., “nice”, “great”, “favorite”)
* [ ] No references to real individuals or imitation requests
* [ ] No repetitive or exaggerated wording
* [ ] Usage scenario is clearly defined
* [ ] All descriptors are perceptible and concrete
---