GENIE

Sleeping

File size: 14,340 Bytes

9bc318e

# 🎵 Genie TTS API 完整使用指南

## 🎯 概述

Genie TTS API 是基于 **GPT-SoVITS V2** 架构的日语文本转语音服务，现在已经完全模块化，提供以下核心功能：

- **🎤 基础TTS合成**: 使用预训练角色进行语音合成
- **🎭 语音克隆**: 基于参考音频快速创建个性化声音
- **📊 音频分析**: 智能分析音频质量和特征
- **🔄 批量处理**: 高效的批量文本转语音

## 🏗️ 模块化架构

```
genie/GENIE/
├── api.py              # 主API应用
├── api_routes.py       # 路由定义
├── voice_cloning.py    # 语音克隆核心功能
├── models.py           # 数据模型定义
├── tts_engine.py       # TTS引擎
├── config.py           # 配置文件
└── requirements.txt    # 依赖包
```

## 🚀 快速开始

### 基础信息

- **API基础URL**: `http://localhost:8000` (本地开发)
- **API文档**: `http://localhost:8000/docs` (Swagger UI)
- **ReDoc文档**: `http://localhost:8000/redoc`

### 启动服务

```bash
# 方法1: 直接运行API服务
python api.py

# 方法2: 与Gradio集成运行
python app.py  # 如果app.py中集成了API
```

## 📚 API端点详细说明

### 1. 基础功能

#### 根信息
```http
GET /
```

**响应示例**:
```json
{
  "name": "Genie TTS API",
  "version": "1.0.0",
  "description": "高质量日语文本转语音API服务",
  "engine": "GPT-SoVITS V2 (ONNX)",
  "supported_languages": ["ja"],
  "features": ["TTS合成", "语音克隆", "角色管理"],
  "docs": "/docs"
}
```

#### 健康检查
```http
GET /health
```

**响应示例**:
```json
{
  "status": "healthy",
  "version": "1.0.0",
  "engine_status": "ready",
  "available_characters": ["misono_mika"],
  "predefined_characters": 1,
  "custom_characters": 0,
  "cloned_voices": 2,
  "timestamp": "2024-01-01T12:00:00"
}
```

### 2. 文本转语音 (TTS)

#### 基础合成 (POST)
```http
POST /tts/synthesize
Content-Type: application/json

{
  "text": "こんにちは、世界！",
  "character": "misono_mika",
  "speed": 1.0,
  "format": "wav"
}
```

#### 快速合成 (GET)
```http
GET /tts/synthesize?text=こんにちは&character=misono_mika
```

#### 批量合成
```http
POST /tts/batch-synthesize
Content-Type: application/json

[
  {"text": "おはようございます", "character": "misono_mika"},
  {"text": "こんばんは", "character": "misono_mika"}
]
```

### 3. 语音克隆

#### 分析参考音频
```http
POST /voice-clone/analyze-audio?audio_url=/path/to/audio.wav
```

**响应示例**:
```json
{
  "success": true,
  "analysis": {
    "duration": 5.2,
    "sample_rate": 22050,
    "quality_score": 0.85,
    "recommendations": []
  },
  "timestamp": "2024-01-01T12:00:00"
}
```

#### 创建克隆声音
```http
POST /voice-clone/create
Content-Type: application/json

{
  "voice_name": "my_voice",
  "reference_audio_url": "/path/to/reference.wav",
  "reference_text": "参考音频中的日语文本内容",
  "description": "我的个人声音克隆"
}
```

#### 获取克隆声音列表
```http
GET /voice-clone/list
```

**响应示例**:
```json
{
  "success": true,
  "message": "获取到 2 个克隆声音",
  "cloned_voices": {
    "my_voice": {
      "name": "my_voice",
      "description": "我的个人声音克隆",
      "reference_text": "参考音频中的日语文本...",
      "quality_score": 0.85,
      "duration": 5.2,
      "created_time": "2024-01-01T12:00:00"
    }
  },
  "count": 1,
  "timestamp": "2024-01-01T12:00:00"
}
```

#### 使用克隆声音合成
```http
POST /voice-clone/synthesize?voice_name=my_voice&text=こんにちは、今日はいい天気ですね
```

#### 删除克隆声音
```http
DELETE /voice-clone/my_voice
```

### 4. 角色管理

#### 获取所有角色
```http
GET /characters/
```

### 5. 音频文件下载

#### 获取生成的音频
```http
GET /audio/{filename}
```

## 💻 编程示例

### Python 完整示例

```python
import requests
import json
from pathlib import Path

class GenieTTSClient:
    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url.rstrip('/')
        
    def health_check(self):
        """检查API健康状态"""
        response = requests.get(f"{self.base_url}/health")
        return response.json()
    
    def basic_tts(self, text, character="misono_mika"):
        """基础文本转语音"""
        data = {
            "text": text,
            "character": character,
            "speed": 1.0,
            "format": "wav"
        }
        
        response = requests.post(f"{self.base_url}/tts/synthesize", json=data)
        result = response.json()
        
        if result["success"]:
            # 下载音频文件
            audio_url = result["audio_url"]
            audio_response = requests.get(f"{self.base_url}{audio_url}")
            
            filename = f"tts_output_{result['timestamp'][:19].replace(':', '-')}.wav"
            with open(filename, "wb") as f:
                f.write(audio_response.content)
            
            print(f"✅ TTS合成成功，保存为: {filename}")
            return filename
        else:
            print(f"❌ TTS合成失败: {result.get('error', '未知错误')}")
            return None
    
    def analyze_audio(self, audio_path):
        """分析音频质量"""
        params = {"audio_url": str(Path(audio_path).absolute())}
        response = requests.post(f"{self.base_url}/voice-clone/analyze-audio", params=params)
        return response.json()
    
    def create_cloned_voice(self, voice_name, audio_path, reference_text, description=""):
        """创建克隆声音"""
        data = {
            "voice_name": voice_name,
            "reference_audio_url": str(Path(audio_path).absolute()),
            "reference_text": reference_text,
            "description": description
        }
        
        response = requests.post(f"{self.base_url}/voice-clone/create", json=data)
        result = response.json()
        
        if result["success"]:
            print(f"✅ 克隆声音创建成功: {voice_name}")
        else:
            print(f"❌ 克隆声音创建失败: {result.get('error', '未知错误')}")
        
        return result
    
    def clone_voice_tts(self, voice_name, text):
        """使用克隆声音进行合成"""
        params = {"voice_name": voice_name, "text": text}
        response = requests.post(f"{self.base_url}/voice-clone/synthesize", params=params)
        result = response.json()
        
        if result["success"]:
            # 下载音频文件
            audio_url = result["audio_url"]
            audio_response = requests.get(f"{self.base_url}{audio_url}")
            
            filename = f"cloned_voice_{voice_name}_{result['timestamp'][:19].replace(':', '-')}.wav"
            with open(filename, "wb") as f:
                f.write(audio_response.content)
            
            print(f"✅ 克隆声音合成成功，保存为: {filename}")
            return filename
        else:
            print(f"❌ 克隆声音合成失败: {result.get('error', '未知错误')}")
            return None
    
    def list_cloned_voices(self):
        """获取所有克隆声音"""
        response = requests.get(f"{self.base_url}/voice-clone/list")
        return response.json()


# 使用示例
def main():
    client = GenieTTSClient()
    
    # 1. 健康检查
    print("1. 检查API状态...")
    health = client.health_check()
    print(f"   状态: {health['status']}")
    print(f"   引擎: {health['engine_status']}")
    print(f"   可用角色: {health['available_characters']}")
    
    # 2. 基础TTS合成
    print("\n2. 基础TTS合成...")
    tts_file = client.basic_tts("こんにちは、今日はいい天気ですね。")
    
    # 3. 语音克隆（需要准备参考音频）
    reference_audio = "reference_voice.wav"  # 替换为实际音频文件路径
    reference_text = "参考音频中的日语文本"    # 替换为实际文本
    
    if Path(reference_audio).exists():
        print(f"\n3. 分析参考音频: {reference_audio}")
        analysis = client.analyze_audio(reference_audio)
        print(f"   音频质量分数: {analysis.get('analysis', {}).get('quality_score', 'N/A')}")
        
        print("\n4. 创建克隆声音...")
        clone_result = client.create_cloned_voice(
            voice_name="test_voice",
            audio_path=reference_audio,
            reference_text=reference_text,
            description="测试用克隆声音"
        )
        
        if clone_result.get("success"):
            print("\n5. 使用克隆声音合成...")
            cloned_file = client.clone_voice_tts(
                "test_voice", 
                "これはクローン音声のテストです。"
            )
    
    # 6. 查看所有克隆声音
    print("\n6. 查看所有克隆声音...")
    voices = client.list_cloned_voices()
    if voices.get("success"):
        print(f"   共有 {voices['count']} 个克隆声音:")
        for name, info in voices.get("cloned_voices", {}).items():
            print(f"   - {name}: {info['description']}")


if __name__ == "__main__":
    main()
```

### JavaScript/Node.js 示例

```javascript
const axios = require('axios');

class GenieTTSClient {
    constructor(baseUrl = 'http://localhost:8000') {
        this.baseUrl = baseUrl.replace(/\/+$/, '');
    }
    
    async healthCheck() {
        const response = await axios.get(`${this.baseUrl}/health`);
        return response.data;
    }
    
    async basicTTS(text, character = 'misono_mika') {
        try {
            const response = await axios.post(`${this.baseUrl}/tts/synthesize`, {
                text: text,
                character: character,
                speed: 1.0,
                format: 'wav'
            });
            
            if (response.data.success) {
                console.log('✅ TTS合成成功:', response.data.audio_url);
                return response.data;
            } else {
                console.error('❌ TTS合成失败:', response.data.error);
            }
        } catch (error) {
            console.error('请求失败:', error.message);
        }
    }
    
    async createClonedVoice(voiceName, audioPath, referenceText, description = '') {
        try {
            const response = await axios.post(`${this.baseUrl}/voice-clone/create`, {
                voice_name: voiceName,
                reference_audio_url: audioPath,
                reference_text: referenceText,
                description: description
            });
            
            if (response.data.success) {
                console.log('✅ 克隆声音创建成功:', voiceName);
            } else {
                console.error('❌ 克隆声音创建失败:', response.data.error);
            }
            
            return response.data;
        } catch (error) {
            console.error('请求失败:', error.message);
        }
    }
    
    async synthesizeWithClonedVoice(voiceName, text) {
        try {
            const response = await axios.post(`${this.baseUrl}/voice-clone/synthesize`, null, {
                params: { voice_name: voiceName, text: text }
            });
            
            if (response.data.success) {
                console.log('✅ 克隆声音合成成功:', response.data.audio_url);
                return response.data;
            } else {
                console.error('❌ 克隆声音合成失败:', response.data.error);
            }
        } catch (error) {
            console.error('请求失败:', error.message);
        }
    }
}

// 使用示例
async function main() {
    const client = new GenieTTSClient();
    
    // 1. 健康检查
    console.log('1. 检查API状态...');
    const health = await client.healthCheck();
    console.log('   状态:', health.status);
    
    // 2. 基础TTS
    console.log('\n2. 基础TTS合成...');
    await client.basicTTS('こんにちは、世界！');
    
    // 3. 语音克隆
    console.log('\n3. 创建克隆声音...');
    await client.createClonedVoice(
        'test_voice',
        '/path/to/reference.wav',
        '参考音频中的文本',
        '测试克隆声音'
    );
    
    // 4. 使用克隆声音
    console.log('\n4. 使用克隆声音合成...');
    await client.synthesizeWithClonedVoice('test_voice', 'テストです。');
}

main().catch(console.error);
```

## ⚠️ 重要注意事项

### 语音克隆要求
1. **音频质量**: 清晰、无噪音、22kHz以上采样率
2. **音频长度**: 3-30秒最佳
3. **语言**: 仅支持日语
4. **文本匹配**: 参考文本必须与音频内容完全匹配

### 性能考量
- **首次使用**: 需要下载模型（约30秒）
- **语音合成**: 每次5-10秒
- **克隆声音**: 创建时间2-5秒
- **并发处理**: 建议控制在5个以内

### 存储说明
- 音频文件临时存储在系统临时目录
- 克隆声音配置持久化保存
- 定期清理过期的音频文件

## 🔧 开发部署

### 本地开发
```bash
pip install -r requirements.txt
python api.py
```

### 生产部署
```bash
# 使用uvicorn
uvicorn api:api_app --host 0.0.0.0 --port 8000 --workers 1

# 使用gunicorn
pip install gunicorn[uvicorn]
gunicorn api:api_app -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
```

### Docker 部署
```dockerfile
FROM python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "api.py"]
```

## 🐛 故障排除

### 常见问题

1. **引擎初始化失败**
   - 检查onnxruntime是否正确安装
   - 确认网络连接正常（需要下载模型）

2. **音频分析失败**
   - 检查音频文件格式（支持wav, flac, ogg等）
   - 确认文件路径正确且有读取权限

3. **语音克隆效果不好**
   - 提高参考音频质量
   - 确保参考文本准确匹配
   - 尝试不同长度的参考音频

### 调试模式
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## 📞 技术支持

- **官方仓库**: [Genie TTS](https://github.com/High-Logic/Genie)
- **问题反馈**: 提交GitHub Issue
- **讨论区**: GitHub Discussions

---

**版本**: 1.0.0 | **最后更新**: 2024年1月