Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio new

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Pi new

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jun 4, 2025

Commit

8f823b0

1 Parent(s): d91a26b

Introduce initial API structure for VoiceDialogue: add dependencies, middleware, and routes for ASR, TTS, system, and voice modules.

Browse files

Files changed (20) hide show

src/VoiceDialogue/api/__init__.py +0 -0
src/VoiceDialogue/api/app.py +127 -0
src/VoiceDialogue/api/dependencies/__init__.py +7 -0
src/VoiceDialogue/api/dependencies/audio_deps.py +61 -0
src/VoiceDialogue/api/dependencies/model_deps.py +79 -0
src/VoiceDialogue/api/middleware/__init__.py +4 -0
src/VoiceDialogue/api/middleware/logging.py +38 -0
src/VoiceDialogue/api/middleware/rate_limit.py +39 -0
src/VoiceDialogue/api/routes/__init__.py +3 -0
src/VoiceDialogue/api/routes/asr_routes.py +193 -0
src/VoiceDialogue/api/routes/system_routes.py +163 -0
src/VoiceDialogue/api/routes/tts_routes.py +172 -0
src/VoiceDialogue/api/routes/voice_routes.py +163 -0
src/VoiceDialogue/api/schemas/__init__.py +17 -0
src/VoiceDialogue/api/schemas/asr_schemas.py +72 -0
src/VoiceDialogue/api/schemas/system_schemas.py +31 -0
src/VoiceDialogue/api/schemas/tts_schemas.py +52 -0
src/VoiceDialogue/api/schemas/voice_schemas.py +49 -0
src/VoiceDialogue/api/server.py +53 -0
src/VoiceDialogue/main.py +129 -17

src/VoiceDialogue/api/__init__.py ADDED Viewed

File without changes

src/VoiceDialogue/api/app.py ADDED Viewed

	@@ -0,0 +1,127 @@

+import logging
+from contextlib import asynccontextmanager
+from typing import Dict, Any
+from fastapi import FastAPI, HTTPException, APIRouter
+from fastapi.middleware.cors import CORSMiddleware
+from .middleware.logging import LoggingMiddleware
+from .middleware.rate_limit import RateLimitMiddleware
+from .routes import tts_routes, asr_routes
+# 配置日志
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+# 全局状态存储
+app_state: Dict[str, Any] = {}
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """应用启动和关闭的生命周期管理"""
+    # 启动时的初始化
+    logger.info("正在启动VoiceDialogue API服务...")
+    # 初始化TTS配置注册表
+    try:
+        from services.audio.audio_generator import tts_config_registry
+        logger.info(f"已加载 {len(tts_config_registry.get_all_configs())} 个TTS配置")
+        app_state["tts_configs_loaded"] = True
+    except Exception as e:
+        logger.error(f"TTS配置加载失败: {e}")
+        app_state["tts_configs_loaded"] = False
+    app_state["system_running"] = True
+    logger.info("VoiceDialogue API服务启动完成")
+    yield
+    # 关闭时的清理
+    logger.info("正在关闭VoiceDialogue API服务...")
+    app_state["system_running"] = False
+    logger.info("VoiceDialogue API服务已关闭")
+# 创建FastAPI应用
+app = FastAPI(
+    title="VoiceDialogue API",
+    description="""
+    语音对话系统的HTTP API接口
+    ## 功能特性
+    * **TTS模型管理**: 查看、加载、删除TTS模型
+    * **模型状态监控**: 实时监控模型下载和加载状态
+    * **RESTful API**: 标准的REST接口设计
+    * **自动文档**: 自动生成的API文档和测试界面
+    ## 使用方法
+    1. 查看所有可用的TTS模型: `GET /api/v1/tts/models`
+    2. 加载指定模型: `POST /api/v1/tts/models/load`
+    3. 查看模型状态: `GET /api/v1/tts/models/{model_id}/status`
+    4. 删除模型: `DELETE /api/v1/tts/models/{model_id}`
+    """,
+    version="1.0.0",
+    docs_url="/docs",
+    redoc_url="/redoc",
+    lifespan=lifespan,
+)
+# 添加CORS中间件
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # 生产环境中应该设置具体的域名
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# 添加自定义中间件
+app.add_middleware(LoggingMiddleware)
+app.add_middleware(RateLimitMiddleware)
+# 注册路由
+v1_router = APIRouter(prefix="/api/v1")
+# v1_router.include_router(voice_routes.router, prefix="/voice", tags=["语音处理"])
+# v1_router.include_router(system_routes.router, prefix="/system", tags=["系统控制"])
+v1_router.include_router(tts_routes.router, prefix="/tts", tags=["TTS模型管理"])
+v1_router.include_router(asr_routes.router, prefix="/asr", tags=["ASR模型管理"])
+app.include_router(v1_router)
+@app.get("/")
+async def root():
+    """根路径健康检查"""
+    return {
+        "message": "欢迎使用VoiceDialogue API",
+        "status": "running",
+        "version": "1.0.0",
+        "docs_url": "/docs",
+        "redoc_url": "/redoc"
+    }
+@app.get("/health")
+async def health_check():
+    """健康检查端点"""
+    return {
+        "status": "healthy",
+        "tts_configs_loaded": app_state.get("tts_configs_loaded", False),
+        "system_running": app_state.get("system_running", False),
+        "available_models": len(app_state.get("available_models", []))
+    }
+# 全局异常处理器
+@app.exception_handler(Exception)
+async def global_exception_handler(request, exc):
+    logger.error(f"未处理的异常: {exc}", exc_info=True)
+    return HTTPException(
+        status_code=500,
+        detail="内部服务器错误"
+    )

src/VoiceDialogue/api/dependencies/__init__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+from .audio_deps import decode_audio_data, encode_audio_data, validate_audio_format
+from .model_deps import get_language_model, get_voice_model, ensure_model_loaded
+__all__ = [
+    "decode_audio_data", "encode_audio_data", "validate_audio_format",
+    "get_language_model", "get_voice_model", "ensure_model_loaded"
+]

src/VoiceDialogue/api/dependencies/audio_deps.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import base64
+import numpy as np
+from fastapi import HTTPException, Depends
+def decode_audio_data(audio_data: str) -> np.ndarray:
+    """解码Base64音频数据"""
+    try:
+        # 解码Base64数据
+        decoded_data = base64.b64decode(audio_data)
+        # 转换为numpy数组 (假设是16-bit PCM格式)
+        audio_array = np.frombuffer(decoded_data, dtype=np.int16)
+        # 转换为float32格式，范围[-1, 1]
+        audio_array = audio_array.astype(np.float32) / 32768.0
+        return audio_array
+    except Exception as e:
+        raise HTTPException(
+            status_code=400,
+            detail=f"音频数据解码失败: {str(e)}"
+        )
+def encode_audio_data(audio_array: np.ndarray, sample_rate: int = 16000) -> str:
+    """编码音频数据为Base64"""
+    try:
+        # 转换为16-bit PCM格式
+        audio_int16 = (audio_array * 32767).astype(np.int16)
+        # 转换为字节
+        audio_bytes = audio_int16.tobytes()
+        # Base64编码
+        encoded_data = base64.b64encode(audio_bytes).decode('utf-8')
+        return encoded_data
+    except Exception as e:
+        raise HTTPException(
+            status_code=500,
+            detail=f"音频数据编码失败: {str(e)}"
+        )
+def validate_audio_format(audio_array: np.ndarray) -> bool:
+    """验证音频格式"""
+    if len(audio_array) == 0:
+        raise HTTPException(
+            status_code=400,
+            detail="音频数据为空"
+        )
+    if len(audio_array) > 16000 * 30:  # 30秒限制
+        raise HTTPException(
+            status_code=400,
+            detail="音频时长超过30秒限制"
+        )
+    return True

src/VoiceDialogue/api/dependencies/model_deps.py ADDED Viewed

	@@ -0,0 +1,79 @@

+import logging
+from typing import Optional, Dict, Any
+from fastapi import HTTPException, Depends
+logger = logging.getLogger(__name__)
+# 模拟的全局模型状态
+_loaded_models: Dict[str, Any] = {}
+def get_language_model(model_name: Optional[str] = None):
+    """获取语言模型依赖"""
+    try:
+        # 这里应该从实际的模型注册表中获取
+        from ...models.language_model import language_model_registry
+        if model_name:
+            # 根据名称查找特定模型
+            for model in language_model_registry:
+                if model.name == model_name:
+                    return model
+            raise HTTPException(
+                status_code=404,
+                detail=f"未找到名为 {model_name} 的语言模型"
+            )
+        else:
+            # 返回默认模型 (14B)
+            return language_model_registry[-2]
+    except ImportError:
+        raise HTTPException(
+            status_code=500,
+            detail="语言模型模块导入失败"
+        )
+def get_voice_model(speaker_name: str = "沈逸"):
+    """获取语音模型依赖"""
+    try:
+        from services.audio.audio_generator.voice_model import voice_model_registry
+        speaker_mapping = {
+            '罗翔': 0,
+            '马保国': 1,
+            '沈逸': 2,
+            '杨幂': 3,
+            '周杰伦': 4,
+            '马云': 5,
+        }
+        index = speaker_mapping.get(speaker_name, 2)  # 默认沈逸
+        if index < len(voice_model_registry):
+            return voice_model_registry[index]
+        else:
+            raise HTTPException(
+                status_code=404,
+                detail=f"未找到语音角色: {speaker_name}"
+            )
+    except ImportError:
+        raise HTTPException(
+            status_code=500,
+            detail="语音模型模块导入失败"
+        )
+def ensure_model_loaded(model):
+    """确保模型已加载"""
+    try:
+        if not hasattr(model, 'is_loaded') or not model.is_loaded:
+            model.download_model()
+            _loaded_models[model.name] = model
+        return model
+    except Exception as e:
+        logger.error(f"模型加载失败: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail=f"模型加载失败: {str(e)}"
+        )

src/VoiceDialogue/api/middleware/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from .logging import LoggingMiddleware
+from .rate_limit import RateLimitMiddleware
+__all__ = ["LoggingMiddleware", "RateLimitMiddleware"]

src/VoiceDialogue/api/middleware/logging.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import logging
+import time
+from fastapi import Request, Response
+from starlette.middleware.base import BaseHTTPMiddleware
+logger = logging.getLogger(__name__)
+class LoggingMiddleware(BaseHTTPMiddleware):
+    """请求日志中间件"""
+    async def dispatch(self, request: Request, call_next):
+        start_time = time.time()
+        # 记录请求信息
+        logger.info(
+            f"请求开始: {request.method} {request.url.path} "
+            f"客户端: {request.client.host if request.client else 'unknown'}"
+        )
+        # 处理请求
+        response = await call_next(request)
+        # 计算处理时间
+        process_time = time.time() - start_time
+        # 记录响应信息
+        logger.info(
+            f"请求完成: {request.method} {request.url.path} "
+            f"状态码: {response.status_code} "
+            f"处理时间: {process_time:.3f}s"
+        )
+        # 添加处理时间到响应头
+        response.headers["X-Process-Time"] = str(process_time)
+        return response

src/VoiceDialogue/api/middleware/rate_limit.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import time
+from collections import defaultdict
+from fastapi import Request, HTTPException
+from starlette.middleware.base import BaseHTTPMiddleware
+class RateLimitMiddleware(BaseHTTPMiddleware):
+    """API限流中间件"""
+    def __init__(self, app, calls_per_minute: int = 60):
+        super().__init__(app)
+        self.calls_per_minute = calls_per_minute
+        self.calls = defaultdict(list)
+    async def dispatch(self, request: Request, call_next):
+        client_ip = request.client.host if request.client else "unknown"
+        current_time = time.time()
+        # 清理过期的调用记录
+        minute_ago = current_time - 60
+        self.calls[client_ip] = [
+            call_time for call_time in self.calls[client_ip]
+            if call_time > minute_ago
+        ]
+        # 检查是否超过限制
+        if len(self.calls[client_ip]) >= self.calls_per_minute:
+            raise HTTPException(
+                status_code=429,
+                detail=f"请求频率过高，每分钟最多允许 {self.calls_per_minute} 次请求"
+            )
+        # 记录本次调用
+        self.calls[client_ip].append(current_time)
+        # 处理请求
+        response = await call_next(request)
+        return response

src/VoiceDialogue/api/routes/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from . import tts_routes, asr_routes
2	+
3	+ __all__ = ["tts_routes", "asr_routes"]

src/VoiceDialogue/api/routes/asr_routes.py ADDED Viewed

	@@ -0,0 +1,193 @@

+import logging
+from fastapi import APIRouter, HTTPException
+from services.speech.asr import asr_manager
+from ..schemas.asr_schemas import (
+    SupportedLanguagesResponse, ASRInstanceRequest, ASRInstanceResponse, LanguageMappingRequest,
+    LanguageMappingResponse, ASRValidationRequest, ASRValidationResponse, CleanupResponse
+)
+logger = logging.getLogger(__name__)
+router = APIRouter()
+@router.get("/languages", response_model=SupportedLanguagesResponse, summary="获取支持的识别语言")
+async def get_supported_languages():
+    """
+    获取系统支持的语音识别语言列表，包括语言映射和可用引擎
+    """
+    try:
+        available_languages = asr_manager.get_available_languages()
+        language_mappings = asr_manager._language_to_asr_mapping
+        asr_engines = list(asr_manager.list_registered_asr().keys())
+        return SupportedLanguagesResponse(
+            languages=available_languages,
+            language_mappings=language_mappings,
+            asr_engines=asr_engines
+        )
+    except Exception as e:
+        logger.error(f"获取支持语言列表失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"获取支持语言列表失败: {str(e)}")
+@router.post("/instance/create", response_model=ASRInstanceResponse, summary="创建ASR实例")
+async def create_asr_instance(request: ASRInstanceRequest):
+    """
+    根据指定语言创建新的ASR实例
+    """
+    try:
+        # 获取最优的ASR引擎
+        asr_type = asr_manager._get_asr_type_for_language(request.language)
+        # 创建实例
+        instance = asr_manager.create_asr(request.language)
+        return ASRInstanceResponse(
+            success=True,
+            message=f"成功创建ASR实例",
+            language=request.language,
+            asr_type=asr_type,
+            instance_id=f"{asr_type}_{request.language}"
+        )
+    except ValueError as e:
+        logger.warning(f"创建ASR实例失败 - 参数错误: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.error(f"创建ASR实例失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"创建ASR实例失败: {str(e)}")
+@router.post("/instance/get-or-create", response_model=ASRInstanceResponse, summary="获取或创建ASR实例")
+async def get_or_create_asr_instance(request: ASRInstanceRequest):
+    """
+    获取现有的ASR实例，如果不存在则创建新实例（单例模式）
+    """
+    try:
+        # 获取最优的ASR引擎
+        asr_type = asr_manager._get_asr_type_for_language(request.language)
+        # 获取或创建实例
+        instance = asr_manager.get_or_create_asr(request.language)
+        return ASRInstanceResponse(
+            success=True,
+            message=f"成功获取ASR实例",
+            language=request.language,
+            asr_type=asr_type,
+            instance_id=f"{asr_type}_{request.language}"
+        )
+    except ValueError as e:
+        logger.warning(f"获取ASR实例失败 - 参数错误: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.error(f"获取ASR实例失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"获取ASR实例失败: {str(e)}")
+@router.post("/mapping", response_model=LanguageMappingResponse, summary="配置语言映射")
+async def set_language_mapping(request: LanguageMappingRequest):
+    """
+    设置特定语言使用的ASR引擎
+    """
+    try:
+        asr_manager.set_language_mapping(request.language, request.asr_type)
+        return LanguageMappingResponse(
+            success=True,
+            message=f"成功设置语言映射: {request.language} -> {request.asr_type}",
+            updated_mapping=asr_manager._language_to_asr_mapping.copy()
+        )
+    except ValueError as e:
+        logger.warning(f"设置语言映射失败 - 参数错误: {e}")
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.error(f"设置语言映射失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"设置语言映射失败: {str(e)}")
+@router.post("/validate", response_model=ASRValidationResponse, summary="验证语言支持")
+async def validate_language_support(request: ASRValidationRequest):
+    """
+    验证指定语言是否被支持，并返回相关信息
+    """
+    try:
+        is_supported = asr_manager.validate_language_support(request.language)
+        optimal_asr = asr_manager.get_optimal_asr_for_language(request.language)
+        # 查找支持该语言的所有ASR引擎
+        available_asrs = []
+        supported_langs = asr_manager.get_supported_languages()
+        for asr_key, languages in supported_langs.items():
+            if request.language in languages:
+                available_asrs.append(asr_key)
+        return ASRValidationResponse(
+            language=request.language,
+            is_supported=is_supported,
+            optimal_asr=optimal_asr,
+            available_asrs=available_asrs
+        )
+    except Exception as e:
+        logger.error(f"验证语言支持失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"验证语言支持失败: {str(e)}")
+@router.get("/validate/{language}", response_model=ASRValidationResponse, summary="验证语言支持(GET)")
+async def validate_language_support_get(language: str):
+    """
+    通过GET方法验证指定语言是否被支持
+    """
+    request = ASRValidationRequest(language=language)
+    return await validate_language_support(request)
+@router.delete("/cleanup", response_model=CleanupResponse, summary="清理ASR实例")
+async def cleanup_asr_instances():
+    """
+    清理所有活动的ASR实例，释放资源
+    """
+    try:
+        # 记录清理前的实例数量
+        stats = asr_manager.get_asr_statistics()
+        instances_count = stats['active_instances_count']
+        # 执行清理
+        asr_manager.cleanup()
+        return CleanupResponse(
+            success=True,
+            message="成功清理所有ASR实例",
+            cleared_instances_count=instances_count
+        )
+    except Exception as e:
+        logger.error(f"清理ASR实例失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"清理ASR实例失败: {str(e)}")
+@router.get("/health", summary="ASR服务健康检查")
+async def asr_health_check():
+    """
+    ASR服务的健康检查接口
+    """
+    try:
+        stats = asr_manager.get_asr_statistics()
+        # 检查是否有已注册的ASR引擎
+        is_healthy = stats['registered_asr_count'] > 0
+        return {
+            "healthy": is_healthy,
+            "message": "ASR服务正常" if is_healthy else "没有可用的ASR引擎",
+            "registered_engines": stats['registered_asr_count'],
+            "active_instances": stats['active_instances_count'],
+            "supported_languages": len(stats['supported_languages'])
+        }
+    except Exception as e:
+        logger.error(f"ASR健康检查失败: {e}", exc_info=True)
+        return {
+            "healthy": False,
+            "message": f"ASR服务异常: {str(e)}",
+            "error": str(e)
+        }

src/VoiceDialogue/api/routes/system_routes.py ADDED Viewed

	@@ -0,0 +1,163 @@

+import asyncio
+import logging
+import time
+from fastapi import APIRouter, HTTPException, BackgroundTasks
+from ..schemas.system_schemas import (
+    SystemStatusResponse, SystemConfig,
+    SystemStartRequest, SystemResponse
+)
+logger = logging.getLogger(__name__)
+router = APIRouter()
+# 全局系统状态
+_system_status = {
+    "status": "stopped",
+    "start_time": None,
+    "config": None,
+    "active_sessions": 0
+}
+# @router.get("/status", response_model=SystemStatusResponse, summary="获取系统状态")
+# async def get_system_status():
+#     """
+#     获取系统整体状态，不包含语言模型信息
+#     """
+#     try:
+#         # 获取TTS模型统计
+#         all_configs = tts_config_registry.get_all_configs()
+#         downloaded_count = sum(1 for config in all_configs if config.is_model_complete())
+#
+#         # 获取TTS引擎状态
+#         available_engines = list(tts_manager.list_registered_tts().keys())
+#
+#         status = SystemStatusResponse(
+#             system_status="running",
+#             tts_models_total=len(all_configs),
+#             tts_models_downloaded=downloaded_count,
+#             available_tts_engines=available_engines,
+#             memory_usage=_get_memory_usage(),
+#             disk_usage=_get_disk_usage()
+#         )
+#
+#         return status
+#
+#     except Exception as e:
+#         logger.error(f"获取系统状态失败: {e}", exc_info=True)
+#         raise HTTPException(status_code=500, detail=f"获取系统状态失败: {str(e)}")
+@router.post("/start", response_model=SystemResponse, summary="启动系统")
+async def start_system(
+        request: SystemStartRequest,
+        background_tasks: BackgroundTasks
+):
+    """
+    启动语音对话系统
+    """
+    try:
+        if _system_status["status"] in ["running", "starting"]:
+            return SystemResponse(
+                success=False,
+                message="系统已经在运行中或正在启动"
+            )
+        # 更新状态
+        _system_status["status"] = "starting"
+        _system_status["config"] = request.config
+        # 在后台启动系统
+        background_tasks.add_task(
+            _start_system_background,
+            request.config
+        )
+        return SystemResponse(
+            success=True,
+            message="系统启动请求已提交，正在后台启动..."
+        )
+    except Exception as e:
+        logger.error(f"系统启动失败: {e}", exc_info=True)
+        _system_status["status"] = "stopped"
+        raise HTTPException(status_code=500, detail=f"系统启动失败: {str(e)}")
+@router.post("/stop", response_model=SystemResponse, summary="停止系统")
+async def stop_system():
+    """
+    停止语音对话系统
+    """
+    try:
+        if _system_status["status"] == "stopped":
+            return SystemResponse(
+                success=False,
+                message="系统已经停止"
+            )
+        # 更新状态
+        _system_status["status"] = "stopping"
+        # 模拟停止过程
+        await asyncio.sleep(1)
+        _system_status["status"] = "stopped"
+        _system_status["start_time"] = None
+        _system_status["config"] = None
+        _system_status["active_sessions"] = 0
+        return SystemResponse(
+            success=True,
+            message="系统已成功停止"
+        )
+    except Exception as e:
+        logger.error(f"系统停止失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"系统停止失败: {str(e)}")
+@router.post("/restart", response_model=SystemResponse, summary="重启系统")
+async def restart_system(
+        request: SystemStartRequest,
+        background_tasks: BackgroundTasks
+):
+    """
+    重启语音对话系统
+    """
+    try:
+        # 先停止
+        if _system_status["status"] != "stopped":
+            await stop_system()
+        # 再启动
+        return await start_system(request, background_tasks)
+    except Exception as e:
+        logger.error(f"系统重启失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"系统重启失败: {str(e)}")
+async def _start_system_background(config: SystemConfig):
+    """
+    后台启动系统的实际逻辑
+    """
+    try:
+        logger.info("开始启动语音对话系统...")
+        # 模拟启动过程
+        await asyncio.sleep(2)
+        # 这里应该调用实际的系统启动逻辑
+        # 类似于原来main.py中的launch_system函数
+        _system_status["status"] = "running"
+        _system_status["start_time"] = time.time()
+        logger.info("语音对话系统启动成功")
+    except Exception as e:
+        logger.error(f"后台启动系统失败: {e}", exc_info=True)
+        _system_status["status"] = "stopped"

src/VoiceDialogue/api/routes/tts_routes.py ADDED Viewed

	@@ -0,0 +1,172 @@

+import logging
+from typing import Optional
+from fastapi import APIRouter, HTTPException, BackgroundTasks
+from services.audio.audio_generator import tts_config_registry
+from ..schemas.tts_schemas import (
+    TTSModelInfo, TTSModelListResponse, TTSModelLoadRequest,
+    TTSModelLoadResponse, TTSModelStatusResponse, TTSModelDeleteResponse,
+    generate_model_id
+)
+logger = logging.getLogger(__name__)
+router = APIRouter()
+@router.get("/models", response_model=TTSModelListResponse, summary="获取TTS模型列表")
+async def list_tts_models():
+    """
+    获取所有可用的TTS模型列表
+    只返回BaseTTSConfig中的基础字段，每个模型分配唯一ID
+    """
+    try:
+        all_configs = tts_config_registry.get_all_configs()
+        models = []
+        for config in all_configs:
+            # 生成唯一ID，但不暴露具体的TTS类型
+            model_id = generate_model_id(config.tts_type.value, config.character_name)
+            # 检查模型状态
+            if config.is_model_complete():
+                status = "downloaded"
+            else:
+                status = "not_downloaded"
+            model_info = TTSModelInfo(
+                id=model_id,
+                character_name=config.character_name,
+                cover_image=config.cover_image,
+                description=config.description,
+                file_size=config.file_size,
+                is_chinese_voice=config.is_chinese_voice,
+                status=status
+            )
+            models.append(model_info)
+        return TTSModelListResponse(
+            models=models,
+            total=len(models)
+        )
+    except Exception as e:
+        logger.error(f"获取TTS模型列表失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"获取TTS模型列表失败: {str(e)}")
+@router.post("/models/load", response_model=TTSModelLoadResponse, summary="加载TTS模型")
+async def load_tts_model(request: TTSModelLoadRequest, background_tasks: BackgroundTasks):
+    """
+    通过模型ID加载TTS模型，不暴露具体的TTS类型
+    """
+    try:
+        # 通过ID找到对应的配置
+        config = _find_config_by_id(request.model_id)
+        if not config:
+            raise HTTPException(status_code=404, detail="模型ID不存在")
+        # 检查模型是否已存在
+        if config.is_model_complete():
+            return TTSModelLoadResponse(
+                success=True,
+                message=f"模型 {config.character_name} 已经加载完成",
+                model_id=request.model_id
+            )
+        # 后台下载模型
+        background_tasks.add_task(_download_model_task, config, request.model_id)
+        return TTSModelLoadResponse(
+            success=True,
+            message=f"模型 {config.character_name} 开始下载",
+            model_id=request.model_id
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"加载TTS模型失败: {e}", exc_info=True)
+        return TTSModelLoadResponse(
+            success=False,
+            message=f"加载模型失败: {str(e)}",
+            model_id=request.model_id
+        )
+@router.get("/models/{model_id}/status", response_model=TTSModelStatusResponse, summary="获取TTS模型状态")
+async def get_tts_model_status(model_id: str):
+    """
+    获取指定TTS模型的状态
+    """
+    try:
+        config = _find_config_by_id(model_id)
+        if not config:
+            raise HTTPException(status_code=404, detail="模型ID不存在")
+        # 检查模型状态
+        if config.is_model_complete():
+            status = "downloaded"
+        else:
+            status = "not_downloaded"
+        return TTSModelStatusResponse(
+            model_id=model_id,
+            status=status
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"获取TTS模型状态失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"获取模型状态失败: {str(e)}")
+@router.delete("/models/{model_id}", response_model=TTSModelDeleteResponse, summary="删除TTS模型")
+async def delete_tts_model(model_id: str):
+    """
+    删除指定的TTS模型
+    """
+    try:
+        config = _find_config_by_id(model_id)
+        if not config:
+            raise HTTPException(status_code=404, detail="模型ID不存在")
+        # 删除模型文件
+        config.delete_model()
+        return TTSModelDeleteResponse(
+            success=True,
+            message=f"模型 {config.character_name} 删除成功",
+            model_id=model_id
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"删除TTS模型失败: {e}", exc_info=True)
+        return TTSModelDeleteResponse(
+            success=False,
+            message=f"删除模型失败: {str(e)}",
+            model_id=model_id
+        )
+def _find_config_by_id(model_id: str) -> Optional:
+    """通过模型ID找到对应的配置"""
+    all_configs = tts_config_registry.get_all_configs()
+    for config in all_configs:
+        config_id = generate_model_id(config.tts_type.value, config.character_name)
+        if config_id == model_id:
+            return config
+    return None
+async def _download_model_task(config, model_id: str):
+    """后台下载模型任务"""
+    try:
+        logger.info(f"开始下载模型: {config.character_name}")
+        config.download_model()
+        logger.info(f"模型下载完成: {config.character_name}")
+    except Exception as e:
+        logger.error(f"模型下载失败: {e}", exc_info=True)

src/VoiceDialogue/api/routes/voice_routes.py ADDED Viewed

	@@ -0,0 +1,163 @@

+import asyncio
+import logging
+import time
+import numpy as np
+from fastapi import APIRouter, Depends, HTTPException, BackgroundTasks
+from ..dependencies.audio_deps import decode_audio_data, encode_audio_data, validate_audio_format
+from ..dependencies.model_deps import get_language_model, get_voice_model
+from ..schemas.voice_schemas import (
+    VoiceInput, TextInput, VoiceResponse,
+    TTSRequest, TTSResponse, ASRRequest, ASRResponse
+)
+logger = logging.getLogger(__name__)
+router = APIRouter()
+@router.post("/chat", response_model=VoiceResponse, summary="语音对话")
+async def voice_chat(
+        voice_input: VoiceInput,
+        language_model=Depends(get_language_model),
+        voice_model=Depends(get_voice_model)
+):
+    """
+    完整的语音对话处理：语音识别 -> 文本生成 -> 语音合成
+    """
+    start_time = time.time()
+    try:
+        # 1. 解码音频数据
+        audio_array = decode_audio_data(voice_input.audio_data)
+        validate_audio_format(audio_array)
+        # 2. 语音识别 (ASR)
+        # 这里应该集成实际的ASR服务
+        transcribed_text = await perform_asr(audio_array, voice_input.language)
+        # 3. 文本生成 (LLM)
+        generated_text = await generate_response(transcribed_text, language_model)
+        # 4. 语音合成 (TTS)
+        audio_response = await synthesize_speech(generated_text, voice_model)
+        processing_time = time.time() - start_time
+        return VoiceResponse(
+            transcribed_text=transcribed_text,
+            generated_text=generated_text,
+            audio_data=audio_response,
+            processing_time=processing_time
+        )
+    except Exception as e:
+        logger.error(f"语音对话处理失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"语音对话处理失败: {str(e)}")
+@router.post("/text-chat", response_model=VoiceResponse, summary="文本对话")
+async def text_chat(
+        text_input: TextInput,
+        language_model=Depends(get_language_model),
+        voice_model=Depends(get_voice_model)
+):
+    """
+    文本对话处理：文本生成 -> 语音合成
+    """
+    start_time = time.time()
+    try:
+        # 1. 文本生成 (LLM)
+        generated_text = await generate_response(text_input.text, language_model)
+        # 2. 语音合成 (TTS)
+        audio_response = await synthesize_speech(generated_text, voice_model)
+        processing_time = time.time() - start_time
+        return VoiceResponse(
+            transcribed_text=text_input.text,
+            generated_text=generated_text,
+            audio_data=audio_response,
+            processing_time=processing_time
+        )
+    except Exception as e:
+        logger.error(f"文本对话处理失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"文本对话处理失败: {str(e)}")
+@router.post("/asr", response_model=ASRResponse, summary="语音识别")
+async def speech_to_text(asr_request: ASRRequest):
+    """
+    语音识别服务
+    """
+    try:
+        # 解码音频数据
+        audio_array = decode_audio_data(asr_request.audio_data)
+        validate_audio_format(audio_array)
+        # 执行语音识别
+        transcribed_text = await perform_asr(audio_array, asr_request.language)
+        return ASRResponse(
+            transcribed_text=transcribed_text,
+            confidence=0.95  # 这里应该返回实际的置信度
+        )
+    except Exception as e:
+        logger.error(f"语音识别失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"语音识别失败: {str(e)}")
+@router.post("/tts", response_model=TTSResponse, summary="文本转语音")
+async def text_to_speech(
+        tts_request: TTSRequest,
+        voice_model=Depends(get_voice_model)
+):
+    """
+    文本转语音服务
+    """
+    try:
+        # 执行语音合成
+        audio_data = await synthesize_speech(tts_request.text, voice_model)
+        # 计算音频时长 (这里是估算)
+        duration = len(tts_request.text) * 0.1  # 大概每个字符0.1秒
+        return TTSResponse(
+            audio_data=audio_data,
+            duration=duration
+        )
+    except Exception as e:
+        logger.error(f"语音合成失败: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"语音合成失败: {str(e)}")
+# 辅助函数
+async def perform_asr(audio_array: np.ndarray, language: str) -> str:
+    """执行语音识别"""
+    # 这里应该集成实际的ASR服务
+    # 模拟处理
+    await asyncio.sleep(0.1)
+    return "这是识别出的文本内容"
+async def generate_response(text: str, language_model) -> str:
+    """生成文本响应"""
+    # 这里应该集成实际的LLM服务
+    # 模拟处理
+    await asyncio.sleep(0.5)
+    return f"针对「{text}」的AI回答：这是一个很好的问题，让我来为您详细解答..."
+async def synthesize_speech(text: str, voice_model) -> str:
+    """合成语音"""
+    # 这里应该集成实际的TTS服务
+    # 模拟返回Base64编码的音频数据
+    await asyncio.sleep(0.3)
+    # 创建一个简单的音频数组作为示例
+    dummy_audio = np.random.randn(16000).astype(np.float32) * 0.1  # 1秒的随机音频
+    return encode_audio_data(dummy_audio)

src/VoiceDialogue/api/schemas/__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from .system_schemas import (
+    SystemStatusResponse, SystemConfig,
+    SystemStartRequest, SystemResponse
+)
+from .voice_schemas import (
+    VoiceInput, TextInput, VoiceResponse,
+    TTSRequest, TTSResponse, ASRRequest, ASRResponse
+)
+__all__ = [
+    "VoiceInput", "TextInput", "VoiceResponse",
+    "TTSRequest", "TTSResponse", "ASRRequest", "ASRResponse",
+    "ModelInfo", "ModelListResponse",
+    "ModelLoadRequest", "ModelLoadResponse",
+    "SystemStatusResponse", "SystemConfig",
+    "SystemStartRequest", "SystemResponse"
+]

src/VoiceDialogue/api/schemas/asr_schemas.py ADDED Viewed

	@@ -0,0 +1,72 @@

+from typing import Literal, List, Dict, Optional
+from pydantic import BaseModel, Field
+class SupportedLanguagesResponse(BaseModel):
+    """支持的语言响应模式"""
+    languages: List[str] = Field(..., description="支持的语言列表")
+    language_mappings: Dict[str, str] = Field(..., description="语言到ASR引擎的映射")
+    asr_engines: List[str] = Field(..., description="可用的ASR引擎列表")
+class ASRRegistryResponse(BaseModel):
+    """ASR注册表响应模式"""
+    registered_asr_types: List[str] = Field(..., description="已注册的ASR类型")
+    supported_languages_by_engine: Dict[str, List[str]] = Field(..., description="各引擎支持的语言")
+    total_registered_count: int = Field(..., description="注册的ASR引擎总数")
+class ASRStatisticsResponse(BaseModel):
+    """ASR统计信息响应模式"""
+    registered_asr_count: int = Field(..., description="已注册的ASR引擎数量")
+    active_instances_count: int = Field(..., description="活动实例数量")
+    supported_languages: List[str] = Field(..., description="支持的语言列表")
+    language_mappings: Dict[str, str] = Field(..., description="语言映射配置")
+    registered_asr_types: List[str] = Field(..., description="已注册的ASR类型")
+class ASRInstanceRequest(BaseModel):
+    """ASR实例请求模式"""
+    language: Literal["zh", "en", "auto"] = Field(..., description="目标语言")
+class ASRInstanceResponse(BaseModel):
+    """ASR实例响应模式"""
+    success: bool = Field(..., description="操作是否成功")
+    message: str = Field(..., description="操作结果消息")
+    language: str = Field(..., description="语言类型")
+    asr_type: str = Field(..., description="使用的ASR引擎类型")
+    instance_id: Optional[str] = Field(None, description="实例标识符")
+class LanguageMappingRequest(BaseModel):
+    """语言映射配置请求模式"""
+    language: str = Field(..., description="语言代码")
+    asr_type: str = Field(..., description="ASR引擎类型")
+class LanguageMappingResponse(BaseModel):
+    """语言映射配置响应模式"""
+    success: bool = Field(..., description="操作是否成功")
+    message: str = Field(..., description="操作结果消息")
+    updated_mapping: Dict[str, str] = Field(..., description="更新后的映射关系")
+class ASRValidationRequest(BaseModel):
+    """ASR语言验证请求模式"""
+    language: str = Field(..., description="要验证的语言代码")
+class ASRValidationResponse(BaseModel):
+    """ASR语言验证响应模式"""
+    language: str = Field(..., description="语言代码")
+    is_supported: bool = Field(..., description="是否支持")
+    optimal_asr: Optional[str] = Field(None, description="最优ASR引擎")
+    available_asrs: List[str] = Field(default_factory=list, description="支持该语言的ASR引擎列表")
+class CleanupResponse(BaseModel):
+    """清理操作响应模式"""
+    success: bool = Field(..., description="清理是否成功")
+    message: str = Field(..., description="清理结果消息")
+    cleared_instances_count: int = Field(..., description="清理的实例数量")

src/VoiceDialogue/api/schemas/system_schemas.py ADDED Viewed

	@@ -0,0 +1,31 @@

+from typing import Optional, Literal
+from pydantic import BaseModel, Field
+class SystemStatusResponse(BaseModel):
+    """系统状态"""
+    status: Literal['running', 'stopped', 'starting', 'stopping'] = Field(..., description="系统状态")
+    uptime: Optional[float] = Field(None, description="运行时间(秒)")
+    active_sessions: int = Field(default=0, description="活跃会话数")
+    memory_usage: Optional[float] = Field(None, description="内存使用率")
+class SystemConfig(BaseModel):
+    """系统配置"""
+    user_language: Literal['zh', 'en'] = Field(default='zh', description="用户语言")
+    system_prompt: str = Field(..., description="系统提示词")
+    tts_speaker: str = Field(default='沈逸', description="TTS语音角色")
+    llm_model: Literal['7B', '14B'] = Field(default='14B', description="语言模型规模")
+class SystemStartRequest(BaseModel):
+    """系统启动请求"""
+    config: SystemConfig = Field(..., description="系统配置")
+class SystemResponse(BaseModel):
+    """系统响应"""
+    success: bool = Field(..., description="操作是否成功")
+    message: str = Field(..., description="响应消息")
+    status: Optional[SystemStatusResponse] = Field(None, description="系统状态")

src/VoiceDialogue/api/schemas/tts_schemas.py ADDED Viewed

	@@ -0,0 +1,52 @@

+from typing import List, Optional, Literal
+from pydantic import BaseModel, Field
+import hashlib
+class TTSModelInfo(BaseModel):
+    """TTS模型基础信息"""
+    id: str = Field(..., description="模型唯一标识符")
+    character_name: str = Field(..., description="角色名称")
+    cover_image: str = Field(..., description="封面图片URL")
+    description: str = Field(..., description="模型描述")
+    file_size: str = Field(..., description="文件大小")
+    is_chinese_voice: bool = Field(..., description="是否为中文语音")
+    status: Literal['not_downloaded', 'downloading', 'downloaded', 'failed'] = Field(..., description="模型状态")
+class TTSModelListResponse(BaseModel):
+    """TTS模型列表响应"""
+    models: List[TTSModelInfo] = Field(..., description="TTS模型列表")
+    total: int = Field(..., description="模型总数")
+class TTSModelLoadRequest(BaseModel):
+    """TTS模型加载请求"""
+    model_id: str = Field(..., description="要加载的模型ID")
+class TTSModelLoadResponse(BaseModel):
+    """TTS模型加载响应"""
+    success: bool = Field(..., description="是否加载成功")
+    message: str = Field(..., description="响应消息")
+    model_id: str = Field(..., description="模型ID")
+class TTSModelStatusResponse(BaseModel):
+    """TTS模型状态响应"""
+    model_id: str = Field(..., description="模型ID")
+    status: Literal['not_downloaded', 'downloading', 'downloaded', 'failed'] = Field(..., description="模型状态")
+    progress: Optional[float] = Field(None, description="下载进度(0-100)")
+class TTSModelDeleteResponse(BaseModel):
+    """TTS模型删除响应"""
+    success: bool = Field(..., description="是否删除成功")
+    message: str = Field(..., description="响应消息")
+    model_id: str = Field(..., description="模型ID")
+def generate_model_id(tts_type: str, character_name: str) -> str:
+    """生成模型唯一ID"""
+    combined = f"{tts_type}:{character_name}"
+    return hashlib.md5(combined.encode()).hexdigest()[:16]

src/VoiceDialogue/api/schemas/voice_schemas.py ADDED Viewed

	@@ -0,0 +1,49 @@

+from datetime import datetime
+from typing import Optional, Literal
+from pydantic import BaseModel, Field
+class VoiceInput(BaseModel):
+    """语音输入请求模式"""
+    audio_data: str = Field(..., description="Base64编码的音频数据")
+    language: Literal['zh', 'en'] = Field(default='zh', description="语音语言")
+class TextInput(BaseModel):
+    """文本输入请求模式"""
+    text: str = Field(..., description="输入文本", min_length=1, max_length=1000)
+    language: Literal['zh', 'en'] = Field(default='zh', description="文本语言")
+class VoiceResponse(BaseModel):
+    """语音响应模式"""
+    transcribed_text: Optional[str] = Field(None, description="转录的文本")
+    generated_text: str = Field(..., description="生成的回答文本")
+    audio_data: str = Field(..., description="Base64编码的音频响应")
+    processing_time: float = Field(..., description="处理时间(秒)")
+    timestamp: datetime = Field(default_factory=datetime.now, description="响应时间戳")
+class TTSRequest(BaseModel):
+    """文本转语音请求模式"""
+    text: str = Field(..., description="要转换的文本", min_length=1, max_length=1000)
+    speaker: str = Field(default='沈逸', description="语音角色")
+class TTSResponse(BaseModel):
+    """文本转语音响应模式"""
+    audio_data: str = Field(..., description="Base64编码的音频数据")
+    duration: float = Field(..., description="音频时长(秒)")
+class ASRRequest(BaseModel):
+    """语音识别请求模式"""
+    audio_data: str = Field(..., description="Base64编码的音频数据")
+    language: Literal['zh', 'en'] = Field(default='zh', description="语音语言")
+class ASRResponse(BaseModel):
+    """语音识别响应模式"""
+    transcribed_text: str = Field(..., description="识别出的文本")
+    confidence: float = Field(..., description="识别置信度")

src/VoiceDialogue/api/server.py ADDED Viewed

	@@ -0,0 +1,53 @@

+"""
+独立的API服务器启动脚本
+可以直接运行此脚本启动API服务器，无需通过main.py
+"""
+import sys
+from pathlib import Path
+import uvicorn
+# 添加项目根目录到Python路径
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+# 加载第三方库
+from config.paths import load_third_party
+load_third_party()
+def run_server(host: str = "0.0.0.0", port: int = 8000, reload: bool = False):
+    """运行API服务器"""
+    print(f"""
+{"=" * 80}
+VoiceDialogue API Server
+{"=" * 80}
+服务器地址: http://{host}:{port}
+API文档: http://{host}:{port}/docs
+ReDoc文档: http://{host}:{port}/redoc
+热重载: {'启用' if reload else '禁用'}
+{"=" * 80}
+    """)
+    uvicorn.run(
+        "api.app:app",
+        host=host,
+        port=port,
+        reload=reload,
+        log_level="info",
+        access_log=True
+    )
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="VoiceDialogue API服务器")
+    parser.add_argument("--host", default="0.0.0.0", help="服务器主机地址")
+    parser.add_argument("--port", "-p", type=int, default=8000, help="服务器端口")
+    parser.add_argument("--reload", action="store_true", help="启用热重载")
+    args = parser.parse_args()
+    run_server(args.host, args.port, args.reload)

src/VoiceDialogue/main.py CHANGED Viewed

@@ -1,8 +1,11 @@
 import time
 import typing
 from multiprocessing import Queue
 from pathlib import Path
 from config.paths import load_third_party
 load_third_party()
@@ -121,32 +124,141 @@ def launch_system(
         thread.join()
-def main():
     """
-    主程序入口函数
-    配置并启动语音对话系统的默认设置。当前配置：
-    - 用户语言：中文 ('zh')
-    - TTS说话人：沈逸
-    该函数可以根据需要修改默认配置，或者扩展为支持命令行参数。
-    Returns:
-        None
-    Example:
-        直接运行脚本：
-        $ python main.py
-        系统将使用默认配置启动语音对话服务
-    """
-    user_language: typing.Literal['zh', 'en'] = 'zh'
-    # '罗翔', '马保国', '沈逸', '杨幂', '周杰伦', '马云'
-    tts_speaker = '沈逸'
-    launch_system(user_language, tts_speaker)
 if __name__ == '__main__':

+import argparse
 import time
 import typing
 from multiprocessing import Queue
 from pathlib import Path
+import uvicorn
 from config.paths import load_third_party
 load_third_party()
         thread.join()
+def launch_api_server(host: str = "0.0.0.0", port: int = 8000, reload: bool = False):
     """
+    启动API服务器
+    Args:
+        host (str): 服务器主机地址，默认为 "0.0.0.0"
+        port (int): 服务器端口，默认为 8000
+        reload (bool): 是否启用热重载，默认为 False
+    """
+    print(f'{"=" * 80}\n正在启动API服务器...\n{"=" * 80}')
+    print(f"服务器地址: http://{host}:{port}")
+    print(f"API文档: http://{host}:{port}/docs")
+    print(f"热重载: {'启用' if reload else '禁用'}")
+    print(f'{"=" * 80}')
+    # 导入并启动FastAPI应用
+    uvicorn.run(
+        "api.app:app",
+        host=host,
+        port=port,
+        reload=reload,
+        log_level="info"
+    )
+def create_argument_parser():
+    """创建命令行参数解析器"""
+    parser = argparse.ArgumentParser(
+        description="VoiceDialogue - 语音对话系统",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+示例用法:
+  # 启动命令行模式（默认）
+  python main.py
+  # 启动命令行模式并指定参数
+  python main.py --mode cli --language zh --speaker 沈逸
+  # 启动API服务器
+  python main.py --mode api
+  # 启动API服务器并指定端口
+  python main.py --mode api --port 9000
+  # 启动API服务器并启用热重载（开发模式）
+  python main.py --mode api --port 8000 --reload
+支持的说话人:
+  罗翔, 马保国, 沈逸, 杨幂, 周杰伦, 马云
+        """
+    )
+    # 运行模式选择
+    parser.add_argument(
+        '--mode', '-m',
+        choices=['cli', 'api'],
+        default='cli',
+        help='运行模式: cli=命令行模式, api=API服务器模式 (默认: cli)'
+    )
+    # 命令行模式参数
+    cli_group = parser.add_argument_group('命令行模式参数')
+    cli_group.add_argument(
+        '--language', '-l',
+        choices=['zh', 'en'],
+        default='zh',
+        help='用户语言: zh=中文, en=英文 (默认: zh)'
+    )
+    cli_group.add_argument(
+        '--speaker', '-s',
+        choices=['罗翔', '马保国', '沈逸', '杨幂', '周杰伦', '马云'],
+        default='沈逸',
+        help='TTS说话人 (默认: 沈逸)'
+    )
+    # API服务器模式参数
+    api_group = parser.add_argument_group('API服务器模式参数')
+    api_group.add_argument(
+        '--host',
+        default='0.0.0.0',
+        help='服务器主机地址 (默认: 0.0.0.0)'
+    )
+    api_group.add_argument(
+        '--port', '-p',
+        type=int,
+        default=8000,
+        help='服务器端口 (默认: 8000)'
+    )
+    api_group.add_argument(
+        '--reload',
+        action='store_true',
+        help='启用热重载（开发模式）'
+    )
+    return parser
+def main():
+    """
+    主程序入口函数
+    根据命令行参数选择启动模式：
+    - cli: 启动命令行语音对话系统
+    - api: 启动HTTP API服务器
+    """
+    parser = create_argument_parser()
+    args = parser.parse_args()
+    print(f"""
+{"=" * 80}
+VoiceDialogue - 语音对话系统
+{"=" * 80}
+运行模式: {args.mode.upper()}
+{"=" * 80}
+    """)
+    try:
+        if args.mode == 'cli':
+            print(f"语言设置: {args.language}")
+            print(f"说话人: {args.speaker}")
+            print("正在启动命令行语音对话系统...")
+            launch_system(args.language, args.speaker)
+        elif args.mode == 'api':
+            launch_api_server(
+                host=args.host,
+                port=args.port,
+                reload=args.reload
+            )
+    except KeyboardInterrupt:
+        print("\n程序被用户中断")
+    except Exception as e:
+        print(f"程序运行出错: {e}")
+        raise
 if __name__ == '__main__':