Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

liumaolin commited on Jun 1, 2025

Commit

025ca3f

1 Parent(s): 7b86866

Refactor voice model structure: extract MoYoYo-specific configurations and introduce universal TTS registry.

Browse files

Files changed (6) hide show

src/VoiceDialogue/main.py +14 -13
src/VoiceDialogue/models/voice_model/__init__.py +19 -0
src/VoiceDialogue/models/voice_model/base.py +119 -0
src/VoiceDialogue/models/{voice_model.py → voice_model/moyoyo_configs.py} +7 -181
src/VoiceDialogue/models/voice_model/moyoyo_tts.py +159 -0
src/VoiceDialogue/services/audio/audio_answer.py +11 -22

src/VoiceDialogue/main.py CHANGED Viewed

@@ -7,7 +7,7 @@ from config.paths import load_third_party
 load_third_party()
-from models.voice_model import voice_model_registry
 from services.audio.aec_audio_capture import EchoCancellingAudioCapture
 from services.audio.audio_answer import TTSAudioGenerator
 from services.audio.audio_player import AudioStreamPlayer
@@ -21,7 +21,7 @@ language: typing.Literal['zh', 'en'] = 'en'
 def launch_system(
         user_language: str,
-        tts_speaker: str
 ):
     audio_frames_queue = Queue()
     user_voice_queue = Queue()
@@ -58,21 +58,22 @@ def launch_system(
     threads.append(answer_generator_worker)
     speaker_mapping = {
-        '罗翔': 0,
-        '马保国': 1,
-        '沈逸': 2,
-        '杨幂': 3,
-        '周杰伦': 4,
-        '马云': 5,
     }
-    speaker = tts_speaker
-    index = speaker_mapping.get(speaker, 0)
-    supported_audio_model = voice_model_registry[index]
-    supported_audio_model.download_model()
     audio_generator_worker = TTSAudioGenerator(
         processed_answer_queue=generated_answer_queue,
         tts_generated_audio_queue=tts_generated_audio_queue,
-        voice_role=supported_audio_model
     )
     audio_generator_worker.start()
     threads.append(audio_generator_worker)

 load_third_party()
+from models.voice_model import tts_config_registry, TTSConfigType
 from services.audio.aec_audio_capture import EchoCancellingAudioCapture
 from services.audio.audio_answer import TTSAudioGenerator
 from services.audio.audio_player import AudioStreamPlayer
 def launch_system(
         user_language: str,
+        speaker: str
 ):
     audio_frames_queue = Queue()
     user_voice_queue = Queue()
     threads.append(answer_generator_worker)
     speaker_mapping = {
+        '罗翔': 'Luo Xiang',
+        '马保国': 'Ma Baoguo',
+        '沈逸': 'Shen Yi',
+        '杨幂': 'Yang Mi',
+        '周杰伦': 'Jay Zhou',
+        '马云': 'Ma Yun',
     }
+    role = speaker_mapping.get(speaker)
+    if role is None:
+        raise ValueError(f"不支持的TTS配置: {speaker}")
+    tts_speaker_config = tts_config_registry.get_config(TTSConfigType.MOYOYO, role)
     audio_generator_worker = TTSAudioGenerator(
         processed_answer_queue=generated_answer_queue,
         tts_generated_audio_queue=tts_generated_audio_queue,
+        voice_role=tts_speaker_config
     )
     audio_generator_worker.start()
     threads.append(audio_generator_worker)

src/VoiceDialogue/models/voice_model/__init__.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from .base import TTSConfigType, VoiceModelStatus, tts_config_registry
+from .moyoyo_configs import get_moyoyo_configs
+from .moyoyo_tts import MoYoYoTTSConfig, MoYoYoTTSInference
+# 注册MoYoYo TTS
+moyoyo_inference = MoYoYoTTSInference()
+tts_config_registry.register_inference_engine(TTSConfigType.MOYOYO, moyoyo_inference)
+# 注册所有MoYoYo配置
+for config in get_moyoyo_configs():
+    tts_config_registry.register_config(config)
+__all__ = [
+    'TTSConfigType',
+    'VoiceModelStatus',
+    'tts_config_registry',
+    'MoYoYoTTSConfig',
+    'MoYoYoTTSInference',
+]

src/VoiceDialogue/models/voice_model/base.py ADDED Viewed

	@@ -0,0 +1,119 @@

+import typing
+from abc import ABC, abstractmethod
+from enum import Enum
+from pathlib import Path
+from pydantic import BaseModel
+class TTSConfigType(Enum):
+    """TTS引擎类型枚举"""
+    MOYOYO = 'moyoyo'
+    EDGE_TTS = 'edge_tts'
+    BARK = 'bark'
+    # 可以添加更多TTS引擎
+class VoiceModelStatus(Enum):
+    """声音模型状态枚举"""
+    NOT_DOWNLOADED = 'not_downloaded'
+    DOWNLOADING = 'downloading'
+    DOWNLOADED = 'downloaded'
+    FAILED = 'failed'
+class BaseTTSConfig(BaseModel, ABC):
+    """TTS配置基类"""
+    tts_type: TTSConfigType
+    character_name: str
+    cover_image: str
+    description: str
+    file_size: str
+    is_chinese_voice: bool
+    @abstractmethod
+    def get_model_storage_path(self) -> Path:
+        """获取模型存储路径"""
+        pass
+    @abstractmethod
+    def is_model_complete(self) -> bool:
+        """检查模型文件是否完整"""
+        pass
+    @abstractmethod
+    def download_model(self, progress_callback: typing.Callable = None):
+        """下载模型"""
+        pass
+    @abstractmethod
+    def delete_model(self):
+        """删除模型"""
+        pass
+class BaseTTSInference(ABC):
+    """TTS推理基类"""
+    @abstractmethod
+    def generate_speech(self, text: str, config: BaseTTSConfig, **kwargs) -> bytes:
+        """生成语音"""
+        pass
+    @abstractmethod
+    def is_supported_config(self, config: BaseTTSConfig) -> bool:
+        """检查是否支持此配置"""
+        pass
+class TTSConfigRegistry:
+    """TTS注册表，管理所有TTS引擎和配置"""
+    def __init__(self):
+        self._configs: dict[str, BaseTTSConfig] = {}
+        self._inference_engines: dict[TTSConfigType, BaseTTSInference] = {}
+    def register_config(self, config: BaseTTSConfig):
+        """注册TTS配置"""
+        key = f"{config.tts_type.value}:{config.character_name}"
+        self._configs[key] = config
+    def register_inference_engine(self, tts_type: TTSConfigType, engine: BaseTTSInference):
+        """注册TTS推理引擎"""
+        self._inference_engines[tts_type] = engine
+    def get_config(self, tts_type: TTSConfigType, character_name: str) -> BaseTTSConfig:
+        """获取指定配置"""
+        key = f"{tts_type.value}:{character_name}"
+        return self._configs.get(key)
+    def get_configs_by_type(self, tts_type: TTSConfigType) -> list[BaseTTSConfig]:
+        """获取指定类型的所有配置"""
+        return [config for config in self._configs.values()
+                if config.tts_type == tts_type]
+    def get_all_configs(self) -> list[BaseTTSConfig]:
+        """获取所有配置"""
+        return list(self._configs.values())
+    def get_inference_engine(self, tts_type: TTSConfigType) -> BaseTTSInference:
+        """获取推理引擎"""
+        return self._inference_engines.get(tts_type)
+    def generate_speech(self, tts_type: TTSConfigType, character_name: str,
+                        text: str, **kwargs) -> bytes:
+        """生成语音的统一接口"""
+        config = self.get_config(tts_type, character_name)
+        engine = self.get_inference_engine(tts_type)
+        if not config or not engine:
+            raise ValueError(f"TTS配置或引擎不存在: {tts_type.value}:{character_name}")
+        if not engine.is_supported_config(config):
+            raise ValueError(f"推理引擎不支持此配置: {tts_type.value}:{character_name}")
+        return engine.generate_speech(text, config, **kwargs)
+# 全局TTS注册表实例
+tts_config_registry = TTSConfigRegistry()

src/VoiceDialogue/models/{voice_model.py → voice_model/moyoyo_configs.py} RENAMED Viewed

@@ -1,12 +1,4 @@
-import enum
-import typing
-from concurrent.futures.thread import ThreadPoolExecutor
-from pathlib import Path
-from pydantic import BaseModel
-from config.settings import settings
-from utils.download_utils import download_file_from_huggingface
 # 基础预训练模型文件映射
 BASE_PRETRAINED_FILES = {
@@ -18,8 +10,8 @@ BASE_PRETRAINED_FILES = {
     'chinese-roberta-wwm-ext-large/tokenizer.json': 'chinese-roberta-wwm-ext-large/tokenizer.json',
 }
-# 声音模型配置
-VOICE_MODEL_CONFIGS = (
     {
         'repository': 'MoYoYoTech/tone-models',
         'character_name': 'Luo Xiang',
@@ -184,7 +176,6 @@ VOICE_MODEL_CONFIGS = (
         'inference_parameters': {
             'text_lang': "zh",
             'prompt_text': "这是我们最大的希望能招聘的到人。所以今天阿里巴巴公司内部，我自己这么觉得，人才梯队的建设非常之好。",
-            # 'prompt_text': "",
             'prompt_lang': "zh",
             'top_k': 5,
             'top_p': 1,
@@ -198,174 +189,9 @@ VOICE_MODEL_CONFIGS = (
             'seed': 233333,
         },
     },
-)
-class VoiceModelStatus(enum.Enum):
-    """声音模型状态枚举"""
-    NOT_DOWNLOADED = 'not_downloaded'
-    DOWNLOADING = 'downloading'
-    DOWNLOADED = 'downloaded'
-    FAILED = 'failed'
-class ConversationTemplates(BaseModel):
-    """对话模板"""
-    opening_remarks: list[str]
-    mid_responses: list[str]
-class VoiceModel(BaseModel):
-    """声音模型配置类"""
-    repository: str
-    character_name: str
-    cover_image: str
-    description: str
-    file_size: str
-    is_chinese_voice: bool
-    model_files: dict[str, str]
-    inference_parameters: dict[str, typing.Any]
-    # conversation_templates: ConversationTemplates
-    _download_status: VoiceModelStatus = VoiceModelStatus.NOT_DOWNLOADED
-    @property
-    def download_status(self) -> VoiceModelStatus:
-        """获取下载状态"""
-        if self.is_model_complete:
-            return VoiceModelStatus.DOWNLOADED
-        return self._download_status
-    @download_status.setter
-    def download_status(self, status: VoiceModelStatus):
-        """设置下载状态"""
-        self._download_status = status
-    @property
-    def model_storage_path(self) -> Path:
-        """获取模型存储路径"""
-        storage_path = settings.paths.AUDIO_MODELS_DIR / self.repository
-        storage_path.mkdir(parents=True, exist_ok=True)
-        return storage_path
-    @property
-    def is_model_complete(self) -> bool:
-        """检查模型文件是否完整"""
-        for model_file in self.model_files.values():
-            file_path = self.model_storage_path / model_file
-            if not file_path.exists():
-                return False
-        return True
-    def download_model(self, progress_callback: typing.Callable = None):
-        """下载模型"""
-        self.download_status = VoiceModelStatus.DOWNLOADING
-        try:
-            self._download_model_files(progress_callback)
-            self.download_status = VoiceModelStatus.DOWNLOADED
-        except Exception:
-            self.download_status = VoiceModelStatus.FAILED
-            raise
-    def _download_model_files(self, progress_callback: typing.Callable = None):
-        """从HuggingFace下载模型文件"""
-        with ThreadPoolExecutor() as executor:
-            for model_file in self.model_files.values():
-                executor.submit(
-                    download_file_from_huggingface,
-                    self.model_storage_path,
-                    self.repository,
-                    model_file
-                )
-        if progress_callback:
-            progress_callback()
-    def delete_model(self):
-        """删除模型核心文件"""
-        core_files = ['gpt-weights', 'sovits-weights']
-        for file_key in core_files:
-            file_path = self.model_storage_path / self.model_files.get(file_key, '')
-            if file_path.is_file():
-                file_path.unlink()
-            elif file_path.is_dir():
-                file_path.rmdir()
-        self.download_status = VoiceModelStatus.NOT_DOWNLOADED
-    # 模型文件路径属性
-    @property
-    def gpt_weights_path(self) -> Path:
-        """GPT权重文件路径"""
-        return self.model_storage_path / self.model_files.get('gpt-weights', '')
-    @property
-    def sovits_weights_path(self) -> Path:
-        """SoVITS权重文件路径"""
-        return self.model_storage_path / self.model_files.get('sovits-weights', '')
-    @property
-    def hubert_model_path(self) -> Path:
-        """中文HuBERT模型路径"""
-        return self.model_storage_path / 'chinese-hubert-base'
-    @property
-    def bert_model_path(self) -> Path:
-        """中文BERT模型路径"""
-        return self.model_storage_path / 'chinese-roberta-wwm-ext-large'
-    @property
-    def reference_audio_path(self) -> Path:
-        """参考音频文件路径"""
-        return self.model_storage_path / self.model_files.get('reference_audio', '')
-    @property
-    def prompt_semantic_path(self) -> Path:
-        """提示语义文件路径"""
-        return self.model_storage_path / self.model_files.get('prompt_semantic', '')
-    @property
-    def reference_spec_path(self) -> Path:
-        """参考频谱文件路径"""
-        return self.model_storage_path / self.model_files.get('reference_spec', '')
-class VoiceModelRegistry:
-    """声音模型注册表"""
-    _registered_models: dict[str, VoiceModel] = {}
-    @classmethod
-    def register_models(cls, model_configs: list[dict]) -> list[VoiceModel]:
-        """从配置注册模型"""
-        registered_models = []
-        for config in model_configs:
-            repository = config.get('repository', '')
-            character_name = config.get('character_name', '')
-            model_key = f'{repository}:{character_name}'
-            voice_model = VoiceModel(**config)
-            cls._registered_models[model_key] = voice_model
-            registered_models.append(voice_model)
-        return registered_models
-    @classmethod
-    def get_model(cls, repository: str, character_name: str) -> VoiceModel:
-        """获取指定模型"""
-        model_key = f'{repository}:{character_name}'
-        return cls._registered_models.get(model_key)
-    @classmethod
-    def get_all_models(cls) -> list[VoiceModel]:
-        """获取所有注册的模型"""
-        return list(cls._registered_models.values())
-    @classmethod
-    def get_version(cls) -> str:
-        """获取模型版本"""
-        return 'v2'
-# 全局声音模型注册表实例
-voice_model_registry = VoiceModelRegistry.register_models(VOICE_MODEL_CONFIGS)

+from .moyoyo_tts import MoYoYoTTSConfig
 # 基础预训练模型文件映射
 BASE_PRETRAINED_FILES = {
     'chinese-roberta-wwm-ext-large/tokenizer.json': 'chinese-roberta-wwm-ext-large/tokenizer.json',
 }
+# MoYoYo TTS配置列表
+MOYOYO_TTS_CONFIGS = [
     {
         'repository': 'MoYoYoTech/tone-models',
         'character_name': 'Luo Xiang',
         'inference_parameters': {
             'text_lang': "zh",
             'prompt_text': "这是我们最大的希望能招聘的到人。所以今天阿里巴巴公司内部，我自己这么觉得，人才梯队的建设非常之好。",
             'prompt_lang': "zh",
             'top_k': 5,
             'top_p': 1,
             'seed': 233333,
         },
     },
+]
+def get_moyoyo_configs() -> list[MoYoYoTTSConfig]:
+    """获取所有MoYoYo TTS配置"""
+    return [MoYoYoTTSConfig(**config) for config in MOYOYO_TTS_CONFIGS]

src/VoiceDialogue/models/voice_model/moyoyo_tts.py ADDED Viewed

	@@ -0,0 +1,159 @@

+import typing
+from concurrent.futures.thread import ThreadPoolExecutor
+from pathlib import Path
+from pydantic import BaseModel, Field
+from config.settings import settings
+from utils.download_utils import download_file_from_huggingface
+from .base import BaseTTSConfig, BaseTTSInference, TTSConfigType, VoiceModelStatus
+class InferenceParameters(BaseModel):
+    """TTS推理参数类"""
+    text_lang: str = Field(default="zh", description="文本语言")
+    prompt_text: str = Field(default="", description="提示文本")
+    prompt_lang: str = Field(default="zh", description="提示语言")
+    top_k: int = Field(default=5, ge=1, le=100, description="Top-K采样")
+    top_p: float = Field(default=1.0, ge=0.0, le=1.0, description="Top-P采样")
+    temperature: float = Field(default=1.0, ge=0.0, description="温度参数")
+    text_split_method: str = Field(default="cut3", description="文本分割方法")
+    batch_size: int = Field(default=100, ge=1, description="批处理大小")
+    speed_factor: float = Field(default=1.1, ge=0.1, le=3.0, description="语速因子")
+    split_bucket: bool = Field(default=True, description="是否分桶")
+    return_fragment: bool = Field(default=False, description="是否返回片段")
+    fragment_interval: float = Field(default=0.07, ge=0.0, description="片段间隔")
+    seed: int = Field(default=233333, description="随机种子")
+    # parallel_infer: bool = Field(default=False, description="是否并行推理")
+class MoYoYoTTSConfig(BaseTTSConfig):
+    """MoYoYo TTS配置类"""
+    tts_type: TTSConfigType = TTSConfigType.MOYOYO
+    repository: str
+    model_files: dict[str, str]
+    inference_parameters: InferenceParameters
+    _download_status: VoiceModelStatus = VoiceModelStatus.NOT_DOWNLOADED
+    @property
+    def download_status(self) -> VoiceModelStatus:
+        """获取下载状态"""
+        if self.is_model_complete():
+            return VoiceModelStatus.DOWNLOADED
+        return self._download_status
+    @download_status.setter
+    def download_status(self, status: VoiceModelStatus):
+        """设置下载状态"""
+        self._download_status = status
+    def get_model_storage_path(self) -> Path:
+        """获取模型存储路径"""
+        storage_path = settings.paths.AUDIO_MODELS_DIR / self.repository
+        storage_path.mkdir(parents=True, exist_ok=True)
+        return storage_path
+    def is_model_complete(self) -> bool:
+        """检查模型文件是否完整"""
+        storage_path = self.get_model_storage_path()
+        for model_file in self.model_files.values():
+            file_path = storage_path / model_file
+            if not file_path.exists():
+                return False
+        return True
+    def download_model(self, progress_callback: typing.Callable = None):
+        """下载模型"""
+        self.download_status = VoiceModelStatus.DOWNLOADING
+        try:
+            self._download_model_files(progress_callback)
+            self.download_status = VoiceModelStatus.DOWNLOADED
+        except Exception:
+            self.download_status = VoiceModelStatus.FAILED
+            raise
+    def _download_model_files(self, progress_callback: typing.Callable = None):
+        """从HuggingFace下载模型文件"""
+        storage_path = self.get_model_storage_path()
+        with ThreadPoolExecutor() as executor:
+            for model_file in self.model_files.values():
+                executor.submit(
+                    download_file_from_huggingface,
+                    storage_path,
+                    self.repository,
+                    model_file
+                )
+        if progress_callback:
+            progress_callback()
+    def delete_model(self):
+        """删除模型核心文件"""
+        storage_path = self.get_model_storage_path()
+        core_files = ['gpt-weights', 'sovits-weights']
+        for file_key in core_files:
+            file_path = storage_path / self.model_files.get(file_key, '')
+            if file_path.is_file():
+                file_path.unlink()
+            elif file_path.is_dir():
+                file_path.rmdir()
+        self.download_status = VoiceModelStatus.NOT_DOWNLOADED
+    # 模型文件路径属性
+    @property
+    def gpt_weights_path(self) -> Path:
+        """GPT权重文件路径"""
+        return self.get_model_storage_path() / self.model_files.get('gpt-weights', '')
+    @property
+    def sovits_weights_path(self) -> Path:
+        """SoVITS权重文件路径"""
+        return self.get_model_storage_path() / self.model_files.get('sovits-weights', '')
+    @property
+    def hubert_model_path(self) -> Path:
+        """中文HuBERT模型路径"""
+        return self.get_model_storage_path() / 'chinese-hubert-base'
+    @property
+    def bert_model_path(self) -> Path:
+        """中文BERT模型路径"""
+        return self.get_model_storage_path() / 'chinese-roberta-wwm-ext-large'
+    @property
+    def reference_audio_path(self) -> Path:
+        """参考��频文件路径"""
+        return self.get_model_storage_path() / self.model_files.get('reference_audio', '')
+    def get_runtime_config(self) -> typing.Dict[str, typing.Any]:
+        """获取Moyoyo运行时配置"""
+        return {
+            'default_v2': {
+                'version': 'v2',
+                'device': 'cpu',
+                'is_half': False,
+                't2s_weights_path': self.gpt_weights_path,
+                'vits_weights_path': self.sovits_weights_path,
+                'cnhuhbert_base_path': self.hubert_model_path,
+                'bert_base_path': self.bert_model_path,
+            }
+        }
+class MoYoYoTTSInference(BaseTTSInference):
+    """MoYoYo TTS推理引擎"""
+    def generate_speech(self, text: str, config: BaseTTSConfig, **kwargs) -> bytes:
+        """生成语音"""
+        if not isinstance(config, MoYoYoTTSConfig):
+            raise ValueError("配置类型不匹配，需要MoYoYoTTSConfig")
+        # 这里实现MoYoYo TTS的具体推理逻辑
+        # 暂时返回空字节，实际实现需要调用相应的TTS模型
+        return b""
+    def is_supported_config(self, config: BaseTTSConfig) -> bool:
+        """检查是否支持此配置"""
+        return isinstance(config, MoYoYoTTSConfig)

src/VoiceDialogue/services/audio/audio_answer.py CHANGED Viewed

@@ -1,4 +1,5 @@
 import time
 from multiprocessing import Queue
 from queue import Empty
@@ -8,39 +9,27 @@ load_third_party()
 from moyoyo_tts import TTSModule, TTS_Config
-from models.voice_model import VoiceModel
 from models.voice_task import VoiceTask
 from services.core.base import BaseThread
 from services.core.constants import dropped_audio_cache, user_still_speaking_event, voice_state_manager
 class TTSAudioGenerator(BaseThread):
     """TTS 音频生成器 - 负责将文本转换为音频"""
     def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None,
-                 processed_answer_queue, tts_generated_audio_queue, voice_role: VoiceModel):
         super().__init__(group, target, name, args, kwargs, daemon=daemon)
         self.processed_answer_queue: Queue = processed_answer_queue
         self.tts_generated_audio_queue: Queue = tts_generated_audio_queue
         self._device = "cpu"  # mps slower 11.66(cpu) vs 39.42(mps)
-        self._voice_role = voice_role
-    def setup_tts_config(self, device, voice_role: VoiceModel):
-        config = {
-            'default_v2': {
-                'version': 'v2',
-                'device': f'{device}',
-                'is_half': False,
-                't2s_weights_path': voice_role.gpt_weights_path,
-                'vits_weights_path': voice_role.sovits_weights_path,
-                'cnhuhbert_base_path': voice_role.hubert_model_path,
-                'bert_base_path': voice_role.bert_model_path,
-                # 'prompt_semantic_path': voice_role.prompt_semantic_path,
-                # 'refer_spec_path': voice_role.reference_spec_path,
-            }
-        }
-        tts_config = TTS_Config(config)
         return tts_config
     def warmup(self, warmup_steps=1):
@@ -53,13 +42,13 @@ class TTSAudioGenerator(BaseThread):
     def run(self):
-        tts_config = self.setup_tts_config(self._device, self._voice_role)
         self.tts_module = TTSModule(tts_config)
         self.tts_module.setup_inference_params(
-            ref_audio=self._voice_role.reference_audio_path,
             parallel_infer=False,
-            **self._voice_role.inference_parameters
         )
         self.warmup()

 import time
+import typing
 from multiprocessing import Queue
 from queue import Empty
 from moyoyo_tts import TTSModule, TTS_Config
 from models.voice_task import VoiceTask
 from services.core.base import BaseThread
 from services.core.constants import dropped_audio_cache, user_still_speaking_event, voice_state_manager
+from models.voice_model import MoYoYoTTSConfig
 class TTSAudioGenerator(BaseThread):
     """TTS 音频生成器 - 负责将文本转换为音频"""
     def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None,
+                 processed_answer_queue, tts_generated_audio_queue, voice_role: MoYoYoTTSConfig):
         super().__init__(group, target, name, args, kwargs, daemon=daemon)
         self.processed_answer_queue: Queue = processed_answer_queue
         self.tts_generated_audio_queue: Queue = tts_generated_audio_queue
         self._device = "cpu"  # mps slower 11.66(cpu) vs 39.42(mps)
+        self._tts_config = voice_role
+        self.tts_module: typing.Optional[TTSModule] = None
+    def setup_tts_config(self, voice_role: MoYoYoTTSConfig):
+        tts_config = TTS_Config(voice_role.get_runtime_config())
         return tts_config
     def warmup(self, warmup_steps=1):
     def run(self):
+        tts_config = self.setup_tts_config(self._tts_config)
         self.tts_module = TTSModule(tts_config)
         self.tts_module.setup_inference_params(
+            ref_audio=self._tts_config.reference_audio_path,
             parallel_infer=False,
+            **self._tts_config.inference_parameters.model_dump()
         )
         self.warmup()