Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

feat: Implement MCP integration for tool discovery and execution

by heyong4725 - opened Aug 8, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+555

-2709

This PR is in draft mode

Files changed (43) hide show

README.md +20 -52
assets/www/assets/{index-CGlMbARk.js → index-ByqsFGbw.js} +2 -2
assets/www/assets/{index-BPAUWo8W.css → index-CCuJ1lip.css} +1 -1
assets/www/index.html +2 -2
build/pyinstaller/hooks/hook-voice_dialogue.py +1 -24
electron-app/main.js +2 -2
frontend/src/App.vue +2 -13
frontend/src/assets/ball.json +2 -2
frontend/src/config/client_config.ts +1 -1
frontend/src/i18n/index.ts +0 -35
frontend/src/i18n/locales/en.ts +0 -74
frontend/src/i18n/locales/zh.ts +0 -74
frontend/src/main.ts +0 -2
frontend/src/stores/config.ts +0 -3
frontend/src/style.scss +0 -65
frontend/src/views/Home/Components/ChatText.vue +7 -15
frontend/src/views/Home/index.vue +1 -12
frontend/src/views/Welcome/Components/SettingsModal.vue +0 -581
frontend/src/views/Welcome/index.vue +418 -72
main.py +1 -16
pyproject.toml +4 -5
scripts/convert_tts_weights_to_safetensors.py +0 -47
src/voice_dialogue/api/app.py +1 -2
src/voice_dialogue/api/core/lifespan.py +2 -2
src/voice_dialogue/api/core/service_factories.py +4 -11
src/voice_dialogue/api/routes/system_routes.py +5 -107
src/voice_dialogue/api/schemas/system_schemas.py +2 -43
src/voice_dialogue/asr/manager.py +5 -24
src/voice_dialogue/asr/models/__init__.py +0 -9
src/voice_dialogue/asr/models/qwen.py +0 -76
src/voice_dialogue/audio/capture/__init__.py +7 -53
src/voice_dialogue/audio/capture/pyaudio_capture.py +10 -104
src/voice_dialogue/audio/devices.py +0 -167
src/voice_dialogue/audio/player.py +1 -69
src/voice_dialogue/cli/args.py +0 -14
src/voice_dialogue/config/audio_config.py +0 -77
src/voice_dialogue/config/paths.py +0 -1
src/voice_dialogue/core/launcher.py +4 -8
src/voice_dialogue/services/asr_service.py +1 -1
src/voice_dialogue/services/audio_player_service.py +1 -3
src/voice_dialogue/tts/runtime/moyoyo.py +0 -3
src/voice_dialogue/tts/weights_migration.py +0 -45
uv.lock +0 -0

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ library_name: transformers
 ![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
 ![License](https://img.shields.io/badge/License-MIT-green.svg)
 ![Platform](https://img.shields.io/badge/Platform-macOS-lightgrey.svg)
-![Version](https://img.shields.io/badge/Version-1.2.0-orange.svg)
 一个集成了语音识别(ASR)、大语言模型(LLM)和文本转语音(TTS)的实时语音对话系统
@@ -38,9 +38,8 @@ library_name: transformers
 VoiceDialogue 是一个基于 Python 的完整语音对话系统，实现了端到端的语音交互体验。系统采用模块化设计，具备实时、高精度、多角色的特点。
-- 🖥️ **图形界面**: 内置 Web 图形界面，浏览器即可使用（选音色、切语言、看实时字幕）
-- 🎤 **实时语音识别**: 基于 Qwen3-ASR 的高精度中英文转录（自带标点，支持 52 种语言）
-- 🤖 **智能对话生成**: 集成 Qwen3 等大语言模型
 - 🔊 **高质量语音合成**: 支持多角色、多风格的语音输出
 - 🌐 **Web API 服务**: 提供 HTTP 接口，方便集成
 - ⚡ **低延迟处理**: 优化的音频流处理管道
@@ -49,78 +48,47 @@ VoiceDialogue 是一个基于 Python 的完整语音对话系统，实现了端
 ## 🚀 快速开始
-> **最简单的方式**：克隆仓库 → 安装依赖 → 启动 → 在浏览器打开图形界面，即可开始语音对话。
-> 目前仅支持 **macOS（Apple Silicon）**。
-### 1. 克隆并安装
-> **模型分两部分**：
-> - **随仓库下载（约 12GB，Git LFS）**：大语言模型、语音合成、参考音色等。
-> - **首次启动自动下载（约 4.4GB）**：语音识别引擎 **Qwen3-ASR**，由程序在第一次运行时从 HuggingFace 拉取并缓存到 `~/.cache/huggingface`，之后无需重复下载。
->
-> ⚠️ **必须先安装 [Git LFS](https://git-lfs.com)**，否则克隆下来的模型只是几百字节的占位指针，应用无法启动。
 ```bash
-# 1) 安装并初始化 Git LFS（只需一次）
-brew install git-lfs        # 如未安装 Homebrew，见 https://git-lfs.com
-git lfs install
-# 2) 克隆项目（包含约 12GB 模型，体积较大，请耐心等待）
 git clone https://huggingface.co/MoYoYoTech/VoiceDialogue
 cd VoiceDialogue
-# 3) 校验模型确实拉取成功（应显示 GB 级大小，而非 100+ 字节）
-#    若显示很小，说明 Git LFS 未生效，执行：git lfs pull
-ls -lh assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf
-# 4) 安装依赖（推荐使用 uv）
 pip install uv
 uv venv
 source .venv/bin/activate
 WHISPER_COREML=1 CMAKE_ARGS="-DGGML_METAL=on" uv sync
-# 5) 安装额外依赖
-uv pip install kokoro-onnx        # kokoro-onnx（英文 TTS）
-uv pip install numpy==1.26.4      # 固定 numpy 版本
 ```
 > 📖 需要更详细的步骤？请查阅 [安装指南](docs/installation.md)，其中包含系统要求和常见问题。
-### 2. 启动图形界面（推荐）
-```bash
-python main.py --mode api
-```
-启动后，在浏览器中打开：**http://localhost:8000/app/**
-在界面中即可完成全部操作：
-- 点击右下角 **⚙️ 设置**，选择**麦克风、回音消除、识别语言、音色**，也可切换**中 / 英界面语言**；
-- 点击 **「开始对话」**，即可与 AI 实时语音对话，**字幕会实时显示**。
-> **首次启动较慢，属正常现象**：程序会自动下载 Qwen3-ASR 模型（约 4.4GB，需联网，下载进度会打印在终端）并转换一次 TTS 权重格式。全部完成后才会就绪，整个过程约几分钟（取决于网速）；之后每次启动只需数十秒。
-> 若终端长时间停在下载步骤，请检查网络是否能访问 `huggingface.co`。
-### 3. 命令行模式（CLI）
-如果不需要图形界面，也可以直接在终端运行语音对话：
 ```bash
-# 启动语音对话（默认中文）
 python main.py
-# 指定语言与音色
 python main.py --language en --speaker Heart
-# 列出可用音频输入设备（如外置麦克风阵列）
-python main.py --list-audio-devices
-# 指定输入设备
-python main.py --input-device <设备索引>
 ```
 > 详细使用方法请参考 [配置指南](docs/configuration.md) 和 [API 服务指南](docs/api-guide.md)。
 ## 📚 文档导航

 ![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
 ![License](https://img.shields.io/badge/License-MIT-green.svg)
 ![Platform](https://img.shields.io/badge/Platform-macOS-lightgrey.svg)
+![Version](https://img.shields.io/badge/Version-1.0.0-orange.svg)
 一个集成了语音识别(ASR)、大语言模型(LLM)和文本转语音(TTS)的实时语音对话系统
 VoiceDialogue 是一个基于 Python 的完整语音对话系统，实现了端到端的语音交互体验。系统采用模块化设计，具备实时、高精度、多角色的特点。
+- 🎤 **实时语音识别**: 高精度中英文语音转录
+- 🤖 **智能对话生成**: 集成 Qwen2.5 等大语言模型
 - 🔊 **高质量语音合成**: 支持多角色、多风格的语音输出
 - 🌐 **Web API 服务**: 提供 HTTP 接口，方便集成
 - ⚡ **低延迟处理**: 优化的音频流处理管道
 ## 🚀 快速开始
+### 1. 安装
 ```bash
+# 克隆项目
 git clone https://huggingface.co/MoYoYoTech/VoiceDialogue
 cd VoiceDialogue
+# 安装依赖 (推荐使用 uv)
 pip install uv
 uv venv
 source .venv/bin/activate
 WHISPER_COREML=1 CMAKE_ARGS="-DGGML_METAL=on" uv sync
+# 安装额外的依赖
+## 1. 安装 kokoro-onnx
+uv pip install kokoro-onnx
+## 2. 重新安装指定版本的 numpy
+uv pip install numpy==1.26.4
 ```
 > 📖 需要更详细的步骤？请查阅 [安装指南](docs/installation.md)，其中包含系统要求和常见问题。
+### 2. 运行
+#### 命令行模式 (CLI)
 ```bash
+# 启动语音对话 (默认中文)
 python main.py
+# 启动并指定语言和角色
 python main.py --language en --speaker Heart
+```
+#### API 服务模式
+```bash
+# 启动 API 服务器
+python main.py --mode api
 ```
 > 详细使用方法请参考 [配置指南](docs/configuration.md) 和 [API 服务指南](docs/api-guide.md)。
 ## 📚 文档导航

assets/www/assets/{index-CGlMbARk.js → index-ByqsFGbw.js} RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2cccc544e9f2c32632c81cf9a5e8ed3cc9c5b0476ec8e595a180deb6fde095c8
-size 2307855

 version https://git-lfs.github.com/spec/v1
+oid sha256:215e0b4a6eee243715941860012a0d3bbee778f8880df45b0ddc8b090993405b
+size 2215701

assets/www/assets/{index-BPAUWo8W.css → index-CCuJ1lip.css} RENAMED Viewed

@@ -1 +1 @@

- @charset "UTF-8";html,body{width:100%;height:100%}input::-ms-clear,input::-ms-reveal{display:none}*,*:before,*:after{box-sizing:border-box}html{font-family:sans-serif;line-height:1.15;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%;-ms-overflow-style:scrollbar;-webkit-tap-highlight-color:rgba(0,0,0,0)}body{margin:0}[tabindex="-1"]:focus{outline:none}hr{box-sizing:content-box;height:0;overflow:visible}h1,h2,h3,h4,h5,h6{margin-top:0;margin-bottom:.5em;font-weight:500}p{margin-top:0;margin-bottom:1em}abbr[title],abbr[data-original-title]{-webkit-text-decoration:underline dotted;text-decoration:underline;text-decoration:underline dotted;border-bottom:0;cursor:help}address{margin-bottom:1em;font-style:normal;line-height:inherit}input[type=text],input[type=password],input[type=number],textarea{-webkit-appearance:none}ol,ul,dl{margin-top:0;margin-bottom:1em}ol ol,ul ul,ol ul,ul ol{margin-bottom:0}dt{font-weight:500}dd{margin-bottom:.5em;margin-left:0}blockquote{margin:0 0 1em}dfn{font-style:italic}b,strong{font-weight:bolder}small{font-size:80%}sub,sup{position:relative;font-size:75%;line-height:0;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}pre,code,kbd,samp{font-size:1em;font-family:SFMono-Regular,Consolas,Liberation Mono,Menlo,Courier,monospace}pre{margin-top:0;margin-bottom:1em;overflow:auto}figure{margin:0 0 1em}img{vertical-align:middle;border-style:none}a,area,button,[role=button],input:not([type=range]),label,select,summary,textarea{touch-action:manipulation}table{border-collapse:collapse}caption{padding-top:.75em;padding-bottom:.3em;text-align:left;caption-side:bottom}input,button,select,optgroup,textarea{margin:0;color:inherit;font-size:inherit;font-family:inherit;line-height:inherit}button,input{overflow:visible}button,select{text-transform:none}button,html [type=button],[type=reset],[type=submit]{-webkit-appearance:button}button::-moz-focus-inner,[type=button]::-moz-focus-inner,[type=reset]::-moz-focus-inner,[type=submit]::-moz-focus-inner{padding:0;border-style:none}input[type=radio],input[type=checkbox]{box-sizing:border-box;padding:0}input[type=date],input[type=time],input[type=datetime-local],input[type=month]{-webkit-appearance:listbox}textarea{overflow:auto;resize:vertical}fieldset{min-width:0;margin:0;padding:0;border:0}legend{display:block;width:100%;max-width:100%;margin-bottom:.5em;padding:0;color:inherit;font-size:1.5em;line-height:inherit;white-space:normal}progress{vertical-align:baseline}[type=number]::-webkit-inner-spin-button,[type=number]::-webkit-outer-spin-button{height:auto}[type=search]{outline-offset:-2px;-webkit-appearance:none}[type=search]::-webkit-search-cancel-button,[type=search]::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{font:inherit;-webkit-appearance:button}output{display:inline-block}summary{display:list-item}template{display:none}[hidden]{display:none!important}mark{padding:.2em;background-color:#feffe6}:root{font-family:Inter,system-ui,Avenir,Helvetica,Arial,sans-serif;line-height:1.5;font-weight:400;color-scheme:light dark;color:#ffffffde;background-color:#242424;font-synthesis:none;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale;-webkit-text-size-adjust:100%}a{font-weight:500;color:#646cff;text-decoration:inherit}a:hover{color:#535bf2}body{margin:0;display:flex;place-items:center;min-width:320px;height:100%;min-height:auto;color:#333;background:#fff}h1{font-size:3.2em;line-height:1.1}button{border-radius:8px;border:1px solid transparent;padding:.6em 1.2em;font-size:1em;font-weight:500;font-family:inherit;background-color:#1a1a1a;cursor:pointer;transition:border-color .25s}.card{border-bottom:solid 2px lightgray;align-items:center;justify-content:center;margin-top:40px;display:flex;max-width:1024px;width:100%}.seg-title{margin:24px 0;font-size:20px;font-weight:500}.seg-co{width:1022px;text-align:left;border-left:solid 6px midnightblue;padding-left:8px;margin-left:2px;margin-top:36px;line-height:24px}#app{margin:0 auto;padding:0;text-align:center;width:100%;height:100%}.ant-btn{padding:4px 12px}@media (prefers-color-scheme: light){:root{color:#213547;background-color:#fff}a:hover{color:#747bff}button{background-color:#f9f9f9}}.ant-card{background:#f5f6fa;height:100%}.ant-card-body{padding:24px 36px 12px!important;border-radius:0 0 8px 8px}.ant-card .ant-card-actions{background-color:#e8e8f8cc!important}.ant-popover{max-width:800px!important}.ant-form-item{background:transparent;margin-bottom:40px!important}.ant-form-item .ant-form-item-explain-error{color:#ff4d4f;text-align:left!important}.ant-form-item-label label{font-size:18px!important;color:#1a1a1a!important;font-weight:500!important}.ant-tooltip{max-width:1022px!important}.ant-page-header-heading{width:1022px!important}.highlight{background:#f8f8ff}.ant-layout-sider-collapsed{width:0!important;min-width:0!important;overflow:hidden}.ant-layout-sider-collapsed .ant-menu-item,.ant-layout-sider-collapsed .ant-menu-submenu-title{display:none}.ant-modal .ant-modal-content{background:#ffffff9e!important;backdrop-filter:blur(28px) saturate(140%);-webkit-backdrop-filter:blur(28px) saturate(140%);border:1px solid rgba(255,255,255,.6);border-radius:22px!important;box-shadow:0 16px 48px #1f26872e}.ant-modal .ant-modal-header{background:transparent!important}.ant-modal-mask{background:#14161e1f!important;backdrop-filter:blur(14px) saturate(120%);-webkit-backdrop-filter:blur(14px) saturate(120%)}.ant-select .ant-select-selector,.ant-input,textarea.ant-input,.ant-input-affix-wrapper{background:#ffffff73!important;backdrop-filter:blur(8px);-webkit-backdrop-filter:blur(8px);border:1px solid rgba(255,255,255,.7)!important}.ant-btn:not(.ant-btn-text):not(.ant-btn-link){box-shadow:0 2px 10px #1f26871a}.ant-btn-default{background:#ffffff80!important;border:1px solid rgba(255,255,255,.75)!important;backdrop-filter:blur(8px);-webkit-backdrop-filter:blur(8px)}.ant-btn-text{box-shadow:none!important;background:transparent!important}.ant-radio-group-solid .ant-radio-button-wrapper:first-child{border-top-left-radius:12px;border-bottom-left-radius:12px}.ant-radio-group-solid .ant-radio-button-wrapper:last-child{border-top-right-radius:12px;border-bottom-right-radius:12px}.header-nav[data-v-07594418]{display:flex;align-items:flex-start;justify-content:space-between;width:100vw;height:40px;align-items:center;position:absolute;top:0;left:0;z-index:99;-webkit-app-region:drag;cursor:move}.header-nav .window-controls[data-v-07594418],.header-nav button[data-v-07594418],.header-nav .ant-input-search[data-v-07594418],.header-nav img[data-v-07594418],.header-nav .anticon[data-v-07594418]{-webkit-app-region:no-drag;cursor:pointer}.header-nav .window-controls[data-v-07594418]{top:0;right:0;display:flex;z-index:1000;margin-left:12px}.header-nav .window-controls .window-control-btn[data-v-07594418]{width:46px;height:32px;border:none;background:transparent;color:#666;font-size:16px;cursor:pointer;display:flex;align-items:center;justify-content:center;transition:background-color .2s}.header-nav .window-controls .window-control-btn[data-v-07594418]:hover{background-color:#0000001a}.header-nav .window-controls .window-control-btn.close[data-v-07594418]:hover{background-color:#e81123;color:#fff}.header-nav .window-controls .close-icon.focus[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.default[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.default[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.focus[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.focus[data-v-07594418]{display:inline}.content[data-v-b8a456cb]{background-color:#fff;margin:0 auto;display:flex;flex-direction:column;align-items:center;justify-content:space-between}.not-found-wrapper[data-v-aef52a59]{height:calc(100vh - 104px)}.tab-body[data-v-a48e843b]{height:360px;overflow-y:auto;padding:4px 8px 4px 2px}.setting-row[data-v-a48e843b]{margin-bottom:20px}.setting-row>label[data-v-a48e843b]{display:block;font-size:15px;font-weight:500;margin-bottom:8px}.setting-row>label .label-icon[data-v-a48e843b]{margin-right:6px;color:#1890ff}.setting-row .hint[data-v-a48e843b]{font-size:12px;color:#999;margin:8px 0 0}.setting-row .row-inline[data-v-a48e843b]{display:flex;align-items:center;justify-content:space-between}.voice-group[data-v-a48e843b]{display:flex;flex-direction:column;margin-top:8px}.about .about-head[data-v-a48e843b]{text-align:center;margin-bottom:24px}.about .about-head .about-name[data-v-a48e843b]{font-size:20px;font-weight:600}.about .about-head .about-ver[data-v-a48e843b]{font-size:13px;color:#888;margin-top:2px}.about .about-head .about-tagline[data-v-a48e843b]{font-size:12px;color:#999;margin-top:4px}.about .about-section[data-v-a48e843b]{margin-bottom:20px}.about .about-section .about-section-title[data-v-a48e843b]{font-size:13px;font-weight:600;color:#666;margin-bottom:10px}.about .about-item[data-v-a48e843b]{margin-bottom:12px}.about .about-item .about-item-label[data-v-a48e843b]{font-size:14px;font-weight:500}.about .about-item .about-item-desc[data-v-a48e843b]{font-size:12px;color:#777;margin-top:2px;line-height:1.6}.about .about-item .about-item-desc a[data-v-a48e843b]{margin-left:6px}.about a[data-v-a48e843b]{color:#1677ff;text-decoration:none}.about a[data-v-a48e843b]:hover{text-decoration:underline}.about .about-link[data-v-a48e843b]{font-size:13px;word-break:break-all}.about .about-copyright[data-v-a48e843b]{margin-top:16px;font-size:11px;color:#aaa;text-align:center}.voice-radio[data-v-a48e843b]{display:flex;align-items:center;height:40px;line-height:40px}.voice-radio .voice-name[data-v-a48e843b]{margin-right:8px}.audio-play-btn[data-v-a48e843b]{padding:0 6px;border-radius:4px}.audio-play-btn.playing[data-v-a48e843b]{background-color:#f6ffed}.asr-chip[data-v-ca2e1f17]{display:flex;align-items:center;gap:8px;height:38px;padding:0 18px;margin-right:16px;border-radius:19px;color:#000000a6;font-size:13px;background:#ffffff80;border:1px solid rgba(255,255,255,.7);backdrop-filter:blur(10px);-webkit-backdrop-filter:blur(10px);box-shadow:0 4px 16px #1f26871f}.settings-btn[data-v-ca2e1f17]{width:60px;height:60px;margin-right:24px;border-radius:50%!important;background:#ffffff80!important;border:1px solid rgba(255,255,255,.7)!important;backdrop-filter:blur(10px);-webkit-backdrop-filter:blur(10px);box-shadow:0 4px 16px #1f26871f;display:flex;align-items:center;justify-content:center}.welcome-wrapper[data-v-ca2e1f17]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.welcome-wrapper .content[data-v-ca2e1f17]{width:100%;height:80vh;display:flex;flex-direction:column;justify-content:space-around;margin-top:64px}.welcome-wrapper .content .inner-content[data-v-ca2e1f17]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.welcome-wrapper .content .inner-content .text-box[data-v-ca2e1f17]{color:#000;margin-bottom:36px}.welcome-wrapper .content .inner-content .text-box .title[data-v-ca2e1f17]{font-size:24px;font-weight:600;margin-bottom:24px}.welcome-wrapper .content .inner-content .text-box .sub-title[data-v-ca2e1f17]{font-size:15px;margin-top:10px}.welcome-wrapper .content .inner-content .btn-box[data-v-ca2e1f17]{width:224px;height:80px}.welcome-wrapper .actions[data-v-ca2e1f17]{width:100%;height:100px;margin-bottom:32px;display:flex;align-items:center;justify-content:flex-end}.ball-wrapper[data-v-34c8e583]{width:100%;height:calc(100vh - 100px);display:flex;flex-direction:column;align-items:center;justify-content:space-around}.talk-wrapper[data-v-05da84ae]{width:auto;width:100%;max-width:1000px;margin:0 auto;box-sizing:border-box;height:calc(100vh - 150px);overflow-y:auto;padding:20px 32px 0;display:flex;flex-direction:column;align-items:flex-start;justify-content:flex-start}.talk-wrapper .cont-left[data-v-05da84ae]{width:100%;margin:24px 0;display:flex;justify-content:flex-start;align-items:flex-start}.talk-wrapper .cont-left .text-left[data-v-05da84ae]{max-width:88%;color:#222;font-size:16px;font-weight:400;text-align:left;line-height:1.8;margin-left:12px;margin-top:6px;word-break:break-word}.talk-wrapper .cont-right[data-v-05da84ae]{width:100%;margin:24px 0;display:flex;justify-content:flex-end;align-items:flex-start}.talk-wrapper .cont-right .text-right[data-v-05da84ae]{max-width:80%;color:#444;font-size:16px;font-weight:400;text-align:start;line-height:1.8;margin-right:12px;background:#ccc;border-radius:8px 0 8px 8px;padding:8px 12px;word-break:break-word}.chat-wrapper[data-v-8b035bf4]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.chat-wrapper .content[data-v-8b035bf4]{width:100%;height:auto;display:flex;flex-direction:column;justify-content:space-around}.chat-wrapper .content .inner-content[data-v-8b035bf4]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.chat-wrapper .content .inner-content .text-box[data-v-8b035bf4]{color:#000;margin-bottom:36px}.chat-wrapper .content .inner-content .text-box .title[data-v-8b035bf4]{font-size:24px;font-weight:600;margin-bottom:24px}.chat-wrapper .content .inner-content .text-box .sub-title[data-v-8b035bf4]{font-size:15px;margin-top:10px}.chat-wrapper .content .inner-content .btn-box[data-v-8b035bf4]{width:224px;height:80px}.chat-wrapper .actions[data-v-8b035bf4]{width:100%;height:100px;margin-bottom:32px;display:flex;justify-content:space-between;align-items:center}.chat-wrapper .actions .holder[data-v-8b035bf4]{width:64px;height:48px}.chat-wrapper .actions .btns[data-v-8b035bf4]{width:450px;height:96px;display:flex;justify-content:space-around;align-items:center}.chat-wrapper .actions .btns[data-v-8b035bf4] .ant-btn{border-radius:50%!important;background:#ffffff80!important;border:1px solid rgba(255,255,255,.7)!important;backdrop-filter:blur(10px);-webkit-backdrop-filter:blur(10px);box-shadow:0 4px 16px #1f26871f}.chat-wrapper .actions .download-wrapper[data-v-8b035bf4]{width:64px;height:64px;display:flex;justify-content:flex-start;align-items:center;margin-right:0}.chat-wrapper .actions .download-wrapper img[data-v-8b035bf4]{width:24px;height:24px}.content-wrapper[data-v-d41c9ce7]{text-align:left;max-width:800px;min-width:320px;margin-bottom:64px;min-height:calc(100vh - 438px)}.content-wrapper .content-box[data-v-d41c9ce7]{padding:24px;height:240px;background-color:#e8e8e8;border-radius:16px;width:50%;margin:48px auto;min-width:300px}.content-wrapper .video-box[data-v-d41c9ce7]{max-width:800px;min-width:320px;width:90vw;height:auto}

+ @charset "UTF-8";html,body{width:100%;height:100%}input::-ms-clear,input::-ms-reveal{display:none}*,*:before,*:after{box-sizing:border-box}html{font-family:sans-serif;line-height:1.15;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%;-ms-overflow-style:scrollbar;-webkit-tap-highlight-color:rgba(0,0,0,0)}body{margin:0}[tabindex="-1"]:focus{outline:none}hr{box-sizing:content-box;height:0;overflow:visible}h1,h2,h3,h4,h5,h6{margin-top:0;margin-bottom:.5em;font-weight:500}p{margin-top:0;margin-bottom:1em}abbr[title],abbr[data-original-title]{-webkit-text-decoration:underline dotted;text-decoration:underline;text-decoration:underline dotted;border-bottom:0;cursor:help}address{margin-bottom:1em;font-style:normal;line-height:inherit}input[type=text],input[type=password],input[type=number],textarea{-webkit-appearance:none}ol,ul,dl{margin-top:0;margin-bottom:1em}ol ol,ul ul,ol ul,ul ol{margin-bottom:0}dt{font-weight:500}dd{margin-bottom:.5em;margin-left:0}blockquote{margin:0 0 1em}dfn{font-style:italic}b,strong{font-weight:bolder}small{font-size:80%}sub,sup{position:relative;font-size:75%;line-height:0;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}pre,code,kbd,samp{font-size:1em;font-family:SFMono-Regular,Consolas,Liberation Mono,Menlo,Courier,monospace}pre{margin-top:0;margin-bottom:1em;overflow:auto}figure{margin:0 0 1em}img{vertical-align:middle;border-style:none}a,area,button,[role=button],input:not([type=range]),label,select,summary,textarea{touch-action:manipulation}table{border-collapse:collapse}caption{padding-top:.75em;padding-bottom:.3em;text-align:left;caption-side:bottom}input,button,select,optgroup,textarea{margin:0;color:inherit;font-size:inherit;font-family:inherit;line-height:inherit}button,input{overflow:visible}button,select{text-transform:none}button,html [type=button],[type=reset],[type=submit]{-webkit-appearance:button}button::-moz-focus-inner,[type=button]::-moz-focus-inner,[type=reset]::-moz-focus-inner,[type=submit]::-moz-focus-inner{padding:0;border-style:none}input[type=radio],input[type=checkbox]{box-sizing:border-box;padding:0}input[type=date],input[type=time],input[type=datetime-local],input[type=month]{-webkit-appearance:listbox}textarea{overflow:auto;resize:vertical}fieldset{min-width:0;margin:0;padding:0;border:0}legend{display:block;width:100%;max-width:100%;margin-bottom:.5em;padding:0;color:inherit;font-size:1.5em;line-height:inherit;white-space:normal}progress{vertical-align:baseline}[type=number]::-webkit-inner-spin-button,[type=number]::-webkit-outer-spin-button{height:auto}[type=search]{outline-offset:-2px;-webkit-appearance:none}[type=search]::-webkit-search-cancel-button,[type=search]::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{font:inherit;-webkit-appearance:button}output{display:inline-block}summary{display:list-item}template{display:none}[hidden]{display:none!important}mark{padding:.2em;background-color:#feffe6}:root{font-family:Inter,system-ui,Avenir,Helvetica,Arial,sans-serif;line-height:1.5;font-weight:400;color-scheme:light dark;color:#ffffffde;background-color:#242424;font-synthesis:none;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale;-webkit-text-size-adjust:100%}a{font-weight:500;color:#646cff;text-decoration:inherit}a:hover{color:#535bf2}body{margin:0;display:flex;place-items:center;min-width:320px;height:100%;min-height:auto;color:#333;background:#fff}h1{font-size:3.2em;line-height:1.1}button{border-radius:8px;border:1px solid transparent;padding:.6em 1.2em;font-size:1em;font-weight:500;font-family:inherit;background-color:#1a1a1a;cursor:pointer;transition:border-color .25s}.card{border-bottom:solid 2px lightgray;align-items:center;justify-content:center;margin-top:40px;display:flex;max-width:1024px;width:100%}.seg-title{margin:24px 0;font-size:20px;font-weight:500}.seg-co{width:1022px;text-align:left;border-left:solid 6px midnightblue;padding-left:8px;margin-left:2px;margin-top:36px;line-height:24px}#app{margin:0 auto;padding:0;text-align:center;width:100%;height:100%}.ant-btn{padding:4px 12px}@media (prefers-color-scheme: light){:root{color:#213547;background-color:#fff}a:hover{color:#747bff}button{background-color:#f9f9f9}}.ant-card{background:#f5f6fa;height:100%}.ant-card-body{padding:24px 36px 12px!important;border-radius:0 0 8px 8px}.ant-card .ant-card-actions{background-color:#e8e8f8cc!important}.ant-popover{max-width:800px!important}.ant-form-item{background:transparent;margin-bottom:40px!important}.ant-form-item .ant-form-item-explain-error{color:#ff4d4f;text-align:left!important}.ant-form-item-label label{font-size:18px!important;color:#1a1a1a!important;font-weight:500!important}.ant-tooltip{max-width:1022px!important}.ant-page-header-heading{width:1022px!important}.highlight{background:#f8f8ff}.ant-layout-sider-collapsed{width:0!important;min-width:0!important;overflow:hidden}.ant-layout-sider-collapsed .ant-menu-item,.ant-layout-sider-collapsed .ant-menu-submenu-title{display:none}.header-nav[data-v-07594418]{display:flex;align-items:flex-start;justify-content:space-between;width:100vw;height:40px;align-items:center;position:absolute;top:0;left:0;z-index:99;-webkit-app-region:drag;cursor:move}.header-nav .window-controls[data-v-07594418],.header-nav button[data-v-07594418],.header-nav .ant-input-search[data-v-07594418],.header-nav img[data-v-07594418],.header-nav .anticon[data-v-07594418]{-webkit-app-region:no-drag;cursor:pointer}.header-nav .window-controls[data-v-07594418]{top:0;right:0;display:flex;z-index:1000;margin-left:12px}.header-nav .window-controls .window-control-btn[data-v-07594418]{width:46px;height:32px;border:none;background:transparent;color:#666;font-size:16px;cursor:pointer;display:flex;align-items:center;justify-content:center;transition:background-color .2s}.header-nav .window-controls .window-control-btn[data-v-07594418]:hover{background-color:#0000001a}.header-nav .window-controls .window-control-btn.close[data-v-07594418]:hover{background-color:#e81123;color:#fff}.header-nav .window-controls .close-icon.focus[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.default[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.default[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.focus[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.focus[data-v-07594418]{display:inline}.content[data-v-874ca48f]{background-color:#fff;margin:0 auto;display:flex;flex-direction:column;align-items:center;justify-content:space-between}.not-found-wrapper[data-v-aef52a59]{height:calc(100vh - 104px)}.btn-groups[data-v-839398ff]{margin-top:36px;display:flex;justify-content:flex-end;align-items:center}.prompt-title p[data-v-839398ff]{margin:0;font-size:16px;font-weight:500}.prompt-content[data-v-839398ff]{margin-top:16px}.prompt-content .prompt-title[data-v-839398ff]{margin-bottom:24px;font-size:22px;font-weight:500;text-align:center}.prompt-content .language-segment[data-v-839398ff]{display:flex;justify-content:center;margin-bottom:16px}.prompt-content .prompt-item[data-v-839398ff]{margin-top:16px}.languages[data-v-cd713caa]{margin-top:24px;margin-bottom:24px}.languages p[data-v-cd713caa]{font-size:16px;font-weight:500;margin-bottom:8px}.audio-play-btn[data-v-cd713caa]{padding:2px 8px 0;border-radius:4px;transition:all .2s;height:40px}.audio-play-btn[data-v-cd713caa]:hover{background-color:#f0f0f0}.audio-play-btn.playing[data-v-cd713caa]{background-color:#f6ffed;border-color:#1890ff}.audio-play-btn.playing .playing-icon[data-v-cd713caa]{animation:pulse-cd713caa 1.5s infinite}@keyframes pulse-cd713caa{0%{opacity:1;transform:scale(1)}50%{opacity:.7;transform:scale(1.1)}to{opacity:1;transform:scale(1)}}.btn-groups[data-v-cd713caa]{margin-top:36px;display:flex;justify-content:space-between;align-items:center}.custom-popover-list[data-v-cd713caa]{width:92px;margin:0}.custom-popover-list .custom-popover-item[data-v-cd713caa]{font-size:14px;line-height:36px;font-weight:500;color:#1e1e1e;cursor:pointer;border-radius:4px;padding:0 8px;margin:0 -8px;transition:background .2s}.custom-popover-list .custom-popover-item[data-v-cd713caa]:hover,.custom-popover-list .custom-popover-item[data-v-cd713caa]:focus{background:#e5e7eb}.welcome-wrapper[data-v-cd713caa]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.welcome-wrapper .content[data-v-cd713caa]{width:100%;height:80vh;display:flex;flex-direction:column;justify-content:space-around;margin-top:64px}.welcome-wrapper .content .inner-content[data-v-cd713caa]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.welcome-wrapper .content .inner-content .text-box[data-v-cd713caa]{color:#000;margin-bottom:36px}.welcome-wrapper .content .inner-content .text-box .title[data-v-cd713caa]{font-size:24px;font-weight:600;margin-bottom:24px}.welcome-wrapper .content .inner-content .text-box .sub-title[data-v-cd713caa]{font-size:15px;margin-top:10px}.welcome-wrapper .content .inner-content .btn-box[data-v-cd713caa]{width:224px;height:80px}.welcome-wrapper .actions[data-v-cd713caa]{width:100%;height:64px;display:flex;justify-content:flex-end}.ball-wrapper[data-v-34c8e583]{width:100%;height:calc(100vh - 100px);display:flex;flex-direction:column;align-items:center;justify-content:space-around}.talk-wrapper[data-v-1f502814]{width:auto;height:calc(100vh - 100px);overflow-y:scroll;padding:20px 240px 0;display:flex;flex-direction:column;align-items:flex-start;justify-content:flex-start}.talk-wrapper .cont-left[data-v-1f502814]{width:100%;margin:24px 0;display:flex;justify-content:flex-start;align-items:flex-start}.talk-wrapper .cont-left .text-left[data-v-1f502814]{color:#222;font-size:16px;font-weight:400;text-align:left;line-height:2;margin-left:12px;margin-top:6px}.talk-wrapper .cont-right[data-v-1f502814]{width:100%;margin:24px 0;display:flex;justify-content:flex-end;align-items:flex-start}.talk-wrapper .cont-right .text-right[data-v-1f502814]{color:#444;font-size:16px;font-weight:400;text-align:end;line-height:2;margin-right:12px;background:#ccc;border-radius:8px 0 8px 8px;padding:8px}.chat-wrapper[data-v-803600aa]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.chat-wrapper .content[data-v-803600aa]{width:100%;height:auto;display:flex;flex-direction:column;justify-content:space-around}.chat-wrapper .content .inner-content[data-v-803600aa]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.chat-wrapper .content .inner-content .text-box[data-v-803600aa]{color:#000;margin-bottom:36px}.chat-wrapper .content .inner-content .text-box .title[data-v-803600aa]{font-size:24px;font-weight:600;margin-bottom:24px}.chat-wrapper .content .inner-content .text-box .sub-title[data-v-803600aa]{font-size:15px;margin-top:10px}.chat-wrapper .content .inner-content .btn-box[data-v-803600aa]{width:224px;height:80px}.chat-wrapper .actions[data-v-803600aa]{width:100%;height:100px;display:flex;justify-content:space-between;align-items:center}.chat-wrapper .actions .holder[data-v-803600aa]{width:64px;height:48px}.chat-wrapper .actions .btns[data-v-803600aa]{width:450px;height:96px;display:flex;justify-content:space-around;align-items:flex-start}.chat-wrapper .actions .download-wrapper[data-v-803600aa]{width:64px;height:64px;display:flex;justify-content:flex-start;align-items:center;margin-right:0}.chat-wrapper .actions .download-wrapper img[data-v-803600aa]{width:24px;height:24px}.content-wrapper[data-v-d41c9ce7]{text-align:left;max-width:800px;min-width:320px;margin-bottom:64px;min-height:calc(100vh - 438px)}.content-wrapper .content-box[data-v-d41c9ce7]{padding:24px;height:240px;background-color:#e8e8e8;border-radius:16px;width:50%;margin:48px auto;min-width:300px}.content-wrapper .video-box[data-v-d41c9ce7]{max-width:800px;min-width:320px;width:90vw;height:auto}

assets/www/index.html CHANGED Viewed

@@ -5,8 +5,8 @@
     <link rel="icon" type="image/svg+xml" href="./favicon.ico" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <title>VoiceDialogue</title>
-    <script type="module" crossorigin src="./assets/index-CGlMbARk.js"></script>
-    <link rel="stylesheet" crossorigin href="./assets/index-BPAUWo8W.css">
   </head>
   <body>
     <div id="app"></div>

     <link rel="icon" type="image/svg+xml" href="./favicon.ico" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <title>VoiceDialogue</title>
+    <script type="module" crossorigin src="./assets/index-ByqsFGbw.js"></script>
+    <link rel="stylesheet" crossorigin href="./assets/index-CCuJ1lip.css">
   </head>
   <body>
     <div id="app"></div>

build/pyinstaller/hooks/hook-voice_dialogue.py CHANGED Viewed

@@ -24,29 +24,8 @@ ASSETS_ROOT = PROJECT_ROOT / "assets"
 # 收集主模块的所有子模块
 hiddenimports = collect_submodules('voice_dialogue')
 datas = collect_data_files('moyoyo_tts', include_py_files=True)
-# 不打包的资源：
-# - 旧版 FunASR/Whisper 模型（默认引擎为内置的 Qwen3-ASR）
-# - TTS 预训练权重的 .bin（已内置等价的 model.safetensors）
-EXCLUDED_ASSET_PATTERNS = [
-    "assets/models/asr/funasr/",
-    "assets/models/asr/whisper/",
-    "chinese-roberta-wwm-ext-large/pytorch_model.bin",
-    "chinese-hubert-base/pytorch_model.bin",
-]
-def _is_excluded(source_path: str) -> bool:
-    normalized = source_path.replace("\\", "/")
-    return any(pattern in normalized for pattern in EXCLUDED_ASSET_PATTERNS)
 # 收集系统资源文件
-datas += [
-    (source, dest)
-    for source, dest in collect_system_data_files(ASSETS_ROOT.as_posix(), "assets")
-    if not _is_excluded(source)
-]
 # ============================================================================
 # 第三方依赖配置
@@ -60,7 +39,6 @@ ML_DEPENDENCIES = [
     "pytorch_lightning",
     "huggingface_hub",
     "einops",
-    "qwen_asr",
 ]
 # 语音处理相关依赖
@@ -139,7 +117,6 @@ DATA_PACKAGES = [
     ("spacy", {"include_py_files": True}),
     ("misaki", {}),
     ("silero_vad", {}),
-    ("qwen_asr", {}),
 ]
 # 收集数据文件

 # 收集主模块的所有子模块
 hiddenimports = collect_submodules('voice_dialogue')
 datas = collect_data_files('moyoyo_tts', include_py_files=True)
 # 收集系统资源文件
+datas += collect_system_data_files(ASSETS_ROOT.as_posix(), "assets")
 # ============================================================================
 # 第三方依赖配置
     "pytorch_lightning",
     "huggingface_hub",
     "einops",
 ]
 # 语音处理相关依赖
     ("spacy", {"include_py_files": True}),
     ("misaki", {}),
     ("silero_vad", {}),
 ]
 # 收集数据文件

electron-app/main.js CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b9aeffbae7b9d83ea4abd63ef6c9465878c146bead1dea9339e087b85f2adccd
-size 7392

 version https://git-lfs.github.com/spec/v1
+oid sha256:4b10113a08513b026f9207f21db221f368a1f242a821dd54d2d002354a6a2ec2
+size 7039

frontend/src/App.vue CHANGED Viewed

@@ -1,8 +1,6 @@
 <template>
-  <a-config-provider :theme="appTheme">
-    <Header/>
-    <router-view class="content" />
-  </a-config-provider>
   <!-- <Footer/> -->
   <!-- <a-layout>
@@ -21,15 +19,6 @@
 import Header from "@/views/Header.vue";
 import Footer from "@/views/Footer.vue";
-// 全局主题：统一圆角与控件高度，配合玻璃拟态（Liquid Glass）
-const appTheme = {
-  token: {
-    colorPrimary: '#1677ff',
-    borderRadius: 14,
-    controlHeight: 38,
-  },
-};
 // import * as api from "@/client";
 import { onBeforeMount, onMounted, watch, CSSProperties, ref} from "vue";
 import {useSettingsStore} from "@/stores/config.ts";

 <template>
+  <Header/>
+  <router-view class="content" />
   <!-- <Footer/> -->
   <!-- <a-layout>
 import Header from "@/views/Header.vue";
 import Footer from "@/views/Footer.vue";
 // import * as api from "@/client";
 import { onBeforeMount, onMounted, watch, CSSProperties, ref} from "vue";
 import {useSettingsStore} from "@/stores/config.ts";

frontend/src/assets/ball.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:91eaffeec742a30f475cf5e777e1605e62d3c1547b64a891c15f9a5431460b8a
-size 22455

 version https://git-lfs.github.com/spec/v1
+oid sha256:edd650ec984e26b5fde217f273e6758d0862fc856b5333e678fa0b578374e8b9
+size 23084

frontend/src/config/client_config.ts CHANGED Viewed

@@ -5,7 +5,7 @@ import router from "@/router";
 const { wsCache } = useCache();
-export const test_server = '127.0.0.1:8000'
 // export const test_server = '59.110.18.232:19001'
 axios.defaults.baseURL = import.meta.env.PROD  ?  '/api/v1' : `http://${test_server}/api/v1`;

 const { wsCache } = useCache();
+export const test_server = '127.0.0.1:8848'
 // export const test_server = '59.110.18.232:19001'
 axios.defaults.baseURL = import.meta.env.PROD  ?  '/api/v1' : `http://${test_server}/api/v1`;

frontend/src/i18n/index.ts DELETED Viewed

@@ -1,35 +0,0 @@
-import { createI18n } from 'vue-i18n'
-import en from './locales/en'
-import zh from './locales/zh'
-export type UiLocale = 'en' | 'zh'
-// 从持久化的 pinia 设置中读取界面语言，默认英文
-function getInitialLocale(): UiLocale {
-    try {
-        const raw = localStorage.getItem('settings')
-        if (raw) {
-            const parsed = JSON.parse(raw)
-            const ui = parsed?.uiLanguage
-            if (ui === 'en' || ui === 'zh') return ui
-        }
-    } catch (e) {
-        // ignore
-    }
-    return 'zh'
-}
-const i18n = createI18n({
-    legacy: false,
-    globalInjection: true,
-    locale: getInitialLocale(),
-    fallbackLocale: 'en',
-    messages: { en, zh },
-})
-export function setUiLocale(locale: UiLocale) {
-    i18n.global.locale.value = locale
-}
-export default i18n

frontend/src/i18n/locales/en.ts DELETED Viewed

@@ -1,74 +0,0 @@
-export default {
-    common: {
-        cancel: 'Cancel',
-        confirm: 'Confirm',
-        reset: 'Reset',
-        save: 'Save',
-        error: 'Error',
-    },
-    lang: {
-        zh: 'Chinese',
-        en: 'English',
-        auto: 'Auto',
-    },
-    welcome: {
-        title: 'Welcome',
-        subtitle: 'Click the button below to start a conversation',
-        start: 'Start Conversation',
-        startFailed: 'Failed to start the voice dialogue system',
-    },
-    settings: {
-        title: 'Settings',
-        entry: 'Settings',
-        tabs: {
-            main: 'Main',
-            language: 'Language',
-            advanced: 'Prompt',
-            about: 'About',
-        },
-        about: {
-            tagline: 'A real-time AI voice dialogue system',
-            version: 'Version',
-            modelsTitle: 'Models',
-            llm: 'Language Model (LLM)',
-            llmDesc: 'Qwen3-8B (Q6_K, GGUF) · via llama.cpp',
-            asr: 'Speech Recognition (ASR)',
-            asrDesc: 'Whisper medium (English) · FunASR SeACo-Paraformer + CT-Transformer (Chinese)',
-            tts: 'Speech Synthesis (TTS)',
-            ttsDesc: 'MoYoYo TTS (GPT-SoVITS) · Kokoro (English)',
-            linksTitle: 'Repositories',
-            repoApp: 'App & source code',
-            repoVoices: 'Voice (tone) models',
-            copyright: '© 2025 MoYoYo · Models belong to their respective owners',
-        },
-        general: {
-            interfaceLanguage: 'Interface Language',
-            interfaceLanguageHint: 'Language of the application interface.',
-        },
-        audio: {
-            microphone: 'Microphone (Input Device)',
-            microphoneHint: 'Choose the input device, e.g. an external microphone array.',
-            systemDefault: 'System Default',
-            channelsSuffix: 'ch',
-            defaultSuffix: 'default',
-            speaker: 'Speaker (Output Device)',
-            speakerHint: 'Choose the output device for voice playback, e.g. an external speaker.',
-            echoCancellation: 'Echo Cancellation',
-            echoCancellationHint: 'Uses the system AEC on the default device. For an external array, echo is handled by the array hardware.',
-        },
-        recognition: {
-            language: 'Recognition Language',
-            languageHint: 'Language used for speech recognition (ASR).',
-        },
-        voice: {
-            role: 'Voice',
-            roleHint: 'The voice used for speech synthesis (TTS).',
-            playSample: 'Play sample',
-        },
-        prompt: {
-            title: 'System Prompt',
-            hint: 'Customize the system prompt for each language.',
-        },
-        applyFailed: 'Failed to apply settings',
-    },
-}

frontend/src/i18n/locales/zh.ts DELETED Viewed

@@ -1,74 +0,0 @@
-export default {
-    common: {
-        cancel: '取消',
-        confirm: '确认',
-        reset: '重置',
-        save: '保存',
-        error: '错误',
-    },
-    lang: {
-        zh: '中文',
-        en: '英文',
-        auto: '自动',
-    },
-    welcome: {
-        title: '欢迎使用',
-        subtitle: '点击下方按钮开始对话',
-        start: '开始对话',
-        startFailed: '启动语音对话系统失败',
-    },
-    settings: {
-        title: '设置',
-        entry: '设置',
-        tabs: {
-            main: '常用',
-            language: '语言',
-            advanced: 'Prompt',
-            about: '关于',
-        },
-        about: {
-            tagline: '实时 AI 语音对话系统',
-            version: '版本',
-            modelsTitle: '使用的模型',
-            llm: '大语言模型 (LLM)',
-            llmDesc: 'Qwen3-8B（Q6_K，GGUF）· 基于 llama.cpp',
-            asr: '语音识别 (ASR)',
-            asrDesc: 'Whisper medium（英文）· FunASR SeACo-Paraformer + CT-Transformer（中文）',
-            tts: '语音合成 (TTS)',
-            ttsDesc: 'MoYoYo TTS（GPT-SoVITS）· Kokoro（英文）',
-            linksTitle: '开源仓库',
-            repoApp: '应用与源码',
-            repoVoices: '音色模型',
-            copyright: '© 2025 MoYoYo · 各模型版权归原作者所有',
-        },
-        general: {
-            interfaceLanguage: '界面语言',
-            interfaceLanguageHint: '应用界面所使用的语言。',
-        },
-        audio: {
-            microphone: '麦克风（输入设备）',
-            microphoneHint: '选择输入设备，例如外置麦克风阵列。',
-            systemDefault: '系统默认',
-            channelsSuffix: '声道',
-            defaultSuffix: '默认',
-            speaker: '扬声器（输出设备）',
-            speakerHint: '选择语音播放的输出设备，例如外置扬声器。',
-            echoCancellation: '回音消除',
-            echoCancellationHint: '默认设备使用系统 AEC；选择外置阵列时，回音由阵列硬件处理。',
-        },
-        recognition: {
-            language: '识别语言',
-            languageHint: '语音识别（ASR）所使用的语言。',
-        },
-        voice: {
-            role: '音色',
-            roleHint: '语音合成（TTS）所使用的音色。',
-            playSample: '试听',
-        },
-        prompt: {
-            title: '系统提示词',
-            hint: '为每种语言自定义系统提示词。',
-        },
-        applyFailed: '应用设置失败',
-    },
-}

frontend/src/main.ts CHANGED Viewed

@@ -9,7 +9,6 @@ import './style.scss'
 import App from './App.vue'
 import router from './router'
-import i18n from './i18n'
 // import * as Sentry from "@sentry/browser";
@@ -29,5 +28,4 @@ createApp(App)
     .use(router)
     .use(Antd)
     .use(Vue3Lottie)
-    .use(i18n)
     .mount('#app')

 import App from './App.vue'
 import router from './router'
 // import * as Sentry from "@sentry/browser";
     .use(router)
     .use(Antd)
     .use(Vue3Lottie)
     .mount('#app')

frontend/src/stores/config.ts CHANGED Viewed

@@ -8,11 +8,8 @@ export const useSettingsStore  = defineStore({
         return {
             role: '',
             language: 'zh',
-            uiLanguage: 'zh' as 'en' | 'zh',
             sider_open: true,
             echoCancel: true,
-            inputDeviceIndex: null as number | null,
-            outputDeviceIndex: null as number | null,
         }
     },
     actions: {

         return {
             role: '',
             language: 'zh',
             sider_open: true,
             echoCancel: true,
         }
     },
     actions: {

frontend/src/style.scss CHANGED Viewed

@@ -173,68 +173,3 @@ $FormItemWidth: 1022px;
 .ant-layout-sider-collapsed .ant-menu-submenu-title {
   display: none;
 }
-/* ============================================================
-   Liquid Glass —— 苹果风格玻璃拟态（全局）
-   半透明 + 背景模糊 + 柔和描边/阴影；圆角由主题 token 统一
-   ============================================================ */
-/* 弹窗使用 Ant 内置 fade 过渡（纯 opacity 动画、无 transform），
-   避免 transform 动画期间 backdrop-filter 失效导致的闪烁；面板与其模糊一起平滑淡入 */
-/* 弹窗：磨砂玻璃面板 */
-.ant-modal .ant-modal-content {
-  background: rgba(255, 255, 255, 0.62) !important;
-  backdrop-filter: blur(28px) saturate(140%);
-  -webkit-backdrop-filter: blur(28px) saturate(140%);
-  border: 1px solid rgba(255, 255, 255, 0.6);
-  border-radius: 22px !important;
-  box-shadow: 0 16px 48px rgba(31, 38, 135, 0.18);
-}
-.ant-modal .ant-modal-header {
-  background: transparent !important;
-}
-/* 遮罩：整屏磨砂——轻微变暗 + 背景模糊。
-   遮罩用 ant-fade（opacity）淡入，模糊随之平滑出现，背景文字与画面一起糊掉，不再"闪出去" */
-.ant-modal-mask {
-  background: rgba(20, 22, 30, 0.12) !important;
-  backdrop-filter: blur(14px) saturate(120%);
-  -webkit-backdrop-filter: blur(14px) saturate(120%);
-}
-/* 输入类控件：半透明玻璃 */
-.ant-select .ant-select-selector,
-.ant-input,
-textarea.ant-input,
-.ant-input-affix-wrapper {
-  background: rgba(255, 255, 255, 0.45) !important;
-  backdrop-filter: blur(8px);
-  -webkit-backdrop-filter: blur(8px);
-  border: 1px solid rgba(255, 255, 255, 0.7) !important;
-}
-/* 按钮：统一形状（圆角来自 token）+ 柔和阴影；默认按钮做玻璃质感，主按钮保持实色
-   文本/链接按钮（如音色试听的小喇叭）保持透明无阴影 */
-.ant-btn:not(.ant-btn-text):not(.ant-btn-link) {
-  box-shadow: 0 2px 10px rgba(31, 38, 135, 0.10);
-}
-.ant-btn-default {
-  background: rgba(255, 255, 255, 0.5) !important;
-  border: 1px solid rgba(255, 255, 255, 0.75) !important;
-  backdrop-filter: blur(8px);
-  -webkit-backdrop-filter: blur(8px);
-}
-.ant-btn-text {
-  box-shadow: none !important;
-  background: transparent !important;
-}
-/* 分段单选（中文/英文 等）两端圆角，去掉方正感 */
-.ant-radio-group-solid .ant-radio-button-wrapper:first-child {
-  border-top-left-radius: 12px;
-  border-bottom-left-radius: 12px;
-}
-.ant-radio-group-solid .ant-radio-button-wrapper:last-child {
-  border-top-right-radius: 12px;
-  border-bottom-right-radius: 12px;
-}

 .ant-layout-sider-collapsed .ant-menu-submenu-title {
   display: none;
 }

frontend/src/views/Home/Components/ChatText.vue CHANGED Viewed

@@ -69,13 +69,9 @@ watch(() => props.chatContent, (newVal, oldVal) => {
 <style lang="scss" scoped>
 .talk-wrapper {
     width: auto;
-    width: 100%;
-    max-width: 1000px;
-    margin: 0 auto;
-    box-sizing: border-box;
-    height: calc(100vh - 150px);
-    overflow-y: auto;
-    padding: 20px 32px 0;
     display: flex;
     flex-direction: column;
     align-items: flex-start;
@@ -89,15 +85,13 @@ watch(() => props.chatContent, (newVal, oldVal) => {
         justify-content: flex-start;
         align-items: flex-start;
         .text-left {
-            max-width: 88%;
             color: #222;
             font-size: 16px;
             font-weight: 400;
             text-align: left;
-            line-height: 1.8;
             margin-left: 12px;
             margin-top: 6px;
-            word-break: break-word;
         }
     }
@@ -109,18 +103,16 @@ watch(() => props.chatContent, (newVal, oldVal) => {
         align-items: flex-start;
         .text-right {
-            max-width: 80%;
             color: #444;
             font-size: 16px;
             font-weight: 400;
-            text-align: start;
-            line-height: 1.8;
             margin-right: 12px;
             background: #ccc;
             border-radius: 8px;
             border-top-right-radius: 0;
-            padding: 8px 12px;
-            word-break: break-word;
         }
     }
 }

 <style lang="scss" scoped>
 .talk-wrapper {
     width: auto;
+    height: calc(100vh - 100px);
+    overflow-y: scroll;
+    padding: 20px 240px 0 240px;
     display: flex;
     flex-direction: column;
     align-items: flex-start;
         justify-content: flex-start;
         align-items: flex-start;
         .text-left {
             color: #222;
             font-size: 16px;
             font-weight: 400;
             text-align: left;
+            line-height: 2;
             margin-left: 12px;
             margin-top: 6px;
         }
     }
         align-items: flex-start;
         .text-right {
             color: #444;
             font-size: 16px;
             font-weight: 400;
+            text-align: end;
+            line-height: 2;
             margin-right: 12px;
             background: #ccc;
             border-radius: 8px;
             border-top-right-radius: 0;
+            padding: 8px;
         }
     }
 }

frontend/src/views/Home/index.vue CHANGED Viewed

@@ -387,7 +387,6 @@ const toggleText = () => {
     .actions {
         width: 100%;
         height: 100px;
-        margin-bottom: 32px;
         display: flex;
         justify-content: space-between;
@@ -402,17 +401,7 @@ const toggleText = () => {
             height: 96px;
             display: flex;
             justify-content: space-around;
-            align-items: center;
-            // Liquid Glass 圆形按钮（与 Welcome 设置按钮统一）
-            :deep(.ant-btn) {
-                border-radius: 50% !important;
-                background: rgba(255, 255, 255, 0.5) !important;
-                border: 1px solid rgba(255, 255, 255, 0.7) !important;
-                backdrop-filter: blur(10px);
-                -webkit-backdrop-filter: blur(10px);
-                box-shadow: 0 4px 16px rgba(31, 38, 135, 0.12);
-            }
         }
         .download-wrapper {
             width: 64px;

     .actions {
         width: 100%;
         height: 100px;
         display: flex;
         justify-content: space-between;
             height: 96px;
             display: flex;
             justify-content: space-around;
+            align-items: flex-start;
         }
         .download-wrapper {
             width: 64px;

frontend/src/views/Welcome/Components/SettingsModal.vue DELETED Viewed

@@ -1,581 +0,0 @@
-<script setup lang="ts">
-import { ref, reactive, computed, watch, onUnmounted } from "vue";
-import { Modal } from "ant-design-vue";
-import { SoundTwoTone, SoundOutlined, TranslationOutlined, AudioOutlined } from "@ant-design/icons-vue";
-import { useI18n } from "vue-i18n";
-import axios from "axios";
-import { useSettingsStore } from "@/stores/config.ts";
-import { setUiLocale, UiLocale } from "@/i18n";
-const props = defineProps({
-    open: { type: Boolean, default: false },
-});
-const emit = defineEmits(["update:open"]);
-const { t } = useI18n();
-const base_url = axios.defaults.baseURL;
-const settingsStore = useSettingsStore();
-const activeTab = ref<string>("main");
-const loading = ref<boolean>(false);
-const appVersion = "1.2.0";
-// ---- 各项设置的本地状态（打开时从 store / 后端同步）----
-const uiLanguage = ref<UiLocale>((settingsStore.$state.uiLanguage as UiLocale) ?? "en");
-const recognitionLanguage = ref<string>(settingsStore.$state.language || "zh");
-const echoCancel = ref<boolean>(settingsStore.$state.echoCancel ?? true);
-const inputDeviceIndex = ref<number | null>(settingsStore.$state.inputDeviceIndex ?? null);
-const outputDeviceIndex = ref<number | null>(settingsStore.$state.outputDeviceIndex ?? null);
-const role = ref<string>(settingsStore.$state.role || "");
-const languages = reactive<string[]>([]);
-const inputDevices = reactive<any[]>([]);
-const outputDevices = reactive<any[]>([]);
-const roles = reactive<any[]>([]);
-// ---- Prompt ----
-const promptLang = ref<string>("zh");
-const default_prompt_en = ref<string>("");
-const default_prompt_zh = ref<string>("");
-const current_prompt_en = ref<string>("");
-const current_prompt_zh = ref<string>("");
-const filteredRoles = computed(() => {
-    const is_chinese = recognitionLanguage.value === "zh";
-    return roles.filter((r) => r["is_chinese_voice"] === is_chinese);
-});
-// 切换识别语言后，自动选中第一个匹配音色
-watch(
-    () => recognitionLanguage.value,
-    () => {
-        if (filteredRoles.value.length > 0) {
-            const exists = filteredRoles.value.find((r) => r["id"] === role.value);
-            role.value = exists ? role.value : filteredRoles.value[0]["id"];
-        } else {
-            role.value = "";
-        }
-    }
-);
-// 界面语言即时生效（让用户立刻看到切换效果）
-watch(uiLanguage, (v) => setUiLocale(v));
-// ---- 数据加载 ----
-const fetchASRLanguages = async () => {
-    try {
-        const res = await fetch(`${base_url}/asr/languages`);
-        const data = await res.json();
-        if (data?.languages) {
-            languages.splice(0, languages.length, ...data.languages);
-            // 优先沿用本地已保存/默认的识别语言（默认中文），不被后端当前值覆盖
-            const saved = settingsStore.$state.language;
-            recognitionLanguage.value = saved && data.languages.includes(saved)
-                ? saved
-                : (data.languages.includes('zh') ? 'zh' : data.languages[0]);
-        }
-    } catch (e) {
-        console.error("Error fetching ASR languages:", e);
-    }
-};
-const fetchTTSRoles = async () => {
-    try {
-        const res = await fetch(`${base_url}/tts/models`);
-        const data = await res.json();
-        if (data?.models) {
-            roles.splice(0, roles.length, ...data.models);
-            if (data.current_model_id) role.value = data.current_model_id;
-        }
-    } catch (e) {
-        console.error("Error fetching TTS roles:", e);
-    }
-};
-const fetchInputDevices = async () => {
-    try {
-        const res = await fetch(`${base_url}/system/audio-devices`);
-        const data = await res.json();
-        if (data?.devices) {
-            inputDevices.splice(0, inputDevices.length, ...data.devices);
-            const saved = settingsStore.$state.inputDeviceIndex;
-            const exists = saved != null && data.devices.some((d: any) => d.index === saved);
-            inputDeviceIndex.value = exists ? saved : (data.current_device_index ?? null);
-        }
-        if (data?.output_devices) {
-            outputDevices.splice(0, outputDevices.length, ...data.output_devices);
-            const saved = settingsStore.$state.outputDeviceIndex;
-            const exists = saved != null && data.output_devices.some((d: any) => d.index === saved);
-            outputDeviceIndex.value = exists ? saved : (data.current_output_device_index ?? null);
-        }
-    } catch (e) {
-        console.error("Error fetching input devices:", e);
-    }
-};
-// 当前实际生效的 ASR 引擎（由后端返回，区分 Qwen / FunASR+Whisper 等）
-const asrEngineName = ref<string>("");
-const asrEngineKeys = ref<string[]>([]);
-const ASR_ENGINE_LINKS: Record<string, { name: string; url: string }> = {
-    qwen: { name: "Qwen3-ASR", url: "https://huggingface.co/Qwen/Qwen3-ASR-1.7B" },
-    whisper: { name: "whisper.cpp", url: "https://github.com/ggerganov/whisper.cpp" },
-    funasr: { name: "FunASR", url: "https://github.com/modelscope/FunASR" },
-};
-const asrEngineLinks = computed(() => {
-    const keys = asrEngineKeys.value.length ? asrEngineKeys.value : ["whisper", "funasr"];
-    return keys.map((k) => ASR_ENGINE_LINKS[k]).filter(Boolean);
-});
-const fetchAsrEngine = async () => {
-    try {
-        const res = await fetch(`${base_url}/system/asr-engine`);
-        const data = await res.json();
-        if (data?.display_name) asrEngineName.value = data.display_name;
-        if (data?.mappings) asrEngineKeys.value = [...new Set(Object.values(data.mappings) as string[])].sort();
-    } catch (e) {
-        console.error("Error fetching ASR engine:", e);
-    }
-};
-const fetchPrompts = async () => {
-    try {
-        const [cur, def] = await Promise.all([
-            fetch(`${base_url}/settings/settings/prompts`).then((r) => r.json()),
-            fetch(`${base_url}/settings/settings/prompts/default`).then((r) => r.json()),
-        ]);
-        if (cur) {
-            current_prompt_en.value = cur.english_prompt;
-            current_prompt_zh.value = cur.chinese_prompt;
-        }
-        if (def) {
-            default_prompt_en.value = def.english_prompt;
-            default_prompt_zh.value = def.chinese_prompt;
-        }
-    } catch (e) {
-        console.error("Error fetching prompts:", e);
-    }
-};
-const resetPrompt = (lang: string) => {
-    if (lang === "en") current_prompt_en.value = default_prompt_en.value;
-    else current_prompt_zh.value = default_prompt_zh.value;
-};
-// ---- 提交 / 取消 ----
-const applySettings = async () => {
-    loading.value = true;
-    try {
-        // 1. 持久化到本地 store
-        settingsStore.$state.uiLanguage = uiLanguage.value;
-        settingsStore.$state.language = recognitionLanguage.value;
-        settingsStore.$state.role = role.value || "";
-        settingsStore.$state.echoCancel = echoCancel.value;
-        settingsStore.$state.inputDeviceIndex = inputDeviceIndex.value;
-        settingsStore.$state.outputDeviceIndex = outputDeviceIndex.value;
-        // 输出设备保存即生效（会话中修改下一句生效）
-        await fetch(`${base_url}/system/audio-output-device`, {
-            method: "POST",
-            headers: { "Content-Type": "application/json" },
-            body: JSON.stringify({ output_device_index: outputDeviceIndex.value }),
-        });
-        // 2. 下发 TTS 音色 + ASR 语言
-        if (role.value) {
-            const r1 = await fetch(`${base_url}/tts/models/load`, {
-                method: "POST",
-                headers: { "Content-Type": "application/json" },
-                body: JSON.stringify({ model_id: role.value }),
-            });
-            if (!r1.ok) throw new Error(`TTS load failed: ${r1.status}`);
-        }
-        const r2 = await fetch(`${base_url}/asr/instance/create`, {
-            method: "POST",
-            headers: { "Content-Type": "application/json" },
-            body: JSON.stringify({ language: recognitionLanguage.value }),
-        });
-        if (!r2.ok) throw new Error(`ASR set failed: ${r2.status}`);
-        // 3. 保存 Prompt
-        await fetch(`${base_url}/settings/settings/prompts`, {
-            method: "POST",
-            headers: { "Content-Type": "application/json" },
-            body: JSON.stringify({
-                chinese_prompt: current_prompt_zh.value,
-                english_prompt: current_prompt_en.value,
-            }),
-        });
-        emit("update:open", false);
-    } catch (err) {
-        console.error("Error applying settings:", err);
-        Modal.error({ title: t("common.error"), content: t("settings.applyFailed") });
-    } finally {
-        loading.value = false;
-    }
-};
-const handleCancel = () => {
-    // 还原本地状态与界面语言
-    uiLanguage.value = (settingsStore.$state.uiLanguage as UiLocale) ?? "en";
-    setUiLocale(uiLanguage.value);
-    recognitionLanguage.value = settingsStore.$state.language || "zh";
-    echoCancel.value = settingsStore.$state.echoCancel ?? true;
-    inputDeviceIndex.value = settingsStore.$state.inputDeviceIndex ?? null;
-    outputDeviceIndex.value = settingsStore.$state.outputDeviceIndex ?? null;
-    role.value = settingsStore.$state.role || "";
-    emit("update:open", false);
-};
-watch(
-    () => props.open,
-    (isOpen) => {
-        if (isOpen) {
-            activeTab.value = "main";
-            uiLanguage.value = (settingsStore.$state.uiLanguage as UiLocale) ?? "en";
-            fetchASRLanguages();
-            fetchTTSRoles();
-            fetchInputDevices();
-            fetchPrompts();
-            fetchAsrEngine();
-        }
-    }
-);
-// ---- 音色试听 ----
-const currentPlayingId = ref<string | null>(null);
-const currentAudio = ref<HTMLAudioElement | null>(null);
-const isPlaying = (id: string) => currentPlayingId.value === id;
-const playRefAudio = async (id: string, e: Event) => {
-    e.stopPropagation();
-    e.preventDefault();
-    try {
-        if (currentPlayingId.value === id && currentAudio.value) {
-            currentAudio.value.pause();
-            currentAudio.value = null;
-            currentPlayingId.value = null;
-            return;
-        }
-        if (currentAudio.value) {
-            currentAudio.value.pause();
-            currentAudio.value = null;
-        }
-        const audio = new Audio(`${base_url}/tts/models/${id}/reference-audio`);
-        audio.addEventListener("ended", () => {
-            currentPlayingId.value = null;
-            currentAudio.value = null;
-        });
-        await audio.play();
-        currentPlayingId.value = id;
-        currentAudio.value = audio;
-    } catch (err) {
-        currentPlayingId.value = null;
-        currentAudio.value = null;
-    }
-};
-onUnmounted(() => {
-    if (currentAudio.value) currentAudio.value.pause();
-});
-</script>
-<template>
-    <a-modal
-        :open="props.open"
-        :title="t('settings.title')"
-        :mask-closable="false"
-        :closable="true"
-        :width="600"
-        centered
-        transition-name="ant-fade"
-        @cancel="handleCancel"
-        @update:open="(v: boolean) => emit('update:open', v)"
-    >
-        <template #footer>
-            <a-button key="back" @click="handleCancel">{{ t('common.cancel') }}</a-button>
-            <a-button key="confirm" type="primary" :loading="loading" @click="applySettings">
-                {{ t('common.confirm') }}
-            </a-button>
-        </template>
-        <a-tabs v-model:activeKey="activeTab" class="settings-tabs">
-            <!-- 常用：输入源 + 回音消除 + 音色（大家最关心的） -->
-            <a-tab-pane key="main" :tab="t('settings.tabs.main')">
-                <div class="tab-body">
-                    <div class="setting-row">
-                        <label>{{ t('settings.audio.microphone') }}</label>
-                        <a-select v-model:value="inputDeviceIndex" style="width: 100%;">
-                            <a-select-option :value="null">{{ t('settings.audio.systemDefault') }}</a-select-option>
-                            <a-select-option v-for="dev in inputDevices" :value="dev.index" :key="dev.index">
-                                {{ dev.name }}<template v-if="dev.max_input_channels > 1"> ({{ dev.max_input_channels }}{{ t('settings.audio.channelsSuffix') }})</template><template v-if="dev.is_default"> · {{ t('settings.audio.defaultSuffix') }}</template>
-                            </a-select-option>
-                        </a-select>
-                    </div>
-                    <div class="setting-row">
-                        <label>{{ t('settings.audio.speaker') }}</label>
-                        <a-select v-model:value="outputDeviceIndex" style="width: 100%;">
-                            <a-select-option :value="null">{{ t('settings.audio.systemDefault') }}</a-select-option>
-                            <a-select-option v-for="dev in outputDevices" :value="dev.index" :key="dev.index">
-                                {{ dev.name }}<template v-if="dev.is_default"> · {{ t('settings.audio.defaultSuffix') }}</template>
-                            </a-select-option>
-                        </a-select>
-                    </div>
-                    <div class="setting-row">
-                        <div class="row-inline">
-                            <label>{{ t('settings.audio.echoCancellation') }}</label>
-                            <a-switch v-model:checked="echoCancel" />
-                        </div>
-                    </div>
-                    <div class="setting-row">
-                        <label>{{ t('settings.voice.role') }}</label>
-                        <a-radio-group v-model:value="role" class="voice-group">
-                            <a-radio v-for="r in filteredRoles" :value="r['id']" :key="r['id']" class="voice-radio">
-                                <span class="voice-name">{{ r['character_name'] }}</span>
-                                <a-button
-                                    type="text"
-                                    class="audio-play-btn"
-                                    :class="{ playing: isPlaying(r['id']) }"
-                                    @click="playRefAudio(r['id'], $event)"
-                                >
-                                    <SoundTwoTone v-if="isPlaying(r['id'])" style="font-size: 16px; color: #52c41a;" />
-                                    <SoundOutlined v-else style="font-size: 16px; color: #1890ff;" />
-                                </a-button>
-                            </a-radio>
-                        </a-radio-group>
-                    </div>
-                </div>
-            </a-tab-pane>
-            <!-- 语言：界面语言 + 识别语言 -->
-            <a-tab-pane key="language" :tab="t('settings.tabs.language')">
-                <div class="tab-body">
-                    <div class="setting-row">
-                        <label><TranslationOutlined class="label-icon" />{{ t('settings.general.interfaceLanguage') }}</label>
-                        <a-select v-model:value="uiLanguage" style="width: 100%;">
-                            <a-select-option value="zh">{{ t('lang.zh') }}</a-select-option>
-                            <a-select-option value="en">{{ t('lang.en') }}</a-select-option>
-                        </a-select>
-                        <p class="hint">{{ t('settings.general.interfaceLanguageHint') }}</p>
-                    </div>
-                    <div class="setting-row">
-                        <label><AudioOutlined class="label-icon" />{{ t('settings.recognition.language') }}</label>
-                        <a-select v-model:value="recognitionLanguage" style="width: 100%;">
-                            <a-select-option v-for="lan in languages" :value="lan" :key="lan">
-                                {{ t('lang.' + lan) }}
-                            </a-select-option>
-                        </a-select>
-                        <p class="hint">{{ t('settings.recognition.languageHint') }}</p>
-                    </div>
-                </div>
-            </a-tab-pane>
-            <!-- 高级：系统提示词 -->
-            <a-tab-pane key="advanced" :tab="t('settings.tabs.advanced')">
-                <div class="tab-body">
-                    <div class="setting-row">
-                        <label>{{ t('settings.prompt.title') }}</label>
-                        <a-radio-group button-style="solid" size="small" v-model:value="promptLang" style="margin-bottom: 12px;">
-                            <a-radio-button value="zh">{{ t('lang.zh') }}</a-radio-button>
-                            <a-radio-button value="en">{{ t('lang.en') }}</a-radio-button>
-                        </a-radio-group>
-                        <div v-show="promptLang === 'zh'">
-                            <a-textarea v-model:value="current_prompt_zh" :placeholder="default_prompt_zh"
-                                :auto-size="{ minRows: 6, maxRows: 10 }" show-count :maxlength="2000" allow-clear />
-                            <a-button size="small" @click="resetPrompt('zh')" style="margin-top: 12px;">{{ t('common.reset') }}</a-button>
-                        </div>
-                        <div v-show="promptLang === 'en'">
-                            <a-textarea v-model:value="current_prompt_en" :placeholder="default_prompt_en"
-                                :auto-size="{ minRows: 6, maxRows: 10 }" show-count :maxlength="2000" allow-clear />
-                            <a-button size="small" @click="resetPrompt('en')" style="margin-top: 12px;">{{ t('common.reset') }}</a-button>
-                        </div>
-                    </div>
-                </div>
-            </a-tab-pane>
-            <!-- 关于 -->
-            <a-tab-pane key="about" :tab="t('settings.tabs.about')">
-                <div class="tab-body about">
-                    <div class="about-head">
-                        <div class="about-name">Voice Dialogue</div>
-                        <div class="about-ver">{{ t('settings.about.version') }} {{ appVersion }}</div>
-                        <div class="about-tagline">{{ t('settings.about.tagline') }}</div>
-                    </div>
-                    <div class="about-section">
-                        <div class="about-section-title">{{ t('settings.about.modelsTitle') }}</div>
-                        <div class="about-item">
-                            <div class="about-item-label">{{ t('settings.about.llm') }}</div>
-                            <div class="about-item-desc">
-                                {{ t('settings.about.llmDesc') }}
-                                <a href="https://huggingface.co/Qwen/Qwen3-8B" target="_blank" rel="noopener">Qwen3 ↗</a>
-                            </div>
-                        </div>
-                        <div class="about-item">
-                            <div class="about-item-label">{{ t('settings.about.asr') }}</div>
-                            <div class="about-item-desc">
-                                {{ asrEngineName || t('settings.about.asrDesc') }}
-                                <a v-for="link in asrEngineLinks" :key="link.url" :href="link.url" target="_blank"
-                                    rel="noopener">{{ link.name }} ↗</a>
-                            </div>
-                        </div>
-                        <div class="about-item">
-                            <div class="about-item-label">{{ t('settings.about.tts') }}</div>
-                            <div class="about-item-desc">
-                                {{ t('settings.about.ttsDesc') }}
-                                <a href="https://github.com/RVC-Boss/GPT-SoVITS" target="_blank" rel="noopener">GPT-SoVITS ↗</a>
-                                <a href="https://huggingface.co/hexgrad/Kokoro-82M" target="_blank" rel="noopener">Kokoro ↗</a>
-                            </div>
-                        </div>
-                    </div>
-                    <div class="about-section">
-                        <div class="about-section-title">{{ t('settings.about.linksTitle') }}</div>
-                        <div class="about-item">
-                            <div class="about-item-label">{{ t('settings.about.repoApp') }}</div>
-                            <a class="about-link" href="https://huggingface.co/MoYoYoTech/VoiceDialogue" target="_blank" rel="noopener">huggingface.co/MoYoYoTech/VoiceDialogue</a>
-                        </div>
-                        <div class="about-item">
-                            <div class="about-item-label">{{ t('settings.about.repoVoices') }}</div>
-                            <a class="about-link" href="https://huggingface.co/MoYoYoTech/tone-models" target="_blank" rel="noopener">huggingface.co/MoYoYoTech/tone-models</a>
-                        </div>
-                    </div>
-                    <div class="about-copyright">{{ t('settings.about.copyright') }}</div>
-                </div>
-            </a-tab-pane>
-        </a-tabs>
-    </a-modal>
-</template>
-<style lang="scss" scoped>
-// 固定内容区高度，切换 Tab 时横条不再跳动
-.tab-body {
-    height: 360px;
-    overflow-y: auto;
-    padding: 4px 8px 4px 2px;
-}
-.setting-row {
-    margin-bottom: 20px;
-    // 仅作用于字段标题（直接子 label），避免影响嵌套的 radio-button 等 <label>
-    > label {
-        display: block;
-        font-size: 15px;
-        font-weight: 500;
-        margin-bottom: 8px;
-        .label-icon {
-            margin-right: 6px;
-            color: #1890ff;
-        }
-    }
-    .hint {
-        font-size: 12px;
-        color: #999;
-        margin: 8px 0 0;
-    }
-    .row-inline {
-        display: flex;
-        align-items: center;
-        justify-content: space-between;
-    }
-}
-.voice-group {
-    display: flex;
-    flex-direction: column;
-    margin-top: 8px;
-}
-/* 关于页 */
-.about {
-    .about-head {
-        text-align: center;
-        margin-bottom: 24px;
-        .about-name {
-            font-size: 20px;
-            font-weight: 600;
-        }
-        .about-ver {
-            font-size: 13px;
-            color: #888;
-            margin-top: 2px;
-        }
-        .about-tagline {
-            font-size: 12px;
-            color: #999;
-            margin-top: 4px;
-        }
-    }
-    .about-section {
-        margin-bottom: 20px;
-        .about-section-title {
-            font-size: 13px;
-            font-weight: 600;
-            color: #666;
-            margin-bottom: 10px;
-        }
-    }
-    .about-item {
-        margin-bottom: 12px;
-        .about-item-label {
-            font-size: 14px;
-            font-weight: 500;
-        }
-        .about-item-desc {
-            font-size: 12px;
-            color: #777;
-            margin-top: 2px;
-            line-height: 1.6;
-            a { margin-left: 6px; }
-        }
-    }
-    a {
-        color: #1677ff;
-        text-decoration: none;
-        &:hover { text-decoration: underline; }
-    }
-    .about-link {
-        font-size: 13px;
-        word-break: break-all;
-    }
-    .about-copyright {
-        margin-top: 16px;
-        font-size: 11px;
-        color: #aaa;
-        text-align: center;
-    }
-}
-.voice-radio {
-    display: flex;
-    align-items: center;
-    height: 40px;
-    line-height: 40px;
-    .voice-name {
-        margin-right: 8px;
-    }
-}
-.audio-play-btn {
-    padding: 0 6px;
-    border-radius: 4px;
-    &.playing {
-        background-color: #f6ffed;
-    }
-}
-</style>

frontend/src/views/Welcome/index.vue CHANGED Viewed

@@ -2,66 +2,303 @@
 import router from "@/router.ts";
 import { useSettingsStore } from "@/stores/config.ts";
-import { ref, onMounted } from "vue";
 import { Modal } from 'ant-design-vue';
-import { AudioOutlined } from "@ant-design/icons-vue";
-import { useI18n } from "vue-i18n";
 import axios from "axios";
-import SettingsModal from "./Components/SettingsModal.vue";
-import setting from "@/assets/setting.png";
-const { t } = useI18n();
-const base_url = axios.defaults.baseURL;
-const settingsStore = useSettingsStore();
-const settingsOpen = ref<boolean>(false);
-const chatLoading = ref<boolean>(false);
-// 当前实际生效的 ASR 引擎，显示在设置按钮左侧
-const asrEngineName = ref<string>("");
 onMounted(async () => {
-    try {
-        const res = await fetch(`${base_url}/system/asr-engine`);
-        const data = await res.json();
-        if (data?.display_name) asrEngineName.value = data.display_name;
-    } catch (e) {
-        console.error("Error fetching ASR engine:", e);
-    }
 });
 const startAudioChat = async () => {
     try {
         chatLoading.value = true;
         const response = await fetch(`${base_url}/system/start`, {
             method: 'POST',
-            headers: { 'Content-Type': 'application/json' },
             body: JSON.stringify({
-                enable_echo_cancellation: settingsStore.$state.echoCancel ?? true,
-                input_device_index: settingsStore.$state.inputDeviceIndex ?? null,
-                output_device_index: settingsStore.$state.outputDeviceIndex ?? null
             })
         });
         if (!response.ok) {
             throw new Error(`HTTP error! status: ${response.status}`);
         }
-        await response.json();
         return true;
     } catch (error) {
-        console.error('Error starting audio chat:', error);
         return false;
     } finally {
         chatLoading.value = false;
     }
 };
-const chatAction = async () => {
-    const ok = await startAudioChat();
-    if (!ok) {
-        Modal.error({ title: t('common.error'), content: t('welcome.startFailed') });
-        return;
     }
-    router.replace('/home');
 };
 </script>
 <template>
@@ -69,67 +306,178 @@ const chatAction = async () => {
         <div class="content">
             <div class="inner-content">
                 <div class="text-box">
-                    <div class="title">{{ t('welcome.title') }}</div>
-                    <div class="sub-title">{{ t('welcome.subtitle') }}</div>
                 </div>
                 <div class="btn-box">
                     <a-button @click="chatAction" block :loading="chatLoading" type="primary" size="large">
-                        <span>{{ t('welcome.start') }}</span>
                     </a-button>
                 </div>
             </div>
         </div>
         <div class="actions">
-            <div v-if="asrEngineName" class="asr-chip" :title="t('settings.about.asr')">
-                <AudioOutlined />
-                <span>{{ asrEngineName }}</span>
-            </div>
-            <a-button type="text" @click="settingsOpen = true" class="settings-btn"
-                :title="t('settings.entry')">
                 <template #icon>
                     <img :src="setting" width="28" height="28" alt="settings" />
                 </template>
             </a-button>
         </div>
-        <SettingsModal v-model:open="settingsOpen" />
     </div>
 </template>
 <style lang="scss" scoped>
-.asr-chip {
-    display: flex;
-    align-items: center;
-    gap: 8px;
-    height: 38px;
-    padding: 0 18px;
-    margin-right: 16px;
-    border-radius: 19px;
-    color: rgba(0, 0, 0, 0.65);
-    font-size: 13px;
-    background: rgba(255, 255, 255, 0.5);
-    border: 1px solid rgba(255, 255, 255, 0.7);
-    backdrop-filter: blur(10px);
-    -webkit-backdrop-filter: blur(10px);
-    box-shadow: 0 4px 16px rgba(31, 38, 135, 0.12);
 }
-.settings-btn {
-    width: 60px;
-    height: 60px;
-    margin-right: 24px;
-    border-radius: 50% !important;
-    background: rgba(255, 255, 255, 0.5) !important;
-    border: 1px solid rgba(255, 255, 255, 0.7) !important;
-    backdrop-filter: blur(10px);
-    -webkit-backdrop-filter: blur(10px);
-    box-shadow: 0 4px 16px rgba(31, 38, 135, 0.12);
     display: flex;
     align-items: center;
-    justify-content: center;
 }
 .welcome-wrapper {
     width: 100%;
     height: 100%;
@@ -175,7 +523,6 @@ const chatAction = async () => {
                     margin-top: 10px;
                 }
             }
             .btn-box {
                 width: 224px;
                 height: 80px;
@@ -184,11 +531,10 @@ const chatAction = async () => {
     }
     .actions {
-        width: 100%;
-        height: 100px;
-        margin-bottom: 32px;
         display: flex;
-        align-items: center;
         justify-content: flex-end;
     }
 }

 import router from "@/router.ts";
 import { useSettingsStore } from "@/stores/config.ts";
+import { onMounted, onUnmounted, ref, reactive, computed, watch, h } from "vue";
 import { Modal } from 'ant-design-vue';
+import { SoundTwoTone, SoundOutlined } from "@ant-design/icons-vue";
 import axios from "axios";
+import PromptText from "./Components/PromptText.vue";
+const base_url = axios.defaults.baseURL
+const settingsStore = useSettingsStore()
+import setting from "@/assets/setting.png"
 onMounted(async () => {
+    await fetchASRLanguages();
+    await fetchTTSRoles();
 });
+const chatAction = async () => {
+    const state = await startAudioChat();
+    if (!state) {
+        console.error('Failed to start audio chat system service');
+        Modal.error({
+            title: 'Error',
+            content: 'Failed to start audio chat system service',
+        });
+        return;
+    }
+    router.replace('/home')
+}
+const chatLoading = ref<boolean>(false);
 const startAudioChat = async () => {
     try {
         chatLoading.value = true;
         const response = await fetch(`${base_url}/system/start`, {
             method: 'POST',
+            headers: {
+                'Content-Type': 'application/json',
+            },
             body: JSON.stringify({
+                enable_echo_cancellation: echoCancel.value
             })
         });
         if (!response.ok) {
             throw new Error(`HTTP error! status: ${response.status}`);
         }
+        const data = await response.json();
+        console.log('ASR Instance started successfully:', data);
         return true;
     } catch (error) {
+        console.error('Error starting ASR instance:', error);
         return false;
     } finally {
         chatLoading.value = false;
     }
+}
+const voiceModelOpen = ref<boolean>(false);
+const modalLoading = ref<boolean>(false);
+const handleVoiceModalCancel = () => {
+    voiceModelOpen.value = false;
+    role.value = settingsStore.$state.role;
+    language.value = settingsStore.$state.language;
 };
+const handleVoiceModalSubmit = async () => {
+    console.log('Selected Language:', language.value);
+    console.log('Selected Role:', role.value);
+    console.log('Echo Cancel:', echoCancel.value);
+    settingsStore.$state.language = language.value;
+    settingsStore.$state.role = role.value || '';
+    settingsStore.$state.echoCancel = echoCancel.value;
+    await pushConfig(settingsStore.$state.role);
+};
+const pushConfig = async (model_id: string) => {
+    try {
+        modalLoading.value = true;
+        const response = await fetch(`${base_url}/tts/models/load`, {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json',
+            },
+            body: JSON.stringify({
+                "model_id": model_id,
+            })
+        });
+        if (!response.ok) {
+            throw new Error(`HTTP error! status: ${response.status}`);
+        }
+        const data = await response.json();
+        console.log('Config pushed successfully:', data);
+        const response2 = await fetch(`${base_url}/asr/instance/create`, {
+            method: 'POST',
+            headers: {
+                'Content-Type': 'application/json',
+            },
+            body: JSON.stringify({
+                "language": language.value,
+            })
+        });
+        if (!response2.ok) {
+            throw new Error(`HTTP error! status: ${response2.status}`);
+        }
+        const data2 = await response2.json();
+        console.log('ASR Language set successfully:', data2);
+    } catch (err) {
+        console.error('Error pushing config:', err);
+        Modal.error({
+            title: 'Error',
+            content: "Error config: " + JSON.stringify(err),
+        });
+    } finally {
+        modalLoading.value = false;
+        voiceModelOpen.value = false;
+    }
+    console.log('Selected Language:', language.value);
+    console.log('Selected Role:', role.value);
+}
+const language = ref<string>(settingsStore.$state.language || 'zh');
+const languages = reactive([]);
+const languageOptions = {
+    'zh': 'Chinese',
+    'en': 'English',
+    'auto': 'Auto',
+};
+const role = ref<string>(settingsStore.$state.role || '');
+const roles = reactive([])
+const echoCancel = ref<boolean>(settingsStore.$state.echoCancel ?? true);
+const radioStyle = reactive({
+    display: 'flex',
+    height: '40px',
+    lineHeight: '40px',
+    fontSize: '16px',
+    marginBottom: '8px',
+});
+const filteredRoles = computed(() => {
+    const is_chinese = language.value == 'zh';
+    return roles.filter(ro => ro['is_chinese_voice'] == is_chinese);
+});
+watch(
+  () => language.value,
+  (newLang) => {
+    // 语言切换后，自动选中第一个可用角色
+    if (filteredRoles.value.length > 0) {
+      const current_role_id = settingsStore.$state.role;
+      const current_role = filteredRoles.value.find(ro => ro['id'] == current_role_id);
+      if (current_role) {
+        role.value = current_role_id;
+      } else {
+        role.value = filteredRoles.value[0]['id'];
+      }
+    } else {
+      role.value = "";
+    }
+  }
+);
+const fetchTTSRoles = async () => {
+    try {
+        const response = await fetch(`${base_url}/tts/models`);
+        const data = await response.json()
+        if (data && data.models) {
+            // @ts-ignore
+            roles.splice(0, data.length, ...data.models)
+            console.log('Fetched TTS Roles:', roles);
+            if (data.current_model_id) {
+                role.value = data.current_model_id;
+            }
+        }
+    } catch (error) {
+        console.error('Error fetching TTS roles:', error);
+    }
+};
+const fetchASRLanguages = async () => {
+    try {
+        const response = await fetch(`${base_url}/asr/languages`);
+        const data = await response.json();
+        if (data && data.languages) {
+            // @ts-ignore
+            languages.splice(0, languages.length, ...data.languages);
+            console.log('Fetched ASR Languages:', data.languages);
+            if (data.current_asr_language) {
+                language.value = data.current_asr_language;
+            }
+        }
+    } catch (error) {
+        console.error('Error fetching ASR languages:', error);
+    }
+};
+const togglePopover = (item: string) => {
+    popoverVisible.value = !popoverVisible.value;
+    if (item == 'voice') {
+        voiceModelOpen.value = true;
+    } else if (item == 'prompt') {
+        promptModelOpen.value = true;
+    }
+};
+const popoverVisible = ref<boolean>(false);
+const promptModelOpen = ref<boolean>(false);
+// 音频播放状态管理
+const currentPlayingId = ref<string | null>(null);
+const currentAudio = ref<HTMLAudioElement | null>(null);
+// 修改音频播放逻辑
+const playRefAudio = async (id: string, e: Event) => {
+    console.log('Playing reference audio for role:', id);
+    e.stopPropagation();
+    e.preventDefault();
+    try {
+        // 如果点击的是当前正在播放的音频，则停止播放
+        if (currentPlayingId.value === id && currentAudio.value) {
+            currentAudio.value.pause();
+            currentAudio.value = null;
+            currentPlayingId.value = null;
+            console.log('Audio stopped');
+            return;
+        }
+        // 如果有其他音频正在播放，先停止它
+        if (currentAudio.value) {
+            currentAudio.value.pause();
+            currentAudio.value = null;
+        }
+        // 创建新的音频实例
+        const audio = new Audio(`${base_url}/tts/models/${id}/reference-audio`);
+        // 设置音频事件监听
+        audio.addEventListener('ended', () => {
+            currentPlayingId.value = null;
+            currentAudio.value = null;
+        });
+        audio.addEventListener('error', (error) => {
+            console.error('Audio playback error:', error);
+            currentPlayingId.value = null;
+            currentAudio.value = null;
+            Modal.error({
+                title: 'Error',
+                content: 'Failed to play reference audio',
+            });
+        });
+        // 开始播放
+        await audio.play();
+        currentPlayingId.value = id;
+        currentAudio.value = audio;
+        console.log('Audio played successfully');
+    } catch (error) {
+        console.error('Error playing audio:', error);
+        currentPlayingId.value = null;
+        currentAudio.value = null;
+        Modal.error({
+            title: 'Error',
+            content: 'Failed to play reference audio',
+        });
     }
 };
+// 组���卸载时清理音频
+onUnmounted(() => {
+    if (currentAudio.value) {
+        currentAudio.value.pause();
+        currentAudio.value = null;
+    }
+    currentPlayingId.value = null;
+});
+// 计算属性：判断是否正在播放
+const isPlaying = (id: string) => {
+    return currentPlayingId.value === id;
+};
 </script>
 <template>
         <div class="content">
             <div class="inner-content">
                 <div class="text-box">
+                    <div class="title">
+                        欢迎使用
+                    </div>
+                    <div class="sub-title">
+                        点击下方按钮开始对话
+                    </div>
                 </div>
                 <div class="btn-box">
                     <a-button @click="chatAction" block :loading="chatLoading" type="primary" size="large">
+                        <span>开始对话</span>
                     </a-button>
                 </div>
             </div>
         </div>
         <div class="actions">
+            <!-- <a-button type="text" @click="toggleSider">sider</a-button> -->
+             <a-button v-if="false" type="text" @click="voiceModelOpen = true"
+                style="width:44px; height: 44px; margin-right:24px;margin-bottom: 24px;">
                 <template #icon>
                     <img :src="setting" width="28" height="28" alt="settings" />
                 </template>
             </a-button>
+             <a-popover v-if="true" v-model:open="popoverVisible" trigger="click" ok-text="Yes" cancel-text="No" placement="bottomRight">
+                <template #content>
+                    <div class="custom-popover-list">
+                        <div class="custom-popover-item" @click="togglePopover('voice')">
+                            选择音色</div>
+                        <div class="custom-popover-item" @click="togglePopover('prompt')">Prompt调试</div>
+                    </div>
+                </template>
+                <img :src="setting" alt="item actions" style="width: 28px; height: 28px; margin-right:24px;margin-top: 16px;">
+            </a-popover>
         </div>
+        <a-modal v-model:open="voiceModelOpen" :title="null" :mask-closable="false" :closable="false" centered>
+            <template #footer>
+                <a-button key="back" @click="handleVoiceModalCancel">Cancel</a-button>
+                <a-button key="submit" type="primary" :loading="modalLoading" @click="handleVoiceModalSubmit">Submit</a-button>
+            </template>
+            <div class="languages">
+                <div class="echo-cancel-item">
+                    <div style="display: flex; justify-content: space-between; align-items: center;">
+                        <p style="margin: 0;">Enable Echo Cancellation:</p>
+                        <a-switch v-model:checked="echoCancel" />
+                    </div>
+                </div>
+            </div>
+            <div class="languages">
+                <div class="language-item">
+                    <p>Select Language:</p>
+                    <a-select v-model:value="language" style="width: 100%;">
+                        <a-select-option v-for="lan in languages" :value="lan" :key="lan">
+                            {{ languageOptions[lan] }}
+                        </a-select-option>
+                    </a-select>
+                </div>
+            </div>
+            <div class="languages">
+                <div class="role-item">
+                    <p>Select voice Role:</p>
+                    <a-radio-group size="large" v-model:value="role">
+                        <a-radio v-for="r in filteredRoles" :style="radioStyle" :value="r['id']" :key="r['id']">
+                            <div style="display: flex; justify-content: space-between; align-items: center; width:450px;">
+                            {{ r['character_name'] }}
+                            <a-button
+                                :key="r['id']"
+                                type="text"
+                                @click="playRefAudio(r['id'], $event)"
+                                class="audio-play-btn"
+                                :class="{ 'playing': isPlaying(r['id']) }"
+                            >
+                                <SoundTwoTone
+                                    v-if="isPlaying(r['id'])"
+                                    style="font-size: 18px; color: #52c41a;"
+                                    class="playing-icon"
+                                />
+                                <SoundOutlined
+                                    v-else
+                                    style="font-size: 18px; color: #1890ff;"
+                                />
+                            </a-button>
+                        </div>
+                        </a-radio>
+                    </a-radio-group>
+                </div>
+            </div>
+        </a-modal>
+        <PromptText v-model:open="promptModelOpen" />
     </div>
 </template>
 <style lang="scss" scoped>
+.languages {
+    margin-top: 24px;
+    margin-bottom: 24px;
+    p {
+        font-size: 16px;
+        font-weight: 500;
+        margin-bottom: 8px;
+    }
 }
+.audio-play-btn {
+    padding: 0px 8px;
+    padding-top:2px;
+    border-radius: 4px;
+    transition: all 0.2s;
+    height: 40px;
+    &:hover {
+        background-color: #f0f0f0;
+    }
+    &.playing {
+        background-color: #f6ffed;
+        border-color: #1890ff;
+        .playing-icon {
+            animation: pulse 1.5s infinite;
+        }
+    }
+}
+@keyframes pulse {
+    0% {
+        opacity: 1;
+        transform: scale(1);
+    }
+    50% {
+        opacity: 0.7;
+        transform: scale(1.1);
+    }
+    100% {
+        opacity: 1;
+        transform: scale(1);
+    }
+}
+.btn-groups {
+    margin-top: 36px;
     display: flex;
+    justify-content: space-between;
     align-items: center;
 }
+.custom-popover-list {
+    width: 92px;
+    margin: 0;
+    .custom-popover-item {
+        font-size: 14px;
+        line-height: 36px;
+        font-weight: 500;
+        color: #1e1e1e;
+        cursor: pointer;
+        border-radius: 4px;
+        padding: 0 8px;
+        margin: 0px -8px;
+        transition: background 0.2s;
+    }
+    .custom-popover-item:hover, .custom-popover-item:focus {
+        background: #e5e7eb;
+    }
+}
 .welcome-wrapper {
     width: 100%;
     height: 100%;
                     margin-top: 10px;
                 }
             }
             .btn-box {
                 width: 224px;
                 height: 80px;
     }
     .actions {
+        width: 100%;;
+        height: 64px;
         display: flex;
         justify-content: flex-end;
     }
 }

main.py CHANGED Viewed

@@ -63,19 +63,6 @@ def main():
     parser = create_argument_parser()
     args = parser.parse_args()
-    # 列出音频输入设备后退出
-    if getattr(args, 'list_audio_devices', False):
-        from voice_dialogue.audio.devices import list_input_devices
-        devices = list_input_devices()
-        print(f"\n可用音频输入设备 ({len(devices)}):")
-        print(f"{'索引':>4}  {'通道':>4}  {'采样率':>7}  {'默认':>4}  名称")
-        for d in devices:
-            default_mark = '✓' if d['is_default'] else ''
-            print(f"{d['index']:>4}  {d['max_input_channels']:>4}  "
-                  f"{d['default_sample_rate']:>7}  {default_mark:>4}  {d['name']}")
-        print("\n使用 --input-device <索引> 选择设备。")
-        sys.exit(0)
     set_debug_mode(args.debug)
     print(f"""
@@ -91,10 +78,8 @@ VoiceDialogue - 语音对话系统
         if args.mode == 'cli':
             print(f"语言设置: {args.language}")
             print(f"说话人: {args.speaker}")
-            if args.input_device is not None:
-                print(f"输入设备索引: {args.input_device}")
             print("正在启动命令行语音对话系统...")
-            launch_system(args.language, args.speaker, args.disable_echo_cancellation, args.input_device)
         elif args.mode == 'api':
             launch_api_server(

     parser = create_argument_parser()
     args = parser.parse_args()
     set_debug_mode(args.debug)
     print(f"""
         if args.mode == 'cli':
             print(f"语言设置: {args.language}")
             print(f"说话人: {args.speaker}")
             print("正在启动命令行语音对话系统...")
+            launch_system(args.language, args.speaker, args.disable_echo_cancellation)
         elif args.mode == 'api':
             launch_api_server(

pyproject.toml CHANGED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "voice_dialogue"
-version = "1.2.0"
 description = "一个基于AI的智能语音对话系统，支持实时语音识别、自然语言处理和语音合成"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -8,11 +8,11 @@ dependencies = [
     "cn2an>=0.5.23",
     "einops>=0.8.1",
     "en-core-web-sm",
-    "fastapi==0.136.3",
     "ffmpeg-python>=0.2.0",
     "funasr-onnx==0.4.1",
     "g2p-en>=2.1.0",
-    "huggingface-hub==0.36.2",
     "jieba>=0.42.1",
     "jieba-fast>=0.53",
     "langchain==0.2.17",
@@ -29,11 +29,10 @@ dependencies = [
     "pypinyin>=0.54.0",
     "pytorch-lightning==2.3.1",
     "pywhispercpp",
-    "qwen-asr>=0.0.6",
     "silero-vad==5.1.2",
     "soundfile==0.13.1",
     "torch==2.3.1",
-    "transformers==4.57.6",
     "uvicorn==0.34.3",
     "websockets>=15.0.1",
     "wordsegment>=1.3.1",

 [project]
 name = "voice_dialogue"
+version = "1.0.0"
 description = "一个基于AI的智能语音对话系统，支持实时语音识别、自然语言处理和语音合成"
 readme = "README.md"
 requires-python = ">=3.11"
     "cn2an>=0.5.23",
     "einops>=0.8.1",
     "en-core-web-sm",
+    "fastapi==0.115.12",
     "ffmpeg-python>=0.2.0",
     "funasr-onnx==0.4.1",
     "g2p-en>=2.1.0",
+    "huggingface-hub==0.32.4",
     "jieba>=0.42.1",
     "jieba-fast>=0.53",
     "langchain==0.2.17",
     "pypinyin>=0.54.0",
     "pytorch-lightning==2.3.1",
     "pywhispercpp",
     "silero-vad==5.1.2",
     "soundfile==0.13.1",
     "torch==2.3.1",
+    "transformers==4.41.2",
     "uvicorn==0.34.3",
     "websockets>=15.0.1",
     "wordsegment>=1.3.1",

scripts/convert_tts_weights_to_safetensors.py DELETED Viewed

@@ -1,47 +0,0 @@
-"""将 TTS 预训练权重 (.bin) 转换为 safetensors。
-qwen-asr 分支将 transformers 升级到 4.57+，其安全策略 (CVE-2025-32434)
-拒绝在 torch < 2.6 上加载 pytorch_model.bin。transformers 加载时优先使用
-model.safetensors，因此本地转换一次即可，无需升级 torch。
-用法: python scripts/convert_tts_weights_to_safetensors.py
-"""
-from pathlib import Path
-import torch
-from safetensors.torch import save_file
-MOYOYO_PRETRAINED_PATH = Path(__file__).parent.parent / "assets" / "models" / "tts" / "moyoyo"
-PRETRAINED_DIRS = [
-    "chinese-roberta-wwm-ext-large",
-    "chinese-hubert-base",
-]
-def main():
-    for dirname in PRETRAINED_DIRS:
-        model_dir = MOYOYO_PRETRAINED_PATH / dirname
-        bin_path = model_dir / "pytorch_model.bin"
-        st_path = model_dir / "model.safetensors"
-        if st_path.exists():
-            print(f"已存在，跳过: {st_path}")
-            continue
-        if not bin_path.exists():
-            print(f"找不到权重文件: {bin_path}")
-            continue
-        state_dict = torch.load(bin_path, map_location="cpu", weights_only=True)
-        # clone 断开共享内存，safetensors 不允许张量间共享存储
-        state_dict = {
-            key: value.clone().contiguous()
-            for key, value in state_dict.items()
-            if isinstance(value, torch.Tensor)
-        }
-        save_file(state_dict, st_path, metadata={"format": "pt"})
-        print(f"{dirname}: {len(state_dict)} tensors -> {st_path.stat().st_size // 1024 ** 2} MB")
-if __name__ == "__main__":
-    main()

src/voice_dialogue/api/app.py CHANGED Viewed

@@ -59,8 +59,7 @@ def _register_routes(app: FastAPI):
     v1_router.include_router(settings_routes.router, prefix="/settings", tags=["设置管理"])
     app.include_router(v1_router)
-    # starlette >= 1.0 移除了 add_websocket_route；ws 路由器自带完整路径，直接 include
-    app.include_router(websocket_routes.ws)
     # 根路径和健康检查
     _register_health_routes(app)

     v1_router.include_router(settings_routes.router, prefix="/settings", tags=["设置管理"])
     app.include_router(v1_router)
+    app.add_websocket_route("/api/v1/ws", websocket_routes.ws)
     # 根路径和健康检查
     _register_health_routes(app)

src/voice_dialogue/api/core/lifespan.py CHANGED Viewed

@@ -24,8 +24,8 @@ class LifespanManager:
         startup_start_time = time.time()
         try:
-            # 初始化系统语言：产品默认使用中文（不随操作系统语言变化）
-            system_language = 'zh'
             logger.info(f"系统默认语言: {system_language}")
             # 初始化TTS配置

         startup_start_time = time.time()
         try:
+            # 初始化系统语言
+            system_language = get_system_language()
             logger.info(f"系统默认语言: {system_language}")
             # 初始化TTS配置

src/voice_dialogue/api/core/service_factories.py CHANGED Viewed

@@ -12,15 +12,11 @@ class ServiceFactories:
     """服务工厂类，封装所有服务的创建逻辑"""
     @staticmethod
-    def create_audio_capture(
-            enable_echo_cancellation: bool = True,
-            input_device_index: int = None,
-    ) -> AudioCapture:
         """创建音频捕获服务"""
         return AudioCapture(
             audio_frames_queue=audio_frames_queue,
-            enable_echo_cancellation=enable_echo_cancellation,
-            input_device_index=input_device_index,
         )
     @staticmethod
@@ -134,14 +130,11 @@ def get_core_voice_service_definitions(system_language: str, tts_config: BaseTTS
     ]
-def get_audio_capture_service_definition(
-        enable_echo_cancellation: bool = True,
-        input_device_index: int = None,
-) -> ServiceDefinition:
     """获取音频捕获服务定义"""
     return ServiceDefinition(
         name="audio_capture",
-        factory=lambda: ServiceFactories.create_audio_capture(enable_echo_cancellation, input_device_index),
         dependencies=[],
         health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
     )

     """服务工厂类，封装所有服务的创建逻辑"""
     @staticmethod
+    def create_audio_capture(enable_echo_cancellation: bool = True) -> AudioCapture:
         """创建音频捕获服务"""
         return AudioCapture(
             audio_frames_queue=audio_frames_queue,
+            enable_echo_cancellation=enable_echo_cancellation
         )
     @staticmethod
     ]
+def get_audio_capture_service_definition(enable_echo_cancellation: bool = True) -> ServiceDefinition:
     """获取音频捕获服务定义"""
     return ServiceDefinition(
         name="audio_capture",
+        factory=lambda: ServiceFactories.create_audio_capture(enable_echo_cancellation),
         dependencies=[],
         health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
     )

src/voice_dialogue/api/routes/system_routes.py CHANGED Viewed

@@ -3,33 +3,15 @@ import time
 from fastapi import APIRouter, HTTPException, BackgroundTasks, Request
-from voice_dialogue.audio.capture import resolves_to_native_aec
-from voice_dialogue.audio.devices import (
-    list_input_devices, get_default_input_device_index, is_valid_input_device,
-    list_output_devices, get_default_output_device_index, is_valid_output_device,
-)
-from voice_dialogue.config.audio_config import (
-    get_input_device_index, save_input_device_index,
-    get_output_device_index, save_output_device_index,
-)
 from voice_dialogue.core.constants import session_manager
 from voice_dialogue.utils.logger import logger
 from ..core.service_factories import get_audio_capture_service_definition, get_speech_monitor_service_definition
 from ..schemas.system_schemas import (
-    SystemStatusResponse, SystemResponse, SystemStartRequest,
-    AudioInputDevicesResponse, AudioInputDevice, AudioOutputDevice, ASREngineResponse,
-    OutputDeviceRequest
 )
 router = APIRouter()
-# ASR 引擎注册名 -> 展示名称
-ASR_ENGINE_DISPLAY_NAMES = {
-    'qwen': 'Qwen3-ASR-1.7B',
-    'funasr': 'FunASR Paraformer',
-    'whisper': 'Whisper medium',
-}
 # 全局系统状态
 _system_status = {
     "status": "stopped",
@@ -78,59 +60,6 @@ async def get_system_status(request: Request):
         raise HTTPException(status_code=500, detail=f"获取系统状态失败: {str(e)}")
-@router.get("/audio-devices", response_model=AudioInputDevicesResponse, summary="获取可用音频输入设备")
-async def get_audio_devices():
-    """
-    列出系统所有可用的音频输入设备（含外置麦克风/麦克风阵列），
-    供前端选择采集设备。
-    """
-    try:
-        devices = [AudioInputDevice(**d) for d in list_input_devices()]
-        output_devices = [AudioOutputDevice(**d) for d in list_output_devices()]
-        return AudioInputDevicesResponse(
-            devices=devices,
-            current_device_index=get_input_device_index(),
-            default_device_index=get_default_input_device_index(),
-            output_devices=output_devices,
-            current_output_device_index=get_output_device_index(),
-            default_output_device_index=get_default_output_device_index(),
-        )
-    except Exception as e:
-        logger.error(f"获取音频输入设备失败: {e}", exc_info=True)
-        raise HTTPException(status_code=500, detail=f"获取音频输入设备失败: {str(e)}")
-@router.post("/audio-output-device", response_model=SystemResponse, summary="设置音频输出设备")
-async def set_audio_output_device(request: OutputDeviceRequest):
-    """
-    保存输出设备选择。播放服务在每次播放时读取该设置，
-    会话进行中修改也会在下一句生效，无需重启。
-    """
-    output_device_index = request.output_device_index
-    if not is_valid_output_device(output_device_index):
-        raise HTTPException(status_code=400, detail=f"无效的输出设备索引: {output_device_index}")
-    if not save_output_device_index(output_device_index):
-        raise HTTPException(status_code=500, detail="保存输出设备设置失败")
-    return SystemResponse(success=True, message="输出设备已更新")
-@router.get("/asr-engine", response_model=ASREngineResponse, summary="获取当前 ASR 引擎")
-async def get_asr_engine():
-    """
-    返回当前生效的 ASR 引擎（语言映射 + 展示名称），
-    供前端在首页和关于页显示实际使用的识别模型。
-    """
-    try:
-        from voice_dialogue.asr import asr_manager
-        mappings = asr_manager.get_asr_statistics()['language_mappings']
-        engines = sorted(set(mappings.values()))
-        display_name = ' + '.join(ASR_ENGINE_DISPLAY_NAMES.get(engine, engine) for engine in engines)
-        return ASREngineResponse(mappings=mappings, display_name=display_name)
-    except Exception as e:
-        logger.error(f"获取ASR引擎信息失败: {e}", exc_info=True)
-        raise HTTPException(status_code=500, detail=f"获取ASR引擎信息失败: {str(e)}")
 @router.post("/start", response_model=SystemResponse, summary="启动系统")
 async def start_system(
         request: SystemStartRequest,
@@ -147,30 +76,6 @@ async def start_system(
                 message="系统已经在运行中或正在启动"
             )
-        # 解析输入设备：请求未指定时回退到已保存的设备
-        input_device_index = request.input_device_index
-        if input_device_index is None:
-            input_device_index = get_input_device_index()
-        if not is_valid_input_device(input_device_index):
-            logger.warning(f"请求的输入设备 {input_device_index} 无效，回退到系统默认设备")
-            input_device_index = None
-        # 持久化用户选择，供下次启动复用
-        save_input_device_index(input_device_index)
-        # 解析输出设备：请求未指定时回退到已保存的设备
-        output_device_index = request.output_device_index
-        if output_device_index is None:
-            output_device_index = get_output_device_index()
-        if not is_valid_output_device(output_device_index):
-            logger.warning(f"请求的输出设备 {output_device_index} 无效，回退到系统默认设备")
-            output_device_index = None
-        # 播放服务在每次播放时读取该设置，保存即生效
-        save_output_device_index(output_device_index)
         # 更新状态
         _system_status["status"] = "starting"
         session_manager.reset_id()
@@ -179,8 +84,7 @@ async def start_system(
         background_tasks.add_task(
             _start_system_background,
             fastapi_request,
-            request.enable_echo_cancellation,
-            input_device_index,
         )
         return SystemResponse(
@@ -310,11 +214,7 @@ async def restart_system(
         raise HTTPException(status_code=500, detail=f"系统重启失败: {str(e)}")
-async def _start_system_background(
-        request: Request,
-        enable_echo_cancellation: bool = True,
-        input_device_index: int = None,
-):
     """
     后台启动系统的实际逻辑 - 创建并启动audio_capture服务
     """
@@ -357,9 +257,7 @@ async def _start_system_background(
             logger.info("语音监控服务已在运行")
         else:
             # 创建语音监控服务定义
-            # 仅当走 macOS 原生 AEC（自带 VAD）时关闭软件 VAD；
-            # 选择了外置设备走 PyAudio 时，必须启用软件 VAD。
-            enable_vad = not resolves_to_native_aec(enable_echo_cancellation, input_device_index)
             speech_monitor_def = get_speech_monitor_service_definition(enable_vad)
             # 启动语音监控服务
@@ -373,7 +271,7 @@ async def _start_system_background(
             logger.info("音频捕获服务已在运行")
         else:
             # 创建audio_capture服务定义
-            audio_capture_def = get_audio_capture_service_definition(enable_echo_cancellation, input_device_index)
             # 启动audio_capture服务
             success = service_manager.start_service(audio_capture_def)

 from fastapi import APIRouter, HTTPException, BackgroundTasks, Request
 from voice_dialogue.core.constants import session_manager
 from voice_dialogue.utils.logger import logger
 from ..core.service_factories import get_audio_capture_service_definition, get_speech_monitor_service_definition
 from ..schemas.system_schemas import (
+    SystemStatusResponse, SystemResponse, SystemStartRequest
 )
 router = APIRouter()
 # 全局系统状态
 _system_status = {
     "status": "stopped",
         raise HTTPException(status_code=500, detail=f"获取系统状态失败: {str(e)}")
 @router.post("/start", response_model=SystemResponse, summary="启动系统")
 async def start_system(
         request: SystemStartRequest,
                 message="系统已经在运行中或正在启动"
             )
         # 更新状态
         _system_status["status"] = "starting"
         session_manager.reset_id()
         background_tasks.add_task(
             _start_system_background,
             fastapi_request,
+            request.enable_echo_cancellation
         )
         return SystemResponse(
         raise HTTPException(status_code=500, detail=f"系统重启失败: {str(e)}")
+async def _start_system_background(request: Request, enable_echo_cancellation: bool = True):
     """
     后台启动系统的实际逻辑 - 创建并启动audio_capture服务
     """
             logger.info("语音监控服务已在运行")
         else:
             # 创建语音监控服务定义
+            enable_vad = not enable_echo_cancellation
             speech_monitor_def = get_speech_monitor_service_definition(enable_vad)
             # 启动语音监控服务
             logger.info("音频捕获服务已在运行")
         else:
             # 创建audio_capture服务定义
+            audio_capture_def = get_audio_capture_service_definition(enable_echo_cancellation)
             # 启动audio_capture服务
             success = service_manager.start_service(audio_capture_def)

src/voice_dialogue/api/schemas/system_schemas.py CHANGED Viewed

@@ -1,4 +1,4 @@
-from typing import Optional, Literal, Dict, Any, List
 from pydantic import BaseModel, Field
@@ -17,51 +17,10 @@ class SystemStatusResponse(BaseModel):
 class SystemStartRequest(BaseModel):
     """系统启动请求"""
-    enable_echo_cancellation: bool = Field(default=True, description="是否启用回声消除（仅在未指定输入设备时使用 macOS 原生 AEC）")
-    input_device_index: Optional[int] = Field(default=None, description="输入设备索引（如外置麦克风阵列）；为空则使用系统默认设备")
-    output_device_index: Optional[int] = Field(default=None, description="输出设备索引（如外置扬声器）；为空则使用系统默认设备")
-class AudioInputDevice(BaseModel):
-    """音频输入设备信息"""
-    index: int = Field(..., description="设备索引")
-    name: str = Field(..., description="设备名称")
-    max_input_channels: int = Field(..., description="最大输入通道数")
-    default_sample_rate: int = Field(..., description="设备默认采样率")
-    is_default: bool = Field(default=False, description="是否为系统默认输入设备")
-class AudioOutputDevice(BaseModel):
-    """音频输出设备信息"""
-    index: int = Field(..., description="设备索引")
-    name: str = Field(..., description="设备名称")
-    max_output_channels: int = Field(..., description="最大输出通道数")
-    default_sample_rate: int = Field(..., description="设备默认采样率")
-    is_default: bool = Field(default=False, description="是否为系统默认输出设备")
-class AudioInputDevicesResponse(BaseModel):
-    """音频设备列表响应（含输入与输出设备）"""
-    devices: List[AudioInputDevice] = Field(default_factory=list, description="可用输入设备列表")
-    current_device_index: Optional[int] = Field(default=None, description="当前已选择/保存的输入设备索引")
-    default_device_index: Optional[int] = Field(default=None, description="系统默认输入设备索引")
-    output_devices: List[AudioOutputDevice] = Field(default_factory=list, description="可用输出设备列表")
-    current_output_device_index: Optional[int] = Field(default=None, description="当前已选择/保存的输出设备索引")
-    default_output_device_index: Optional[int] = Field(default=None, description="系统默认输出设备索引")
 class SystemResponse(BaseModel):
     """系统操作响应"""
     success: bool = Field(..., description="操作是否成功")
     message: str = Field(..., description="响应消息")
-class OutputDeviceRequest(BaseModel):
-    """设置输出设备请求"""
-    output_device_index: Optional[int] = Field(default=None, description="输出设备索引；为空则使用系统默认设备")
-class ASREngineResponse(BaseModel):
-    """当前 ASR 引擎信息"""
-    mappings: Dict[str, str] = Field(default_factory=dict, description="语言到 ASR 引擎的映射，如 {'zh': 'qwen'}")
-    display_name: str = Field(..., description="当前 ASR 引擎的展示名称")

+from typing import Optional, Literal, Dict, Any
 from pydantic import BaseModel, Field
 class SystemStartRequest(BaseModel):
     """系统启动请求"""
+    enable_echo_cancellation: bool = Field(default=True, description="是否启用回声消除")
 class SystemResponse(BaseModel):
     """系统操作响应"""
     success: bool = Field(..., description="操作是否成功")
     message: str = Field(..., description="响应消息")

src/voice_dialogue/asr/manager.py CHANGED Viewed

@@ -1,6 +1,5 @@
 import importlib.util
 import inspect
-import os
 import re
 from dataclasses import dataclass
 from typing import Dict, Type, List, Literal, Optional
@@ -93,26 +92,11 @@ class ASRManager:
     def __init__(self):
         self._asr_instances: Dict[str, ASRInterface] = {}
-        # 默认使用 Qwen3-ASR；设置 VOICE_DIALOGUE_ASR=legacy 可切回原引擎做 A/B 对比
-        if os.environ.get('VOICE_DIALOGUE_ASR', 'qwen') == 'legacy':
-            self._language_to_asr_mapping = {
-                'zh': 'funasr',  # 中文优先使用FunASR
-                'en': 'whisper',  # 英文优先使用Whisper
-            }
-        else:
-            self._language_to_asr_mapping = {
-                'zh': 'qwen',
-                'en': 'qwen',
-            }
-    def _resolve_unregistered(self, language: str, asr_type: str) -> str:
-        """所选引擎未注册时（如 qwen-asr 未安装）回退到传统引擎。"""
-        fallback = {'zh': 'funasr', 'en': 'whisper'}.get(language)
-        if fallback and fallback in asr_tables.asr_classes:
-            logger.warning(f"ASR引擎 '{asr_type}' 未注册，回退到 '{fallback}'")
-            self._language_to_asr_mapping[language] = fallback
-            return fallback
-        return asr_type
     def create_asr(self, language: Literal['auto', 'zh', 'en']) -> ASRInterface:
         """
@@ -131,9 +115,6 @@ class ASRManager:
             # 根据语言选择合适的ASR引擎
             asr_type = self._get_asr_type_for_language(language)
-            if asr_type not in asr_tables.asr_classes:
-                asr_type = self._resolve_unregistered(language, asr_type)
             if asr_type not in asr_tables.asr_classes:
                 raise ValueError(f"ASR类型 '{asr_type}' 未注册")

 import importlib.util
 import inspect
 import re
 from dataclasses import dataclass
 from typing import Dict, Type, List, Literal, Optional
     def __init__(self):
         self._asr_instances: Dict[str, ASRInterface] = {}
+        self._language_to_asr_mapping = {
+            'zh': 'funasr',  # 中文优先使用FunASR
+            'en': 'whisper',  # 英文优先使用Whisper
+            # 'auto': 'whisper',  # 自动检测默认使用Whisper
+        }
     def create_asr(self, language: Literal['auto', 'zh', 'en']) -> ASRInterface:
         """
             # 根据语言选择合适的ASR引擎
             asr_type = self._get_asr_type_for_language(language)
             if asr_type not in asr_tables.asr_classes:
                 raise ValueError(f"ASR类型 '{asr_type}' 未注册")

src/voice_dialogue/asr/models/__init__.py CHANGED Viewed

@@ -19,12 +19,3 @@ except ImportError as e:
     from voice_dialogue.utils.logger import logger
     logger.warning(f"Failed to import some Whisper implementations: {e}")
-try:
-    from .qwen import QwenASRClient
-    __all__.append('QwenASRClient')
-except ImportError as e:
-    from voice_dialogue.utils.logger import logger
-    logger.warning(f"Failed to import some Qwen ASR implementations: {e}")


19	from voice_dialogue.utils.logger import logger
20
21	logger.warning(f"Failed to import some Whisper implementations: {e}")

src/voice_dialogue/asr/models/qwen.py DELETED Viewed

@@ -1,76 +0,0 @@
-import os
-import typing
-import numpy as np
-import torch
-from qwen_asr import Qwen3ASRModel
-from voice_dialogue.asr.manager import asr_tables
-from voice_dialogue.asr.models.base import ASRInterface
-from voice_dialogue.asr.utils import ensure_minimum_audio_duration
-from voice_dialogue.config import paths
-from voice_dialogue.utils.logger import logger
-# 内置模型目录（打包分发时随应用携带，存在则离线加载）
-BUILTIN_QWEN_ASR_MODEL_PATH = paths.ASR_MODELS_PATH / 'qwen3-asr-1.7b'
-TARGET_SAMPLE_RATE = 16000
-def resolve_model_path() -> str:
-    """模型来源优先级：环境变量 > 内置目录 > HuggingFace 自动下载。"""
-    env_model = os.environ.get('QWEN_ASR_MODEL')
-    if env_model:
-        return env_model
-    if (BUILTIN_QWEN_ASR_MODEL_PATH / 'config.json').exists():
-        return BUILTIN_QWEN_ASR_MODEL_PATH.as_posix()
-    return 'Qwen/Qwen3-ASR-1.7B'
-@asr_tables.register('asr_classes', 'qwen')
-class QwenASRClient(ASRInterface):
-    """Qwen3-ASR 客户端（transformers 后端，macOS 上使用 MPS 加速）"""
-    supported_langs = ['zh', 'en']
-    def __init__(self):
-        super().__init__()
-        self.model: typing.Optional[Qwen3ASRModel] = None
-    def setup(self, **kwargs) -> None:
-        model_name = kwargs.get('model') or resolve_model_path()
-        if torch.backends.mps.is_available():
-            device_map, dtype = 'mps', torch.bfloat16
-        elif torch.cuda.is_available():
-            device_map, dtype = 'cuda:0', torch.bfloat16
-        else:
-            device_map, dtype = 'cpu', torch.float32
-        logger.info(f'[INFO] Loading Qwen3-ASR model: {model_name} (device={device_map}, dtype={dtype})')
-        self.model = Qwen3ASRModel.from_pretrained(
-            model_name,
-            dtype=dtype,
-            device_map=device_map,
-            max_inference_batch_size=1,
-            max_new_tokens=256,
-        )
-    def warmup(self) -> None:
-        logger.info('[INFO] Warming up Qwen3-ASR model...')
-        try:
-            self.transcribe(self.warmup_audiodata)
-            logger.info('[INFO] Qwen3-ASR model warmed up.')
-        except Exception as e:
-            logger.warning(f'[WARNING] Qwen3-ASR model warmup failed: {e}')
-    def transcribe(self, audio_array: np.ndarray, language: str = None) -> str:
-        audio_array = ensure_minimum_audio_duration(audio_array)
-        # 始终使用自动语种检测：指定语言会强制模型"只输出转写文本"，
-        # 静音/噪声段会被迫编出幻听文字；自动模式下非语音段返回空串，
-        # 由上游丢弃，从根上消除幻听。
-        results = self.model.transcribe(
-            audio=(audio_array, TARGET_SAMPLE_RATE),
-            language=None,
-        )
-        return ' '.join(result.text for result in results).strip()

src/voice_dialogue/audio/capture/__init__.py CHANGED Viewed

@@ -4,43 +4,12 @@
 根据配置选择并管理具体的音频捕获策略。
 """
 from multiprocessing import Queue
-from typing import Optional
 from voice_dialogue.utils.logger import logger
 from .aec_capture import AecCapture
 from .pyaudio_capture import PyAudioCapture
-def resolves_to_native_aec(
-        enable_echo_cancellation: bool,
-        input_device_index: Optional[int] = None,
-) -> bool:
-    """
-    判断在给定配置下是否会使用 macOS 原生 AEC 采集策略。
-    原生 AEC 库作用于系统默认输入设备，且自带 VAD。因此当满足以下任一情况时使用原生 AEC：
-      - 启用回声消除且未指定具体输入设备（隐式使用默认设备）；
-      - 启用回声消除且所选设备恰好就是系统默认输入设备
-        （原生 AEC 本就采集默认设备，等价覆盖）。
-    只有当选择了"非默认"输入设备（如外置麦克风阵列）时，才退化为 PyAudio 策略——
-    此时回声消除依赖设备自身硬件，语音活动检测改用软件 VAD。
-    上层据此决定 SpeechStateMonitor 是否需要启用软件 VAD
-    （enable_vad = not resolves_to_native_aec(...)）。
-    """
-    if not enable_echo_cancellation:
-        return False
-    if input_device_index is None:
-        return True
-    # 所选设备即系统默认设备时，仍可使用原生 AEC
-    try:
-        from voice_dialogue.audio.devices import get_default_input_device_index
-        return input_device_index == get_default_input_device_index()
-    except Exception:
-        return False
 class AudioCapture:
     """
     音频捕获器门面 (Facade)。
@@ -54,44 +23,29 @@ class AudioCapture:
             self,
             audio_frames_queue: Queue,
             enable_echo_cancellation: bool = True,
-            input_device_index: Optional[int] = None,
-            channels: Optional[int] = None,
     ):
         """
         初始化音频捕获器。
         Args:
             audio_frames_queue (Queue): 用于存放捕获的音频帧的队列。
-            enable_echo_cancellation (bool): 是否启用回声消除功能。仅在未指定
-                                             input_device_index 时生效（使用 macOS
-                                             原生 AEC 库，作用于系统默认输入设备）。
-            input_device_index (Optional[int]): 指定的输入设备索引（如外置麦克风阵列）。
-                                                一旦指定，则使用 PyAudio 策略采集该设备，
-                                                回声消除依赖设备硬件。
-            channels (Optional[int]): 采集通道数（仅 PyAudio 策略生效，多通道会降混为单声道）。
         """
-        use_native_aec = resolves_to_native_aec(enable_echo_cancellation, input_device_index)
         self._strategy = None
         try:
-            if use_native_aec:
                 self._strategy = AecCapture(audio_frames_queue=audio_frames_queue)
             else:
-                self._strategy = PyAudioCapture(
-                    audio_frames_queue=audio_frames_queue,
-                    input_device_index=input_device_index,
-                    channels=channels,
-                )
             logger.info(f"音频捕获策略已选择: {self._strategy.__class__.__name__}")
         except Exception as e:
             logger.error(
-                f"初始化 {AecCapture.__name__ if use_native_aec else PyAudioCapture.__name__} 失败: {e}, 将回退到 PyAudio。")
             # 只有在尝试 AEC 失败时才回退
             if not isinstance(self._strategy, PyAudioCapture):
-                self._strategy = PyAudioCapture(
-                    audio_frames_queue=audio_frames_queue,
-                    input_device_index=input_device_index,
-                    channels=channels,
-                )
                 logger.info(f"已回退到音频捕获策略: {self._strategy.__class__.__name__}")
     def start(self):

 根据配置选择并管理具体的音频捕获策略。
 """
 from multiprocessing import Queue
 from voice_dialogue.utils.logger import logger
 from .aec_capture import AecCapture
 from .pyaudio_capture import PyAudioCapture
 class AudioCapture:
     """
     音频捕获器门面 (Facade)。
             self,
             audio_frames_queue: Queue,
             enable_echo_cancellation: bool = True,
     ):
         """
         初始化音频捕获器。
         Args:
             audio_frames_queue (Queue): 用于存放捕获的音频帧的队列。
+            enable_echo_cancellation (bool): 是否启用回声消除功能。
+                                             若为 True，则使用 AEC 原生库；
+                                             否则，使用 PyAudio。
         """
         self._strategy = None
         try:
+            if enable_echo_cancellation:
                 self._strategy = AecCapture(audio_frames_queue=audio_frames_queue)
             else:
+                self._strategy = PyAudioCapture(audio_frames_queue=audio_frames_queue)
             logger.info(f"音频捕获策略已选择: {self._strategy.__class__.__name__}")
         except Exception as e:
             logger.error(
+                f"初始化 {AecCapture.__name__ if enable_echo_cancellation else PyAudioCapture.__name__} 失败: {e}, 将回退到 PyAudio。")
             # 只有在尝试 AEC 失败时才回退
             if not isinstance(self._strategy, PyAudioCapture):
+                self._strategy = PyAudioCapture(audio_frames_queue=audio_frames_queue)
                 logger.info(f"已回退到音频捕获策略: {self._strategy.__class__.__name__}")
     def start(self):

src/voice_dialogue/audio/capture/pyaudio_capture.py CHANGED Viewed

@@ -1,130 +1,41 @@
 from multiprocessing import Queue
-from typing import Optional
-import numpy as np
 import pyaudio
 from voice_dialogue.utils.logger import logger
 from .base_capture import BaseCapture
-# 下游 ASR / VAD 统一要求 16kHz 单声道 int16 音频
-TARGET_SAMPLE_RATE = 16000
 class PyAudioCapture(BaseCapture):
     """
     使用 PyAudio 进行标准的音频采集策略。
-    支持选择指定的输入设备（如外置麦克风阵列），并自动将多通道、
-    非 16kHz 的输入降混并重采样为下游所需的 16kHz 单声道 int16 数据。
     """
-    def __init__(
-            self,
-            audio_frames_queue: Queue,
-            input_device_index: Optional[int] = None,
-            channels: Optional[int] = None,
-            **kwargs
-    ):
-        """
-        Args:
-            audio_frames_queue (Queue): 用于存放捕获的音频帧的队列。
-            input_device_index (Optional[int]): 输入设备索引；None 表示使用系统默认设备。
-            channels (Optional[int]): 采集通道数；None 表示自动使用设备支持的最大通道数
-                                      （麦克风阵列通常为多通道，采集后会降混为单声道）。
-        """
         super().__init__(audio_frames_queue=audio_frames_queue, **kwargs)
-        self.input_device_index = input_device_index
-        self.requested_channels = channels
-    def _resolve_device_params(self, p: pyaudio.PyAudio):
-        """根据所选设备解析采集通道数与采集采样率。"""
-        # 默认参数（系统默认设备、单声道、16kHz）
-        device_index = self.input_device_index
-        channels = self.requested_channels or 1
-        sample_rate = TARGET_SAMPLE_RATE
-        try:
-            if device_index is None:
-                device_index = int(p.get_default_input_device_info().get("index"))
-            info = p.get_device_info_by_index(device_index)
-            max_channels = int(info.get("maxInputChannels", 1)) or 1
-            # 未显式指定通道数时，采集设备的全部通道再降混（适配麦克风阵列）
-            if self.requested_channels is None:
-                channels = max_channels
-            else:
-                channels = min(self.requested_channels, max_channels)
-            # 优先尝试 16kHz；若设备不支持则采用设备默认采样率，后续重采样
-            device_rate = int(info.get("defaultSampleRate", TARGET_SAMPLE_RATE))
-            if not p.is_format_supported(
-                    rate=TARGET_SAMPLE_RATE,
-                    input_device=device_index,
-                    input_channels=channels,
-                    input_format=pyaudio.paInt16,
-            ):
-                sample_rate = device_rate
-        except Exception as e:
-            logger.warning(f"解析输入设备参数失败，回退到默认设备/单声道/16kHz: {e}")
-            device_index = self.input_device_index
-            channels = 1
-            sample_rate = TARGET_SAMPLE_RATE
-        return device_index, channels, sample_rate
     def _init_pyaudio(self):
         """初始化 PyAudio 并返回实例和配置。"""
         p = pyaudio.PyAudio()
-        device_index, channels, sample_rate = self._resolve_device_params(p)
-        # 采集块大小按采集采样率取约 64ms，保证重采样后帧长足够 VAD 处理
-        chunk = max(1024, int(sample_rate * 0.064))
-        logger.info(
-            f"PyAudio 采集配置: device_index={device_index}, channels={channels}, "
-            f"sample_rate={sample_rate} -> {TARGET_SAMPLE_RATE}, chunk={chunk}"
-        )
-        return p, chunk, sample_rate, channels, device_index
-    def _open_stream(self, p, chunk, sample_rate, channels, device_index):
         """打开 PyAudio 音频流。"""
         return p.open(
             format=pyaudio.paInt16,
-            channels=channels,
             rate=sample_rate,
             input=True,
-            input_device_index=device_index,
             frames_per_buffer=chunk,
         )
-    def _to_mono_16k(self, data: bytes, channels: int, sample_rate: int) -> Optional[bytes]:
-        """将原始多通道/任意采样率的 int16 数据降混并重采样为 16kHz 单声道 int16。"""
-        samples = np.frombuffer(data, dtype=np.int16)
-        if samples.size == 0:
-            return None
-        # 多通道降混为单声道（按通道求平均）
-        if channels > 1:
-            frame_count = samples.size // channels
-            if frame_count == 0:
-                return None
-            samples = samples[:frame_count * channels].reshape(-1, channels)
-            mono = samples.astype(np.float32).mean(axis=1)
-        else:
-            mono = samples.astype(np.float32)
-        # 重采样到 16kHz
-        if sample_rate != TARGET_SAMPLE_RATE:
-            import soxr
-            mono = soxr.resample(mono, sample_rate, TARGET_SAMPLE_RATE)
-        return np.clip(mono, -32768, 32767).astype(np.int16).tobytes()
-    def _capture_loop(self, stream, chunk, channels, sample_rate):
         """PyAudio 音频捕获的主循环。"""
         logger.info("使用 PyAudio 开始音频采集...")
         self.is_ready = True
-        needs_processing = channels > 1 or sample_rate != TARGET_SAMPLE_RATE
         while not self.is_exited:
             data = stream.read(chunk, exception_on_overflow=False)
             if data is None:
@@ -133,11 +44,6 @@ class PyAudioCapture(BaseCapture):
             if self.is_paused:
                 continue
-            if needs_processing:
-                data = self._to_mono_16k(data, channels, sample_rate)
-                if data is None:
-                    continue
             self.audio_frames_queue.put(data)
     def _cleanup(self, stream, p):
@@ -151,11 +57,11 @@ class PyAudioCapture(BaseCapture):
         """
         线程主循环，执行 PyAudio 音频采集。
         """
-        p, chunk, sample_rate, channels, device_index = self._init_pyaudio()
         stream = None
         try:
-            stream = self._open_stream(p, chunk, sample_rate, channels, device_index)
-            self._capture_loop(stream, chunk, channels, sample_rate)
         except Exception as e:
             logger.error(f'PyAudio 音频捕获器运行时发生错误: {e}')
         finally:

 from multiprocessing import Queue
 import pyaudio
 from voice_dialogue.utils.logger import logger
 from .base_capture import BaseCapture
 class PyAudioCapture(BaseCapture):
     """
     使用 PyAudio 进行标准的音频采集策略。
     """
+    def __init__(self, audio_frames_queue: Queue, **kwargs):
         super().__init__(audio_frames_queue=audio_frames_queue, **kwargs)
     def _init_pyaudio(self):
         """初始化 PyAudio 并返回实例和配置。"""
         p = pyaudio.PyAudio()
+        chunk = 1024
+        sample_rate = 16000
+        return p, chunk, sample_rate
+    def _open_stream(self, p, chunk, sample_rate):
         """打开 PyAudio 音频流。"""
         return p.open(
             format=pyaudio.paInt16,
+            channels=1,
             rate=sample_rate,
             input=True,
             frames_per_buffer=chunk,
         )
+    def _capture_loop(self, stream, chunk):
         """PyAudio 音频捕获的主循环。"""
         logger.info("使用 PyAudio 开始音频采集...")
         self.is_ready = True
         while not self.is_exited:
             data = stream.read(chunk, exception_on_overflow=False)
             if data is None:
             if self.is_paused:
                 continue
             self.audio_frames_queue.put(data)
     def _cleanup(self, stream, p):
         """
         线程主循环，执行 PyAudio 音频采集。
         """
+        p, chunk, sample_rate = self._init_pyaudio()
         stream = None
         try:
+            stream = self._open_stream(p, chunk, sample_rate)
+            self._capture_loop(stream, chunk)
         except Exception as e:
             logger.error(f'PyAudio 音频捕获器运行时发生错误: {e}')
         finally:

src/voice_dialogue/audio/devices.py DELETED Viewed

@@ -1,167 +0,0 @@
-"""
-音频设备枚举工具。
-提供列出系统可用输入/输出设备（包括外置麦克风阵列、外置扬声器）的能力，
-供 CLI、API 以及前端进行设备选择。
-"""
-from typing import List, Optional, TypedDict
-import pyaudio
-from voice_dialogue.utils.logger import logger
-class InputDeviceInfo(TypedDict):
-    """输入设备信息。"""
-    index: int
-    name: str
-    max_input_channels: int
-    default_sample_rate: int
-    is_default: bool
-class OutputDeviceInfo(TypedDict):
-    """输出设备信息。"""
-    index: int
-    name: str
-    max_output_channels: int
-    default_sample_rate: int
-    is_default: bool
-def _get_default_input_index(p: pyaudio.PyAudio) -> Optional[int]:
-    """获取系统默认输入设备索引，失败时返回 None。"""
-    try:
-        return int(p.get_default_input_device_info().get("index"))
-    except Exception:
-        return None
-def list_input_devices() -> List[InputDeviceInfo]:
-    """
-    列出所有可用的音频输入设备。
-    Returns:
-        List[InputDeviceInfo]: 输入设备列表（仅包含 maxInputChannels > 0 的设备）。
-    """
-    devices: List[InputDeviceInfo] = []
-    p = pyaudio.PyAudio()
-    try:
-        default_index = _get_default_input_index(p)
-        for i in range(p.get_device_count()):
-            try:
-                info = p.get_device_info_by_index(i)
-            except Exception as e:
-                logger.warning(f"读取音频设备 {i} 信息失败: {e}")
-                continue
-            max_input_channels = int(info.get("maxInputChannels", 0))
-            if max_input_channels <= 0:
-                continue
-            devices.append(
-                InputDeviceInfo(
-                    index=int(info.get("index", i)),
-                    name=str(info.get("name", f"device-{i}")),
-                    max_input_channels=max_input_channels,
-                    default_sample_rate=int(info.get("defaultSampleRate", 16000)),
-                    is_default=(int(info.get("index", i)) == default_index),
-                )
-            )
-    finally:
-        p.terminate()
-    return devices
-def get_default_input_device_index() -> Optional[int]:
-    """获取系统默认输入设备索引。"""
-    p = pyaudio.PyAudio()
-    try:
-        return _get_default_input_index(p)
-    finally:
-        p.terminate()
-def is_valid_input_device(index: Optional[int]) -> bool:
-    """
-    校验给定索引是否为有效的输入设备。
-    Args:
-        index: 设备索引；None 表示使用系统默认设备，视为有效。
-    Returns:
-        bool: 是否有效。
-    """
-    if index is None:
-        return True
-    return any(d["index"] == index for d in list_input_devices())
-def _get_default_output_index(p: pyaudio.PyAudio) -> Optional[int]:
-    """获取系统默认输出设备索引，失败时返回 None。"""
-    try:
-        return int(p.get_default_output_device_info().get("index"))
-    except Exception:
-        return None
-def list_output_devices() -> List[OutputDeviceInfo]:
-    """
-    列出所有可用的音频输出设备。
-    Returns:
-        List[OutputDeviceInfo]: 输出设备列表（仅包含 maxOutputChannels > 0 的设备）。
-    """
-    devices: List[OutputDeviceInfo] = []
-    p = pyaudio.PyAudio()
-    try:
-        default_index = _get_default_output_index(p)
-        for i in range(p.get_device_count()):
-            try:
-                info = p.get_device_info_by_index(i)
-            except Exception as e:
-                logger.warning(f"读取音频设备 {i} 信息失败: {e}")
-                continue
-            max_output_channels = int(info.get("maxOutputChannels", 0))
-            if max_output_channels <= 0:
-                continue
-            devices.append(
-                OutputDeviceInfo(
-                    index=int(info.get("index", i)),
-                    name=str(info.get("name", f"device-{i}")),
-                    max_output_channels=max_output_channels,
-                    default_sample_rate=int(info.get("defaultSampleRate", 48000)),
-                    is_default=(int(info.get("index", i)) == default_index),
-                )
-            )
-    finally:
-        p.terminate()
-    return devices
-def get_default_output_device_index() -> Optional[int]:
-    """获取系统默认输出设备索引。"""
-    p = pyaudio.PyAudio()
-    try:
-        return _get_default_output_index(p)
-    finally:
-        p.terminate()
-def is_valid_output_device(index: Optional[int]) -> bool:
-    """
-    校验给定索引是否为有效的输出设备。
-    Args:
-        index: 设备索引；None 表示使用系统默认设备，视为有效。
-    Returns:
-        bool: 是否有效。
-    """
-    if index is None:
-        return True
-    return any(d["index"] == index for d in list_output_devices())

src/voice_dialogue/audio/player.py CHANGED Viewed

@@ -1,78 +1,10 @@
 import tempfile
-from typing import Optional
-import numpy as np
 import soundfile as sf
 from playsound import playsound
-from voice_dialogue.utils.logger import logger
-def _to_int16(audio_data) -> np.ndarray:
-    """将音频数据规整为一维 int16。"""
-    audio = np.asarray(audio_data)
-    if audio.ndim > 1:
-        audio = audio.mean(axis=-1)
-    if audio.dtype != np.int16:
-        audio = np.clip(audio, -1.0, 1.0)
-        audio = (audio * 32767.0).astype(np.int16)
-    return audio
-def _play_via_pyaudio(audio_data, sample_rate: int, output_device_index: int):
-    """通过 PyAudio 输出流播放，支持指定输出设备。"""
-    import pyaudio
-    audio = _to_int16(audio_data)
-    p = pyaudio.PyAudio()
-    try:
-        # 设备不支持该采样率时，重采样到设备默认采样率
-        try:
-            p.is_format_supported(
-                rate=sample_rate,
-                output_device=output_device_index,
-                output_channels=1,
-                output_format=pyaudio.paInt16,
-            )
-        except Exception:
-            device_rate = int(p.get_device_info_by_index(output_device_index).get("defaultSampleRate", 48000))
-            logger.info(f"输出设备不支持 {sample_rate}Hz，重采样到 {device_rate}Hz")
-            import soxr
-            audio = soxr.resample(audio, sample_rate, device_rate).astype(np.int16)
-            sample_rate = device_rate
-        stream = p.open(
-            format=pyaudio.paInt16,
-            channels=1,
-            rate=sample_rate,
-            output=True,
-            output_device_index=output_device_index,
-        )
-        try:
-            stream.write(audio.tobytes())
-        finally:
-            stream.stop_stream()
-            stream.close()
-    finally:
-        p.terminate()
-def play_audio(audio_data, sample_rate=16000, output_device_index: Optional[int] = None):
-    """播放音频。
-    Args:
-        audio_data: 音频数据
-        sample_rate: 采样率
-        output_device_index: 输出设备索引；None 表示系统默认设备
-    """
-    if output_device_index is not None:
-        try:
-            _play_via_pyaudio(audio_data, sample_rate, output_device_index)
-            return
-        except Exception as e:
-            logger.warning(f"指定输出设备 {output_device_index} 播放失败，回退到系统默认设备: {e}")
     with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
         sf.write(soundfile, audio_data, samplerate=sample_rate, subtype='PCM_16', closefd=False)
         playsound(soundfile.name, block=True)

 import tempfile
 import soundfile as sf
 from playsound import playsound
+def play_audio(audio_data, sample_rate=16000):
     with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
         sf.write(soundfile, audio_data, samplerate=sample_rate, subtype='PCM_16', closefd=False)
         playsound(soundfile.name, block=True)

src/voice_dialogue/cli/args.py CHANGED Viewed

@@ -74,20 +74,6 @@ def create_argument_parser():
         default=False,
         help='禁用回声消除功能 (默认: 不禁用)'
     )
-    cli_group.add_argument(
-        '--input-device', '-i',
-        type=int,
-        default=None,
-        metavar='INDEX',
-        help='指定输入设备索引（如外置麦克风阵列）。多通道会自动降混为单声道；'
-             '指定后回声消除依赖设备硬件。用 --list-audio-devices 查看可用索引。'
-    )
-    cli_group.add_argument(
-        '--list-audio-devices',
-        action='store_true',
-        default=False,
-        help='列出可用的音频输入设备及其索引后退出'
-    )
     # API服务器模式参数
     api_group = parser.add_argument_group('API服务器模式参数')

         default=False,
         help='禁用回声消除功能 (默认: 不禁用)'
     )
     # API服务器模式参数
     api_group = parser.add_argument_group('API服务器模式参数')

src/voice_dialogue/config/audio_config.py DELETED Viewed

@@ -1,77 +0,0 @@
-"""音频设备配置管理模块。
-持久化用户选择的输入设备（如外置麦克风阵列），在重启后自动复用。
-"""
-import json
-from typing import Optional, TypedDict
-from voice_dialogue.utils.logger import logger
-from .paths import AUDIO_SETTINGS_PATH
-class AudioSettings(TypedDict, total=False):
-    """音频设置。"""
-    input_device_index: Optional[int]
-    output_device_index: Optional[int]
-_audio_settings_cache: Optional[AudioSettings] = None
-def get_audio_settings() -> AudioSettings:
-    """加载用户音频设置（带内存缓存）。"""
-    global _audio_settings_cache
-    if _audio_settings_cache is not None:
-        return _audio_settings_cache
-    if not AUDIO_SETTINGS_PATH.exists():
-        _audio_settings_cache = {}
-        return _audio_settings_cache
-    try:
-        with open(AUDIO_SETTINGS_PATH, "r", encoding="utf-8") as f:
-            _audio_settings_cache = json.load(f)
-    except (json.JSONDecodeError, IOError) as e:
-        logger.error(f"无法加载音频设置，使用空配置: {e}")
-        _audio_settings_cache = {}
-    return _audio_settings_cache
-def get_input_device_index() -> Optional[int]:
-    """获取已保存的输入设备索引；未配置时返回 None（系统默认设备）。"""
-    value = get_audio_settings().get("input_device_index")
-    return int(value) if value is not None else None
-def _save_audio_setting(key: str, value: Optional[int]) -> bool:
-    """保存单项音频设置并刷新缓存。"""
-    global _audio_settings_cache
-    settings = dict(get_audio_settings())
-    settings[key] = value
-    try:
-        if not AUDIO_SETTINGS_PATH.parent.exists():
-            AUDIO_SETTINGS_PATH.parent.mkdir(parents=True, exist_ok=True)
-        with open(AUDIO_SETTINGS_PATH, "w", encoding="utf-8") as f:
-            json.dump(settings, f, ensure_ascii=False, indent=4)
-        _audio_settings_cache = settings  # type: ignore[assignment]
-        logger.info(f"音频设置已保存: {key}={value}")
-        return True
-    except IOError as e:
-        logger.error(f"无法保存音频设置: {e}")
-        return False
-def save_input_device_index(input_device_index: Optional[int]) -> bool:
-    """保存用户选择的输入设备索引。"""
-    return _save_audio_setting("input_device_index", input_device_index)
-def get_output_device_index() -> Optional[int]:
-    """获取已保存的输出设备索引；未配置时返回 None（系统默认设备）。"""
-    value = get_audio_settings().get("output_device_index")
-    return int(value) if value is not None else None
-def save_output_device_index(output_device_index: Optional[int]) -> bool:
-    """保存用户选择的输出设备索引。"""
-    return _save_audio_setting("output_device_index", output_device_index)

src/voice_dialogue/config/paths.py CHANGED Viewed

@@ -46,7 +46,6 @@ APP_DATA_PATH = get_app_data_path()
 if not APP_DATA_PATH.exists():
     APP_DATA_PATH.mkdir(parents=True, exist_ok=True)
 USER_PROMPTS_PATH = APP_DATA_PATH / "user_prompts.json"
-AUDIO_SETTINGS_PATH = APP_DATA_PATH / "audio_settings.json"
 def load_third_party():

 if not APP_DATA_PATH.exists():
     APP_DATA_PATH.mkdir(parents=True, exist_ok=True)
 USER_PROMPTS_PATH = APP_DATA_PATH / "user_prompts.json"
 def load_third_party():

src/voice_dialogue/core/launcher.py CHANGED Viewed

@@ -6,7 +6,7 @@
 import time
-from voice_dialogue.audio.capture import AudioCapture, resolves_to_native_aec
 from voice_dialogue.config.speaker_config import get_tts_config_by_speaker_name, get_available_speaker_names
 from voice_dialogue.core.constants import (
     audio_frames_queue,
@@ -23,7 +23,6 @@ def launch_system(
         user_language: str,
         speaker: str,
         disable_echo_cancellation: bool = False,
-        input_device_index: int = None,
 ) -> None:
     """
     启动完整的语音对话系统
@@ -101,10 +100,7 @@ def launch_system(
     threads.append(audio_player)
     # 语音状态监测
-    # 仅当走 macOS 原生 AEC（自带 VAD）时关闭软件 VAD；
-    # 指定外置设备走 PyAudio 时，必须启用软件 VAD。
-    enable_echo_cancellation = not disable_echo_cancellation
-    enable_vad = not resolves_to_native_aec(enable_echo_cancellation, input_device_index)
     speech_monitor = SpeechStateMonitor(
         audio_frame_queue=audio_frames_queue,
         user_voice_queue=user_voice_queue,
@@ -115,10 +111,10 @@ def launch_system(
     threads.append(speech_monitor)
     # 音频采集
     audio_capture = AudioCapture(
         audio_frames_queue=audio_frames_queue,
-        enable_echo_cancellation=enable_echo_cancellation,
-        input_device_index=input_device_index,
     )
     audio_capture.daemon = True
     audio_capture.start()

 import time
+from voice_dialogue.audio.capture import AudioCapture
 from voice_dialogue.config.speaker_config import get_tts_config_by_speaker_name, get_available_speaker_names
 from voice_dialogue.core.constants import (
     audio_frames_queue,
         user_language: str,
         speaker: str,
         disable_echo_cancellation: bool = False,
 ) -> None:
     """
     启动完整的语音对话系统
     threads.append(audio_player)
     # 语音状态监测
+    enable_vad = disable_echo_cancellation
     speech_monitor = SpeechStateMonitor(
         audio_frame_queue=audio_frames_queue,
         user_voice_queue=user_voice_queue,
     threads.append(speech_monitor)
     # 音频采集
+    enable_echo_cancellation = not disable_echo_cancellation
     audio_capture = AudioCapture(
         audio_frames_queue=audio_frames_queue,
+        enable_echo_cancellation=enable_echo_cancellation
     )
     audio_capture.daemon = True
     audio_capture.start()

src/voice_dialogue/services/asr_service.py CHANGED Viewed

@@ -42,7 +42,7 @@ class ASRService(BaseThread, PerformanceLogMixin):
             voice_task.whisper_start_time = time.time()
             user_voice: np.array = voice_task.user_voice
-            transcribed_text = self.client.transcribe(user_voice, language=self.language)
             if not transcribed_text.strip():
                 voice_state_manager.reset_task_id()
                 continue

             voice_task.whisper_start_time = time.time()
             user_voice: np.array = voice_task.user_voice
+            transcribed_text = self.client.transcribe(user_voice)
             if not transcribed_text.strip():
                 voice_state_manager.reset_task_id()
                 continue

src/voice_dialogue/services/audio_player_service.py CHANGED Viewed

@@ -4,7 +4,6 @@ from queue import Empty
 from typing import Optional
 from voice_dialogue.audio.player import play_audio
-from voice_dialogue.config.audio_config import get_output_device_index
 from voice_dialogue.core.base import BaseThread
 from voice_dialogue.core.constants import voice_state_manager, silence_over_threshold_event
 from voice_dialogue.models.voice_task import VoiceTask, AnswerDisplayMessage
@@ -65,8 +64,7 @@ class AudioPlayerService(BaseThread, TaskStatusMixin, HistoryMixin, PerformanceL
             if not self.is_stopped:
                 audio_data, sample_rate = voice_task.tts_generated_sentence_audio
-                # 每次播放时读取保存的输出设备，设置变更后下一句即生效
-                play_audio(audio_data, sample_rate, output_device_index=get_output_device_index())
             # 任务处理完毕，跳出内部循环
             break

 from typing import Optional
 from voice_dialogue.audio.player import play_audio
 from voice_dialogue.core.base import BaseThread
 from voice_dialogue.core.constants import voice_state_manager, silence_over_threshold_event
 from voice_dialogue.models.voice_task import VoiceTask, AnswerDisplayMessage
             if not self.is_stopped:
                 audio_data, sample_rate = voice_task.tts_generated_sentence_audio
+                play_audio(audio_data, sample_rate)
             # 任务处理完毕，跳出内部循环
             break

src/voice_dialogue/tts/runtime/moyoyo.py CHANGED Viewed

@@ -34,9 +34,6 @@ class MoYoYoTTS(TTSInterface):
     def setup(self, **kwargs) -> None:
         """设置TTS模块"""
-        from voice_dialogue.tts.weights_migration import ensure_safetensors_weights
-        ensure_safetensors_weights()
         tts_config = TTS_Config(self.config.get_runtime_config())
         self.tts_module = TTSModule(tts_config)
         self.tts_module.setup_inference_params(

     def setup(self, **kwargs) -> None:
         """设置TTS模块"""
         tts_config = TTS_Config(self.config.get_runtime_config())
         self.tts_module = TTSModule(tts_config)
         self.tts_module.setup_inference_params(

src/voice_dialogue/tts/weights_migration.py DELETED Viewed

@@ -1,45 +0,0 @@
-"""TTS 预训练权重 safetensors 迁移。
-transformers >= 4.56 的安全策略 (CVE-2025-32434) 拒绝在 torch < 2.6 上加载
-pytorch_model.bin。transformers 加载时优先使用 model.safetensors，因此首次
-启动时把 .bin 转换一次即可，无需升级 torch。
-"""
-from pathlib import Path
-from voice_dialogue.config import paths
-from voice_dialogue.utils.logger import logger
-PRETRAINED_DIRS = [
-    "chinese-roberta-wwm-ext-large",
-    "chinese-hubert-base",
-]
-def ensure_safetensors_weights() -> None:
-    """确保 MoYoYo TTS 的预训练权重存在 safetensors 版本，缺失时从 .bin 转换。"""
-    moyoyo_path = Path(paths.TTS_MODELS_PATH) / "moyoyo"
-    for dirname in PRETRAINED_DIRS:
-        model_dir = moyoyo_path / dirname
-        bin_path = model_dir / "pytorch_model.bin"
-        st_path = model_dir / "model.safetensors"
-        if st_path.exists() or not bin_path.exists():
-            continue
-        logger.info(f"[INFO] 首次启动：转换 {dirname} 权重为 safetensors...")
-        try:
-            import torch
-            from safetensors.torch import save_file
-            state_dict = torch.load(bin_path, map_location="cpu", weights_only=True)
-            # clone 断开共享内存，safetensors 不允许张量间共享存储
-            state_dict = {
-                key: value.clone().contiguous()
-                for key, value in state_dict.items()
-                if hasattr(value, "clone")
-            }
-            save_file(state_dict, st_path, metadata={"format": "pt"})
-            logger.info(f"[INFO] {dirname} 转换完成: {st_path.stat().st_size // 1024 ** 2} MB")
-        except Exception as e:
-            logger.error(f"[ERROR] 转换 {dirname} 权重失败: {e}")

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff