feat: Implement MCP integration for tool discovery and execution

#1
by heyong4725 - opened
Files changed (43) hide show
  1. README.md +20 -52
  2. assets/www/assets/{index-CGlMbARk.js → index-ByqsFGbw.js} +2 -2
  3. assets/www/assets/{index-BPAUWo8W.css → index-CCuJ1lip.css} +1 -1
  4. assets/www/index.html +2 -2
  5. build/pyinstaller/hooks/hook-voice_dialogue.py +1 -24
  6. electron-app/main.js +2 -2
  7. frontend/src/App.vue +2 -13
  8. frontend/src/assets/ball.json +2 -2
  9. frontend/src/config/client_config.ts +1 -1
  10. frontend/src/i18n/index.ts +0 -35
  11. frontend/src/i18n/locales/en.ts +0 -74
  12. frontend/src/i18n/locales/zh.ts +0 -74
  13. frontend/src/main.ts +0 -2
  14. frontend/src/stores/config.ts +0 -3
  15. frontend/src/style.scss +0 -65
  16. frontend/src/views/Home/Components/ChatText.vue +7 -15
  17. frontend/src/views/Home/index.vue +1 -12
  18. frontend/src/views/Welcome/Components/SettingsModal.vue +0 -581
  19. frontend/src/views/Welcome/index.vue +418 -72
  20. main.py +1 -16
  21. pyproject.toml +4 -5
  22. scripts/convert_tts_weights_to_safetensors.py +0 -47
  23. src/voice_dialogue/api/app.py +1 -2
  24. src/voice_dialogue/api/core/lifespan.py +2 -2
  25. src/voice_dialogue/api/core/service_factories.py +4 -11
  26. src/voice_dialogue/api/routes/system_routes.py +5 -107
  27. src/voice_dialogue/api/schemas/system_schemas.py +2 -43
  28. src/voice_dialogue/asr/manager.py +5 -24
  29. src/voice_dialogue/asr/models/__init__.py +0 -9
  30. src/voice_dialogue/asr/models/qwen.py +0 -76
  31. src/voice_dialogue/audio/capture/__init__.py +7 -53
  32. src/voice_dialogue/audio/capture/pyaudio_capture.py +10 -104
  33. src/voice_dialogue/audio/devices.py +0 -167
  34. src/voice_dialogue/audio/player.py +1 -69
  35. src/voice_dialogue/cli/args.py +0 -14
  36. src/voice_dialogue/config/audio_config.py +0 -77
  37. src/voice_dialogue/config/paths.py +0 -1
  38. src/voice_dialogue/core/launcher.py +4 -8
  39. src/voice_dialogue/services/asr_service.py +1 -1
  40. src/voice_dialogue/services/audio_player_service.py +1 -3
  41. src/voice_dialogue/tts/runtime/moyoyo.py +0 -3
  42. src/voice_dialogue/tts/weights_migration.py +0 -45
  43. uv.lock +0 -0
README.md CHANGED
@@ -26,7 +26,7 @@ library_name: transformers
26
  ![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
27
  ![License](https://img.shields.io/badge/License-MIT-green.svg)
28
  ![Platform](https://img.shields.io/badge/Platform-macOS-lightgrey.svg)
29
- ![Version](https://img.shields.io/badge/Version-1.2.0-orange.svg)
30
 
31
  一个集成了语音识别(ASR)、大语言模型(LLM)和文本转语音(TTS)的实时语音对话系统
32
 
@@ -38,9 +38,8 @@ library_name: transformers
38
 
39
  VoiceDialogue 是一个基于 Python 的完整语音对话系统,实现了端到端的语音交互体验。系统采用模块化设计,具备实时、高精度、多角色的特点。
40
 
41
- - 🖥️ **图形界面**: 内置 Web 图形界面,浏览器即可使用(选音色、切言、看实时字幕)
42
- - 🎤 **实时语音识别**: 基于 Qwen3-ASR 的高精度中英文转录(自带标点,支持 52 种语言
43
- - 🤖 **智能对话生成**: 集成 Qwen3 等大语言模型
44
  - 🔊 **高质量语音合成**: 支持多角色、多风格的语音输出
45
  - 🌐 **Web API 服务**: 提供 HTTP 接口,方便集成
46
  - ⚡ **低延迟处理**: 优化的音频流处理管道
@@ -49,78 +48,47 @@ VoiceDialogue 是一个基于 Python 的完整语音对话系统,实现了端
49
 
50
  ## 🚀 快速开始
51
 
52
- > **最简单的方式**:克隆仓库 安装依赖 → 启动 → 在浏览器打开图形界面,即可开始语音对话。
53
- > 目前仅支持 **macOS(Apple Silicon)**。
54
-
55
- ### 1. 克隆并安装
56
-
57
- > **模型分两部分**:
58
- > - **随仓库下载(约 12GB,Git LFS)**:大语言模型、语音合成、参考音色等。
59
- > - **首次启动自动下载(约 4.4GB)**:语音识别引擎 **Qwen3-ASR**,由程序在第一次运行时从 HuggingFace 拉取并缓存到 `~/.cache/huggingface`,之后无需重复下载。
60
- >
61
- > ⚠️ **必须先安装 [Git LFS](https://git-lfs.com)**,否则克隆下来的模型只是几百字节的占位指针,应用无法启动。
62
 
63
  ```bash
64
- # 1) 安装并初始化 Git LFS(只需一次)
65
- brew install git-lfs # 如未安装 Homebrew,见 https://git-lfs.com
66
- git lfs install
67
-
68
- # 2) 克隆项目(包含约 12GB 模型,体积较大,请耐心等待)
69
  git clone https://huggingface.co/MoYoYoTech/VoiceDialogue
70
  cd VoiceDialogue
71
 
72
- # 3) 校验模型确实拉取成功(应显示 GB 级大小,而非 100+ 字节)
73
- # 若显示很小,说明 Git LFS 未生效,执行:git lfs pull
74
- ls -lh assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf
75
-
76
- # 4) 安装依赖(推荐使用 uv)
77
  pip install uv
78
  uv venv
79
  source .venv/bin/activate
80
 
81
  WHISPER_COREML=1 CMAKE_ARGS="-DGGML_METAL=on" uv sync
82
 
83
- # 5) 安装额外依赖
84
- uv pip install kokoro-onnx # kokoro-onnx(英文 TTS)
85
- uv pip install numpy==1.26.4 # 固定 numpy 版本
 
 
86
  ```
87
 
88
  > 📖 需要更详细的步骤?请查阅 [安装指南](docs/installation.md),其中包含系统要求和常见问题。
89
 
90
- ### 2. 启动图形界面(推荐)
91
-
92
- ```bash
93
- python main.py --mode api
94
- ```
95
-
96
- 启动后,在浏览器中打开:**http://localhost:8000/app/**
97
 
98
- 在界面中即可完成全部操作:
99
-
100
- - 点击右下角 **⚙️ 设置**,选择**麦克风、回音消除、识别语言、音色**,也可切换**中 / 英界面语言**;
101
- - 点击 **「开始对话」**,即可与 AI 实时语音对话,**字幕会实时显示**。
102
-
103
- > **首次启动较慢,属正常现象**:程序会自动下载 Qwen3-ASR 模型(约 4.4GB,需联网,下载进度会打印在终端)并转换一次 TTS 权重格式。全部完成后才会就绪,整个过程约几分钟(取决于网速);之后每次启动只需数十秒。
104
- > 若终端长时间停在下载步骤,请检查网络是否能访问 `huggingface.co`。
105
-
106
- ### 3. 命令行模式(CLI)
107
-
108
- 如果不需要图形界面,也可以直接在终端运行语音对话:
109
 
110
  ```bash
111
- # 启动语音对话默认中文
112
  python main.py
113
 
114
- # 指定语言与音
115
  python main.py --language en --speaker Heart
 
116
 
117
- # 列出可用音频输入设备(如外置麦克风阵列)
118
- python main.py --list-audio-devices
119
 
120
- # 指定输入设备
121
- python main.py --input-device <设备索引>
 
122
  ```
123
-
124
  > 详细使用方法请参考 [配置指南](docs/configuration.md) 和 [API 服务指南](docs/api-guide.md)。
125
 
126
  ## 📚 文档导航
 
26
  ![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
27
  ![License](https://img.shields.io/badge/License-MIT-green.svg)
28
  ![Platform](https://img.shields.io/badge/Platform-macOS-lightgrey.svg)
29
+ ![Version](https://img.shields.io/badge/Version-1.0.0-orange.svg)
30
 
31
  一个集成了语音识别(ASR)、大语言模型(LLM)和文本转语音(TTS)的实时语音对话系统
32
 
 
38
 
39
  VoiceDialogue 是一个基于 Python 的完整语音对话系统,实现了端到端的语音交互体验。系统采用模块化设计,具备实时、高精度、多角色的特点。
40
 
41
+ - 🎤 **实时语音识别**: 高精度中英文音转录
42
+ - 🤖 **智能对话生成**: 集成 Qwen2.5 等大语言模型
 
43
  - 🔊 **高质量语音合成**: 支持多角色、多风格的语音输出
44
  - 🌐 **Web API 服务**: 提供 HTTP 接口,方便集成
45
  - ⚡ **低延迟处理**: 优化的音频流处理管道
 
48
 
49
  ## 🚀 快速开始
50
 
51
+ ### 1. 安装
 
 
 
 
 
 
 
 
 
52
 
53
  ```bash
54
+ # 克隆项目
 
 
 
 
55
  git clone https://huggingface.co/MoYoYoTech/VoiceDialogue
56
  cd VoiceDialogue
57
 
58
+ # 安装依赖 (推荐使用 uv)
 
 
 
 
59
  pip install uv
60
  uv venv
61
  source .venv/bin/activate
62
 
63
  WHISPER_COREML=1 CMAKE_ARGS="-DGGML_METAL=on" uv sync
64
 
65
+ # 安装额外依赖
66
+ ## 1. 安装 kokoro-onnx
67
+ uv pip install kokoro-onnx
68
+ ## 2. 重新安装指定版本的 numpy
69
+ uv pip install numpy==1.26.4
70
  ```
71
 
72
  > 📖 需要更详细的步骤?请查阅 [安装指南](docs/installation.md),其中包含系统要求和常见问题。
73
 
74
+ ### 2. 运行
 
 
 
 
 
 
75
 
76
+ #### 命令行模式 (CLI)
 
 
 
 
 
 
 
 
 
 
77
 
78
  ```bash
79
+ # 启动语音对话 (默认中文)
80
  python main.py
81
 
82
+ # 启动并指定语言和角
83
  python main.py --language en --speaker Heart
84
+ ```
85
 
86
+ #### API 服务模式
 
87
 
88
+ ```bash
89
+ # 启动 API 服务器
90
+ python main.py --mode api
91
  ```
 
92
  > 详细使用方法请参考 [配置指南](docs/configuration.md) 和 [API 服务指南](docs/api-guide.md)。
93
 
94
  ## 📚 文档导航
assets/www/assets/{index-CGlMbARk.js → index-ByqsFGbw.js} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2cccc544e9f2c32632c81cf9a5e8ed3cc9c5b0476ec8e595a180deb6fde095c8
3
- size 2307855
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:215e0b4a6eee243715941860012a0d3bbee778f8880df45b0ddc8b090993405b
3
+ size 2215701
assets/www/assets/{index-BPAUWo8W.css → index-CCuJ1lip.css} RENAMED
@@ -1 +1 @@
1
- @charset "UTF-8";html,body{width:100%;height:100%}input::-ms-clear,input::-ms-reveal{display:none}*,*:before,*:after{box-sizing:border-box}html{font-family:sans-serif;line-height:1.15;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%;-ms-overflow-style:scrollbar;-webkit-tap-highlight-color:rgba(0,0,0,0)}body{margin:0}[tabindex="-1"]:focus{outline:none}hr{box-sizing:content-box;height:0;overflow:visible}h1,h2,h3,h4,h5,h6{margin-top:0;margin-bottom:.5em;font-weight:500}p{margin-top:0;margin-bottom:1em}abbr[title],abbr[data-original-title]{-webkit-text-decoration:underline dotted;text-decoration:underline;text-decoration:underline dotted;border-bottom:0;cursor:help}address{margin-bottom:1em;font-style:normal;line-height:inherit}input[type=text],input[type=password],input[type=number],textarea{-webkit-appearance:none}ol,ul,dl{margin-top:0;margin-bottom:1em}ol ol,ul ul,ol ul,ul ol{margin-bottom:0}dt{font-weight:500}dd{margin-bottom:.5em;margin-left:0}blockquote{margin:0 0 1em}dfn{font-style:italic}b,strong{font-weight:bolder}small{font-size:80%}sub,sup{position:relative;font-size:75%;line-height:0;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}pre,code,kbd,samp{font-size:1em;font-family:SFMono-Regular,Consolas,Liberation Mono,Menlo,Courier,monospace}pre{margin-top:0;margin-bottom:1em;overflow:auto}figure{margin:0 0 1em}img{vertical-align:middle;border-style:none}a,area,button,[role=button],input:not([type=range]),label,select,summary,textarea{touch-action:manipulation}table{border-collapse:collapse}caption{padding-top:.75em;padding-bottom:.3em;text-align:left;caption-side:bottom}input,button,select,optgroup,textarea{margin:0;color:inherit;font-size:inherit;font-family:inherit;line-height:inherit}button,input{overflow:visible}button,select{text-transform:none}button,html [type=button],[type=reset],[type=submit]{-webkit-appearance:button}button::-moz-focus-inner,[type=button]::-moz-focus-inner,[type=reset]::-moz-focus-inner,[type=submit]::-moz-focus-inner{padding:0;border-style:none}input[type=radio],input[type=checkbox]{box-sizing:border-box;padding:0}input[type=date],input[type=time],input[type=datetime-local],input[type=month]{-webkit-appearance:listbox}textarea{overflow:auto;resize:vertical}fieldset{min-width:0;margin:0;padding:0;border:0}legend{display:block;width:100%;max-width:100%;margin-bottom:.5em;padding:0;color:inherit;font-size:1.5em;line-height:inherit;white-space:normal}progress{vertical-align:baseline}[type=number]::-webkit-inner-spin-button,[type=number]::-webkit-outer-spin-button{height:auto}[type=search]{outline-offset:-2px;-webkit-appearance:none}[type=search]::-webkit-search-cancel-button,[type=search]::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{font:inherit;-webkit-appearance:button}output{display:inline-block}summary{display:list-item}template{display:none}[hidden]{display:none!important}mark{padding:.2em;background-color:#feffe6}:root{font-family:Inter,system-ui,Avenir,Helvetica,Arial,sans-serif;line-height:1.5;font-weight:400;color-scheme:light dark;color:#ffffffde;background-color:#242424;font-synthesis:none;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale;-webkit-text-size-adjust:100%}a{font-weight:500;color:#646cff;text-decoration:inherit}a:hover{color:#535bf2}body{margin:0;display:flex;place-items:center;min-width:320px;height:100%;min-height:auto;color:#333;background:#fff}h1{font-size:3.2em;line-height:1.1}button{border-radius:8px;border:1px solid transparent;padding:.6em 1.2em;font-size:1em;font-weight:500;font-family:inherit;background-color:#1a1a1a;cursor:pointer;transition:border-color .25s}.card{border-bottom:solid 2px lightgray;align-items:center;justify-content:center;margin-top:40px;display:flex;max-width:1024px;width:100%}.seg-title{margin:24px 0;font-size:20px;font-weight:500}.seg-co{width:1022px;text-align:left;border-left:solid 6px midnightblue;padding-left:8px;margin-left:2px;margin-top:36px;line-height:24px}#app{margin:0 auto;padding:0;text-align:center;width:100%;height:100%}.ant-btn{padding:4px 12px}@media (prefers-color-scheme: light){:root{color:#213547;background-color:#fff}a:hover{color:#747bff}button{background-color:#f9f9f9}}.ant-card{background:#f5f6fa;height:100%}.ant-card-body{padding:24px 36px 12px!important;border-radius:0 0 8px 8px}.ant-card .ant-card-actions{background-color:#e8e8f8cc!important}.ant-popover{max-width:800px!important}.ant-form-item{background:transparent;margin-bottom:40px!important}.ant-form-item .ant-form-item-explain-error{color:#ff4d4f;text-align:left!important}.ant-form-item-label label{font-size:18px!important;color:#1a1a1a!important;font-weight:500!important}.ant-tooltip{max-width:1022px!important}.ant-page-header-heading{width:1022px!important}.highlight{background:#f8f8ff}.ant-layout-sider-collapsed{width:0!important;min-width:0!important;overflow:hidden}.ant-layout-sider-collapsed .ant-menu-item,.ant-layout-sider-collapsed .ant-menu-submenu-title{display:none}.ant-modal .ant-modal-content{background:#ffffff9e!important;backdrop-filter:blur(28px) saturate(140%);-webkit-backdrop-filter:blur(28px) saturate(140%);border:1px solid rgba(255,255,255,.6);border-radius:22px!important;box-shadow:0 16px 48px #1f26872e}.ant-modal .ant-modal-header{background:transparent!important}.ant-modal-mask{background:#14161e1f!important;backdrop-filter:blur(14px) saturate(120%);-webkit-backdrop-filter:blur(14px) saturate(120%)}.ant-select .ant-select-selector,.ant-input,textarea.ant-input,.ant-input-affix-wrapper{background:#ffffff73!important;backdrop-filter:blur(8px);-webkit-backdrop-filter:blur(8px);border:1px solid rgba(255,255,255,.7)!important}.ant-btn:not(.ant-btn-text):not(.ant-btn-link){box-shadow:0 2px 10px #1f26871a}.ant-btn-default{background:#ffffff80!important;border:1px solid rgba(255,255,255,.75)!important;backdrop-filter:blur(8px);-webkit-backdrop-filter:blur(8px)}.ant-btn-text{box-shadow:none!important;background:transparent!important}.ant-radio-group-solid .ant-radio-button-wrapper:first-child{border-top-left-radius:12px;border-bottom-left-radius:12px}.ant-radio-group-solid .ant-radio-button-wrapper:last-child{border-top-right-radius:12px;border-bottom-right-radius:12px}.header-nav[data-v-07594418]{display:flex;align-items:flex-start;justify-content:space-between;width:100vw;height:40px;align-items:center;position:absolute;top:0;left:0;z-index:99;-webkit-app-region:drag;cursor:move}.header-nav .window-controls[data-v-07594418],.header-nav button[data-v-07594418],.header-nav .ant-input-search[data-v-07594418],.header-nav img[data-v-07594418],.header-nav .anticon[data-v-07594418]{-webkit-app-region:no-drag;cursor:pointer}.header-nav .window-controls[data-v-07594418]{top:0;right:0;display:flex;z-index:1000;margin-left:12px}.header-nav .window-controls .window-control-btn[data-v-07594418]{width:46px;height:32px;border:none;background:transparent;color:#666;font-size:16px;cursor:pointer;display:flex;align-items:center;justify-content:center;transition:background-color .2s}.header-nav .window-controls .window-control-btn[data-v-07594418]:hover{background-color:#0000001a}.header-nav .window-controls .window-control-btn.close[data-v-07594418]:hover{background-color:#e81123;color:#fff}.header-nav .window-controls .close-icon.focus[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.default[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.default[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.focus[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.focus[data-v-07594418]{display:inline}.content[data-v-b8a456cb]{background-color:#fff;margin:0 auto;display:flex;flex-direction:column;align-items:center;justify-content:space-between}.not-found-wrapper[data-v-aef52a59]{height:calc(100vh - 104px)}.tab-body[data-v-a48e843b]{height:360px;overflow-y:auto;padding:4px 8px 4px 2px}.setting-row[data-v-a48e843b]{margin-bottom:20px}.setting-row>label[data-v-a48e843b]{display:block;font-size:15px;font-weight:500;margin-bottom:8px}.setting-row>label .label-icon[data-v-a48e843b]{margin-right:6px;color:#1890ff}.setting-row .hint[data-v-a48e843b]{font-size:12px;color:#999;margin:8px 0 0}.setting-row .row-inline[data-v-a48e843b]{display:flex;align-items:center;justify-content:space-between}.voice-group[data-v-a48e843b]{display:flex;flex-direction:column;margin-top:8px}.about .about-head[data-v-a48e843b]{text-align:center;margin-bottom:24px}.about .about-head .about-name[data-v-a48e843b]{font-size:20px;font-weight:600}.about .about-head .about-ver[data-v-a48e843b]{font-size:13px;color:#888;margin-top:2px}.about .about-head .about-tagline[data-v-a48e843b]{font-size:12px;color:#999;margin-top:4px}.about .about-section[data-v-a48e843b]{margin-bottom:20px}.about .about-section .about-section-title[data-v-a48e843b]{font-size:13px;font-weight:600;color:#666;margin-bottom:10px}.about .about-item[data-v-a48e843b]{margin-bottom:12px}.about .about-item .about-item-label[data-v-a48e843b]{font-size:14px;font-weight:500}.about .about-item .about-item-desc[data-v-a48e843b]{font-size:12px;color:#777;margin-top:2px;line-height:1.6}.about .about-item .about-item-desc a[data-v-a48e843b]{margin-left:6px}.about a[data-v-a48e843b]{color:#1677ff;text-decoration:none}.about a[data-v-a48e843b]:hover{text-decoration:underline}.about .about-link[data-v-a48e843b]{font-size:13px;word-break:break-all}.about .about-copyright[data-v-a48e843b]{margin-top:16px;font-size:11px;color:#aaa;text-align:center}.voice-radio[data-v-a48e843b]{display:flex;align-items:center;height:40px;line-height:40px}.voice-radio .voice-name[data-v-a48e843b]{margin-right:8px}.audio-play-btn[data-v-a48e843b]{padding:0 6px;border-radius:4px}.audio-play-btn.playing[data-v-a48e843b]{background-color:#f6ffed}.asr-chip[data-v-ca2e1f17]{display:flex;align-items:center;gap:8px;height:38px;padding:0 18px;margin-right:16px;border-radius:19px;color:#000000a6;font-size:13px;background:#ffffff80;border:1px solid rgba(255,255,255,.7);backdrop-filter:blur(10px);-webkit-backdrop-filter:blur(10px);box-shadow:0 4px 16px #1f26871f}.settings-btn[data-v-ca2e1f17]{width:60px;height:60px;margin-right:24px;border-radius:50%!important;background:#ffffff80!important;border:1px solid rgba(255,255,255,.7)!important;backdrop-filter:blur(10px);-webkit-backdrop-filter:blur(10px);box-shadow:0 4px 16px #1f26871f;display:flex;align-items:center;justify-content:center}.welcome-wrapper[data-v-ca2e1f17]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.welcome-wrapper .content[data-v-ca2e1f17]{width:100%;height:80vh;display:flex;flex-direction:column;justify-content:space-around;margin-top:64px}.welcome-wrapper .content .inner-content[data-v-ca2e1f17]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.welcome-wrapper .content .inner-content .text-box[data-v-ca2e1f17]{color:#000;margin-bottom:36px}.welcome-wrapper .content .inner-content .text-box .title[data-v-ca2e1f17]{font-size:24px;font-weight:600;margin-bottom:24px}.welcome-wrapper .content .inner-content .text-box .sub-title[data-v-ca2e1f17]{font-size:15px;margin-top:10px}.welcome-wrapper .content .inner-content .btn-box[data-v-ca2e1f17]{width:224px;height:80px}.welcome-wrapper .actions[data-v-ca2e1f17]{width:100%;height:100px;margin-bottom:32px;display:flex;align-items:center;justify-content:flex-end}.ball-wrapper[data-v-34c8e583]{width:100%;height:calc(100vh - 100px);display:flex;flex-direction:column;align-items:center;justify-content:space-around}.talk-wrapper[data-v-05da84ae]{width:auto;width:100%;max-width:1000px;margin:0 auto;box-sizing:border-box;height:calc(100vh - 150px);overflow-y:auto;padding:20px 32px 0;display:flex;flex-direction:column;align-items:flex-start;justify-content:flex-start}.talk-wrapper .cont-left[data-v-05da84ae]{width:100%;margin:24px 0;display:flex;justify-content:flex-start;align-items:flex-start}.talk-wrapper .cont-left .text-left[data-v-05da84ae]{max-width:88%;color:#222;font-size:16px;font-weight:400;text-align:left;line-height:1.8;margin-left:12px;margin-top:6px;word-break:break-word}.talk-wrapper .cont-right[data-v-05da84ae]{width:100%;margin:24px 0;display:flex;justify-content:flex-end;align-items:flex-start}.talk-wrapper .cont-right .text-right[data-v-05da84ae]{max-width:80%;color:#444;font-size:16px;font-weight:400;text-align:start;line-height:1.8;margin-right:12px;background:#ccc;border-radius:8px 0 8px 8px;padding:8px 12px;word-break:break-word}.chat-wrapper[data-v-8b035bf4]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.chat-wrapper .content[data-v-8b035bf4]{width:100%;height:auto;display:flex;flex-direction:column;justify-content:space-around}.chat-wrapper .content .inner-content[data-v-8b035bf4]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.chat-wrapper .content .inner-content .text-box[data-v-8b035bf4]{color:#000;margin-bottom:36px}.chat-wrapper .content .inner-content .text-box .title[data-v-8b035bf4]{font-size:24px;font-weight:600;margin-bottom:24px}.chat-wrapper .content .inner-content .text-box .sub-title[data-v-8b035bf4]{font-size:15px;margin-top:10px}.chat-wrapper .content .inner-content .btn-box[data-v-8b035bf4]{width:224px;height:80px}.chat-wrapper .actions[data-v-8b035bf4]{width:100%;height:100px;margin-bottom:32px;display:flex;justify-content:space-between;align-items:center}.chat-wrapper .actions .holder[data-v-8b035bf4]{width:64px;height:48px}.chat-wrapper .actions .btns[data-v-8b035bf4]{width:450px;height:96px;display:flex;justify-content:space-around;align-items:center}.chat-wrapper .actions .btns[data-v-8b035bf4] .ant-btn{border-radius:50%!important;background:#ffffff80!important;border:1px solid rgba(255,255,255,.7)!important;backdrop-filter:blur(10px);-webkit-backdrop-filter:blur(10px);box-shadow:0 4px 16px #1f26871f}.chat-wrapper .actions .download-wrapper[data-v-8b035bf4]{width:64px;height:64px;display:flex;justify-content:flex-start;align-items:center;margin-right:0}.chat-wrapper .actions .download-wrapper img[data-v-8b035bf4]{width:24px;height:24px}.content-wrapper[data-v-d41c9ce7]{text-align:left;max-width:800px;min-width:320px;margin-bottom:64px;min-height:calc(100vh - 438px)}.content-wrapper .content-box[data-v-d41c9ce7]{padding:24px;height:240px;background-color:#e8e8e8;border-radius:16px;width:50%;margin:48px auto;min-width:300px}.content-wrapper .video-box[data-v-d41c9ce7]{max-width:800px;min-width:320px;width:90vw;height:auto}
 
1
+ @charset "UTF-8";html,body{width:100%;height:100%}input::-ms-clear,input::-ms-reveal{display:none}*,*:before,*:after{box-sizing:border-box}html{font-family:sans-serif;line-height:1.15;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%;-ms-overflow-style:scrollbar;-webkit-tap-highlight-color:rgba(0,0,0,0)}body{margin:0}[tabindex="-1"]:focus{outline:none}hr{box-sizing:content-box;height:0;overflow:visible}h1,h2,h3,h4,h5,h6{margin-top:0;margin-bottom:.5em;font-weight:500}p{margin-top:0;margin-bottom:1em}abbr[title],abbr[data-original-title]{-webkit-text-decoration:underline dotted;text-decoration:underline;text-decoration:underline dotted;border-bottom:0;cursor:help}address{margin-bottom:1em;font-style:normal;line-height:inherit}input[type=text],input[type=password],input[type=number],textarea{-webkit-appearance:none}ol,ul,dl{margin-top:0;margin-bottom:1em}ol ol,ul ul,ol ul,ul ol{margin-bottom:0}dt{font-weight:500}dd{margin-bottom:.5em;margin-left:0}blockquote{margin:0 0 1em}dfn{font-style:italic}b,strong{font-weight:bolder}small{font-size:80%}sub,sup{position:relative;font-size:75%;line-height:0;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}pre,code,kbd,samp{font-size:1em;font-family:SFMono-Regular,Consolas,Liberation Mono,Menlo,Courier,monospace}pre{margin-top:0;margin-bottom:1em;overflow:auto}figure{margin:0 0 1em}img{vertical-align:middle;border-style:none}a,area,button,[role=button],input:not([type=range]),label,select,summary,textarea{touch-action:manipulation}table{border-collapse:collapse}caption{padding-top:.75em;padding-bottom:.3em;text-align:left;caption-side:bottom}input,button,select,optgroup,textarea{margin:0;color:inherit;font-size:inherit;font-family:inherit;line-height:inherit}button,input{overflow:visible}button,select{text-transform:none}button,html [type=button],[type=reset],[type=submit]{-webkit-appearance:button}button::-moz-focus-inner,[type=button]::-moz-focus-inner,[type=reset]::-moz-focus-inner,[type=submit]::-moz-focus-inner{padding:0;border-style:none}input[type=radio],input[type=checkbox]{box-sizing:border-box;padding:0}input[type=date],input[type=time],input[type=datetime-local],input[type=month]{-webkit-appearance:listbox}textarea{overflow:auto;resize:vertical}fieldset{min-width:0;margin:0;padding:0;border:0}legend{display:block;width:100%;max-width:100%;margin-bottom:.5em;padding:0;color:inherit;font-size:1.5em;line-height:inherit;white-space:normal}progress{vertical-align:baseline}[type=number]::-webkit-inner-spin-button,[type=number]::-webkit-outer-spin-button{height:auto}[type=search]{outline-offset:-2px;-webkit-appearance:none}[type=search]::-webkit-search-cancel-button,[type=search]::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{font:inherit;-webkit-appearance:button}output{display:inline-block}summary{display:list-item}template{display:none}[hidden]{display:none!important}mark{padding:.2em;background-color:#feffe6}:root{font-family:Inter,system-ui,Avenir,Helvetica,Arial,sans-serif;line-height:1.5;font-weight:400;color-scheme:light dark;color:#ffffffde;background-color:#242424;font-synthesis:none;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale;-webkit-text-size-adjust:100%}a{font-weight:500;color:#646cff;text-decoration:inherit}a:hover{color:#535bf2}body{margin:0;display:flex;place-items:center;min-width:320px;height:100%;min-height:auto;color:#333;background:#fff}h1{font-size:3.2em;line-height:1.1}button{border-radius:8px;border:1px solid transparent;padding:.6em 1.2em;font-size:1em;font-weight:500;font-family:inherit;background-color:#1a1a1a;cursor:pointer;transition:border-color .25s}.card{border-bottom:solid 2px lightgray;align-items:center;justify-content:center;margin-top:40px;display:flex;max-width:1024px;width:100%}.seg-title{margin:24px 0;font-size:20px;font-weight:500}.seg-co{width:1022px;text-align:left;border-left:solid 6px midnightblue;padding-left:8px;margin-left:2px;margin-top:36px;line-height:24px}#app{margin:0 auto;padding:0;text-align:center;width:100%;height:100%}.ant-btn{padding:4px 12px}@media (prefers-color-scheme: light){:root{color:#213547;background-color:#fff}a:hover{color:#747bff}button{background-color:#f9f9f9}}.ant-card{background:#f5f6fa;height:100%}.ant-card-body{padding:24px 36px 12px!important;border-radius:0 0 8px 8px}.ant-card .ant-card-actions{background-color:#e8e8f8cc!important}.ant-popover{max-width:800px!important}.ant-form-item{background:transparent;margin-bottom:40px!important}.ant-form-item .ant-form-item-explain-error{color:#ff4d4f;text-align:left!important}.ant-form-item-label label{font-size:18px!important;color:#1a1a1a!important;font-weight:500!important}.ant-tooltip{max-width:1022px!important}.ant-page-header-heading{width:1022px!important}.highlight{background:#f8f8ff}.ant-layout-sider-collapsed{width:0!important;min-width:0!important;overflow:hidden}.ant-layout-sider-collapsed .ant-menu-item,.ant-layout-sider-collapsed .ant-menu-submenu-title{display:none}.header-nav[data-v-07594418]{display:flex;align-items:flex-start;justify-content:space-between;width:100vw;height:40px;align-items:center;position:absolute;top:0;left:0;z-index:99;-webkit-app-region:drag;cursor:move}.header-nav .window-controls[data-v-07594418],.header-nav button[data-v-07594418],.header-nav .ant-input-search[data-v-07594418],.header-nav img[data-v-07594418],.header-nav .anticon[data-v-07594418]{-webkit-app-region:no-drag;cursor:pointer}.header-nav .window-controls[data-v-07594418]{top:0;right:0;display:flex;z-index:1000;margin-left:12px}.header-nav .window-controls .window-control-btn[data-v-07594418]{width:46px;height:32px;border:none;background:transparent;color:#666;font-size:16px;cursor:pointer;display:flex;align-items:center;justify-content:center;transition:background-color .2s}.header-nav .window-controls .window-control-btn[data-v-07594418]:hover{background-color:#0000001a}.header-nav .window-controls .window-control-btn.close[data-v-07594418]:hover{background-color:#e81123;color:#fff}.header-nav .window-controls .close-icon.focus[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.default[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.default[data-v-07594418]{display:none}.header-nav .window-controls:hover .close-icon.focus[data-v-07594418],.header-nav .window-controls:focus-within .close-icon.focus[data-v-07594418]{display:inline}.content[data-v-874ca48f]{background-color:#fff;margin:0 auto;display:flex;flex-direction:column;align-items:center;justify-content:space-between}.not-found-wrapper[data-v-aef52a59]{height:calc(100vh - 104px)}.btn-groups[data-v-839398ff]{margin-top:36px;display:flex;justify-content:flex-end;align-items:center}.prompt-title p[data-v-839398ff]{margin:0;font-size:16px;font-weight:500}.prompt-content[data-v-839398ff]{margin-top:16px}.prompt-content .prompt-title[data-v-839398ff]{margin-bottom:24px;font-size:22px;font-weight:500;text-align:center}.prompt-content .language-segment[data-v-839398ff]{display:flex;justify-content:center;margin-bottom:16px}.prompt-content .prompt-item[data-v-839398ff]{margin-top:16px}.languages[data-v-cd713caa]{margin-top:24px;margin-bottom:24px}.languages p[data-v-cd713caa]{font-size:16px;font-weight:500;margin-bottom:8px}.audio-play-btn[data-v-cd713caa]{padding:2px 8px 0;border-radius:4px;transition:all .2s;height:40px}.audio-play-btn[data-v-cd713caa]:hover{background-color:#f0f0f0}.audio-play-btn.playing[data-v-cd713caa]{background-color:#f6ffed;border-color:#1890ff}.audio-play-btn.playing .playing-icon[data-v-cd713caa]{animation:pulse-cd713caa 1.5s infinite}@keyframes pulse-cd713caa{0%{opacity:1;transform:scale(1)}50%{opacity:.7;transform:scale(1.1)}to{opacity:1;transform:scale(1)}}.btn-groups[data-v-cd713caa]{margin-top:36px;display:flex;justify-content:space-between;align-items:center}.custom-popover-list[data-v-cd713caa]{width:92px;margin:0}.custom-popover-list .custom-popover-item[data-v-cd713caa]{font-size:14px;line-height:36px;font-weight:500;color:#1e1e1e;cursor:pointer;border-radius:4px;padding:0 8px;margin:0 -8px;transition:background .2s}.custom-popover-list .custom-popover-item[data-v-cd713caa]:hover,.custom-popover-list .custom-popover-item[data-v-cd713caa]:focus{background:#e5e7eb}.welcome-wrapper[data-v-cd713caa]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.welcome-wrapper .content[data-v-cd713caa]{width:100%;height:80vh;display:flex;flex-direction:column;justify-content:space-around;margin-top:64px}.welcome-wrapper .content .inner-content[data-v-cd713caa]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.welcome-wrapper .content .inner-content .text-box[data-v-cd713caa]{color:#000;margin-bottom:36px}.welcome-wrapper .content .inner-content .text-box .title[data-v-cd713caa]{font-size:24px;font-weight:600;margin-bottom:24px}.welcome-wrapper .content .inner-content .text-box .sub-title[data-v-cd713caa]{font-size:15px;margin-top:10px}.welcome-wrapper .content .inner-content .btn-box[data-v-cd713caa]{width:224px;height:80px}.welcome-wrapper .actions[data-v-cd713caa]{width:100%;height:64px;display:flex;justify-content:flex-end}.ball-wrapper[data-v-34c8e583]{width:100%;height:calc(100vh - 100px);display:flex;flex-direction:column;align-items:center;justify-content:space-around}.talk-wrapper[data-v-1f502814]{width:auto;height:calc(100vh - 100px);overflow-y:scroll;padding:20px 240px 0;display:flex;flex-direction:column;align-items:flex-start;justify-content:flex-start}.talk-wrapper .cont-left[data-v-1f502814]{width:100%;margin:24px 0;display:flex;justify-content:flex-start;align-items:flex-start}.talk-wrapper .cont-left .text-left[data-v-1f502814]{color:#222;font-size:16px;font-weight:400;text-align:left;line-height:2;margin-left:12px;margin-top:6px}.talk-wrapper .cont-right[data-v-1f502814]{width:100%;margin:24px 0;display:flex;justify-content:flex-end;align-items:flex-start}.talk-wrapper .cont-right .text-right[data-v-1f502814]{color:#444;font-size:16px;font-weight:400;text-align:end;line-height:2;margin-right:12px;background:#ccc;border-radius:8px 0 8px 8px;padding:8px}.chat-wrapper[data-v-803600aa]{width:100%;height:100%;background-image:url(./bg-BmnA8p_e.png);background-repeat:no-repeat;background-attachment:fixed;background-size:cover;background-position:center;display:flex;flex-direction:column;align-items:center;justify-content:space-between;color:#fff}.chat-wrapper .content[data-v-803600aa]{width:100%;height:auto;display:flex;flex-direction:column;justify-content:space-around}.chat-wrapper .content .inner-content[data-v-803600aa]{display:flex;flex-direction:column;align-items:center;justify-content:center;text-align:center;padding:20px}.chat-wrapper .content .inner-content .text-box[data-v-803600aa]{color:#000;margin-bottom:36px}.chat-wrapper .content .inner-content .text-box .title[data-v-803600aa]{font-size:24px;font-weight:600;margin-bottom:24px}.chat-wrapper .content .inner-content .text-box .sub-title[data-v-803600aa]{font-size:15px;margin-top:10px}.chat-wrapper .content .inner-content .btn-box[data-v-803600aa]{width:224px;height:80px}.chat-wrapper .actions[data-v-803600aa]{width:100%;height:100px;display:flex;justify-content:space-between;align-items:center}.chat-wrapper .actions .holder[data-v-803600aa]{width:64px;height:48px}.chat-wrapper .actions .btns[data-v-803600aa]{width:450px;height:96px;display:flex;justify-content:space-around;align-items:flex-start}.chat-wrapper .actions .download-wrapper[data-v-803600aa]{width:64px;height:64px;display:flex;justify-content:flex-start;align-items:center;margin-right:0}.chat-wrapper .actions .download-wrapper img[data-v-803600aa]{width:24px;height:24px}.content-wrapper[data-v-d41c9ce7]{text-align:left;max-width:800px;min-width:320px;margin-bottom:64px;min-height:calc(100vh - 438px)}.content-wrapper .content-box[data-v-d41c9ce7]{padding:24px;height:240px;background-color:#e8e8e8;border-radius:16px;width:50%;margin:48px auto;min-width:300px}.content-wrapper .video-box[data-v-d41c9ce7]{max-width:800px;min-width:320px;width:90vw;height:auto}
assets/www/index.html CHANGED
@@ -5,8 +5,8 @@
5
  <link rel="icon" type="image/svg+xml" href="./favicon.ico" />
6
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
7
  <title>VoiceDialogue</title>
8
- <script type="module" crossorigin src="./assets/index-CGlMbARk.js"></script>
9
- <link rel="stylesheet" crossorigin href="./assets/index-BPAUWo8W.css">
10
  </head>
11
  <body>
12
  <div id="app"></div>
 
5
  <link rel="icon" type="image/svg+xml" href="./favicon.ico" />
6
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
7
  <title>VoiceDialogue</title>
8
+ <script type="module" crossorigin src="./assets/index-ByqsFGbw.js"></script>
9
+ <link rel="stylesheet" crossorigin href="./assets/index-CCuJ1lip.css">
10
  </head>
11
  <body>
12
  <div id="app"></div>
build/pyinstaller/hooks/hook-voice_dialogue.py CHANGED
@@ -24,29 +24,8 @@ ASSETS_ROOT = PROJECT_ROOT / "assets"
24
  # 收集主模块的所有子模块
25
  hiddenimports = collect_submodules('voice_dialogue')
26
  datas = collect_data_files('moyoyo_tts', include_py_files=True)
27
-
28
- # 不打包的资源:
29
- # - 旧版 FunASR/Whisper 模型(默认引擎为内置的 Qwen3-ASR)
30
- # - TTS 预训练权重的 .bin(已内置等价的 model.safetensors)
31
- EXCLUDED_ASSET_PATTERNS = [
32
- "assets/models/asr/funasr/",
33
- "assets/models/asr/whisper/",
34
- "chinese-roberta-wwm-ext-large/pytorch_model.bin",
35
- "chinese-hubert-base/pytorch_model.bin",
36
- ]
37
-
38
-
39
- def _is_excluded(source_path: str) -> bool:
40
- normalized = source_path.replace("\\", "/")
41
- return any(pattern in normalized for pattern in EXCLUDED_ASSET_PATTERNS)
42
-
43
-
44
  # 收集系统资源文件
45
- datas += [
46
- (source, dest)
47
- for source, dest in collect_system_data_files(ASSETS_ROOT.as_posix(), "assets")
48
- if not _is_excluded(source)
49
- ]
50
 
51
  # ============================================================================
52
  # 第三方依赖配置
@@ -60,7 +39,6 @@ ML_DEPENDENCIES = [
60
  "pytorch_lightning",
61
  "huggingface_hub",
62
  "einops",
63
- "qwen_asr",
64
  ]
65
 
66
  # 语音处理相关依赖
@@ -139,7 +117,6 @@ DATA_PACKAGES = [
139
  ("spacy", {"include_py_files": True}),
140
  ("misaki", {}),
141
  ("silero_vad", {}),
142
- ("qwen_asr", {}),
143
  ]
144
 
145
  # 收集数据文件
 
24
  # 收集主模块的所有子模块
25
  hiddenimports = collect_submodules('voice_dialogue')
26
  datas = collect_data_files('moyoyo_tts', include_py_files=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  # 收集系统资源文件
28
+ datas += collect_system_data_files(ASSETS_ROOT.as_posix(), "assets")
 
 
 
 
29
 
30
  # ============================================================================
31
  # 第三方依赖配置
 
39
  "pytorch_lightning",
40
  "huggingface_hub",
41
  "einops",
 
42
  ]
43
 
44
  # 语音处理相关依赖
 
117
  ("spacy", {"include_py_files": True}),
118
  ("misaki", {}),
119
  ("silero_vad", {}),
 
120
  ]
121
 
122
  # 收集数据文件
electron-app/main.js CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b9aeffbae7b9d83ea4abd63ef6c9465878c146bead1dea9339e087b85f2adccd
3
- size 7392
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b10113a08513b026f9207f21db221f368a1f242a821dd54d2d002354a6a2ec2
3
+ size 7039
frontend/src/App.vue CHANGED
@@ -1,8 +1,6 @@
1
  <template>
2
- <a-config-provider :theme="appTheme">
3
- <Header/>
4
- <router-view class="content" />
5
- </a-config-provider>
6
  <!-- <Footer/> -->
7
 
8
  <!-- <a-layout>
@@ -21,15 +19,6 @@
21
  import Header from "@/views/Header.vue";
22
  import Footer from "@/views/Footer.vue";
23
 
24
- // 全局主题:统一圆角与控件高度,配合玻璃拟态(Liquid Glass)
25
- const appTheme = {
26
- token: {
27
- colorPrimary: '#1677ff',
28
- borderRadius: 14,
29
- controlHeight: 38,
30
- },
31
- };
32
-
33
  // import * as api from "@/client";
34
  import { onBeforeMount, onMounted, watch, CSSProperties, ref} from "vue";
35
  import {useSettingsStore} from "@/stores/config.ts";
 
1
  <template>
2
+ <Header/>
3
+ <router-view class="content" />
 
 
4
  <!-- <Footer/> -->
5
 
6
  <!-- <a-layout>
 
19
  import Header from "@/views/Header.vue";
20
  import Footer from "@/views/Footer.vue";
21
 
 
 
 
 
 
 
 
 
 
22
  // import * as api from "@/client";
23
  import { onBeforeMount, onMounted, watch, CSSProperties, ref} from "vue";
24
  import {useSettingsStore} from "@/stores/config.ts";
frontend/src/assets/ball.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:91eaffeec742a30f475cf5e777e1605e62d3c1547b64a891c15f9a5431460b8a
3
- size 22455
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edd650ec984e26b5fde217f273e6758d0862fc856b5333e678fa0b578374e8b9
3
+ size 23084
frontend/src/config/client_config.ts CHANGED
@@ -5,7 +5,7 @@ import router from "@/router";
5
 
6
  const { wsCache } = useCache();
7
 
8
- export const test_server = '127.0.0.1:8000'
9
  // export const test_server = '59.110.18.232:19001'
10
 
11
  axios.defaults.baseURL = import.meta.env.PROD ? '/api/v1' : `http://${test_server}/api/v1`;
 
5
 
6
  const { wsCache } = useCache();
7
 
8
+ export const test_server = '127.0.0.1:8848'
9
  // export const test_server = '59.110.18.232:19001'
10
 
11
  axios.defaults.baseURL = import.meta.env.PROD ? '/api/v1' : `http://${test_server}/api/v1`;
frontend/src/i18n/index.ts DELETED
@@ -1,35 +0,0 @@
1
- import { createI18n } from 'vue-i18n'
2
-
3
- import en from './locales/en'
4
- import zh from './locales/zh'
5
-
6
- export type UiLocale = 'en' | 'zh'
7
-
8
- // 从持久化的 pinia 设置中读取界面语言,默认英文
9
- function getInitialLocale(): UiLocale {
10
- try {
11
- const raw = localStorage.getItem('settings')
12
- if (raw) {
13
- const parsed = JSON.parse(raw)
14
- const ui = parsed?.uiLanguage
15
- if (ui === 'en' || ui === 'zh') return ui
16
- }
17
- } catch (e) {
18
- // ignore
19
- }
20
- return 'zh'
21
- }
22
-
23
- const i18n = createI18n({
24
- legacy: false,
25
- globalInjection: true,
26
- locale: getInitialLocale(),
27
- fallbackLocale: 'en',
28
- messages: { en, zh },
29
- })
30
-
31
- export function setUiLocale(locale: UiLocale) {
32
- i18n.global.locale.value = locale
33
- }
34
-
35
- export default i18n
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
frontend/src/i18n/locales/en.ts DELETED
@@ -1,74 +0,0 @@
1
- export default {
2
- common: {
3
- cancel: 'Cancel',
4
- confirm: 'Confirm',
5
- reset: 'Reset',
6
- save: 'Save',
7
- error: 'Error',
8
- },
9
- lang: {
10
- zh: 'Chinese',
11
- en: 'English',
12
- auto: 'Auto',
13
- },
14
- welcome: {
15
- title: 'Welcome',
16
- subtitle: 'Click the button below to start a conversation',
17
- start: 'Start Conversation',
18
- startFailed: 'Failed to start the voice dialogue system',
19
- },
20
- settings: {
21
- title: 'Settings',
22
- entry: 'Settings',
23
- tabs: {
24
- main: 'Main',
25
- language: 'Language',
26
- advanced: 'Prompt',
27
- about: 'About',
28
- },
29
- about: {
30
- tagline: 'A real-time AI voice dialogue system',
31
- version: 'Version',
32
- modelsTitle: 'Models',
33
- llm: 'Language Model (LLM)',
34
- llmDesc: 'Qwen3-8B (Q6_K, GGUF) · via llama.cpp',
35
- asr: 'Speech Recognition (ASR)',
36
- asrDesc: 'Whisper medium (English) · FunASR SeACo-Paraformer + CT-Transformer (Chinese)',
37
- tts: 'Speech Synthesis (TTS)',
38
- ttsDesc: 'MoYoYo TTS (GPT-SoVITS) · Kokoro (English)',
39
- linksTitle: 'Repositories',
40
- repoApp: 'App & source code',
41
- repoVoices: 'Voice (tone) models',
42
- copyright: '© 2025 MoYoYo · Models belong to their respective owners',
43
- },
44
- general: {
45
- interfaceLanguage: 'Interface Language',
46
- interfaceLanguageHint: 'Language of the application interface.',
47
- },
48
- audio: {
49
- microphone: 'Microphone (Input Device)',
50
- microphoneHint: 'Choose the input device, e.g. an external microphone array.',
51
- systemDefault: 'System Default',
52
- channelsSuffix: 'ch',
53
- defaultSuffix: 'default',
54
- speaker: 'Speaker (Output Device)',
55
- speakerHint: 'Choose the output device for voice playback, e.g. an external speaker.',
56
- echoCancellation: 'Echo Cancellation',
57
- echoCancellationHint: 'Uses the system AEC on the default device. For an external array, echo is handled by the array hardware.',
58
- },
59
- recognition: {
60
- language: 'Recognition Language',
61
- languageHint: 'Language used for speech recognition (ASR).',
62
- },
63
- voice: {
64
- role: 'Voice',
65
- roleHint: 'The voice used for speech synthesis (TTS).',
66
- playSample: 'Play sample',
67
- },
68
- prompt: {
69
- title: 'System Prompt',
70
- hint: 'Customize the system prompt for each language.',
71
- },
72
- applyFailed: 'Failed to apply settings',
73
- },
74
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
frontend/src/i18n/locales/zh.ts DELETED
@@ -1,74 +0,0 @@
1
- export default {
2
- common: {
3
- cancel: '取消',
4
- confirm: '确认',
5
- reset: '重置',
6
- save: '保存',
7
- error: '错误',
8
- },
9
- lang: {
10
- zh: '中文',
11
- en: '英文',
12
- auto: '自动',
13
- },
14
- welcome: {
15
- title: '欢迎使用',
16
- subtitle: '点击下方按钮开始对话',
17
- start: '开始对话',
18
- startFailed: '启动语音对话系统失败',
19
- },
20
- settings: {
21
- title: '设置',
22
- entry: '设置',
23
- tabs: {
24
- main: '常用',
25
- language: '语言',
26
- advanced: 'Prompt',
27
- about: '关于',
28
- },
29
- about: {
30
- tagline: '实时 AI 语音对话系统',
31
- version: '版本',
32
- modelsTitle: '使用的模型',
33
- llm: '大语言模型 (LLM)',
34
- llmDesc: 'Qwen3-8B(Q6_K,GGUF)· 基于 llama.cpp',
35
- asr: '语音识别 (ASR)',
36
- asrDesc: 'Whisper medium(英文)· FunASR SeACo-Paraformer + CT-Transformer(中文)',
37
- tts: '语音合成 (TTS)',
38
- ttsDesc: 'MoYoYo TTS(GPT-SoVITS)· Kokoro(英文)',
39
- linksTitle: '开源仓库',
40
- repoApp: '应用与源码',
41
- repoVoices: '音色模型',
42
- copyright: '© 2025 MoYoYo · 各模型版权归原作者所有',
43
- },
44
- general: {
45
- interfaceLanguage: '界面语言',
46
- interfaceLanguageHint: '应用界面所使用的语言。',
47
- },
48
- audio: {
49
- microphone: '麦克风(输入设备)',
50
- microphoneHint: '选择输入设备,例如外置麦克风阵列。',
51
- systemDefault: '系统默认',
52
- channelsSuffix: '声道',
53
- defaultSuffix: '默认',
54
- speaker: '扬声器(输出设备)',
55
- speakerHint: '选择语音播放的输出设备,例如外置扬声器。',
56
- echoCancellation: '回音消除',
57
- echoCancellationHint: '默认设备使用系统 AEC;选择外置阵列时,回音由阵列硬件处理。',
58
- },
59
- recognition: {
60
- language: '识别语言',
61
- languageHint: '语音识别(ASR)所使用的语言。',
62
- },
63
- voice: {
64
- role: '音色',
65
- roleHint: '语音合成(TTS)所使用的音色。',
66
- playSample: '试听',
67
- },
68
- prompt: {
69
- title: '系统提示词',
70
- hint: '为每种语言自定义系统提示词。',
71
- },
72
- applyFailed: '应用设置失败',
73
- },
74
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
frontend/src/main.ts CHANGED
@@ -9,7 +9,6 @@ import './style.scss'
9
 
10
  import App from './App.vue'
11
  import router from './router'
12
- import i18n from './i18n'
13
 
14
 
15
  // import * as Sentry from "@sentry/browser";
@@ -29,5 +28,4 @@ createApp(App)
29
  .use(router)
30
  .use(Antd)
31
  .use(Vue3Lottie)
32
- .use(i18n)
33
  .mount('#app')
 
9
 
10
  import App from './App.vue'
11
  import router from './router'
 
12
 
13
 
14
  // import * as Sentry from "@sentry/browser";
 
28
  .use(router)
29
  .use(Antd)
30
  .use(Vue3Lottie)
 
31
  .mount('#app')
frontend/src/stores/config.ts CHANGED
@@ -8,11 +8,8 @@ export const useSettingsStore = defineStore({
8
  return {
9
  role: '',
10
  language: 'zh',
11
- uiLanguage: 'zh' as 'en' | 'zh',
12
  sider_open: true,
13
  echoCancel: true,
14
- inputDeviceIndex: null as number | null,
15
- outputDeviceIndex: null as number | null,
16
  }
17
  },
18
  actions: {
 
8
  return {
9
  role: '',
10
  language: 'zh',
 
11
  sider_open: true,
12
  echoCancel: true,
 
 
13
  }
14
  },
15
  actions: {
frontend/src/style.scss CHANGED
@@ -173,68 +173,3 @@ $FormItemWidth: 1022px;
173
  .ant-layout-sider-collapsed .ant-menu-submenu-title {
174
  display: none;
175
  }
176
-
177
- /* ============================================================
178
- Liquid Glass —— 苹果风格玻璃拟态(全局)
179
- 半透明 + 背景模糊 + 柔和描边/阴影;圆角由主题 token 统一
180
- ============================================================ */
181
-
182
- /* 弹窗使用 Ant 内置 fade 过渡(纯 opacity 动画、无 transform),
183
- 避免 transform 动画期间 backdrop-filter 失效导致的闪烁;面板与其模糊一起平滑淡入 */
184
-
185
- /* 弹窗:磨砂玻璃面板 */
186
- .ant-modal .ant-modal-content {
187
- background: rgba(255, 255, 255, 0.62) !important;
188
- backdrop-filter: blur(28px) saturate(140%);
189
- -webkit-backdrop-filter: blur(28px) saturate(140%);
190
- border: 1px solid rgba(255, 255, 255, 0.6);
191
- border-radius: 22px !important;
192
- box-shadow: 0 16px 48px rgba(31, 38, 135, 0.18);
193
- }
194
- .ant-modal .ant-modal-header {
195
- background: transparent !important;
196
- }
197
- /* 遮罩:整屏磨砂——轻微变暗 + 背景模糊。
198
- 遮罩用 ant-fade(opacity)淡入,模糊随之平滑出现,背景文字与画面一起糊掉,不再"闪出去" */
199
- .ant-modal-mask {
200
- background: rgba(20, 22, 30, 0.12) !important;
201
- backdrop-filter: blur(14px) saturate(120%);
202
- -webkit-backdrop-filter: blur(14px) saturate(120%);
203
- }
204
-
205
- /* 输入类控件:半透明玻璃 */
206
- .ant-select .ant-select-selector,
207
- .ant-input,
208
- textarea.ant-input,
209
- .ant-input-affix-wrapper {
210
- background: rgba(255, 255, 255, 0.45) !important;
211
- backdrop-filter: blur(8px);
212
- -webkit-backdrop-filter: blur(8px);
213
- border: 1px solid rgba(255, 255, 255, 0.7) !important;
214
- }
215
-
216
- /* 按钮:统一形状(圆角来自 token)+ 柔和阴影;默认按钮做玻璃质感,主按钮保持实色
217
- 文本/链接按钮(如音色试听的小喇叭)保持透明无阴影 */
218
- .ant-btn:not(.ant-btn-text):not(.ant-btn-link) {
219
- box-shadow: 0 2px 10px rgba(31, 38, 135, 0.10);
220
- }
221
- .ant-btn-default {
222
- background: rgba(255, 255, 255, 0.5) !important;
223
- border: 1px solid rgba(255, 255, 255, 0.75) !important;
224
- backdrop-filter: blur(8px);
225
- -webkit-backdrop-filter: blur(8px);
226
- }
227
- .ant-btn-text {
228
- box-shadow: none !important;
229
- background: transparent !important;
230
- }
231
-
232
- /* 分段单选(中文/英文 等)两端圆角,去掉方正感 */
233
- .ant-radio-group-solid .ant-radio-button-wrapper:first-child {
234
- border-top-left-radius: 12px;
235
- border-bottom-left-radius: 12px;
236
- }
237
- .ant-radio-group-solid .ant-radio-button-wrapper:last-child {
238
- border-top-right-radius: 12px;
239
- border-bottom-right-radius: 12px;
240
- }
 
173
  .ant-layout-sider-collapsed .ant-menu-submenu-title {
174
  display: none;
175
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
frontend/src/views/Home/Components/ChatText.vue CHANGED
@@ -69,13 +69,9 @@ watch(() => props.chatContent, (newVal, oldVal) => {
69
  <style lang="scss" scoped>
70
  .talk-wrapper {
71
  width: auto;
72
- width: 100%;
73
- max-width: 1000px;
74
- margin: 0 auto;
75
- box-sizing: border-box;
76
- height: calc(100vh - 150px);
77
- overflow-y: auto;
78
- padding: 20px 32px 0;
79
  display: flex;
80
  flex-direction: column;
81
  align-items: flex-start;
@@ -89,15 +85,13 @@ watch(() => props.chatContent, (newVal, oldVal) => {
89
  justify-content: flex-start;
90
  align-items: flex-start;
91
  .text-left {
92
- max-width: 88%;
93
  color: #222;
94
  font-size: 16px;
95
  font-weight: 400;
96
  text-align: left;
97
- line-height: 1.8;
98
  margin-left: 12px;
99
  margin-top: 6px;
100
- word-break: break-word;
101
  }
102
  }
103
 
@@ -109,18 +103,16 @@ watch(() => props.chatContent, (newVal, oldVal) => {
109
  align-items: flex-start;
110
 
111
  .text-right {
112
- max-width: 80%;
113
  color: #444;
114
  font-size: 16px;
115
  font-weight: 400;
116
- text-align: start;
117
- line-height: 1.8;
118
  margin-right: 12px;
119
  background: #ccc;
120
  border-radius: 8px;
121
  border-top-right-radius: 0;
122
- padding: 8px 12px;
123
- word-break: break-word;
124
  }
125
  }
126
  }
 
69
  <style lang="scss" scoped>
70
  .talk-wrapper {
71
  width: auto;
72
+ height: calc(100vh - 100px);
73
+ overflow-y: scroll;
74
+ padding: 20px 240px 0 240px;
 
 
 
 
75
  display: flex;
76
  flex-direction: column;
77
  align-items: flex-start;
 
85
  justify-content: flex-start;
86
  align-items: flex-start;
87
  .text-left {
 
88
  color: #222;
89
  font-size: 16px;
90
  font-weight: 400;
91
  text-align: left;
92
+ line-height: 2;
93
  margin-left: 12px;
94
  margin-top: 6px;
 
95
  }
96
  }
97
 
 
103
  align-items: flex-start;
104
 
105
  .text-right {
 
106
  color: #444;
107
  font-size: 16px;
108
  font-weight: 400;
109
+ text-align: end;
110
+ line-height: 2;
111
  margin-right: 12px;
112
  background: #ccc;
113
  border-radius: 8px;
114
  border-top-right-radius: 0;
115
+ padding: 8px;
 
116
  }
117
  }
118
  }
frontend/src/views/Home/index.vue CHANGED
@@ -387,7 +387,6 @@ const toggleText = () => {
387
  .actions {
388
  width: 100%;
389
  height: 100px;
390
- margin-bottom: 32px;
391
 
392
  display: flex;
393
  justify-content: space-between;
@@ -402,17 +401,7 @@ const toggleText = () => {
402
  height: 96px;
403
  display: flex;
404
  justify-content: space-around;
405
- align-items: center;
406
-
407
- // Liquid Glass 圆形按钮(与 Welcome 设置按钮统一)
408
- :deep(.ant-btn) {
409
- border-radius: 50% !important;
410
- background: rgba(255, 255, 255, 0.5) !important;
411
- border: 1px solid rgba(255, 255, 255, 0.7) !important;
412
- backdrop-filter: blur(10px);
413
- -webkit-backdrop-filter: blur(10px);
414
- box-shadow: 0 4px 16px rgba(31, 38, 135, 0.12);
415
- }
416
  }
417
  .download-wrapper {
418
  width: 64px;
 
387
  .actions {
388
  width: 100%;
389
  height: 100px;
 
390
 
391
  display: flex;
392
  justify-content: space-between;
 
401
  height: 96px;
402
  display: flex;
403
  justify-content: space-around;
404
+ align-items: flex-start;
 
 
 
 
 
 
 
 
 
 
405
  }
406
  .download-wrapper {
407
  width: 64px;
frontend/src/views/Welcome/Components/SettingsModal.vue DELETED
@@ -1,581 +0,0 @@
1
- <script setup lang="ts">
2
- import { ref, reactive, computed, watch, onUnmounted } from "vue";
3
- import { Modal } from "ant-design-vue";
4
- import { SoundTwoTone, SoundOutlined, TranslationOutlined, AudioOutlined } from "@ant-design/icons-vue";
5
- import { useI18n } from "vue-i18n";
6
- import axios from "axios";
7
- import { useSettingsStore } from "@/stores/config.ts";
8
- import { setUiLocale, UiLocale } from "@/i18n";
9
-
10
- const props = defineProps({
11
- open: { type: Boolean, default: false },
12
- });
13
- const emit = defineEmits(["update:open"]);
14
-
15
- const { t } = useI18n();
16
- const base_url = axios.defaults.baseURL;
17
- const settingsStore = useSettingsStore();
18
-
19
- const activeTab = ref<string>("main");
20
- const loading = ref<boolean>(false);
21
- const appVersion = "1.2.0";
22
-
23
- // ---- 各项设置的本地状态(打开时从 store / 后端同步)----
24
- const uiLanguage = ref<UiLocale>((settingsStore.$state.uiLanguage as UiLocale) ?? "en");
25
- const recognitionLanguage = ref<string>(settingsStore.$state.language || "zh");
26
- const echoCancel = ref<boolean>(settingsStore.$state.echoCancel ?? true);
27
- const inputDeviceIndex = ref<number | null>(settingsStore.$state.inputDeviceIndex ?? null);
28
- const outputDeviceIndex = ref<number | null>(settingsStore.$state.outputDeviceIndex ?? null);
29
- const role = ref<string>(settingsStore.$state.role || "");
30
-
31
- const languages = reactive<string[]>([]);
32
- const inputDevices = reactive<any[]>([]);
33
- const outputDevices = reactive<any[]>([]);
34
- const roles = reactive<any[]>([]);
35
-
36
- // ---- Prompt ----
37
- const promptLang = ref<string>("zh");
38
- const default_prompt_en = ref<string>("");
39
- const default_prompt_zh = ref<string>("");
40
- const current_prompt_en = ref<string>("");
41
- const current_prompt_zh = ref<string>("");
42
-
43
- const filteredRoles = computed(() => {
44
- const is_chinese = recognitionLanguage.value === "zh";
45
- return roles.filter((r) => r["is_chinese_voice"] === is_chinese);
46
- });
47
-
48
- // 切换识别语言后,自动选中第一个匹配音色
49
- watch(
50
- () => recognitionLanguage.value,
51
- () => {
52
- if (filteredRoles.value.length > 0) {
53
- const exists = filteredRoles.value.find((r) => r["id"] === role.value);
54
- role.value = exists ? role.value : filteredRoles.value[0]["id"];
55
- } else {
56
- role.value = "";
57
- }
58
- }
59
- );
60
-
61
- // 界面语言即时生效(让用户立刻看到切换效果)
62
- watch(uiLanguage, (v) => setUiLocale(v));
63
-
64
- // ---- 数据加载 ----
65
- const fetchASRLanguages = async () => {
66
- try {
67
- const res = await fetch(`${base_url}/asr/languages`);
68
- const data = await res.json();
69
- if (data?.languages) {
70
- languages.splice(0, languages.length, ...data.languages);
71
- // 优先沿用本地已保存/默认的识别语言(默认中文),不被后端当前值覆盖
72
- const saved = settingsStore.$state.language;
73
- recognitionLanguage.value = saved && data.languages.includes(saved)
74
- ? saved
75
- : (data.languages.includes('zh') ? 'zh' : data.languages[0]);
76
- }
77
- } catch (e) {
78
- console.error("Error fetching ASR languages:", e);
79
- }
80
- };
81
-
82
- const fetchTTSRoles = async () => {
83
- try {
84
- const res = await fetch(`${base_url}/tts/models`);
85
- const data = await res.json();
86
- if (data?.models) {
87
- roles.splice(0, roles.length, ...data.models);
88
- if (data.current_model_id) role.value = data.current_model_id;
89
- }
90
- } catch (e) {
91
- console.error("Error fetching TTS roles:", e);
92
- }
93
- };
94
-
95
- const fetchInputDevices = async () => {
96
- try {
97
- const res = await fetch(`${base_url}/system/audio-devices`);
98
- const data = await res.json();
99
- if (data?.devices) {
100
- inputDevices.splice(0, inputDevices.length, ...data.devices);
101
- const saved = settingsStore.$state.inputDeviceIndex;
102
- const exists = saved != null && data.devices.some((d: any) => d.index === saved);
103
- inputDeviceIndex.value = exists ? saved : (data.current_device_index ?? null);
104
- }
105
- if (data?.output_devices) {
106
- outputDevices.splice(0, outputDevices.length, ...data.output_devices);
107
- const saved = settingsStore.$state.outputDeviceIndex;
108
- const exists = saved != null && data.output_devices.some((d: any) => d.index === saved);
109
- outputDeviceIndex.value = exists ? saved : (data.current_output_device_index ?? null);
110
- }
111
- } catch (e) {
112
- console.error("Error fetching input devices:", e);
113
- }
114
- };
115
-
116
- // 当前实际生效的 ASR 引擎(由后端返回,区分 Qwen / FunASR+Whisper 等)
117
- const asrEngineName = ref<string>("");
118
- const asrEngineKeys = ref<string[]>([]);
119
- const ASR_ENGINE_LINKS: Record<string, { name: string; url: string }> = {
120
- qwen: { name: "Qwen3-ASR", url: "https://huggingface.co/Qwen/Qwen3-ASR-1.7B" },
121
- whisper: { name: "whisper.cpp", url: "https://github.com/ggerganov/whisper.cpp" },
122
- funasr: { name: "FunASR", url: "https://github.com/modelscope/FunASR" },
123
- };
124
- const asrEngineLinks = computed(() => {
125
- const keys = asrEngineKeys.value.length ? asrEngineKeys.value : ["whisper", "funasr"];
126
- return keys.map((k) => ASR_ENGINE_LINKS[k]).filter(Boolean);
127
- });
128
- const fetchAsrEngine = async () => {
129
- try {
130
- const res = await fetch(`${base_url}/system/asr-engine`);
131
- const data = await res.json();
132
- if (data?.display_name) asrEngineName.value = data.display_name;
133
- if (data?.mappings) asrEngineKeys.value = [...new Set(Object.values(data.mappings) as string[])].sort();
134
- } catch (e) {
135
- console.error("Error fetching ASR engine:", e);
136
- }
137
- };
138
-
139
- const fetchPrompts = async () => {
140
- try {
141
- const [cur, def] = await Promise.all([
142
- fetch(`${base_url}/settings/settings/prompts`).then((r) => r.json()),
143
- fetch(`${base_url}/settings/settings/prompts/default`).then((r) => r.json()),
144
- ]);
145
- if (cur) {
146
- current_prompt_en.value = cur.english_prompt;
147
- current_prompt_zh.value = cur.chinese_prompt;
148
- }
149
- if (def) {
150
- default_prompt_en.value = def.english_prompt;
151
- default_prompt_zh.value = def.chinese_prompt;
152
- }
153
- } catch (e) {
154
- console.error("Error fetching prompts:", e);
155
- }
156
- };
157
-
158
- const resetPrompt = (lang: string) => {
159
- if (lang === "en") current_prompt_en.value = default_prompt_en.value;
160
- else current_prompt_zh.value = default_prompt_zh.value;
161
- };
162
-
163
- // ---- 提交 / 取消 ----
164
- const applySettings = async () => {
165
- loading.value = true;
166
- try {
167
- // 1. 持久化到本地 store
168
- settingsStore.$state.uiLanguage = uiLanguage.value;
169
- settingsStore.$state.language = recognitionLanguage.value;
170
- settingsStore.$state.role = role.value || "";
171
- settingsStore.$state.echoCancel = echoCancel.value;
172
- settingsStore.$state.inputDeviceIndex = inputDeviceIndex.value;
173
- settingsStore.$state.outputDeviceIndex = outputDeviceIndex.value;
174
-
175
- // 输出设备保存即生效(会话中修改下一句生效)
176
- await fetch(`${base_url}/system/audio-output-device`, {
177
- method: "POST",
178
- headers: { "Content-Type": "application/json" },
179
- body: JSON.stringify({ output_device_index: outputDeviceIndex.value }),
180
- });
181
-
182
- // 2. 下发 TTS 音色 + ASR 语言
183
- if (role.value) {
184
- const r1 = await fetch(`${base_url}/tts/models/load`, {
185
- method: "POST",
186
- headers: { "Content-Type": "application/json" },
187
- body: JSON.stringify({ model_id: role.value }),
188
- });
189
- if (!r1.ok) throw new Error(`TTS load failed: ${r1.status}`);
190
- }
191
- const r2 = await fetch(`${base_url}/asr/instance/create`, {
192
- method: "POST",
193
- headers: { "Content-Type": "application/json" },
194
- body: JSON.stringify({ language: recognitionLanguage.value }),
195
- });
196
- if (!r2.ok) throw new Error(`ASR set failed: ${r2.status}`);
197
-
198
- // 3. 保存 Prompt
199
- await fetch(`${base_url}/settings/settings/prompts`, {
200
- method: "POST",
201
- headers: { "Content-Type": "application/json" },
202
- body: JSON.stringify({
203
- chinese_prompt: current_prompt_zh.value,
204
- english_prompt: current_prompt_en.value,
205
- }),
206
- });
207
-
208
- emit("update:open", false);
209
- } catch (err) {
210
- console.error("Error applying settings:", err);
211
- Modal.error({ title: t("common.error"), content: t("settings.applyFailed") });
212
- } finally {
213
- loading.value = false;
214
- }
215
- };
216
-
217
- const handleCancel = () => {
218
- // 还原本地状态与界面语言
219
- uiLanguage.value = (settingsStore.$state.uiLanguage as UiLocale) ?? "en";
220
- setUiLocale(uiLanguage.value);
221
- recognitionLanguage.value = settingsStore.$state.language || "zh";
222
- echoCancel.value = settingsStore.$state.echoCancel ?? true;
223
- inputDeviceIndex.value = settingsStore.$state.inputDeviceIndex ?? null;
224
- outputDeviceIndex.value = settingsStore.$state.outputDeviceIndex ?? null;
225
- role.value = settingsStore.$state.role || "";
226
- emit("update:open", false);
227
- };
228
-
229
- watch(
230
- () => props.open,
231
- (isOpen) => {
232
- if (isOpen) {
233
- activeTab.value = "main";
234
- uiLanguage.value = (settingsStore.$state.uiLanguage as UiLocale) ?? "en";
235
- fetchASRLanguages();
236
- fetchTTSRoles();
237
- fetchInputDevices();
238
- fetchPrompts();
239
- fetchAsrEngine();
240
- }
241
- }
242
- );
243
-
244
- // ---- 音色试听 ----
245
- const currentPlayingId = ref<string | null>(null);
246
- const currentAudio = ref<HTMLAudioElement | null>(null);
247
- const isPlaying = (id: string) => currentPlayingId.value === id;
248
-
249
- const playRefAudio = async (id: string, e: Event) => {
250
- e.stopPropagation();
251
- e.preventDefault();
252
- try {
253
- if (currentPlayingId.value === id && currentAudio.value) {
254
- currentAudio.value.pause();
255
- currentAudio.value = null;
256
- currentPlayingId.value = null;
257
- return;
258
- }
259
- if (currentAudio.value) {
260
- currentAudio.value.pause();
261
- currentAudio.value = null;
262
- }
263
- const audio = new Audio(`${base_url}/tts/models/${id}/reference-audio`);
264
- audio.addEventListener("ended", () => {
265
- currentPlayingId.value = null;
266
- currentAudio.value = null;
267
- });
268
- await audio.play();
269
- currentPlayingId.value = id;
270
- currentAudio.value = audio;
271
- } catch (err) {
272
- currentPlayingId.value = null;
273
- currentAudio.value = null;
274
- }
275
- };
276
-
277
- onUnmounted(() => {
278
- if (currentAudio.value) currentAudio.value.pause();
279
- });
280
- </script>
281
-
282
- <template>
283
- <a-modal
284
- :open="props.open"
285
- :title="t('settings.title')"
286
- :mask-closable="false"
287
- :closable="true"
288
- :width="600"
289
- centered
290
- transition-name="ant-fade"
291
- @cancel="handleCancel"
292
- @update:open="(v: boolean) => emit('update:open', v)"
293
- >
294
- <template #footer>
295
- <a-button key="back" @click="handleCancel">{{ t('common.cancel') }}</a-button>
296
- <a-button key="confirm" type="primary" :loading="loading" @click="applySettings">
297
- {{ t('common.confirm') }}
298
- </a-button>
299
- </template>
300
-
301
- <a-tabs v-model:activeKey="activeTab" class="settings-tabs">
302
- <!-- 常用:输入源 + 回音消除 + 音色(大家最关心的) -->
303
- <a-tab-pane key="main" :tab="t('settings.tabs.main')">
304
- <div class="tab-body">
305
- <div class="setting-row">
306
- <label>{{ t('settings.audio.microphone') }}</label>
307
- <a-select v-model:value="inputDeviceIndex" style="width: 100%;">
308
- <a-select-option :value="null">{{ t('settings.audio.systemDefault') }}</a-select-option>
309
- <a-select-option v-for="dev in inputDevices" :value="dev.index" :key="dev.index">
310
- {{ dev.name }}<template v-if="dev.max_input_channels > 1"> ({{ dev.max_input_channels }}{{ t('settings.audio.channelsSuffix') }})</template><template v-if="dev.is_default"> · {{ t('settings.audio.defaultSuffix') }}</template>
311
- </a-select-option>
312
- </a-select>
313
- </div>
314
- <div class="setting-row">
315
- <label>{{ t('settings.audio.speaker') }}</label>
316
- <a-select v-model:value="outputDeviceIndex" style="width: 100%;">
317
- <a-select-option :value="null">{{ t('settings.audio.systemDefault') }}</a-select-option>
318
- <a-select-option v-for="dev in outputDevices" :value="dev.index" :key="dev.index">
319
- {{ dev.name }}<template v-if="dev.is_default"> · {{ t('settings.audio.defaultSuffix') }}</template>
320
- </a-select-option>
321
- </a-select>
322
- </div>
323
- <div class="setting-row">
324
- <div class="row-inline">
325
- <label>{{ t('settings.audio.echoCancellation') }}</label>
326
- <a-switch v-model:checked="echoCancel" />
327
- </div>
328
- </div>
329
- <div class="setting-row">
330
- <label>{{ t('settings.voice.role') }}</label>
331
- <a-radio-group v-model:value="role" class="voice-group">
332
- <a-radio v-for="r in filteredRoles" :value="r['id']" :key="r['id']" class="voice-radio">
333
- <span class="voice-name">{{ r['character_name'] }}</span>
334
- <a-button
335
- type="text"
336
- class="audio-play-btn"
337
- :class="{ playing: isPlaying(r['id']) }"
338
- @click="playRefAudio(r['id'], $event)"
339
- >
340
- <SoundTwoTone v-if="isPlaying(r['id'])" style="font-size: 16px; color: #52c41a;" />
341
- <SoundOutlined v-else style="font-size: 16px; color: #1890ff;" />
342
- </a-button>
343
- </a-radio>
344
- </a-radio-group>
345
- </div>
346
- </div>
347
- </a-tab-pane>
348
-
349
- <!-- 语言:界面语言 + 识别语言 -->
350
- <a-tab-pane key="language" :tab="t('settings.tabs.language')">
351
- <div class="tab-body">
352
- <div class="setting-row">
353
- <label><TranslationOutlined class="label-icon" />{{ t('settings.general.interfaceLanguage') }}</label>
354
- <a-select v-model:value="uiLanguage" style="width: 100%;">
355
- <a-select-option value="zh">{{ t('lang.zh') }}</a-select-option>
356
- <a-select-option value="en">{{ t('lang.en') }}</a-select-option>
357
- </a-select>
358
- <p class="hint">{{ t('settings.general.interfaceLanguageHint') }}</p>
359
- </div>
360
- <div class="setting-row">
361
- <label><AudioOutlined class="label-icon" />{{ t('settings.recognition.language') }}</label>
362
- <a-select v-model:value="recognitionLanguage" style="width: 100%;">
363
- <a-select-option v-for="lan in languages" :value="lan" :key="lan">
364
- {{ t('lang.' + lan) }}
365
- </a-select-option>
366
- </a-select>
367
- <p class="hint">{{ t('settings.recognition.languageHint') }}</p>
368
- </div>
369
- </div>
370
- </a-tab-pane>
371
-
372
- <!-- 高级:系统提示词 -->
373
- <a-tab-pane key="advanced" :tab="t('settings.tabs.advanced')">
374
- <div class="tab-body">
375
- <div class="setting-row">
376
- <label>{{ t('settings.prompt.title') }}</label>
377
- <a-radio-group button-style="solid" size="small" v-model:value="promptLang" style="margin-bottom: 12px;">
378
- <a-radio-button value="zh">{{ t('lang.zh') }}</a-radio-button>
379
- <a-radio-button value="en">{{ t('lang.en') }}</a-radio-button>
380
- </a-radio-group>
381
- <div v-show="promptLang === 'zh'">
382
- <a-textarea v-model:value="current_prompt_zh" :placeholder="default_prompt_zh"
383
- :auto-size="{ minRows: 6, maxRows: 10 }" show-count :maxlength="2000" allow-clear />
384
- <a-button size="small" @click="resetPrompt('zh')" style="margin-top: 12px;">{{ t('common.reset') }}</a-button>
385
- </div>
386
- <div v-show="promptLang === 'en'">
387
- <a-textarea v-model:value="current_prompt_en" :placeholder="default_prompt_en"
388
- :auto-size="{ minRows: 6, maxRows: 10 }" show-count :maxlength="2000" allow-clear />
389
- <a-button size="small" @click="resetPrompt('en')" style="margin-top: 12px;">{{ t('common.reset') }}</a-button>
390
- </div>
391
- </div>
392
- </div>
393
- </a-tab-pane>
394
-
395
- <!-- 关于 -->
396
- <a-tab-pane key="about" :tab="t('settings.tabs.about')">
397
- <div class="tab-body about">
398
- <div class="about-head">
399
- <div class="about-name">Voice Dialogue</div>
400
- <div class="about-ver">{{ t('settings.about.version') }} {{ appVersion }}</div>
401
- <div class="about-tagline">{{ t('settings.about.tagline') }}</div>
402
- </div>
403
-
404
- <div class="about-section">
405
- <div class="about-section-title">{{ t('settings.about.modelsTitle') }}</div>
406
- <div class="about-item">
407
- <div class="about-item-label">{{ t('settings.about.llm') }}</div>
408
- <div class="about-item-desc">
409
- {{ t('settings.about.llmDesc') }}
410
- <a href="https://huggingface.co/Qwen/Qwen3-8B" target="_blank" rel="noopener">Qwen3 ↗</a>
411
- </div>
412
- </div>
413
- <div class="about-item">
414
- <div class="about-item-label">{{ t('settings.about.asr') }}</div>
415
- <div class="about-item-desc">
416
- {{ asrEngineName || t('settings.about.asrDesc') }}
417
- <a v-for="link in asrEngineLinks" :key="link.url" :href="link.url" target="_blank"
418
- rel="noopener">{{ link.name }} ↗</a>
419
- </div>
420
- </div>
421
- <div class="about-item">
422
- <div class="about-item-label">{{ t('settings.about.tts') }}</div>
423
- <div class="about-item-desc">
424
- {{ t('settings.about.ttsDesc') }}
425
- <a href="https://github.com/RVC-Boss/GPT-SoVITS" target="_blank" rel="noopener">GPT-SoVITS ↗</a>
426
- <a href="https://huggingface.co/hexgrad/Kokoro-82M" target="_blank" rel="noopener">Kokoro ↗</a>
427
- </div>
428
- </div>
429
- </div>
430
-
431
- <div class="about-section">
432
- <div class="about-section-title">{{ t('settings.about.linksTitle') }}</div>
433
- <div class="about-item">
434
- <div class="about-item-label">{{ t('settings.about.repoApp') }}</div>
435
- <a class="about-link" href="https://huggingface.co/MoYoYoTech/VoiceDialogue" target="_blank" rel="noopener">huggingface.co/MoYoYoTech/VoiceDialogue</a>
436
- </div>
437
- <div class="about-item">
438
- <div class="about-item-label">{{ t('settings.about.repoVoices') }}</div>
439
- <a class="about-link" href="https://huggingface.co/MoYoYoTech/tone-models" target="_blank" rel="noopener">huggingface.co/MoYoYoTech/tone-models</a>
440
- </div>
441
- </div>
442
-
443
- <div class="about-copyright">{{ t('settings.about.copyright') }}</div>
444
- </div>
445
- </a-tab-pane>
446
- </a-tabs>
447
- </a-modal>
448
- </template>
449
-
450
- <style lang="scss" scoped>
451
- // 固定内容区高度,切换 Tab 时横条不再跳动
452
- .tab-body {
453
- height: 360px;
454
- overflow-y: auto;
455
- padding: 4px 8px 4px 2px;
456
- }
457
-
458
- .setting-row {
459
- margin-bottom: 20px;
460
-
461
- // 仅作用于字段标题(直接子 label),避免影响嵌套的 radio-button 等 <label>
462
- > label {
463
- display: block;
464
- font-size: 15px;
465
- font-weight: 500;
466
- margin-bottom: 8px;
467
-
468
- .label-icon {
469
- margin-right: 6px;
470
- color: #1890ff;
471
- }
472
- }
473
-
474
- .hint {
475
- font-size: 12px;
476
- color: #999;
477
- margin: 8px 0 0;
478
- }
479
-
480
- .row-inline {
481
- display: flex;
482
- align-items: center;
483
- justify-content: space-between;
484
- }
485
- }
486
-
487
- .voice-group {
488
- display: flex;
489
- flex-direction: column;
490
- margin-top: 8px;
491
- }
492
-
493
- /* 关于页 */
494
- .about {
495
- .about-head {
496
- text-align: center;
497
- margin-bottom: 24px;
498
-
499
- .about-name {
500
- font-size: 20px;
501
- font-weight: 600;
502
- }
503
- .about-ver {
504
- font-size: 13px;
505
- color: #888;
506
- margin-top: 2px;
507
- }
508
- .about-tagline {
509
- font-size: 12px;
510
- color: #999;
511
- margin-top: 4px;
512
- }
513
- }
514
-
515
- .about-section {
516
- margin-bottom: 20px;
517
-
518
- .about-section-title {
519
- font-size: 13px;
520
- font-weight: 600;
521
- color: #666;
522
- margin-bottom: 10px;
523
- }
524
- }
525
-
526
- .about-item {
527
- margin-bottom: 12px;
528
-
529
- .about-item-label {
530
- font-size: 14px;
531
- font-weight: 500;
532
- }
533
- .about-item-desc {
534
- font-size: 12px;
535
- color: #777;
536
- margin-top: 2px;
537
- line-height: 1.6;
538
-
539
- a { margin-left: 6px; }
540
- }
541
- }
542
-
543
- a {
544
- color: #1677ff;
545
- text-decoration: none;
546
- &:hover { text-decoration: underline; }
547
- }
548
-
549
- .about-link {
550
- font-size: 13px;
551
- word-break: break-all;
552
- }
553
-
554
- .about-copyright {
555
- margin-top: 16px;
556
- font-size: 11px;
557
- color: #aaa;
558
- text-align: center;
559
- }
560
- }
561
-
562
- .voice-radio {
563
- display: flex;
564
- align-items: center;
565
- height: 40px;
566
- line-height: 40px;
567
-
568
- .voice-name {
569
- margin-right: 8px;
570
- }
571
- }
572
-
573
- .audio-play-btn {
574
- padding: 0 6px;
575
- border-radius: 4px;
576
-
577
- &.playing {
578
- background-color: #f6ffed;
579
- }
580
- }
581
- </style>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
frontend/src/views/Welcome/index.vue CHANGED
@@ -2,66 +2,303 @@
2
 
3
  import router from "@/router.ts";
4
  import { useSettingsStore } from "@/stores/config.ts";
5
- import { ref, onMounted } from "vue";
6
  import { Modal } from 'ant-design-vue';
7
- import { AudioOutlined } from "@ant-design/icons-vue";
8
- import { useI18n } from "vue-i18n";
9
  import axios from "axios";
10
- import SettingsModal from "./Components/SettingsModal.vue";
11
- import setting from "@/assets/setting.png";
12
 
13
- const { t } = useI18n();
14
- const base_url = axios.defaults.baseURL;
15
- const settingsStore = useSettingsStore();
 
 
16
 
17
- const settingsOpen = ref<boolean>(false);
18
- const chatLoading = ref<boolean>(false);
19
 
20
- // 当前实际生效的 ASR 引擎,显示在设置按钮左侧
21
- const asrEngineName = ref<string>("");
22
  onMounted(async () => {
23
- try {
24
- const res = await fetch(`${base_url}/system/asr-engine`);
25
- const data = await res.json();
26
- if (data?.display_name) asrEngineName.value = data.display_name;
27
- } catch (e) {
28
- console.error("Error fetching ASR engine:", e);
29
- }
30
  });
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  const startAudioChat = async () => {
33
  try {
34
  chatLoading.value = true;
35
  const response = await fetch(`${base_url}/system/start`, {
36
  method: 'POST',
37
- headers: { 'Content-Type': 'application/json' },
 
 
38
  body: JSON.stringify({
39
- enable_echo_cancellation: settingsStore.$state.echoCancel ?? true,
40
- input_device_index: settingsStore.$state.inputDeviceIndex ?? null,
41
- output_device_index: settingsStore.$state.outputDeviceIndex ?? null
42
  })
43
  });
44
  if (!response.ok) {
45
  throw new Error(`HTTP error! status: ${response.status}`);
46
  }
47
- await response.json();
 
48
  return true;
49
  } catch (error) {
50
- console.error('Error starting audio chat:', error);
51
  return false;
52
  } finally {
53
  chatLoading.value = false;
54
  }
 
 
 
 
 
 
 
 
 
 
55
  };
56
 
57
- const chatAction = async () => {
58
- const ok = await startAudioChat();
59
- if (!ok) {
60
- Modal.error({ title: t('common.error'), content: t('welcome.startFailed') });
61
- return;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  }
63
- router.replace('/home');
64
  };
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  </script>
66
 
67
  <template>
@@ -69,67 +306,178 @@ const chatAction = async () => {
69
  <div class="content">
70
  <div class="inner-content">
71
  <div class="text-box">
72
- <div class="title">{{ t('welcome.title') }}</div>
73
- <div class="sub-title">{{ t('welcome.subtitle') }}</div>
 
 
 
 
74
  </div>
75
  <div class="btn-box">
76
  <a-button @click="chatAction" block :loading="chatLoading" type="primary" size="large">
77
- <span>{{ t('welcome.start') }}</span>
78
  </a-button>
79
  </div>
80
  </div>
81
  </div>
82
 
83
  <div class="actions">
84
- <div v-if="asrEngineName" class="asr-chip" :title="t('settings.about.asr')">
85
- <AudioOutlined />
86
- <span>{{ asrEngineName }}</span>
87
- </div>
88
- <a-button type="text" @click="settingsOpen = true" class="settings-btn"
89
- :title="t('settings.entry')">
90
  <template #icon>
91
  <img :src="setting" width="28" height="28" alt="settings" />
92
  </template>
93
  </a-button>
 
 
 
 
 
 
 
 
 
 
94
  </div>
95
 
96
- <SettingsModal v-model:open="settingsOpen" />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  </div>
98
  </template>
99
 
100
  <style lang="scss" scoped>
101
- .asr-chip {
102
- display: flex;
103
- align-items: center;
104
- gap: 8px;
105
- height: 38px;
106
- padding: 0 18px;
107
- margin-right: 16px;
108
- border-radius: 19px;
109
- color: rgba(0, 0, 0, 0.65);
110
- font-size: 13px;
111
- background: rgba(255, 255, 255, 0.5);
112
- border: 1px solid rgba(255, 255, 255, 0.7);
113
- backdrop-filter: blur(10px);
114
- -webkit-backdrop-filter: blur(10px);
115
- box-shadow: 0 4px 16px rgba(31, 38, 135, 0.12);
116
  }
117
 
118
- .settings-btn {
119
- width: 60px;
120
- height: 60px;
121
- margin-right: 24px;
122
- border-radius: 50% !important;
123
- background: rgba(255, 255, 255, 0.5) !important;
124
- border: 1px solid rgba(255, 255, 255, 0.7) !important;
125
- backdrop-filter: blur(10px);
126
- -webkit-backdrop-filter: blur(10px);
127
- box-shadow: 0 4px 16px rgba(31, 38, 135, 0.12);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
  display: flex;
 
129
  align-items: center;
130
- justify-content: center;
131
  }
132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
  .welcome-wrapper {
134
  width: 100%;
135
  height: 100%;
@@ -175,7 +523,6 @@ const chatAction = async () => {
175
  margin-top: 10px;
176
  }
177
  }
178
-
179
  .btn-box {
180
  width: 224px;
181
  height: 80px;
@@ -184,11 +531,10 @@ const chatAction = async () => {
184
  }
185
 
186
  .actions {
187
- width: 100%;
188
- height: 100px;
189
- margin-bottom: 32px;
190
  display: flex;
191
- align-items: center;
192
  justify-content: flex-end;
193
  }
194
  }
 
2
 
3
  import router from "@/router.ts";
4
  import { useSettingsStore } from "@/stores/config.ts";
5
+ import { onMounted, onUnmounted, ref, reactive, computed, watch, h } from "vue";
6
  import { Modal } from 'ant-design-vue';
7
+ import { SoundTwoTone, SoundOutlined } from "@ant-design/icons-vue";
 
8
  import axios from "axios";
9
+ import PromptText from "./Components/PromptText.vue";
 
10
 
11
+ const base_url = axios.defaults.baseURL
12
+
13
+ const settingsStore = useSettingsStore()
14
+
15
+ import setting from "@/assets/setting.png"
16
 
 
 
17
 
 
 
18
  onMounted(async () => {
19
+ await fetchASRLanguages();
20
+ await fetchTTSRoles();
 
 
 
 
 
21
  });
22
 
23
+ const chatAction = async () => {
24
+ const state = await startAudioChat();
25
+ if (!state) {
26
+ console.error('Failed to start audio chat system service');
27
+
28
+ Modal.error({
29
+ title: 'Error',
30
+ content: 'Failed to start audio chat system service',
31
+ });
32
+ return;
33
+ }
34
+ router.replace('/home')
35
+ }
36
+ const chatLoading = ref<boolean>(false);
37
+
38
  const startAudioChat = async () => {
39
  try {
40
  chatLoading.value = true;
41
  const response = await fetch(`${base_url}/system/start`, {
42
  method: 'POST',
43
+ headers: {
44
+ 'Content-Type': 'application/json',
45
+ },
46
  body: JSON.stringify({
47
+ enable_echo_cancellation: echoCancel.value
 
 
48
  })
49
  });
50
  if (!response.ok) {
51
  throw new Error(`HTTP error! status: ${response.status}`);
52
  }
53
+ const data = await response.json();
54
+ console.log('ASR Instance started successfully:', data);
55
  return true;
56
  } catch (error) {
57
+ console.error('Error starting ASR instance:', error);
58
  return false;
59
  } finally {
60
  chatLoading.value = false;
61
  }
62
+ }
63
+
64
+
65
+ const voiceModelOpen = ref<boolean>(false);
66
+ const modalLoading = ref<boolean>(false);
67
+
68
+ const handleVoiceModalCancel = () => {
69
+ voiceModelOpen.value = false;
70
+ role.value = settingsStore.$state.role;
71
+ language.value = settingsStore.$state.language;
72
  };
73
 
74
+ const handleVoiceModalSubmit = async () => {
75
+ console.log('Selected Language:', language.value);
76
+ console.log('Selected Role:', role.value);
77
+ console.log('Echo Cancel:', echoCancel.value);
78
+ settingsStore.$state.language = language.value;
79
+ settingsStore.$state.role = role.value || '';
80
+ settingsStore.$state.echoCancel = echoCancel.value;
81
+
82
+ await pushConfig(settingsStore.$state.role);
83
+ };
84
+
85
+ const pushConfig = async (model_id: string) => {
86
+ try {
87
+ modalLoading.value = true;
88
+ const response = await fetch(`${base_url}/tts/models/load`, {
89
+ method: 'POST',
90
+ headers: {
91
+ 'Content-Type': 'application/json',
92
+ },
93
+ body: JSON.stringify({
94
+ "model_id": model_id,
95
+ })
96
+ });
97
+ if (!response.ok) {
98
+ throw new Error(`HTTP error! status: ${response.status}`);
99
+ }
100
+ const data = await response.json();
101
+ console.log('Config pushed successfully:', data);
102
+
103
+ const response2 = await fetch(`${base_url}/asr/instance/create`, {
104
+ method: 'POST',
105
+ headers: {
106
+ 'Content-Type': 'application/json',
107
+ },
108
+ body: JSON.stringify({
109
+ "language": language.value,
110
+ })
111
+ });
112
+ if (!response2.ok) {
113
+ throw new Error(`HTTP error! status: ${response2.status}`);
114
+ }
115
+ const data2 = await response2.json();
116
+ console.log('ASR Language set successfully:', data2);
117
+
118
+ } catch (err) {
119
+ console.error('Error pushing config:', err);
120
+ Modal.error({
121
+ title: 'Error',
122
+ content: "Error config: " + JSON.stringify(err),
123
+ });
124
+ } finally {
125
+ modalLoading.value = false;
126
+ voiceModelOpen.value = false;
127
+ }
128
+
129
+ console.log('Selected Language:', language.value);
130
+ console.log('Selected Role:', role.value);
131
+ }
132
+
133
+
134
+ const language = ref<string>(settingsStore.$state.language || 'zh');
135
+ const languages = reactive([]);
136
+ const languageOptions = {
137
+ 'zh': 'Chinese',
138
+ 'en': 'English',
139
+ 'auto': 'Auto',
140
+ };
141
+ const role = ref<string>(settingsStore.$state.role || '');
142
+ const roles = reactive([])
143
+ const echoCancel = ref<boolean>(settingsStore.$state.echoCancel ?? true);
144
+
145
+ const radioStyle = reactive({
146
+ display: 'flex',
147
+ height: '40px',
148
+ lineHeight: '40px',
149
+ fontSize: '16px',
150
+ marginBottom: '8px',
151
+ });
152
+
153
+ const filteredRoles = computed(() => {
154
+ const is_chinese = language.value == 'zh';
155
+ return roles.filter(ro => ro['is_chinese_voice'] == is_chinese);
156
+ });
157
+
158
+ watch(
159
+ () => language.value,
160
+ (newLang) => {
161
+ // 语言切换后,自动选中第一个可用角色
162
+ if (filteredRoles.value.length > 0) {
163
+ const current_role_id = settingsStore.$state.role;
164
+ const current_role = filteredRoles.value.find(ro => ro['id'] == current_role_id);
165
+ if (current_role) {
166
+ role.value = current_role_id;
167
+ } else {
168
+ role.value = filteredRoles.value[0]['id'];
169
+ }
170
+ } else {
171
+ role.value = "";
172
+ }
173
+ }
174
+ );
175
+
176
+
177
+ const fetchTTSRoles = async () => {
178
+ try {
179
+ const response = await fetch(`${base_url}/tts/models`);
180
+ const data = await response.json()
181
+ if (data && data.models) {
182
+ // @ts-ignore
183
+ roles.splice(0, data.length, ...data.models)
184
+ console.log('Fetched TTS Roles:', roles);
185
+
186
+ if (data.current_model_id) {
187
+ role.value = data.current_model_id;
188
+ }
189
+ }
190
+ } catch (error) {
191
+ console.error('Error fetching TTS roles:', error);
192
+ }
193
+ };
194
+
195
+ const fetchASRLanguages = async () => {
196
+ try {
197
+ const response = await fetch(`${base_url}/asr/languages`);
198
+ const data = await response.json();
199
+ if (data && data.languages) {
200
+ // @ts-ignore
201
+ languages.splice(0, languages.length, ...data.languages);
202
+ console.log('Fetched ASR Languages:', data.languages);
203
+
204
+ if (data.current_asr_language) {
205
+ language.value = data.current_asr_language;
206
+ }
207
+ }
208
+ } catch (error) {
209
+ console.error('Error fetching ASR languages:', error);
210
+ }
211
+ };
212
+
213
+ const togglePopover = (item: string) => {
214
+ popoverVisible.value = !popoverVisible.value;
215
+ if (item == 'voice') {
216
+ voiceModelOpen.value = true;
217
+ } else if (item == 'prompt') {
218
+ promptModelOpen.value = true;
219
+ }
220
+ };
221
+
222
+ const popoverVisible = ref<boolean>(false);
223
+ const promptModelOpen = ref<boolean>(false);
224
+
225
+ // 音频播放状态管理
226
+ const currentPlayingId = ref<string | null>(null);
227
+ const currentAudio = ref<HTMLAudioElement | null>(null);
228
+
229
+ // 修改音频播放逻辑
230
+ const playRefAudio = async (id: string, e: Event) => {
231
+ console.log('Playing reference audio for role:', id);
232
+
233
+ e.stopPropagation();
234
+ e.preventDefault();
235
+
236
+ try {
237
+ // 如果点击的是当前正在播放的音频,则停止播放
238
+ if (currentPlayingId.value === id && currentAudio.value) {
239
+ currentAudio.value.pause();
240
+ currentAudio.value = null;
241
+ currentPlayingId.value = null;
242
+ console.log('Audio stopped');
243
+ return;
244
+ }
245
+
246
+ // 如果有其他音频正在播放,先停止它
247
+ if (currentAudio.value) {
248
+ currentAudio.value.pause();
249
+ currentAudio.value = null;
250
+ }
251
+
252
+ // 创建新的音频实例
253
+ const audio = new Audio(`${base_url}/tts/models/${id}/reference-audio`);
254
+
255
+ // 设置音频事件监听
256
+ audio.addEventListener('ended', () => {
257
+ currentPlayingId.value = null;
258
+ currentAudio.value = null;
259
+ });
260
+
261
+ audio.addEventListener('error', (error) => {
262
+ console.error('Audio playback error:', error);
263
+ currentPlayingId.value = null;
264
+ currentAudio.value = null;
265
+ Modal.error({
266
+ title: 'Error',
267
+ content: 'Failed to play reference audio',
268
+ });
269
+ });
270
+
271
+ // 开始播放
272
+ await audio.play();
273
+ currentPlayingId.value = id;
274
+ currentAudio.value = audio;
275
+ console.log('Audio played successfully');
276
+
277
+ } catch (error) {
278
+ console.error('Error playing audio:', error);
279
+ currentPlayingId.value = null;
280
+ currentAudio.value = null;
281
+ Modal.error({
282
+ title: 'Error',
283
+ content: 'Failed to play reference audio',
284
+ });
285
  }
 
286
  };
287
+
288
+ // 组���卸载时清理音频
289
+ onUnmounted(() => {
290
+ if (currentAudio.value) {
291
+ currentAudio.value.pause();
292
+ currentAudio.value = null;
293
+ }
294
+ currentPlayingId.value = null;
295
+ });
296
+
297
+ // 计算属性:判断是否正在播放
298
+ const isPlaying = (id: string) => {
299
+ return currentPlayingId.value === id;
300
+ };
301
+
302
  </script>
303
 
304
  <template>
 
306
  <div class="content">
307
  <div class="inner-content">
308
  <div class="text-box">
309
+ <div class="title">
310
+ 欢迎使用
311
+ </div>
312
+ <div class="sub-title">
313
+ 点击下方按钮开始对话
314
+ </div>
315
  </div>
316
  <div class="btn-box">
317
  <a-button @click="chatAction" block :loading="chatLoading" type="primary" size="large">
318
+ <span>开始对话</span>
319
  </a-button>
320
  </div>
321
  </div>
322
  </div>
323
 
324
  <div class="actions">
325
+ <!-- <a-button type="text" @click="toggleSider">sider</a-button> -->
326
+
327
+ <a-button v-if="false" type="text" @click="voiceModelOpen = true"
328
+ style="width:44px; height: 44px; margin-right:24px;margin-bottom: 24px;">
 
 
329
  <template #icon>
330
  <img :src="setting" width="28" height="28" alt="settings" />
331
  </template>
332
  </a-button>
333
+ <a-popover v-if="true" v-model:open="popoverVisible" trigger="click" ok-text="Yes" cancel-text="No" placement="bottomRight">
334
+ <template #content>
335
+ <div class="custom-popover-list">
336
+ <div class="custom-popover-item" @click="togglePopover('voice')">
337
+ 选择音色</div>
338
+ <div class="custom-popover-item" @click="togglePopover('prompt')">Prompt调试</div>
339
+ </div>
340
+ </template>
341
+ <img :src="setting" alt="item actions" style="width: 28px; height: 28px; margin-right:24px;margin-top: 16px;">
342
+ </a-popover>
343
  </div>
344
 
345
+ <a-modal v-model:open="voiceModelOpen" :title="null" :mask-closable="false" :closable="false" centered>
346
+ <template #footer>
347
+ <a-button key="back" @click="handleVoiceModalCancel">Cancel</a-button>
348
+ <a-button key="submit" type="primary" :loading="modalLoading" @click="handleVoiceModalSubmit">Submit</a-button>
349
+ </template>
350
+ <div class="languages">
351
+ <div class="echo-cancel-item">
352
+ <div style="display: flex; justify-content: space-between; align-items: center;">
353
+ <p style="margin: 0;">Enable Echo Cancellation:</p>
354
+ <a-switch v-model:checked="echoCancel" />
355
+ </div>
356
+ </div>
357
+ </div>
358
+ <div class="languages">
359
+ <div class="language-item">
360
+ <p>Select Language:</p>
361
+ <a-select v-model:value="language" style="width: 100%;">
362
+ <a-select-option v-for="lan in languages" :value="lan" :key="lan">
363
+ {{ languageOptions[lan] }}
364
+ </a-select-option>
365
+ </a-select>
366
+ </div>
367
+ </div>
368
+ <div class="languages">
369
+ <div class="role-item">
370
+ <p>Select voice Role:</p>
371
+ <a-radio-group size="large" v-model:value="role">
372
+ <a-radio v-for="r in filteredRoles" :style="radioStyle" :value="r['id']" :key="r['id']">
373
+ <div style="display: flex; justify-content: space-between; align-items: center; width:450px;">
374
+ {{ r['character_name'] }}
375
+ <a-button
376
+ :key="r['id']"
377
+ type="text"
378
+ @click="playRefAudio(r['id'], $event)"
379
+ class="audio-play-btn"
380
+ :class="{ 'playing': isPlaying(r['id']) }"
381
+ >
382
+ <SoundTwoTone
383
+ v-if="isPlaying(r['id'])"
384
+ style="font-size: 18px; color: #52c41a;"
385
+ class="playing-icon"
386
+ />
387
+ <SoundOutlined
388
+ v-else
389
+ style="font-size: 18px; color: #1890ff;"
390
+ />
391
+ </a-button>
392
+ </div>
393
+
394
+ </a-radio>
395
+ </a-radio-group>
396
+
397
+ </div>
398
+ </div>
399
+ </a-modal>
400
+
401
+ <PromptText v-model:open="promptModelOpen" />
402
  </div>
403
  </template>
404
 
405
  <style lang="scss" scoped>
406
+
407
+ .languages {
408
+ margin-top: 24px;
409
+ margin-bottom: 24px;
410
+
411
+ p {
412
+ font-size: 16px;
413
+ font-weight: 500;
414
+ margin-bottom: 8px;
415
+ }
 
 
 
 
 
416
  }
417
 
418
+ .audio-play-btn {
419
+ padding: 0px 8px;
420
+ padding-top:2px;
421
+ border-radius: 4px;
422
+ transition: all 0.2s;
423
+ height: 40px;
424
+
425
+ &:hover {
426
+ background-color: #f0f0f0;
427
+ }
428
+
429
+ &.playing {
430
+ background-color: #f6ffed;
431
+ border-color: #1890ff;
432
+
433
+ .playing-icon {
434
+ animation: pulse 1.5s infinite;
435
+ }
436
+ }
437
+ }
438
+
439
+ @keyframes pulse {
440
+ 0% {
441
+ opacity: 1;
442
+ transform: scale(1);
443
+ }
444
+ 50% {
445
+ opacity: 0.7;
446
+ transform: scale(1.1);
447
+ }
448
+ 100% {
449
+ opacity: 1;
450
+ transform: scale(1);
451
+ }
452
+ }
453
+
454
+ .btn-groups {
455
+ margin-top: 36px;
456
  display: flex;
457
+ justify-content: space-between;
458
  align-items: center;
 
459
  }
460
 
461
+ .custom-popover-list {
462
+ width: 92px;
463
+ margin: 0;
464
+ .custom-popover-item {
465
+ font-size: 14px;
466
+ line-height: 36px;
467
+ font-weight: 500;
468
+ color: #1e1e1e;
469
+ cursor: pointer;
470
+ border-radius: 4px;
471
+ padding: 0 8px;
472
+ margin: 0px -8px;
473
+ transition: background 0.2s;
474
+ }
475
+ .custom-popover-item:hover, .custom-popover-item:focus {
476
+ background: #e5e7eb;
477
+ }
478
+ }
479
+
480
+
481
  .welcome-wrapper {
482
  width: 100%;
483
  height: 100%;
 
523
  margin-top: 10px;
524
  }
525
  }
 
526
  .btn-box {
527
  width: 224px;
528
  height: 80px;
 
531
  }
532
 
533
  .actions {
534
+ width: 100%;;
535
+ height: 64px;
536
+
537
  display: flex;
 
538
  justify-content: flex-end;
539
  }
540
  }
main.py CHANGED
@@ -63,19 +63,6 @@ def main():
63
  parser = create_argument_parser()
64
  args = parser.parse_args()
65
 
66
- # 列出音频输入设备后退出
67
- if getattr(args, 'list_audio_devices', False):
68
- from voice_dialogue.audio.devices import list_input_devices
69
- devices = list_input_devices()
70
- print(f"\n可用音频输入设备 ({len(devices)}):")
71
- print(f"{'索引':>4} {'通道':>4} {'采样率':>7} {'默认':>4} 名称")
72
- for d in devices:
73
- default_mark = '✓' if d['is_default'] else ''
74
- print(f"{d['index']:>4} {d['max_input_channels']:>4} "
75
- f"{d['default_sample_rate']:>7} {default_mark:>4} {d['name']}")
76
- print("\n使用 --input-device <索引> 选择设备。")
77
- sys.exit(0)
78
-
79
  set_debug_mode(args.debug)
80
 
81
  print(f"""
@@ -91,10 +78,8 @@ VoiceDialogue - 语音对话系统
91
  if args.mode == 'cli':
92
  print(f"语言设置: {args.language}")
93
  print(f"说话人: {args.speaker}")
94
- if args.input_device is not None:
95
- print(f"输入设备索引: {args.input_device}")
96
  print("正在启动命令行语音对话系统...")
97
- launch_system(args.language, args.speaker, args.disable_echo_cancellation, args.input_device)
98
 
99
  elif args.mode == 'api':
100
  launch_api_server(
 
63
  parser = create_argument_parser()
64
  args = parser.parse_args()
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  set_debug_mode(args.debug)
67
 
68
  print(f"""
 
78
  if args.mode == 'cli':
79
  print(f"语言设置: {args.language}")
80
  print(f"说话人: {args.speaker}")
 
 
81
  print("正在启动命令行语音对话系统...")
82
+ launch_system(args.language, args.speaker, args.disable_echo_cancellation)
83
 
84
  elif args.mode == 'api':
85
  launch_api_server(
pyproject.toml CHANGED
@@ -1,6 +1,6 @@
1
  [project]
2
  name = "voice_dialogue"
3
- version = "1.2.0"
4
  description = "一个基于AI的智能语音对话系统,支持实时语音识别、自然语言处理和语音合成"
5
  readme = "README.md"
6
  requires-python = ">=3.11"
@@ -8,11 +8,11 @@ dependencies = [
8
  "cn2an>=0.5.23",
9
  "einops>=0.8.1",
10
  "en-core-web-sm",
11
- "fastapi==0.136.3",
12
  "ffmpeg-python>=0.2.0",
13
  "funasr-onnx==0.4.1",
14
  "g2p-en>=2.1.0",
15
- "huggingface-hub==0.36.2",
16
  "jieba>=0.42.1",
17
  "jieba-fast>=0.53",
18
  "langchain==0.2.17",
@@ -29,11 +29,10 @@ dependencies = [
29
  "pypinyin>=0.54.0",
30
  "pytorch-lightning==2.3.1",
31
  "pywhispercpp",
32
- "qwen-asr>=0.0.6",
33
  "silero-vad==5.1.2",
34
  "soundfile==0.13.1",
35
  "torch==2.3.1",
36
- "transformers==4.57.6",
37
  "uvicorn==0.34.3",
38
  "websockets>=15.0.1",
39
  "wordsegment>=1.3.1",
 
1
  [project]
2
  name = "voice_dialogue"
3
+ version = "1.0.0"
4
  description = "一个基于AI的智能语音对话系统,支持实时语音识别、自然语言处理和语音合成"
5
  readme = "README.md"
6
  requires-python = ">=3.11"
 
8
  "cn2an>=0.5.23",
9
  "einops>=0.8.1",
10
  "en-core-web-sm",
11
+ "fastapi==0.115.12",
12
  "ffmpeg-python>=0.2.0",
13
  "funasr-onnx==0.4.1",
14
  "g2p-en>=2.1.0",
15
+ "huggingface-hub==0.32.4",
16
  "jieba>=0.42.1",
17
  "jieba-fast>=0.53",
18
  "langchain==0.2.17",
 
29
  "pypinyin>=0.54.0",
30
  "pytorch-lightning==2.3.1",
31
  "pywhispercpp",
 
32
  "silero-vad==5.1.2",
33
  "soundfile==0.13.1",
34
  "torch==2.3.1",
35
+ "transformers==4.41.2",
36
  "uvicorn==0.34.3",
37
  "websockets>=15.0.1",
38
  "wordsegment>=1.3.1",
scripts/convert_tts_weights_to_safetensors.py DELETED
@@ -1,47 +0,0 @@
1
- """将 TTS 预训练权重 (.bin) 转换为 safetensors。
2
-
3
- qwen-asr 分支将 transformers 升级到 4.57+,其安全策略 (CVE-2025-32434)
4
- 拒绝在 torch < 2.6 上加载 pytorch_model.bin。transformers 加载时优先使用
5
- model.safetensors,因此本地转换一次即可,无需升级 torch。
6
-
7
- 用法: python scripts/convert_tts_weights_to_safetensors.py
8
- """
9
- from pathlib import Path
10
-
11
- import torch
12
- from safetensors.torch import save_file
13
-
14
- MOYOYO_PRETRAINED_PATH = Path(__file__).parent.parent / "assets" / "models" / "tts" / "moyoyo"
15
-
16
- PRETRAINED_DIRS = [
17
- "chinese-roberta-wwm-ext-large",
18
- "chinese-hubert-base",
19
- ]
20
-
21
-
22
- def main():
23
- for dirname in PRETRAINED_DIRS:
24
- model_dir = MOYOYO_PRETRAINED_PATH / dirname
25
- bin_path = model_dir / "pytorch_model.bin"
26
- st_path = model_dir / "model.safetensors"
27
-
28
- if st_path.exists():
29
- print(f"已存在,跳过: {st_path}")
30
- continue
31
- if not bin_path.exists():
32
- print(f"找不到权重文件: {bin_path}")
33
- continue
34
-
35
- state_dict = torch.load(bin_path, map_location="cpu", weights_only=True)
36
- # clone 断开共享内存,safetensors 不允许张量间共享存储
37
- state_dict = {
38
- key: value.clone().contiguous()
39
- for key, value in state_dict.items()
40
- if isinstance(value, torch.Tensor)
41
- }
42
- save_file(state_dict, st_path, metadata={"format": "pt"})
43
- print(f"{dirname}: {len(state_dict)} tensors -> {st_path.stat().st_size // 1024 ** 2} MB")
44
-
45
-
46
- if __name__ == "__main__":
47
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/voice_dialogue/api/app.py CHANGED
@@ -59,8 +59,7 @@ def _register_routes(app: FastAPI):
59
  v1_router.include_router(settings_routes.router, prefix="/settings", tags=["设置管理"])
60
  app.include_router(v1_router)
61
 
62
- # starlette >= 1.0 移除了 add_websocket_routews 路由器自带完整路径,直接 include
63
- app.include_router(websocket_routes.ws)
64
 
65
  # 根路径和健康检查
66
  _register_health_routes(app)
 
59
  v1_router.include_router(settings_routes.router, prefix="/settings", tags=["设置管理"])
60
  app.include_router(v1_router)
61
 
62
+ app.add_websocket_route("/api/v1/ws", websocket_routes.ws)
 
63
 
64
  # 根路径和健康检查
65
  _register_health_routes(app)
src/voice_dialogue/api/core/lifespan.py CHANGED
@@ -24,8 +24,8 @@ class LifespanManager:
24
  startup_start_time = time.time()
25
 
26
  try:
27
- # 初始化系统语言:产品默认使用中文(不随操作系统语言变化)
28
- system_language = 'zh'
29
  logger.info(f"系统默认语言: {system_language}")
30
 
31
  # 初始化TTS配置
 
24
  startup_start_time = time.time()
25
 
26
  try:
27
+ # 初始化系统语言
28
+ system_language = get_system_language()
29
  logger.info(f"系统默认语言: {system_language}")
30
 
31
  # 初始化TTS配置
src/voice_dialogue/api/core/service_factories.py CHANGED
@@ -12,15 +12,11 @@ class ServiceFactories:
12
  """服务工厂类,封装所有服务的创建逻辑"""
13
 
14
  @staticmethod
15
- def create_audio_capture(
16
- enable_echo_cancellation: bool = True,
17
- input_device_index: int = None,
18
- ) -> AudioCapture:
19
  """创建音频捕获服务"""
20
  return AudioCapture(
21
  audio_frames_queue=audio_frames_queue,
22
- enable_echo_cancellation=enable_echo_cancellation,
23
- input_device_index=input_device_index,
24
  )
25
 
26
  @staticmethod
@@ -134,14 +130,11 @@ def get_core_voice_service_definitions(system_language: str, tts_config: BaseTTS
134
  ]
135
 
136
 
137
- def get_audio_capture_service_definition(
138
- enable_echo_cancellation: bool = True,
139
- input_device_index: int = None,
140
- ) -> ServiceDefinition:
141
  """获取音频捕获服务定义"""
142
  return ServiceDefinition(
143
  name="audio_capture",
144
- factory=lambda: ServiceFactories.create_audio_capture(enable_echo_cancellation, input_device_index),
145
  dependencies=[],
146
  health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
147
  )
 
12
  """服务工厂类,封装所有服务的创建逻辑"""
13
 
14
  @staticmethod
15
+ def create_audio_capture(enable_echo_cancellation: bool = True) -> AudioCapture:
 
 
 
16
  """创建音频捕获服务"""
17
  return AudioCapture(
18
  audio_frames_queue=audio_frames_queue,
19
+ enable_echo_cancellation=enable_echo_cancellation
 
20
  )
21
 
22
  @staticmethod
 
130
  ]
131
 
132
 
133
+ def get_audio_capture_service_definition(enable_echo_cancellation: bool = True) -> ServiceDefinition:
 
 
 
134
  """获取音频捕获服务定义"""
135
  return ServiceDefinition(
136
  name="audio_capture",
137
+ factory=lambda: ServiceFactories.create_audio_capture(enable_echo_cancellation),
138
  dependencies=[],
139
  health_check=lambda service: hasattr(service, 'is_ready') and service.is_ready
140
  )
src/voice_dialogue/api/routes/system_routes.py CHANGED
@@ -3,33 +3,15 @@ import time
3
 
4
  from fastapi import APIRouter, HTTPException, BackgroundTasks, Request
5
 
6
- from voice_dialogue.audio.capture import resolves_to_native_aec
7
- from voice_dialogue.audio.devices import (
8
- list_input_devices, get_default_input_device_index, is_valid_input_device,
9
- list_output_devices, get_default_output_device_index, is_valid_output_device,
10
- )
11
- from voice_dialogue.config.audio_config import (
12
- get_input_device_index, save_input_device_index,
13
- get_output_device_index, save_output_device_index,
14
- )
15
  from voice_dialogue.core.constants import session_manager
16
  from voice_dialogue.utils.logger import logger
17
  from ..core.service_factories import get_audio_capture_service_definition, get_speech_monitor_service_definition
18
  from ..schemas.system_schemas import (
19
- SystemStatusResponse, SystemResponse, SystemStartRequest,
20
- AudioInputDevicesResponse, AudioInputDevice, AudioOutputDevice, ASREngineResponse,
21
- OutputDeviceRequest
22
  )
23
 
24
  router = APIRouter()
25
 
26
- # ASR 引擎注册名 -> 展示名称
27
- ASR_ENGINE_DISPLAY_NAMES = {
28
- 'qwen': 'Qwen3-ASR-1.7B',
29
- 'funasr': 'FunASR Paraformer',
30
- 'whisper': 'Whisper medium',
31
- }
32
-
33
  # 全局系统状态
34
  _system_status = {
35
  "status": "stopped",
@@ -78,59 +60,6 @@ async def get_system_status(request: Request):
78
  raise HTTPException(status_code=500, detail=f"获取系统状态失败: {str(e)}")
79
 
80
 
81
- @router.get("/audio-devices", response_model=AudioInputDevicesResponse, summary="获取可用音频输入设备")
82
- async def get_audio_devices():
83
- """
84
- 列出系统所有可用的音频输入设备(含外置麦克风/麦克风阵列),
85
- 供前端选择采集设备。
86
- """
87
- try:
88
- devices = [AudioInputDevice(**d) for d in list_input_devices()]
89
- output_devices = [AudioOutputDevice(**d) for d in list_output_devices()]
90
- return AudioInputDevicesResponse(
91
- devices=devices,
92
- current_device_index=get_input_device_index(),
93
- default_device_index=get_default_input_device_index(),
94
- output_devices=output_devices,
95
- current_output_device_index=get_output_device_index(),
96
- default_output_device_index=get_default_output_device_index(),
97
- )
98
- except Exception as e:
99
- logger.error(f"获取音频输入设备失败: {e}", exc_info=True)
100
- raise HTTPException(status_code=500, detail=f"获取音频输入设备失败: {str(e)}")
101
-
102
-
103
- @router.post("/audio-output-device", response_model=SystemResponse, summary="设置音频输出设备")
104
- async def set_audio_output_device(request: OutputDeviceRequest):
105
- """
106
- 保存输出设备选择。播放服务在每次播放时读取该设置,
107
- 会话进行中修改也会在下一句生效,无需重启。
108
- """
109
- output_device_index = request.output_device_index
110
- if not is_valid_output_device(output_device_index):
111
- raise HTTPException(status_code=400, detail=f"无效的输出设备索引: {output_device_index}")
112
- if not save_output_device_index(output_device_index):
113
- raise HTTPException(status_code=500, detail="保存输出设备设置失败")
114
- return SystemResponse(success=True, message="输出设备已更新")
115
-
116
-
117
- @router.get("/asr-engine", response_model=ASREngineResponse, summary="获取当前 ASR 引擎")
118
- async def get_asr_engine():
119
- """
120
- 返回当前生效的 ASR 引擎(语言映射 + 展示名称),
121
- 供前端在首页和关于页显示实际使用的识别模型。
122
- """
123
- try:
124
- from voice_dialogue.asr import asr_manager
125
- mappings = asr_manager.get_asr_statistics()['language_mappings']
126
- engines = sorted(set(mappings.values()))
127
- display_name = ' + '.join(ASR_ENGINE_DISPLAY_NAMES.get(engine, engine) for engine in engines)
128
- return ASREngineResponse(mappings=mappings, display_name=display_name)
129
- except Exception as e:
130
- logger.error(f"获取ASR引擎信息失败: {e}", exc_info=True)
131
- raise HTTPException(status_code=500, detail=f"获取ASR引擎信息失败: {str(e)}")
132
-
133
-
134
  @router.post("/start", response_model=SystemResponse, summary="启动系统")
135
  async def start_system(
136
  request: SystemStartRequest,
@@ -147,30 +76,6 @@ async def start_system(
147
  message="系统已经在运行中或正在启动"
148
  )
149
 
150
- # 解析输入设备:请求未指定时回退到已保存的设备
151
- input_device_index = request.input_device_index
152
- if input_device_index is None:
153
- input_device_index = get_input_device_index()
154
-
155
- if not is_valid_input_device(input_device_index):
156
- logger.warning(f"请求的输入设备 {input_device_index} 无效,回退到系统默认设备")
157
- input_device_index = None
158
-
159
- # 持久化用户选择,供下次启动复用
160
- save_input_device_index(input_device_index)
161
-
162
- # 解析输出设备:请求未指定时回退到已保存的设备
163
- output_device_index = request.output_device_index
164
- if output_device_index is None:
165
- output_device_index = get_output_device_index()
166
-
167
- if not is_valid_output_device(output_device_index):
168
- logger.warning(f"请求的输出设备 {output_device_index} 无效,回退到系统默认设备")
169
- output_device_index = None
170
-
171
- # 播放服务在每次播放时读取该设置,保存即生效
172
- save_output_device_index(output_device_index)
173
-
174
  # 更新状态
175
  _system_status["status"] = "starting"
176
  session_manager.reset_id()
@@ -179,8 +84,7 @@ async def start_system(
179
  background_tasks.add_task(
180
  _start_system_background,
181
  fastapi_request,
182
- request.enable_echo_cancellation,
183
- input_device_index,
184
  )
185
 
186
  return SystemResponse(
@@ -310,11 +214,7 @@ async def restart_system(
310
  raise HTTPException(status_code=500, detail=f"系统重启失败: {str(e)}")
311
 
312
 
313
- async def _start_system_background(
314
- request: Request,
315
- enable_echo_cancellation: bool = True,
316
- input_device_index: int = None,
317
- ):
318
  """
319
  后台启动系统的实际逻辑 - 创建并启动audio_capture服务
320
  """
@@ -357,9 +257,7 @@ async def _start_system_background(
357
  logger.info("语音监控服务已在运行")
358
  else:
359
  # 创建语音监控服务定义
360
- # 仅当走 macOS 原生 AEC(自带 VAD)时关闭软件 VAD;
361
- # 选择了外置设备走 PyAudio 时,必须启用软件 VAD。
362
- enable_vad = not resolves_to_native_aec(enable_echo_cancellation, input_device_index)
363
  speech_monitor_def = get_speech_monitor_service_definition(enable_vad)
364
 
365
  # 启动语音监控服务
@@ -373,7 +271,7 @@ async def _start_system_background(
373
  logger.info("音频捕获服务已在运行")
374
  else:
375
  # 创建audio_capture服务定义
376
- audio_capture_def = get_audio_capture_service_definition(enable_echo_cancellation, input_device_index)
377
 
378
  # 启动audio_capture服务
379
  success = service_manager.start_service(audio_capture_def)
 
3
 
4
  from fastapi import APIRouter, HTTPException, BackgroundTasks, Request
5
 
 
 
 
 
 
 
 
 
 
6
  from voice_dialogue.core.constants import session_manager
7
  from voice_dialogue.utils.logger import logger
8
  from ..core.service_factories import get_audio_capture_service_definition, get_speech_monitor_service_definition
9
  from ..schemas.system_schemas import (
10
+ SystemStatusResponse, SystemResponse, SystemStartRequest
 
 
11
  )
12
 
13
  router = APIRouter()
14
 
 
 
 
 
 
 
 
15
  # 全局系统状态
16
  _system_status = {
17
  "status": "stopped",
 
60
  raise HTTPException(status_code=500, detail=f"获取系统状态失败: {str(e)}")
61
 
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  @router.post("/start", response_model=SystemResponse, summary="启动系统")
64
  async def start_system(
65
  request: SystemStartRequest,
 
76
  message="系统已经在运行中或正在启动"
77
  )
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  # 更新状态
80
  _system_status["status"] = "starting"
81
  session_manager.reset_id()
 
84
  background_tasks.add_task(
85
  _start_system_background,
86
  fastapi_request,
87
+ request.enable_echo_cancellation
 
88
  )
89
 
90
  return SystemResponse(
 
214
  raise HTTPException(status_code=500, detail=f"系统重启失败: {str(e)}")
215
 
216
 
217
+ async def _start_system_background(request: Request, enable_echo_cancellation: bool = True):
 
 
 
 
218
  """
219
  后台启动系统的实际逻辑 - 创建并启动audio_capture服务
220
  """
 
257
  logger.info("语音监控服务已在运行")
258
  else:
259
  # 创建语音监控服务定义
260
+ enable_vad = not enable_echo_cancellation
 
 
261
  speech_monitor_def = get_speech_monitor_service_definition(enable_vad)
262
 
263
  # 启动语音监控服务
 
271
  logger.info("音频捕获服务已在运行")
272
  else:
273
  # 创建audio_capture服务定义
274
+ audio_capture_def = get_audio_capture_service_definition(enable_echo_cancellation)
275
 
276
  # 启动audio_capture服务
277
  success = service_manager.start_service(audio_capture_def)
src/voice_dialogue/api/schemas/system_schemas.py CHANGED
@@ -1,4 +1,4 @@
1
- from typing import Optional, Literal, Dict, Any, List
2
 
3
  from pydantic import BaseModel, Field
4
 
@@ -17,51 +17,10 @@ class SystemStatusResponse(BaseModel):
17
 
18
  class SystemStartRequest(BaseModel):
19
  """系统启动请求"""
20
- enable_echo_cancellation: bool = Field(default=True, description="是否启用回声消除(仅在未指定输入设备时使用 macOS 原生 AEC)")
21
- input_device_index: Optional[int] = Field(default=None, description="输入设备索引(如外置麦克风阵列);为空则使用系统默认设备")
22
- output_device_index: Optional[int] = Field(default=None, description="输出设备索引(如外置扬声器);为空则使用系统默认设备")
23
-
24
-
25
- class AudioInputDevice(BaseModel):
26
- """音频输入设备信息"""
27
- index: int = Field(..., description="设备索引")
28
- name: str = Field(..., description="设备名称")
29
- max_input_channels: int = Field(..., description="最大输入通道数")
30
- default_sample_rate: int = Field(..., description="设备默认采样率")
31
- is_default: bool = Field(default=False, description="是否为系统默认输入设备")
32
-
33
-
34
- class AudioOutputDevice(BaseModel):
35
- """音频输出设备信息"""
36
- index: int = Field(..., description="设备索引")
37
- name: str = Field(..., description="设备名称")
38
- max_output_channels: int = Field(..., description="最大输出通道数")
39
- default_sample_rate: int = Field(..., description="设备默认采样率")
40
- is_default: bool = Field(default=False, description="是否为系统默认输出设备")
41
-
42
-
43
- class AudioInputDevicesResponse(BaseModel):
44
- """音频设备列表响应(含输入与输出设备)"""
45
- devices: List[AudioInputDevice] = Field(default_factory=list, description="可用输入设备列表")
46
- current_device_index: Optional[int] = Field(default=None, description="当前已选择/保存的输入设备索引")
47
- default_device_index: Optional[int] = Field(default=None, description="系统默认输入设备索引")
48
- output_devices: List[AudioOutputDevice] = Field(default_factory=list, description="可用输出设备列表")
49
- current_output_device_index: Optional[int] = Field(default=None, description="当前已选择/保存的输出设备索引")
50
- default_output_device_index: Optional[int] = Field(default=None, description="系统默认输出设备索引")
51
 
52
 
53
  class SystemResponse(BaseModel):
54
  """系统操作响应"""
55
  success: bool = Field(..., description="操作是否成功")
56
  message: str = Field(..., description="响应消息")
57
-
58
-
59
- class OutputDeviceRequest(BaseModel):
60
- """设置输出设备请求"""
61
- output_device_index: Optional[int] = Field(default=None, description="输出设备索引;为空则使用系统默认设备")
62
-
63
-
64
- class ASREngineResponse(BaseModel):
65
- """当前 ASR 引擎信息"""
66
- mappings: Dict[str, str] = Field(default_factory=dict, description="语言到 ASR 引擎的映射,如 {'zh': 'qwen'}")
67
- display_name: str = Field(..., description="当前 ASR 引擎的展示名称")
 
1
+ from typing import Optional, Literal, Dict, Any
2
 
3
  from pydantic import BaseModel, Field
4
 
 
17
 
18
  class SystemStartRequest(BaseModel):
19
  """系统启动请求"""
20
+ enable_echo_cancellation: bool = Field(default=True, description="是否启用回声消除")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
 
23
  class SystemResponse(BaseModel):
24
  """系统操作响应"""
25
  success: bool = Field(..., description="操作是否成功")
26
  message: str = Field(..., description="响应消息")
 
 
 
 
 
 
 
 
 
 
 
src/voice_dialogue/asr/manager.py CHANGED
@@ -1,6 +1,5 @@
1
  import importlib.util
2
  import inspect
3
- import os
4
  import re
5
  from dataclasses import dataclass
6
  from typing import Dict, Type, List, Literal, Optional
@@ -93,26 +92,11 @@ class ASRManager:
93
 
94
  def __init__(self):
95
  self._asr_instances: Dict[str, ASRInterface] = {}
96
- # 默认使用 Qwen3-ASR;设置 VOICE_DIALOGUE_ASR=legacy 可切回原引擎做 A/B 对比
97
- if os.environ.get('VOICE_DIALOGUE_ASR', 'qwen') == 'legacy':
98
- self._language_to_asr_mapping = {
99
- 'zh': 'funasr', # 中文优先使用FunASR
100
- 'en': 'whisper', # 英文优先使用Whisper
101
- }
102
- else:
103
- self._language_to_asr_mapping = {
104
- 'zh': 'qwen',
105
- 'en': 'qwen',
106
- }
107
-
108
- def _resolve_unregistered(self, language: str, asr_type: str) -> str:
109
- """所选引擎未注册时(如 qwen-asr 未安装)回退到传统引擎。"""
110
- fallback = {'zh': 'funasr', 'en': 'whisper'}.get(language)
111
- if fallback and fallback in asr_tables.asr_classes:
112
- logger.warning(f"ASR引擎 '{asr_type}' 未注册,回退到 '{fallback}'")
113
- self._language_to_asr_mapping[language] = fallback
114
- return fallback
115
- return asr_type
116
 
117
  def create_asr(self, language: Literal['auto', 'zh', 'en']) -> ASRInterface:
118
  """
@@ -131,9 +115,6 @@ class ASRManager:
131
  # 根据语言选择合适的ASR引擎
132
  asr_type = self._get_asr_type_for_language(language)
133
 
134
- if asr_type not in asr_tables.asr_classes:
135
- asr_type = self._resolve_unregistered(language, asr_type)
136
-
137
  if asr_type not in asr_tables.asr_classes:
138
  raise ValueError(f"ASR类型 '{asr_type}' 未注册")
139
 
 
1
  import importlib.util
2
  import inspect
 
3
  import re
4
  from dataclasses import dataclass
5
  from typing import Dict, Type, List, Literal, Optional
 
92
 
93
  def __init__(self):
94
  self._asr_instances: Dict[str, ASRInterface] = {}
95
+ self._language_to_asr_mapping = {
96
+ 'zh': 'funasr', # 中文优先使用FunASR
97
+ 'en': 'whisper', # 英文优先使用Whisper
98
+ # 'auto': 'whisper', # 自动检测默认使用Whisper
99
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
  def create_asr(self, language: Literal['auto', 'zh', 'en']) -> ASRInterface:
102
  """
 
115
  # 根据语言选择合适的ASR引擎
116
  asr_type = self._get_asr_type_for_language(language)
117
 
 
 
 
118
  if asr_type not in asr_tables.asr_classes:
119
  raise ValueError(f"ASR类型 '{asr_type}' 未注册")
120
 
src/voice_dialogue/asr/models/__init__.py CHANGED
@@ -19,12 +19,3 @@ except ImportError as e:
19
  from voice_dialogue.utils.logger import logger
20
 
21
  logger.warning(f"Failed to import some Whisper implementations: {e}")
22
-
23
- try:
24
- from .qwen import QwenASRClient
25
-
26
- __all__.append('QwenASRClient')
27
- except ImportError as e:
28
- from voice_dialogue.utils.logger import logger
29
-
30
- logger.warning(f"Failed to import some Qwen ASR implementations: {e}")
 
19
  from voice_dialogue.utils.logger import logger
20
 
21
  logger.warning(f"Failed to import some Whisper implementations: {e}")
 
 
 
 
 
 
 
 
 
src/voice_dialogue/asr/models/qwen.py DELETED
@@ -1,76 +0,0 @@
1
- import os
2
- import typing
3
-
4
- import numpy as np
5
- import torch
6
- from qwen_asr import Qwen3ASRModel
7
-
8
- from voice_dialogue.asr.manager import asr_tables
9
- from voice_dialogue.asr.models.base import ASRInterface
10
- from voice_dialogue.asr.utils import ensure_minimum_audio_duration
11
- from voice_dialogue.config import paths
12
- from voice_dialogue.utils.logger import logger
13
-
14
- # 内置模型目录(打包分发时随应用携带,存在则离线加载)
15
- BUILTIN_QWEN_ASR_MODEL_PATH = paths.ASR_MODELS_PATH / 'qwen3-asr-1.7b'
16
-
17
- TARGET_SAMPLE_RATE = 16000
18
-
19
-
20
- def resolve_model_path() -> str:
21
- """模型来源优先级:环境变量 > 内置目录 > HuggingFace 自动下载。"""
22
- env_model = os.environ.get('QWEN_ASR_MODEL')
23
- if env_model:
24
- return env_model
25
- if (BUILTIN_QWEN_ASR_MODEL_PATH / 'config.json').exists():
26
- return BUILTIN_QWEN_ASR_MODEL_PATH.as_posix()
27
- return 'Qwen/Qwen3-ASR-1.7B'
28
-
29
-
30
- @asr_tables.register('asr_classes', 'qwen')
31
- class QwenASRClient(ASRInterface):
32
- """Qwen3-ASR 客户端(transformers 后端,macOS 上使用 MPS 加速)"""
33
- supported_langs = ['zh', 'en']
34
-
35
- def __init__(self):
36
- super().__init__()
37
- self.model: typing.Optional[Qwen3ASRModel] = None
38
-
39
- def setup(self, **kwargs) -> None:
40
- model_name = kwargs.get('model') or resolve_model_path()
41
-
42
- if torch.backends.mps.is_available():
43
- device_map, dtype = 'mps', torch.bfloat16
44
- elif torch.cuda.is_available():
45
- device_map, dtype = 'cuda:0', torch.bfloat16
46
- else:
47
- device_map, dtype = 'cpu', torch.float32
48
-
49
- logger.info(f'[INFO] Loading Qwen3-ASR model: {model_name} (device={device_map}, dtype={dtype})')
50
- self.model = Qwen3ASRModel.from_pretrained(
51
- model_name,
52
- dtype=dtype,
53
- device_map=device_map,
54
- max_inference_batch_size=1,
55
- max_new_tokens=256,
56
- )
57
-
58
- def warmup(self) -> None:
59
- logger.info('[INFO] Warming up Qwen3-ASR model...')
60
- try:
61
- self.transcribe(self.warmup_audiodata)
62
- logger.info('[INFO] Qwen3-ASR model warmed up.')
63
- except Exception as e:
64
- logger.warning(f'[WARNING] Qwen3-ASR model warmup failed: {e}')
65
-
66
- def transcribe(self, audio_array: np.ndarray, language: str = None) -> str:
67
- audio_array = ensure_minimum_audio_duration(audio_array)
68
-
69
- # 始终使用自动语种检测:指定语言会强制模型"只输出转写文本",
70
- # 静音/噪声段会被迫编出幻听文字;自动模式下非语音段返回空串,
71
- # 由上游丢弃,从根上消除幻听。
72
- results = self.model.transcribe(
73
- audio=(audio_array, TARGET_SAMPLE_RATE),
74
- language=None,
75
- )
76
- return ' '.join(result.text for result in results).strip()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/voice_dialogue/audio/capture/__init__.py CHANGED
@@ -4,43 +4,12 @@
4
  根据配置选择并管理具体的音频捕获策略。
5
  """
6
  from multiprocessing import Queue
7
- from typing import Optional
8
 
9
  from voice_dialogue.utils.logger import logger
10
  from .aec_capture import AecCapture
11
  from .pyaudio_capture import PyAudioCapture
12
 
13
 
14
- def resolves_to_native_aec(
15
- enable_echo_cancellation: bool,
16
- input_device_index: Optional[int] = None,
17
- ) -> bool:
18
- """
19
- 判断在给定配置下是否会使用 macOS 原生 AEC 采集策略。
20
-
21
- 原生 AEC 库作用于系统默认输入设备,且自带 VAD。因此当满足以下任一情况时使用原生 AEC:
22
- - 启用回声消除且未指定具体输入设备(隐式使用默认设备);
23
- - 启用回声消除且所选设备恰好就是系统默认输入设备
24
- (原生 AEC 本就采集默认设备,等价覆盖)。
25
-
26
- 只有当选择了"非默认"输入设备(如外置麦克风阵列)时,才退化为 PyAudio 策略——
27
- 此时回声消除依赖设备自身硬件,语音活动检测改用软件 VAD。
28
-
29
- 上层据此决定 SpeechStateMonitor 是否需要启用软件 VAD
30
- (enable_vad = not resolves_to_native_aec(...))。
31
- """
32
- if not enable_echo_cancellation:
33
- return False
34
- if input_device_index is None:
35
- return True
36
- # 所选设备即系统默认设备时,仍可使用原生 AEC
37
- try:
38
- from voice_dialogue.audio.devices import get_default_input_device_index
39
- return input_device_index == get_default_input_device_index()
40
- except Exception:
41
- return False
42
-
43
-
44
  class AudioCapture:
45
  """
46
  音频捕获器门面 (Facade)。
@@ -54,44 +23,29 @@ class AudioCapture:
54
  self,
55
  audio_frames_queue: Queue,
56
  enable_echo_cancellation: bool = True,
57
- input_device_index: Optional[int] = None,
58
- channels: Optional[int] = None,
59
  ):
60
  """
61
  初始化音频捕获器。
62
 
63
  Args:
64
  audio_frames_queue (Queue): 用于存放捕获的音频帧的队列。
65
- enable_echo_cancellation (bool): 是否启用回声消除功能。仅在未指定
66
- input_device_index 时生效(使用 macOS
67
- 原生 AEC 库于系统默认输入设备)
68
- input_device_index (Optional[int]): 指定的输入设备索引(如外置麦克风阵列)。
69
- 一旦指定,则使用 PyAudio 策略采集该设备,
70
- 回声消除依赖设备硬件。
71
- channels (Optional[int]): 采集通道数(仅 PyAudio 策略生效,多通道会降混为单声道)。
72
  """
73
- use_native_aec = resolves_to_native_aec(enable_echo_cancellation, input_device_index)
74
  self._strategy = None
75
  try:
76
- if use_native_aec:
77
  self._strategy = AecCapture(audio_frames_queue=audio_frames_queue)
78
  else:
79
- self._strategy = PyAudioCapture(
80
- audio_frames_queue=audio_frames_queue,
81
- input_device_index=input_device_index,
82
- channels=channels,
83
- )
84
  logger.info(f"音频捕获策略已选择: {self._strategy.__class__.__name__}")
85
  except Exception as e:
86
  logger.error(
87
- f"初始化 {AecCapture.__name__ if use_native_aec else PyAudioCapture.__name__} 失败: {e}, 将回退到 PyAudio。")
88
  # 只有在尝试 AEC 失败时才回退
89
  if not isinstance(self._strategy, PyAudioCapture):
90
- self._strategy = PyAudioCapture(
91
- audio_frames_queue=audio_frames_queue,
92
- input_device_index=input_device_index,
93
- channels=channels,
94
- )
95
  logger.info(f"已回退到音频捕获策略: {self._strategy.__class__.__name__}")
96
 
97
  def start(self):
 
4
  根据配置选择并管理具体的音频捕获策略。
5
  """
6
  from multiprocessing import Queue
 
7
 
8
  from voice_dialogue.utils.logger import logger
9
  from .aec_capture import AecCapture
10
  from .pyaudio_capture import PyAudioCapture
11
 
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  class AudioCapture:
14
  """
15
  音频捕获器门面 (Facade)。
 
23
  self,
24
  audio_frames_queue: Queue,
25
  enable_echo_cancellation: bool = True,
 
 
26
  ):
27
  """
28
  初始化音频捕获器。
29
 
30
  Args:
31
  audio_frames_queue (Queue): 用于存放捕获的音频帧的队列。
32
+ enable_echo_cancellation (bool): 是否启用回声消除功能。
33
+ 若为 True,则使用 AEC 原生库;
34
+ 否则使 PyAudio
 
 
 
 
35
  """
 
36
  self._strategy = None
37
  try:
38
+ if enable_echo_cancellation:
39
  self._strategy = AecCapture(audio_frames_queue=audio_frames_queue)
40
  else:
41
+ self._strategy = PyAudioCapture(audio_frames_queue=audio_frames_queue)
 
 
 
 
42
  logger.info(f"音频捕获策略已选择: {self._strategy.__class__.__name__}")
43
  except Exception as e:
44
  logger.error(
45
+ f"初始化 {AecCapture.__name__ if enable_echo_cancellation else PyAudioCapture.__name__} 失败: {e}, 将回退到 PyAudio。")
46
  # 只有在尝试 AEC 失败时才回退
47
  if not isinstance(self._strategy, PyAudioCapture):
48
+ self._strategy = PyAudioCapture(audio_frames_queue=audio_frames_queue)
 
 
 
 
49
  logger.info(f"已回退到音频捕获策略: {self._strategy.__class__.__name__}")
50
 
51
  def start(self):
src/voice_dialogue/audio/capture/pyaudio_capture.py CHANGED
@@ -1,130 +1,41 @@
1
  from multiprocessing import Queue
2
- from typing import Optional
3
 
4
- import numpy as np
5
  import pyaudio
6
 
7
  from voice_dialogue.utils.logger import logger
8
  from .base_capture import BaseCapture
9
 
10
- # 下游 ASR / VAD 统一要求 16kHz 单声道 int16 音频
11
- TARGET_SAMPLE_RATE = 16000
12
-
13
 
14
  class PyAudioCapture(BaseCapture):
15
  """
16
  使用 PyAudio 进行标准的音频采集策略。
17
-
18
- 支持选择指定的输入设备(如外置麦克风阵列),并自动将多通道、
19
- 非 16kHz 的输入降混并重采样为下游所需的 16kHz 单声道 int16 数据。
20
  """
21
 
22
- def __init__(
23
- self,
24
- audio_frames_queue: Queue,
25
- input_device_index: Optional[int] = None,
26
- channels: Optional[int] = None,
27
- **kwargs
28
- ):
29
- """
30
- Args:
31
- audio_frames_queue (Queue): 用于存放捕获的音频帧的队列。
32
- input_device_index (Optional[int]): 输入设备索引;None 表示使用系统默认设备。
33
- channels (Optional[int]): 采集通道数;None 表示自动使用设备支持的最大通道数
34
- (麦克风阵列通常为多通道,采集后会降混为单声道)。
35
- """
36
  super().__init__(audio_frames_queue=audio_frames_queue, **kwargs)
37
- self.input_device_index = input_device_index
38
- self.requested_channels = channels
39
-
40
- def _resolve_device_params(self, p: pyaudio.PyAudio):
41
- """根据所选设备解析采集通道数与采集采样率。"""
42
- # 默认参数(系统默认设备、单声道、16kHz)
43
- device_index = self.input_device_index
44
- channels = self.requested_channels or 1
45
- sample_rate = TARGET_SAMPLE_RATE
46
-
47
- try:
48
- if device_index is None:
49
- device_index = int(p.get_default_input_device_info().get("index"))
50
- info = p.get_device_info_by_index(device_index)
51
- max_channels = int(info.get("maxInputChannels", 1)) or 1
52
- # 未显式指定通道数时,采集设备的全部通道再降混(适配麦克风阵列)
53
- if self.requested_channels is None:
54
- channels = max_channels
55
- else:
56
- channels = min(self.requested_channels, max_channels)
57
-
58
- # 优先尝试 16kHz;若设备不支持则采用设备默认采样率,后续重采样
59
- device_rate = int(info.get("defaultSampleRate", TARGET_SAMPLE_RATE))
60
- if not p.is_format_supported(
61
- rate=TARGET_SAMPLE_RATE,
62
- input_device=device_index,
63
- input_channels=channels,
64
- input_format=pyaudio.paInt16,
65
- ):
66
- sample_rate = device_rate
67
- except Exception as e:
68
- logger.warning(f"解析输入设备参数失败,回退到默认设备/单声道/16kHz: {e}")
69
- device_index = self.input_device_index
70
- channels = 1
71
- sample_rate = TARGET_SAMPLE_RATE
72
-
73
- return device_index, channels, sample_rate
74
 
75
  def _init_pyaudio(self):
76
  """初始化 PyAudio 并返回实例和配置。"""
77
  p = pyaudio.PyAudio()
78
- device_index, channels, sample_rate = self._resolve_device_params(p)
79
- # 采集块大小按采集采样率取约 64ms,保证重采样后帧长足够 VAD 处理
80
- chunk = max(1024, int(sample_rate * 0.064))
81
- logger.info(
82
- f"PyAudio 采集配置: device_index={device_index}, channels={channels}, "
83
- f"sample_rate={sample_rate} -> {TARGET_SAMPLE_RATE}, chunk={chunk}"
84
- )
85
- return p, chunk, sample_rate, channels, device_index
86
 
87
- def _open_stream(self, p, chunk, sample_rate, channels, device_index):
88
  """打开 PyAudio 音频流。"""
89
  return p.open(
90
  format=pyaudio.paInt16,
91
- channels=channels,
92
  rate=sample_rate,
93
  input=True,
94
- input_device_index=device_index,
95
  frames_per_buffer=chunk,
96
  )
97
 
98
- def _to_mono_16k(self, data: bytes, channels: int, sample_rate: int) -> Optional[bytes]:
99
- """将原始多通道/任意采样率的 int16 数据降混并重采样为 16kHz 单声道 int16。"""
100
- samples = np.frombuffer(data, dtype=np.int16)
101
- if samples.size == 0:
102
- return None
103
-
104
- # 多通道降混为单声道(按通道求平均)
105
- if channels > 1:
106
- frame_count = samples.size // channels
107
- if frame_count == 0:
108
- return None
109
- samples = samples[:frame_count * channels].reshape(-1, channels)
110
- mono = samples.astype(np.float32).mean(axis=1)
111
- else:
112
- mono = samples.astype(np.float32)
113
-
114
- # 重采样到 16kHz
115
- if sample_rate != TARGET_SAMPLE_RATE:
116
- import soxr
117
- mono = soxr.resample(mono, sample_rate, TARGET_SAMPLE_RATE)
118
-
119
- return np.clip(mono, -32768, 32767).astype(np.int16).tobytes()
120
-
121
- def _capture_loop(self, stream, chunk, channels, sample_rate):
122
  """PyAudio 音频捕获的主循环。"""
123
  logger.info("使用 PyAudio 开始音频采集...")
124
  self.is_ready = True
125
 
126
- needs_processing = channels > 1 or sample_rate != TARGET_SAMPLE_RATE
127
-
128
  while not self.is_exited:
129
  data = stream.read(chunk, exception_on_overflow=False)
130
  if data is None:
@@ -133,11 +44,6 @@ class PyAudioCapture(BaseCapture):
133
  if self.is_paused:
134
  continue
135
 
136
- if needs_processing:
137
- data = self._to_mono_16k(data, channels, sample_rate)
138
- if data is None:
139
- continue
140
-
141
  self.audio_frames_queue.put(data)
142
 
143
  def _cleanup(self, stream, p):
@@ -151,11 +57,11 @@ class PyAudioCapture(BaseCapture):
151
  """
152
  线程主循环,执行 PyAudio 音频采集。
153
  """
154
- p, chunk, sample_rate, channels, device_index = self._init_pyaudio()
155
  stream = None
156
  try:
157
- stream = self._open_stream(p, chunk, sample_rate, channels, device_index)
158
- self._capture_loop(stream, chunk, channels, sample_rate)
159
  except Exception as e:
160
  logger.error(f'PyAudio 音频捕获器运行时发生错误: {e}')
161
  finally:
 
1
  from multiprocessing import Queue
 
2
 
 
3
  import pyaudio
4
 
5
  from voice_dialogue.utils.logger import logger
6
  from .base_capture import BaseCapture
7
 
 
 
 
8
 
9
  class PyAudioCapture(BaseCapture):
10
  """
11
  使用 PyAudio 进行标准的音频采集策略。
 
 
 
12
  """
13
 
14
+ def __init__(self, audio_frames_queue: Queue, **kwargs):
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  super().__init__(audio_frames_queue=audio_frames_queue, **kwargs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  def _init_pyaudio(self):
18
  """初始化 PyAudio 并返回实例和配置。"""
19
  p = pyaudio.PyAudio()
20
+ chunk = 1024
21
+ sample_rate = 16000
22
+ return p, chunk, sample_rate
 
 
 
 
 
23
 
24
+ def _open_stream(self, p, chunk, sample_rate):
25
  """打开 PyAudio 音频流。"""
26
  return p.open(
27
  format=pyaudio.paInt16,
28
+ channels=1,
29
  rate=sample_rate,
30
  input=True,
 
31
  frames_per_buffer=chunk,
32
  )
33
 
34
+ def _capture_loop(self, stream, chunk):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  """PyAudio 音频捕获的主循环。"""
36
  logger.info("使用 PyAudio 开始音频采集...")
37
  self.is_ready = True
38
 
 
 
39
  while not self.is_exited:
40
  data = stream.read(chunk, exception_on_overflow=False)
41
  if data is None:
 
44
  if self.is_paused:
45
  continue
46
 
 
 
 
 
 
47
  self.audio_frames_queue.put(data)
48
 
49
  def _cleanup(self, stream, p):
 
57
  """
58
  线程主循环,执行 PyAudio 音频采集。
59
  """
60
+ p, chunk, sample_rate = self._init_pyaudio()
61
  stream = None
62
  try:
63
+ stream = self._open_stream(p, chunk, sample_rate)
64
+ self._capture_loop(stream, chunk)
65
  except Exception as e:
66
  logger.error(f'PyAudio 音频捕获器运行时发生错误: {e}')
67
  finally:
src/voice_dialogue/audio/devices.py DELETED
@@ -1,167 +0,0 @@
1
- """
2
- 音频设备枚举工具。
3
-
4
- 提供列出系统可用输入/输出设备(包括外置麦克风阵列、外置扬声器)的能力,
5
- 供 CLI、API 以及前端进行设备选择。
6
- """
7
- from typing import List, Optional, TypedDict
8
-
9
- import pyaudio
10
-
11
- from voice_dialogue.utils.logger import logger
12
-
13
-
14
- class InputDeviceInfo(TypedDict):
15
- """输入设备信息。"""
16
- index: int
17
- name: str
18
- max_input_channels: int
19
- default_sample_rate: int
20
- is_default: bool
21
-
22
-
23
- class OutputDeviceInfo(TypedDict):
24
- """输出设备信息。"""
25
- index: int
26
- name: str
27
- max_output_channels: int
28
- default_sample_rate: int
29
- is_default: bool
30
-
31
-
32
- def _get_default_input_index(p: pyaudio.PyAudio) -> Optional[int]:
33
- """获取系统默认输入设备索引,失败时返回 None。"""
34
- try:
35
- return int(p.get_default_input_device_info().get("index"))
36
- except Exception:
37
- return None
38
-
39
-
40
- def list_input_devices() -> List[InputDeviceInfo]:
41
- """
42
- 列出所有可用的音频输入设备。
43
-
44
- Returns:
45
- List[InputDeviceInfo]: 输入设备列表(仅包含 maxInputChannels > 0 的设备)。
46
- """
47
- devices: List[InputDeviceInfo] = []
48
- p = pyaudio.PyAudio()
49
- try:
50
- default_index = _get_default_input_index(p)
51
- for i in range(p.get_device_count()):
52
- try:
53
- info = p.get_device_info_by_index(i)
54
- except Exception as e:
55
- logger.warning(f"读取音频设备 {i} 信息失败: {e}")
56
- continue
57
-
58
- max_input_channels = int(info.get("maxInputChannels", 0))
59
- if max_input_channels <= 0:
60
- continue
61
-
62
- devices.append(
63
- InputDeviceInfo(
64
- index=int(info.get("index", i)),
65
- name=str(info.get("name", f"device-{i}")),
66
- max_input_channels=max_input_channels,
67
- default_sample_rate=int(info.get("defaultSampleRate", 16000)),
68
- is_default=(int(info.get("index", i)) == default_index),
69
- )
70
- )
71
- finally:
72
- p.terminate()
73
-
74
- return devices
75
-
76
-
77
- def get_default_input_device_index() -> Optional[int]:
78
- """获取系统默认输入设备索引。"""
79
- p = pyaudio.PyAudio()
80
- try:
81
- return _get_default_input_index(p)
82
- finally:
83
- p.terminate()
84
-
85
-
86
- def is_valid_input_device(index: Optional[int]) -> bool:
87
- """
88
- 校验给定索引是否为有效的输入设备。
89
-
90
- Args:
91
- index: 设备索引;None 表示使用系统默认设备,视为有效。
92
-
93
- Returns:
94
- bool: 是否有效。
95
- """
96
- if index is None:
97
- return True
98
- return any(d["index"] == index for d in list_input_devices())
99
-
100
-
101
- def _get_default_output_index(p: pyaudio.PyAudio) -> Optional[int]:
102
- """获取系统默认输出设备索引,失败时返回 None。"""
103
- try:
104
- return int(p.get_default_output_device_info().get("index"))
105
- except Exception:
106
- return None
107
-
108
-
109
- def list_output_devices() -> List[OutputDeviceInfo]:
110
- """
111
- 列出所有可用的音频输出设备。
112
-
113
- Returns:
114
- List[OutputDeviceInfo]: 输出设备列表(仅包含 maxOutputChannels > 0 的设备)。
115
- """
116
- devices: List[OutputDeviceInfo] = []
117
- p = pyaudio.PyAudio()
118
- try:
119
- default_index = _get_default_output_index(p)
120
- for i in range(p.get_device_count()):
121
- try:
122
- info = p.get_device_info_by_index(i)
123
- except Exception as e:
124
- logger.warning(f"读取音频设备 {i} 信息失败: {e}")
125
- continue
126
-
127
- max_output_channels = int(info.get("maxOutputChannels", 0))
128
- if max_output_channels <= 0:
129
- continue
130
-
131
- devices.append(
132
- OutputDeviceInfo(
133
- index=int(info.get("index", i)),
134
- name=str(info.get("name", f"device-{i}")),
135
- max_output_channels=max_output_channels,
136
- default_sample_rate=int(info.get("defaultSampleRate", 48000)),
137
- is_default=(int(info.get("index", i)) == default_index),
138
- )
139
- )
140
- finally:
141
- p.terminate()
142
-
143
- return devices
144
-
145
-
146
- def get_default_output_device_index() -> Optional[int]:
147
- """获取系统默认输出设备索引。"""
148
- p = pyaudio.PyAudio()
149
- try:
150
- return _get_default_output_index(p)
151
- finally:
152
- p.terminate()
153
-
154
-
155
- def is_valid_output_device(index: Optional[int]) -> bool:
156
- """
157
- 校验给定索引是否为有效的输出设备。
158
-
159
- Args:
160
- index: 设备索引;None 表示使用系统默认设备,视为有效。
161
-
162
- Returns:
163
- bool: 是否有效。
164
- """
165
- if index is None:
166
- return True
167
- return any(d["index"] == index for d in list_output_devices())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/voice_dialogue/audio/player.py CHANGED
@@ -1,78 +1,10 @@
1
  import tempfile
2
- from typing import Optional
3
 
4
- import numpy as np
5
  import soundfile as sf
6
  from playsound import playsound
7
 
8
- from voice_dialogue.utils.logger import logger
9
-
10
-
11
- def _to_int16(audio_data) -> np.ndarray:
12
- """将音频数据规整为一维 int16。"""
13
- audio = np.asarray(audio_data)
14
- if audio.ndim > 1:
15
- audio = audio.mean(axis=-1)
16
- if audio.dtype != np.int16:
17
- audio = np.clip(audio, -1.0, 1.0)
18
- audio = (audio * 32767.0).astype(np.int16)
19
- return audio
20
-
21
-
22
- def _play_via_pyaudio(audio_data, sample_rate: int, output_device_index: int):
23
- """通过 PyAudio 输出流播放,支持指定输出设备。"""
24
- import pyaudio
25
-
26
- audio = _to_int16(audio_data)
27
-
28
- p = pyaudio.PyAudio()
29
- try:
30
- # 设备不支持该采样率时,重采样到设备默认采样率
31
- try:
32
- p.is_format_supported(
33
- rate=sample_rate,
34
- output_device=output_device_index,
35
- output_channels=1,
36
- output_format=pyaudio.paInt16,
37
- )
38
- except Exception:
39
- device_rate = int(p.get_device_info_by_index(output_device_index).get("defaultSampleRate", 48000))
40
- logger.info(f"输出设备不支持 {sample_rate}Hz,重采样到 {device_rate}Hz")
41
- import soxr
42
- audio = soxr.resample(audio, sample_rate, device_rate).astype(np.int16)
43
- sample_rate = device_rate
44
-
45
- stream = p.open(
46
- format=pyaudio.paInt16,
47
- channels=1,
48
- rate=sample_rate,
49
- output=True,
50
- output_device_index=output_device_index,
51
- )
52
- try:
53
- stream.write(audio.tobytes())
54
- finally:
55
- stream.stop_stream()
56
- stream.close()
57
- finally:
58
- p.terminate()
59
-
60
-
61
- def play_audio(audio_data, sample_rate=16000, output_device_index: Optional[int] = None):
62
- """播放音频。
63
-
64
- Args:
65
- audio_data: 音频数据
66
- sample_rate: 采样率
67
- output_device_index: 输出设备索引;None 表示系统默认设备
68
- """
69
- if output_device_index is not None:
70
- try:
71
- _play_via_pyaudio(audio_data, sample_rate, output_device_index)
72
- return
73
- except Exception as e:
74
- logger.warning(f"指定输出设备 {output_device_index} 播放失败,回退到系统默认设备: {e}")
75
 
 
76
  with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
77
  sf.write(soundfile, audio_data, samplerate=sample_rate, subtype='PCM_16', closefd=False)
78
  playsound(soundfile.name, block=True)
 
1
  import tempfile
 
2
 
 
3
  import soundfile as sf
4
  from playsound import playsound
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
+ def play_audio(audio_data, sample_rate=16000):
8
  with tempfile.NamedTemporaryFile('w+b', suffix='.wav') as soundfile:
9
  sf.write(soundfile, audio_data, samplerate=sample_rate, subtype='PCM_16', closefd=False)
10
  playsound(soundfile.name, block=True)
src/voice_dialogue/cli/args.py CHANGED
@@ -74,20 +74,6 @@ def create_argument_parser():
74
  default=False,
75
  help='禁用回声消除功能 (默认: 不禁用)'
76
  )
77
- cli_group.add_argument(
78
- '--input-device', '-i',
79
- type=int,
80
- default=None,
81
- metavar='INDEX',
82
- help='指定输入设备索引(如外置麦克风阵列)。多通道会自动降混为单声道;'
83
- '指定后回声消除依赖设备硬件。用 --list-audio-devices 查看可用索引。'
84
- )
85
- cli_group.add_argument(
86
- '--list-audio-devices',
87
- action='store_true',
88
- default=False,
89
- help='列出可用的音频输入设备及其索引后退出'
90
- )
91
 
92
  # API服务器模式参数
93
  api_group = parser.add_argument_group('API服务器模式参数')
 
74
  default=False,
75
  help='禁用回声消除功能 (默认: 不禁用)'
76
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  # API服务器模式参数
79
  api_group = parser.add_argument_group('API服务器模式参数')
src/voice_dialogue/config/audio_config.py DELETED
@@ -1,77 +0,0 @@
1
- """音频设备配置管理模块。
2
-
3
- 持久化用户选择的输入设备(如外置麦克风阵列),在重启后自动复用。
4
- """
5
- import json
6
- from typing import Optional, TypedDict
7
-
8
- from voice_dialogue.utils.logger import logger
9
- from .paths import AUDIO_SETTINGS_PATH
10
-
11
-
12
- class AudioSettings(TypedDict, total=False):
13
- """音频设置。"""
14
- input_device_index: Optional[int]
15
- output_device_index: Optional[int]
16
-
17
-
18
- _audio_settings_cache: Optional[AudioSettings] = None
19
-
20
-
21
- def get_audio_settings() -> AudioSettings:
22
- """加载用户音频设置(带内存缓存)。"""
23
- global _audio_settings_cache
24
- if _audio_settings_cache is not None:
25
- return _audio_settings_cache
26
-
27
- if not AUDIO_SETTINGS_PATH.exists():
28
- _audio_settings_cache = {}
29
- return _audio_settings_cache
30
-
31
- try:
32
- with open(AUDIO_SETTINGS_PATH, "r", encoding="utf-8") as f:
33
- _audio_settings_cache = json.load(f)
34
- except (json.JSONDecodeError, IOError) as e:
35
- logger.error(f"无法加载音频设置,使用空配置: {e}")
36
- _audio_settings_cache = {}
37
- return _audio_settings_cache
38
-
39
-
40
- def get_input_device_index() -> Optional[int]:
41
- """获取已保存的输入设备索引;未配置时返回 None(系统默认设备)。"""
42
- value = get_audio_settings().get("input_device_index")
43
- return int(value) if value is not None else None
44
-
45
-
46
- def _save_audio_setting(key: str, value: Optional[int]) -> bool:
47
- """保存单项音频设置并刷新缓存。"""
48
- global _audio_settings_cache
49
- settings = dict(get_audio_settings())
50
- settings[key] = value
51
- try:
52
- if not AUDIO_SETTINGS_PATH.parent.exists():
53
- AUDIO_SETTINGS_PATH.parent.mkdir(parents=True, exist_ok=True)
54
- with open(AUDIO_SETTINGS_PATH, "w", encoding="utf-8") as f:
55
- json.dump(settings, f, ensure_ascii=False, indent=4)
56
- _audio_settings_cache = settings # type: ignore[assignment]
57
- logger.info(f"音频设置已保存: {key}={value}")
58
- return True
59
- except IOError as e:
60
- logger.error(f"无法保存音频设置: {e}")
61
- return False
62
-
63
-
64
- def save_input_device_index(input_device_index: Optional[int]) -> bool:
65
- """保存用户选择的输入设备索引。"""
66
- return _save_audio_setting("input_device_index", input_device_index)
67
-
68
-
69
- def get_output_device_index() -> Optional[int]:
70
- """获取已保存的输出设备索引;未配置时返回 None(系统默认设备)。"""
71
- value = get_audio_settings().get("output_device_index")
72
- return int(value) if value is not None else None
73
-
74
-
75
- def save_output_device_index(output_device_index: Optional[int]) -> bool:
76
- """保存用户选择的输出设备索引。"""
77
- return _save_audio_setting("output_device_index", output_device_index)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/voice_dialogue/config/paths.py CHANGED
@@ -46,7 +46,6 @@ APP_DATA_PATH = get_app_data_path()
46
  if not APP_DATA_PATH.exists():
47
  APP_DATA_PATH.mkdir(parents=True, exist_ok=True)
48
  USER_PROMPTS_PATH = APP_DATA_PATH / "user_prompts.json"
49
- AUDIO_SETTINGS_PATH = APP_DATA_PATH / "audio_settings.json"
50
 
51
 
52
  def load_third_party():
 
46
  if not APP_DATA_PATH.exists():
47
  APP_DATA_PATH.mkdir(parents=True, exist_ok=True)
48
  USER_PROMPTS_PATH = APP_DATA_PATH / "user_prompts.json"
 
49
 
50
 
51
  def load_third_party():
src/voice_dialogue/core/launcher.py CHANGED
@@ -6,7 +6,7 @@
6
 
7
  import time
8
 
9
- from voice_dialogue.audio.capture import AudioCapture, resolves_to_native_aec
10
  from voice_dialogue.config.speaker_config import get_tts_config_by_speaker_name, get_available_speaker_names
11
  from voice_dialogue.core.constants import (
12
  audio_frames_queue,
@@ -23,7 +23,6 @@ def launch_system(
23
  user_language: str,
24
  speaker: str,
25
  disable_echo_cancellation: bool = False,
26
- input_device_index: int = None,
27
  ) -> None:
28
  """
29
  启动完整的语音对话系统
@@ -101,10 +100,7 @@ def launch_system(
101
  threads.append(audio_player)
102
 
103
  # 语音状态监测
104
- # 仅当走 macOS 原生 AEC(自带 VAD)时关闭软件 VAD;
105
- # 指定外置设备走 PyAudio 时,必须启用软件 VAD。
106
- enable_echo_cancellation = not disable_echo_cancellation
107
- enable_vad = not resolves_to_native_aec(enable_echo_cancellation, input_device_index)
108
  speech_monitor = SpeechStateMonitor(
109
  audio_frame_queue=audio_frames_queue,
110
  user_voice_queue=user_voice_queue,
@@ -115,10 +111,10 @@ def launch_system(
115
  threads.append(speech_monitor)
116
 
117
  # 音频采集
 
118
  audio_capture = AudioCapture(
119
  audio_frames_queue=audio_frames_queue,
120
- enable_echo_cancellation=enable_echo_cancellation,
121
- input_device_index=input_device_index,
122
  )
123
  audio_capture.daemon = True
124
  audio_capture.start()
 
6
 
7
  import time
8
 
9
+ from voice_dialogue.audio.capture import AudioCapture
10
  from voice_dialogue.config.speaker_config import get_tts_config_by_speaker_name, get_available_speaker_names
11
  from voice_dialogue.core.constants import (
12
  audio_frames_queue,
 
23
  user_language: str,
24
  speaker: str,
25
  disable_echo_cancellation: bool = False,
 
26
  ) -> None:
27
  """
28
  启动完整的语音对话系统
 
100
  threads.append(audio_player)
101
 
102
  # 语音状态监测
103
+ enable_vad = disable_echo_cancellation
 
 
 
104
  speech_monitor = SpeechStateMonitor(
105
  audio_frame_queue=audio_frames_queue,
106
  user_voice_queue=user_voice_queue,
 
111
  threads.append(speech_monitor)
112
 
113
  # 音频采集
114
+ enable_echo_cancellation = not disable_echo_cancellation
115
  audio_capture = AudioCapture(
116
  audio_frames_queue=audio_frames_queue,
117
+ enable_echo_cancellation=enable_echo_cancellation
 
118
  )
119
  audio_capture.daemon = True
120
  audio_capture.start()
src/voice_dialogue/services/asr_service.py CHANGED
@@ -42,7 +42,7 @@ class ASRService(BaseThread, PerformanceLogMixin):
42
  voice_task.whisper_start_time = time.time()
43
 
44
  user_voice: np.array = voice_task.user_voice
45
- transcribed_text = self.client.transcribe(user_voice, language=self.language)
46
  if not transcribed_text.strip():
47
  voice_state_manager.reset_task_id()
48
  continue
 
42
  voice_task.whisper_start_time = time.time()
43
 
44
  user_voice: np.array = voice_task.user_voice
45
+ transcribed_text = self.client.transcribe(user_voice)
46
  if not transcribed_text.strip():
47
  voice_state_manager.reset_task_id()
48
  continue
src/voice_dialogue/services/audio_player_service.py CHANGED
@@ -4,7 +4,6 @@ from queue import Empty
4
  from typing import Optional
5
 
6
  from voice_dialogue.audio.player import play_audio
7
- from voice_dialogue.config.audio_config import get_output_device_index
8
  from voice_dialogue.core.base import BaseThread
9
  from voice_dialogue.core.constants import voice_state_manager, silence_over_threshold_event
10
  from voice_dialogue.models.voice_task import VoiceTask, AnswerDisplayMessage
@@ -65,8 +64,7 @@ class AudioPlayerService(BaseThread, TaskStatusMixin, HistoryMixin, PerformanceL
65
 
66
  if not self.is_stopped:
67
  audio_data, sample_rate = voice_task.tts_generated_sentence_audio
68
- # 每次播放时读取保存的输出设备,设置变更后下一句即生效
69
- play_audio(audio_data, sample_rate, output_device_index=get_output_device_index())
70
 
71
  # 任务处理完毕,跳出内部循环
72
  break
 
4
  from typing import Optional
5
 
6
  from voice_dialogue.audio.player import play_audio
 
7
  from voice_dialogue.core.base import BaseThread
8
  from voice_dialogue.core.constants import voice_state_manager, silence_over_threshold_event
9
  from voice_dialogue.models.voice_task import VoiceTask, AnswerDisplayMessage
 
64
 
65
  if not self.is_stopped:
66
  audio_data, sample_rate = voice_task.tts_generated_sentence_audio
67
+ play_audio(audio_data, sample_rate)
 
68
 
69
  # 任务处理完毕,跳出内部循环
70
  break
src/voice_dialogue/tts/runtime/moyoyo.py CHANGED
@@ -34,9 +34,6 @@ class MoYoYoTTS(TTSInterface):
34
 
35
  def setup(self, **kwargs) -> None:
36
  """设置TTS模块"""
37
- from voice_dialogue.tts.weights_migration import ensure_safetensors_weights
38
- ensure_safetensors_weights()
39
-
40
  tts_config = TTS_Config(self.config.get_runtime_config())
41
  self.tts_module = TTSModule(tts_config)
42
  self.tts_module.setup_inference_params(
 
34
 
35
  def setup(self, **kwargs) -> None:
36
  """设置TTS模块"""
 
 
 
37
  tts_config = TTS_Config(self.config.get_runtime_config())
38
  self.tts_module = TTSModule(tts_config)
39
  self.tts_module.setup_inference_params(
src/voice_dialogue/tts/weights_migration.py DELETED
@@ -1,45 +0,0 @@
1
- """TTS 预训练权重 safetensors 迁移。
2
-
3
- transformers >= 4.56 的安全策略 (CVE-2025-32434) 拒绝在 torch < 2.6 上加载
4
- pytorch_model.bin。transformers 加载时优先使用 model.safetensors,因此首次
5
- 启动时把 .bin 转换一次即可,无需升级 torch。
6
- """
7
- from pathlib import Path
8
-
9
- from voice_dialogue.config import paths
10
- from voice_dialogue.utils.logger import logger
11
-
12
- PRETRAINED_DIRS = [
13
- "chinese-roberta-wwm-ext-large",
14
- "chinese-hubert-base",
15
- ]
16
-
17
-
18
- def ensure_safetensors_weights() -> None:
19
- """确保 MoYoYo TTS 的预训练权重存在 safetensors 版本,缺失时从 .bin 转换。"""
20
- moyoyo_path = Path(paths.TTS_MODELS_PATH) / "moyoyo"
21
-
22
- for dirname in PRETRAINED_DIRS:
23
- model_dir = moyoyo_path / dirname
24
- bin_path = model_dir / "pytorch_model.bin"
25
- st_path = model_dir / "model.safetensors"
26
-
27
- if st_path.exists() or not bin_path.exists():
28
- continue
29
-
30
- logger.info(f"[INFO] 首次启动:转换 {dirname} 权重为 safetensors...")
31
- try:
32
- import torch
33
- from safetensors.torch import save_file
34
-
35
- state_dict = torch.load(bin_path, map_location="cpu", weights_only=True)
36
- # clone 断开共享内存,safetensors 不允许张量间共享存储
37
- state_dict = {
38
- key: value.clone().contiguous()
39
- for key, value in state_dict.items()
40
- if hasattr(value, "clone")
41
- }
42
- save_file(state_dict, st_path, metadata={"format": "pt"})
43
- logger.info(f"[INFO] {dirname} 转换完成: {st_path.stat().st_size // 1024 ** 2} MB")
44
- except Exception as e:
45
- logger.error(f"[ERROR] 转换 {dirname} 权重失败: {e}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
uv.lock CHANGED
The diff for this file is too large to render. See raw diff