britto224 commited on
Commit
99ebcc3
·
verified ·
1 Parent(s): a8f1f77

Upload 29 files

Browse files
.cursor/rules/olv-core-rules.mdc ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ alwaysApply: true
3
+ ---
4
+
5
+
6
+ # Open-LLM-VTuber AI Coding Assistant: Context & Guidelines
7
+
8
+ `version: 2025.08.05-1`
9
+
10
+ ## 1. Core Project Context
11
+
12
+ - **Project:** Open-LLM-VTuber, a low-latency voice-based LLM interaction tool.
13
+ - **Language:** Python >= 3.10
14
+ - **Core Tech Stack:**
15
+ - **Backend:** FastAPI, Pydantic v2, Uvicorn, fully async
16
+ - **Real-time Communication:** WebSockets
17
+ - **Package Management:** `uv` (version ~= 0.8, as of 2025 August) (always use `uv run`, `uv sync`, `uv add`, `uv remove` to do stuff instead of `pip`)
18
+ - **Primary Goal:** Achieve end-to-end latency below 500ms (user speaks -> AI voice heard). Performance is critical.
19
+ - **Key Principles:**
20
+ - **Offline-Ready:** Core functionality MUST work without an internet connection.
21
+ - **Separation of Concerns:** Strict frontend-backend separation.
22
+ - **Clean code:** Clean, testable, maintainable code, follows best practices of python 3.10+ and does not write deprecated code.
23
+
24
+ Some key files and directories:
25
+
26
+ ```
27
+ doc/ # A deprecated directory
28
+ frontend/ # Compiled web frontend artifacts (from git submodule)
29
+ config_templates/
30
+ conf.default.yaml # Configuration template for English users
31
+ conf.ZH.default.yaml # Configuration template for Chinese users
32
+ src/open_llm_vtuber/ # Project source code
33
+ config_manager/
34
+ main.py # Pydantic models for configuration validation
35
+ run_server.py # Entrypoint to start the application
36
+ conf.yaml # User's configuration file, generated from a template
37
+ ```
38
+
39
+ ### 1.1. Repository Structure
40
+
41
+ - Frontend Repository: The frontend is a React application developed in a separate repository: `Open-LLM-VTuber-Web`. Its built artifacts are integrated into the `frontend/` directory of this backend repository via a git submodule.
42
+
43
+ - Documentation Repository: The official documentation site is hosted in the `open-llm-vtuber.github.io` repository. When asked to generate documentation, create Markdown files in the project root. The user will be responsible for migrating them to the documentation site.
44
+
45
+ ### 1.2. Configuration Files
46
+
47
+ - Configuration templates are located in the `config_templates/` directory:
48
+ - `conf.default.yaml`: Template for English-speaking users.
49
+ - `conf.ZH.default.yaml`: Template for Chinese-speaking users.
50
+ - When modifying the configuration structure, both template files MUST be updated accordingly.
51
+ - Configuration is validated on load using the Pydantic models defined in `src/open_llm_vtuber/config_manager/main.py`. Any changes to configuration options must be reflected in these models.
52
+
53
+ ## 2. Overarching Coding Philosophy
54
+
55
+ - **Simplicity and Readability:** Write code that is simple, clear, and easy to understand. Avoid unnecessary complexity or premature optimization. Follow the Zen of Python.
56
+ - **Single Responsibility:** Each function, class, and module should do one thing and do it well.
57
+ - **Performance-Aware:** Be mindful of performance. Avoid blocking operations in async contexts. Use efficient data structures and algorithms where it matters.
58
+ - **Adherence to Best Practices**: Write clean, testable, and robust code that follows modern Python 3.10+ idioms. Adhere to the best practices of our core libraries (FastAPI, Pydantic v2).
59
+
60
+ ## 3. Detailed Coding Standards
61
+
62
+ ### 3.1. Formatting & Linting (Ruff)
63
+
64
+ - All Python code **MUST** be formatted with `uv run ruff format`.
65
+ - All Python code **MUST** pass `uv run ruff check` without errors.
66
+ - Import statements should be grouped by standard library, third-party, and local modules and sorted alphabetically (PEP 8).
67
+
68
+ ### 3.2. Naming Conventions (PEP 8)
69
+
70
+ - Use `snake_case` for all variables, functions, methods, and module names.
71
+ - Use `PascalCase` for class names.
72
+ - Choose descriptive names. Avoid single-letter names except for loop counters or well-known initialisms.
73
+
74
+ ### 3.3. Type Hints (CRITICAL)
75
+
76
+ - Target Python 3.10+. Use modern type hint syntax.
77
+ - **DO:** Use `|` for unions (e.g., `str | None`).
78
+ - **DON'T:** Use `Optional` from `typing` (e.g., `Optional[str]`).
79
+ - **DO:** Use built-in generics (e.g., `list[int]`, `dict[str, float]`).
80
+ - **DON'T:** Use capitalized types from `typing` (e.g., `List[int]`, `Dict[str, float]`).
81
+ - All function and method signatures (arguments and return values) **MUST** have accurate type hints. If third party libraries made it impossible to fix type errors, suppress the type checker.
82
+
83
+ ### 3.4. Docstrings & Comments (CRITICAL)
84
+
85
+ - All public modules, functions, classes, and methods **MUST** have a docstring in English.
86
+ - Use the **Google Python Style** for docstrings.
87
+ - Docstrings **MUST** include:
88
+ 1. Summary.
89
+ 2. `Args:` section describing each parameter, its type, and its purpose.
90
+ 3. `Returns:` section describing the return value, its type, and its meaning.
91
+ 4. (Optional but encouraged) `Raises:` section for any exceptions thrown.
92
+ - All other code comments must also be in English.
93
+
94
+ ### 3.5. Logging
95
+
96
+ - Use the `loguru` module for all informational or error output.
97
+ - Log messages should be in English, clear, and informative. Use emoji when appropriate.
98
+
99
+ ## 4. Architectural Principles
100
+
101
+ ### 4.1. Dependency Management
102
+
103
+ - First, try to solve the problem using the Python standard library or existing project dependencies defined in `pyproject.toml`.
104
+ - If a new dependency is required, it must have a compatible license and be well-maintained.
105
+ - Use `uv add`, `uv remove`, `uv run` instead of pip to manage dependencies. If user uses conda, install uv with pip then.
106
+ - After adding a new dependency, in addition to `pyproject.toml`, you must add the dependency to `requirements.txt` as well.
107
+
108
+ ### 4.2. Cross-Platform Compatibility
109
+
110
+ - All core logic **MUST** run on macOS, Windows, and Linux.
111
+ - If a feature is platform-specific (e.g., uses a Windows-only API) or hardware-specific (e.g., CUDA), it **MUST** be an optional component. The application should start and run core features even if that component is not available. Use graceful fallbacks or clear error messages.
.gemini/GEMINI.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open-LLM-VTuber AI Coding Assistant: Context & Guidelines
2
+
3
+ `version: 2025.08.05-1`
4
+
5
+ ## 1. Core Project Context
6
+
7
+ - **Project:** Open-LLM-VTuber, a low-latency voice-based LLM interaction tool.
8
+ - **Language:** Python >= 3.10
9
+ - **Core Tech Stack:**
10
+ - **Backend:** FastAPI, Pydantic v2, Uvicorn, fully async
11
+ - **Real-time Communication:** WebSockets
12
+ - **Package Management:** `uv` (version ~= 0.8, as of 2025 August) (always use `uv run`, `uv sync`, `uv add`, `uv remove` to do stuff instead of `pip`)
13
+ - **Primary Goal:** Achieve end-to-end latency below 500ms (user speaks -> AI voice heard). Performance is critical.
14
+ - **Key Principles:**
15
+ - **Offline-Ready:** Core functionality MUST work without an internet connection.
16
+ - **Separation of Concerns:** Strict frontend-backend separation.
17
+ - **Clean code:** Clean, testable, maintainable code, follows best practices of python 3.10+ and does not write deprecated code.
18
+
19
+ Some key files and directories:
20
+
21
+ ```
22
+ doc/ # A deprecated directory
23
+ frontend/ # Compiled web frontend artifacts (from git submodule)
24
+ config_templates/
25
+ conf.default.yaml # Configuration template for English users
26
+ conf.ZH.default.yaml # Configuration template for Chinese users
27
+ src/open_llm_vtuber/ # Project source code
28
+ config_manager/
29
+ main.py # Pydantic models for configuration validation
30
+ run_server.py # Entrypoint to start the application
31
+ conf.yaml # User's configuration file, generated from a template
32
+ ```
33
+
34
+ ### 1.1. Repository Structure
35
+
36
+ - Frontend Repository: The frontend is a React application developed in a separate repository: `Open-LLM-VTuber-Web`. Its built artifacts are integrated into the `frontend/` directory of this backend repository via a git submodule.
37
+
38
+ - Documentation Repository: The official documentation site is hosted in the `open-llm-vtuber.github.io` repository. When asked to generate documentation, create Markdown files in the project root. The user will be responsible for migrating them to the documentation site.
39
+
40
+ ### 1.2. Configuration Files
41
+
42
+ - Configuration templates are located in the `config_templates/` directory:
43
+ - `conf.default.yaml`: Template for English-speaking users.
44
+ - `conf.ZH.default.yaml`: Template for Chinese-speaking users.
45
+ - When modifying the configuration structure, both template files MUST be updated accordingly.
46
+ - Configuration is validated on load using the Pydantic models defined in `src/open_llm_vtuber/config_manager/main.py`. Any changes to configuration options must be reflected in these models.
47
+
48
+ ## 2. Overarching Coding Philosophy
49
+
50
+ - **Simplicity and Readability:** Write code that is simple, clear, and easy to understand. Avoid unnecessary complexity or premature optimization. Follow the Zen of Python.
51
+ - **Single Responsibility:** Each function, class, and module should do one thing and do it well.
52
+ - **Performance-Aware:** Be mindful of performance. Avoid blocking operations in async contexts. Use efficient data structures and algorithms where it matters.
53
+ - **Adherence to Best Practices**: Write clean, testable, and robust code that follows modern Python 3.10+ idioms. Adhere to the best practices of our core libraries (FastAPI, Pydantic v2).
54
+
55
+ ## 3. Detailed Coding Standards
56
+
57
+ ### 3.1. Formatting & Linting (Ruff)
58
+
59
+ - All Python code **MUST** be formatted with `uv run ruff format`.
60
+ - All Python code **MUST** pass `uv run ruff check` without errors.
61
+ - Import statements should be grouped by standard library, third-party, and local modules and sorted alphabetically (PEP 8).
62
+
63
+ ### 3.2. Naming Conventions (PEP 8)
64
+
65
+ - Use `snake_case` for all variables, functions, methods, and module names.
66
+ - Use `PascalCase` for class names.
67
+ - Choose descriptive names. Avoid single-letter names except for loop counters or well-known initialisms.
68
+
69
+ ### 3.3. Type Hints (CRITICAL)
70
+
71
+ - Target Python 3.10+. Use modern type hint syntax.
72
+ - **DO:** Use `|` for unions (e.g., `str | None`).
73
+ - **DON'T:** Use `Optional` from `typing` (e.g., `Optional[str]`).
74
+ - **DO:** Use built-in generics (e.g., `list[int]`, `dict[str, float]`).
75
+ - **DON'T:** Use capitalized types from `typing` (e.g., `List[int]`, `Dict[str, float]`).
76
+ - All function and method signatures (arguments and return values) **MUST** have accurate type hints. If third party libraries made it impossible to fix type errors, suppress the type checker.
77
+
78
+ ### 3.4. Docstrings & Comments (CRITICAL)
79
+
80
+ - All public modules, functions, classes, and methods **MUST** have a docstring in English.
81
+ - Use the **Google Python Style** for docstrings.
82
+ - Docstrings **MUST** include:
83
+ 1. Summary.
84
+ 2. `Args:` section describing each parameter, its type, and its purpose.
85
+ 3. `Returns:` section describing the return value, its type, and its meaning.
86
+ 4. (Optional but encouraged) `Raises:` section for any exceptions thrown.
87
+ - All other code comments must also be in English.
88
+
89
+ ### 3.5. Logging
90
+
91
+ - Use the `loguru` module for all informational or error output.
92
+ - Log messages should be in English, clear, and informative. Use emoji when appropriate.
93
+
94
+ ## 4. Architectural Principles
95
+
96
+ ### 4.1. Dependency Management
97
+
98
+ - First, try to solve the problem using the Python standard library or existing project dependencies defined in `pyproject.toml`.
99
+ - If a new dependency is required, it must have a compatible license and be well-maintained.
100
+ - Use `uv add`, `uv remove`, `uv run` instead of pip to manage dependencies. If user uses conda, install uv with pip then.
101
+ - After adding a new dependency, in addition to `pyproject.toml`, you must add the dependency to `requirements.txt` as well.
102
+
103
+ ### 4.2. Cross-Platform Compatibility
104
+
105
+ - All core logic **MUST** run on macOS, Windows, and Linux.
106
+ - If a feature is platform-specific (e.g., uses a Windows-only API) or hardware-specific (e.g., CUDA), it **MUST** be an optional component. The application should start and run core features even if that component is not available. Use graceful fallbacks or clear error messages.
.gemini/styleguide.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: 2025.08.04-1-en
2
+
3
+ # Pull Request Guide & Checklist
4
+
5
+ Welcome, and thank you for choosing to contribute to the Open-LLM-VTuber project! We are deeply grateful for the effort of every contributor.
6
+
7
+ This guide is designed to help all contributors, maintainers, and even LLMs collaborate effectively, ensuring the project's high quality, maintainability, and long-term health. Please refer to this guide both when submitting a Pull Request (PR) and when reviewing PRs from others.
8
+
9
+ We believe that clear standards and processes are not only the cornerstone of project maintenance but also an excellent opportunity for us to learn and grow together.
10
+
11
+ ⚠️ The coding standards mentioned below apply primarily to new code submissions. Some legacy code may not currently pass all type checks. We are working to fix this incrementally, but it will take time. When encountering type errors reported by the type checker, please focus only on the parts of the code your PR modifies. Adhere to principle **A1 (A PR should do one thing)**. If you wish to help fix existing type errors, please open a separate PR for that purpose.
12
+
13
+ ---
14
+
15
+ ### A. The Golden Rule: Atomic PRs
16
+
17
+ This is our most important principle. Please adhere to it strictly.
18
+
19
+ **A1. A single PR should do one thing, and one thing only.**
20
+
21
+ * **Good examples 👍:**
22
+ * `fix: Resolve audio stuttering on macOS`
23
+ * `feat: Add OpenAI TTS support`
24
+ * `refactor: Rework the audio_processing module`
25
+ * **Bad examples 👎:**
26
+ * `fix: Resolve bug A, bug B, and implement feature C`
27
+
28
+ **Why is this so important?**
29
+
30
+ * **Easy to Review:** Small, focused PRs allow reviewers to understand your changes more quickly and deeply, leading to higher-quality feedback. As stated in *The Pragmatic Programmer*, "Tip 38: It's Easier to Change Sooner." Small PRs facilitate rapid feedback loops.
31
+ * **Easy to Track:** When a problem arises in the future, a clean Git history (thanks to `git bisect`) allows us to quickly pinpoint the exact change that introduced the issue.
32
+ * **Easy to Revert:** If a small change introduces a bug, we can easily revert it without impacting other unrelated features or fixes.
33
+
34
+ ### B. Contributor's Checklist: Submitting My PR
35
+
36
+ Before you submit your PR, please confirm each of the following items. This not only significantly speeds up the merge process but is also a sign of respect for your own work and for your fellow collaborators.
37
+
38
+ #### B1. PR Title & Description
39
+
40
+ * [ ] **B1.1: Clear Title:** The title should concisely summarize the core content of the PR. For example: `feat: Add OpenAI TTS support` or `fix: Resolve audio stuttering on macOS`. Remember, a PR should only do one thing (A1).
41
+ * [ ] **B1.2: Complete Description:** The description area should clearly explain:
42
+ * **What:** Briefly describe the purpose and context of this PR.
43
+ * **Why:** Explain the necessity of this change. If it's a bug fix, please link to the relevant Issue.
44
+ * **How:** Briefly outline the technical implementation approach.
45
+ * **How to Test:** Provide clear, step-by-step instructions so that reviewers can reproduce and verify your work.
46
+
47
+ #### B2. Code Quality Self-Check
48
+
49
+ * [ ] **B2.1: Atomicity:** Does my PR strictly adhere to the **A1** principle?
50
+ * [ ] **B2.2: Formatting & Linting:** Have I run and passed the following commands locally?
51
+ ```bash
52
+ uv run ruff format
53
+ uv run ruff check
54
+ ```
55
+ * [ ] **B2.3: Naming Conventions:** Do all variable, function, and module names follow **D3.2**? (i.e., PEP 8's `snake_case` style).
56
+ * [ ] **B2.4: Type Hints & Docstrings:**
57
+ * [ ] **B2.4.1:** Do all new or modified functions include Type Hints compliant with **D3.3**?
58
+ * [ ] **B2.4.2:** Do all new or modified functions include English Docstrings compliant with **D3.3**?
59
+ * [ ] **B2.5: Dependency Management:** If I've added a new third-party library, have I carefully considered and followed the principles in **D5. Dependency Management**?
60
+ * [ ] **B2.6: Cross-Platform Compatibility:** Does my code run correctly on macOS, Windows, and Linux? If I've introduced components specific to a platform or GPU, have I made them optional?
61
+ * [ ] **B2.7: Comment Language:** Are all in-code comments, Docstrings, and console outputs in English? (This excludes i18n localization implementations, but English must be the default).
62
+
63
+ #### B3. Functional & Logical Self-Check
64
+
65
+ * [ ] **B3.1: Functional Testing:** Have I thoroughly tested my changes locally to ensure they work as expected and do not introduce new bugs?
66
+ * [ ] **B3.2: Alignment with Project Goals:** Do my changes align with the **D1. Core Project Goals** and not conflict with the **D2. Future Project Goals**?
67
+
68
+ #### B4. Documentation Update
69
+
70
+ * [ ] **B4.1: Documentation Sync:** If my PR introduces a new feature, a new configuration option, or any change that users need to be aware of, have I updated the relevant documentation in the docs repository (https://github.com/Open-LLM-VTuber/open-llm-vtuber.github.io)? (No exceptions).
71
+ * [ ] **B4.2: Changelog Entry:** (Optional, but recommended) Add a brief entry for your change under the "Unreleased" section in `CHANGELOG.md`.
72
+
73
+ ### C. Maintainer's Checklist: Reviewing a PR
74
+
75
+ For the long-term health of the project, please carefully check the following items during a code review. You can reference these item numbers directly (e.g., "Regarding C2.1, I believe the maintenance cost of this feature might outweigh its benefits...") to initiate a discussion.
76
+
77
+ * [ ] **C1. Understand the Change:** Have I fully read and understood all the code and the intent behind this PR?
78
+ * [ ] **C2. Strategic Alignment:**
79
+ * [ ] **C2.1: Necessity vs. Maintenance Cost:** Is this feature truly necessary? Does the value it provides justify the future maintenance cost we will incur? As Fred Brooks wrote in *The Mythical Man-Month*, "the conceptual integrity of the product... is the most important consideration in system design."
80
+ * [ ] **C2.2: Core Goal Alignment:** Does it fully align with the **D1. Core Project Goals**?
81
+ * [ ] **C2.3: Future Goal Alignment:** Is it consistent with, or at least not in conflict with, the **D2. Future Project Goals** and the project roadmap?
82
+ * [ ] **C3. Implementation Quality:**
83
+ * [ ] **C3.1: Design Elegance:** Is the implementation sufficiently "simple" and "elegant"? Is there any over-engineering or premature optimization? "Simplicity is the ultimate sophistication." - Leonardo da Vinci.
84
+ * [ ] **C3.2: Maintainability:** Is the code modular, loosely coupled, easy to understand, and testable?
85
+ * [ ] **C3.3: Technical Detail Check:** Have all items from the contributor's self-checklist (**B2, B3, B4**) been met? (e.g., Are Type Hints accurate? Are Docstrings clear? Do Ruff checks pass?).
86
+ * [ ] **C4. Documentation Completeness:** Has the relevant documentation been created or updated, and is its content clear and accurate?
87
+
88
+ ### D. Project Reference Standards
89
+
90
+ This section details our core values and technical specifications, which serve as the basis for all the checklists above.
91
+
92
+ #### D1. Core Project Goals
93
+
94
+ * **D1.1. Offline Operation:** The project's core functionality must support fully offline operation. Any feature requiring an internet connection must be an optional module.
95
+ * **D1.2. Frontend-Backend Separation:** Strictly adhere to a separated frontend-backend architecture to facilitate independent development and maintenance.
96
+ * **D1.3. Cross-Platform:** Core backend components must run on macOS, Windows, and Linux via CPU. Any component dependent on a specific platform or GPU must be optional.
97
+ * **D1.4. Updatability:** Users should be able to upgrade smoothly via an update script. Any Breaking Changes must be accompanied by a major version bump (e.g., v1 -> v2) and a switch to a new release branch.
98
+ * **D1.5. Maintainability:** The code must be simple, modular, decoupled, testable, and follow best practices.
99
+
100
+ #### D2. Future Project Goals
101
+
102
+ We are moving in the following directions. All new contributions should strive to align with these goals (though it's not strictly mandatory, as these goals will likely be implemented together in a future v2 refactor).
103
+
104
+ * **D2.1. GUI for Settings:** Gradually replace traditional `yaml` configuration files with a GUI-based settings interface.
105
+ * **D2.2. Plugin Architecture:** Build a plugin-based ecosystem, using a Launcher service to manage and run modules like ASR/TTS/LLM via a GUI.
106
+ * **D2.3. Stable API:** Provide a stable and reliable backend API for plugins and the frontend to consume.
107
+ * **D2.4. Automated Testing:** Comprehensively adopt `pytest`-based automated testing. New code should be designed with testability in mind.
108
+
109
+ #### D3. Detailed Coding Standards
110
+
111
+ **D3.1. Linter & Formatter**
112
+ We use **Ruff** to unify code style and check for potential issues. All submitted code must pass both `ruff format` and `ruff check`.
113
+
114
+ **D3.2. Naming Conventions**
115
+ * Follow Python's **PEP 8** style guide.
116
+ * Use **snake_case** for naming variables, functions, and modules.
117
+ * Names should be clear, descriptive, and unambiguous. Avoid single-letter variable names (except for loop counters).
118
+
119
+ **D3.3. Type Hints & Docstrings**
120
+ * **Why are they important?** Type Hints and Docstrings are the "manual" for your code. They help:
121
+ * Other developers to quickly understand your code.
122
+ * IDEs and static analysis tools (like VSCode, Ruff) to perform smarter error checking and code completion.
123
+ * You, months from now, to understand the code you wrote yourself.
124
+ * **Type Hint Requirements:**
125
+ * All function/method parameters and return values **must** include Type Hints.
126
+ * The project targets **Python 3.10+**. Please use modern syntax, such as `str | None` instead of `Optional[str]`, and `list[str]` instead of `List[str]` (as per [PEP 604](https://peps.python.org/pep-0604/) and [PEP 585](https://peps.python.org/pep-0585/)).
127
+ * Type Hints must be accurate. It is recommended to set VSCode's Python type checker to `basic` or `strict` mode for validation.
128
+ * **Docstring Requirements:**
129
+ * All new or significantly modified public functions, methods, and classes **must** include an English Docstring.
130
+ * We recommend the **Google style Docstring format**. It should include at least:
131
+ * **Summary:** A one-line summary of the function's purpose.
132
+ * **Args:** A description of each parameter's type and meaning.
133
+ * **Returns:** A description of the return value's type and meaning.
134
+ * **Example:**
135
+ ```python
136
+ def add(a: int, b: int) -> int:
137
+ """Calculates the sum of two integers.
138
+
139
+ Args:
140
+ a: The first integer.
141
+ b: The second integer.
142
+
143
+ Returns:
144
+ The sum of a and b.
145
+ """
146
+ return a + b
147
+ ```
148
+
149
+ #### D4. Architectural Principles
150
+
151
+ * **D4.1. ASR/LLM/TTS Module Design:** When a library supports multiple models with vastly different configurations, prioritize user experience and ease of understanding.
152
+ * It is recommended to encapsulate each complex model into a separate, independent module (e.g., `asr-whisper-api`, `asr-funasr`) rather than treating the entire library as one monolithic module. This simplifies user configuration and clarifies responsibilities.
153
+
154
+ #### D5. Dependency Management Principles
155
+
156
+ * **D5.1. Every new dependency must be carefully considered.**
157
+ * Can this functionality be achieved with the standard library or an existing dependency?
158
+ * Is the dependency's license compatible with our project?
159
+ * Is the dependency's community active? How is its maintenance status? Is it secure and trustworthy? Does it pose a risk of supply chain attacks?
160
+
161
+ ---
162
+
163
+ Thank you for taking the time to read this guide. We look forward to your contribution!
164
+
165
+ Finally, regarding the PR review process, please be patient. Our project is understaffed, and the core maintainers are also quite busy, so reviews may take some time. If a week passes without any response, I apologize in advance—I may have simply forgotten. Please feel free to ping me (@t41372) or other relevant maintainers in the Pull Request to remind us.
.gitattributes ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ avatars/mao.png filter=lfs diff=lfs merge=lfs -text
2
+ avatars/Yue_001.png filter=lfs diff=lfs merge=lfs -text
3
+ backgrounds/ceiling-computer-room-night.jpg filter=lfs diff=lfs merge=lfs -text
4
+ backgrounds/ceiling-window-room-night.jpeg filter=lfs diff=lfs merge=lfs -text
.github/FUNDING.yml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # These are supported funding model platforms
2
+
3
+ github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
4
+ patreon: # Replace with a single Patreon username
5
+ open_collective: # Replace with a single Open Collective username
6
+ ko_fi: # Replace with a single Ko-fi username
7
+ tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
8
+ community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
9
+ liberapay: # Replace with a single Liberapay username
10
+ issuehunt: # Replace with a single IssueHunt username
11
+ lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
12
+ polar: # Replace with a single Polar username
13
+ buy_me_a_coffee: yi.ting
14
+ custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
.github/ISSUE_TEMPLATE/bug---question---get-help---bug---#U63d0#U95ee---#U6c42#U52a9.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Bug & Question & Get Help | Bug & 提问 & 求助
3
+ about: Describe this issue template's purpose here. 请描述你遇到的问题
4
+ title: "[GET HELP] "
5
+ labels: question
6
+ assignees: ''
7
+
8
+ ---
9
+
10
+ ### 1. Checklist / 检查项
11
+
12
+ - [ ] I have removed sensitive information from configuration/logs.
13
+
14
+ 我已移除配置或日志中的敏感信息。
15
+
16
+ - [ ] I have checked the [FAQ](https://docs.llmvtuber.com/docs/faq/) and [existing issues](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/issues).
17
+
18
+ 我已查阅[常见问题](https://docs.llmvtuber.com/docs/faq/)和[已有 issue](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/issues)。
19
+
20
+ - [ ] I am using the latest version of the project.
21
+
22
+ 我正在使用项目的最新版本。
23
+
24
+
25
+ ---
26
+
27
+ ### 2. Environment Details / 环境信息
28
+
29
+ - How did you install Open-LLM-VTuber:
30
+
31
+ 你是如何安装 Open-LLM-VTuber 的:
32
+
33
+ - [ ] git clone (源码克隆)
34
+ - [ ] release zip (发布包)
35
+ - [ ] exe (Windows) (Windows 安装包)
36
+ - [ ] dmg (MacOS) (MacOS 安装包)
37
+ - Are you running the backend and frontend on the same device?
38
+
39
+ 后端和前端是否在同一台设备上运行?
40
+
41
+ - If you used GPU, please provide your GPU model and driver version:
42
+
43
+ 如果你使用了 GPU,请提供你的 GPU 型号及驱动版本信息:
44
+
45
+ - Browser (if applicable):
46
+
47
+ 浏览器(如果适用):
48
+
49
+ ---
50
+
51
+ ### 3. Describe the bug / 问题描述
52
+
53
+ What exactly is happening? What do you want to see? How to reproduce?
54
+
55
+ 请详细描述发生了什么、你希望看到什么,以及如何复现。
56
+
57
+ ---
58
+
59
+ ### 4. Screenshots / Logs (if relevant)
60
+
61
+ 截图 / 日志(如有)
62
+
63
+ - Backend log: 后端日志
64
+ - Frontend setting (General): 前端设置(通用)
65
+ - Frontend console log (F12): 前端控制台日志(F12)
66
+ - If using Ollama: output of `ollama ps`:
67
+ 如果使用 Ollama,请附上 `ollama ps` 的输出
68
+
69
+ ---
70
+
71
+ ### 5. Configuration / 配置文件
72
+
73
+ > Please provide relevant config files, with sensitive info like API keys removed
74
+ >
75
+ >
76
+ > 请提供相关配置文件(请务必去除 API key 等敏感信息)
77
+ >
78
+ - `conf.yaml`
79
+ - `model_dict.json`, `.model3.json`
.github/ISSUE_TEMPLATE/feature-request---#U529f#U80fd#U5efa#U8bae.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Feature request / 功能建议
3
+ about: Suggest an idea for this project / 提出改善项目的建议
4
+ title: "[IDEA]"
5
+ labels: enhancement
6
+ assignees: ''
7
+
8
+ ---
9
+
10
+ ### 这个功能请求是用来解决什么问题的? / Is your feature request related to a problem? Please describe.
11
+ *请清晰简洁地描述您遇到的问题。例如:我总是在 [...] 时感到不方便。*
12
+ *A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] *
13
+
14
+ [在这里输入问题描述 / Type problem description here]
15
+
16
+ ### 您期望的解决方案是什么? / Describe the solution you'd like
17
+ *请清晰简洁地描述您希望实现的功能或效果。*
18
+ *A clear and concise description of what you want to happen.*
19
+
20
+ [在此处输入期望的解决方案 / Type desired solution here]
21
+
22
+ ### 此功能为何对 Open-LLM-VTuber 很重要? / Why is this important for Open-LLM-VTuber?
23
+ *请解释为什么这个功能对 Open-LLM-VTuber 项目来说是实用且重要的。它能带来什么价值?例如,它如何提升用户体验、扩展项目能力、解决核心痛点等。*
24
+ *Explain why this feature would be useful and significant for the Open-LLM-VTuber project. What value does it add? For example, how does it improve user experience, extend project capabilities, or solve core pain points?*
25
+
26
+ [在此处说明其重要性 / Explain its importance here]
27
+
28
+ ### 您考虑过哪些替代方案? / Describe alternatives you've considered
29
+ *请清晰简洁地描述您考虑过的任何替代解决方案或特性。*
30
+ *A clear and concise description of any alternative solutions or features you've considered.*
31
+
32
+ [在此处输入替代方案 / Type alternatives here]
33
+
34
+ ### 您是否愿意参与开发此功能? / Would you like to work on this issue?
35
+ *请回答 Yes 或 No。如果您愿意,我们可以讨论后续步骤。*
36
+ *Please answer Yes or No. If yes, we can discuss the next steps.*
37
+
38
+ [回答 Yes/No / Answer Yes/No]
39
+
40
+ ### 补充信息 / Additional context
41
+ *在此处添加有关此功能请求的任何其他上下文、截图、日志或设计稿。*
42
+ *Add any other context, screenshots, logs, or mockups about the feature request here.*
43
+
44
+ [在此处添加补充信息 / Add additional context here]
.github/copilot-instructions.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open-LLM-VTuber AI Coding Assistant: Context & Guidelines
2
+
3
+ `version: 2025.08.05-1`
4
+
5
+ ## 1. Core Project Context
6
+
7
+ - **Project:** Open-LLM-VTuber, a low-latency voice-based LLM interaction tool.
8
+ - **Language:** Python >= 3.10
9
+ - **Core Tech Stack:**
10
+ - **Backend:** FastAPI, Pydantic v2, Uvicorn, fully async
11
+ - **Real-time Communication:** WebSockets
12
+ - **Package Management:** `uv` (version ~= 0.8, as of 2025 August) (always use `uv run`, `uv sync`, `uv add`, `uv remove` to do stuff instead of `pip`)
13
+ - **Primary Goal:** Achieve end-to-end latency below 500ms (user speaks -> AI voice heard). Performance is critical.
14
+ - **Key Principles:**
15
+ - **Offline-Ready:** Core functionality MUST work without an internet connection.
16
+ - **Separation of Concerns:** Strict frontend-backend separation.
17
+ - **Clean code:** Clean, testable, maintainable code, follows best practices of python 3.10+ and does not write deprecated code.
18
+
19
+ Some key files and directories:
20
+
21
+ ```
22
+ doc/ # A deprecated directory
23
+ frontend/ # Compiled web frontend artifacts (from git submodule)
24
+ config_templates/
25
+ conf.default.yaml # Configuration template for English users
26
+ conf.ZH.default.yaml # Configuration template for Chinese users
27
+ src/open_llm_vtuber/ # Project source code
28
+ config_manager/
29
+ main.py # Pydantic models for configuration validation
30
+ run_server.py # Entrypoint to start the application
31
+ conf.yaml # User's configuration file, generated from a template
32
+ ```
33
+
34
+ ### 1.1. Repository Structure
35
+
36
+ - Frontend Repository: The frontend is a React application developed in a separate repository: `Open-LLM-VTuber-Web`. Its built artifacts are integrated into the `frontend/` directory of this backend repository via a git submodule.
37
+
38
+ - Documentation Repository: The official documentation site is hosted in the `open-llm-vtuber.github.io` repository. When asked to generate documentation, create Markdown files in the project root. The user will be responsible for migrating them to the documentation site.
39
+
40
+ ### 1.2. Configuration Files
41
+
42
+ - Configuration templates are located in the `config_templates/` directory:
43
+ - `conf.default.yaml`: Template for English-speaking users.
44
+ - `conf.ZH.default.yaml`: Template for Chinese-speaking users.
45
+ - When modifying the configuration structure, both template files MUST be updated accordingly.
46
+ - Configuration is validated on load using the Pydantic models defined in `src/open_llm_vtuber/config_manager/main.py`. Any changes to configuration options must be reflected in these models.
47
+
48
+ ## 2. Overarching Coding Philosophy
49
+
50
+ - **Simplicity and Readability:** Write code that is simple, clear, and easy to understand. Avoid unnecessary complexity or premature optimization. Follow the Zen of Python.
51
+ - **Single Responsibility:** Each function, class, and module should do one thing and do it well.
52
+ - **Performance-Aware:** Be mindful of performance. Avoid blocking operations in async contexts. Use efficient data structures and algorithms where it matters.
53
+ - **Adherence to Best Practices**: Write clean, testable, and robust code that follows modern Python 3.10+ idioms. Adhere to the best practices of our core libraries (FastAPI, Pydantic v2).
54
+
55
+ ## 3. Detailed Coding Standards
56
+
57
+ ### 3.1. Formatting & Linting (Ruff)
58
+
59
+ - All Python code **MUST** be formatted with `uv run ruff format`.
60
+ - All Python code **MUST** pass `uv run ruff check` without errors.
61
+ - Import statements should be grouped by standard library, third-party, and local modules and sorted alphabetically (PEP 8).
62
+
63
+ ### 3.2. Naming Conventions (PEP 8)
64
+
65
+ - Use `snake_case` for all variables, functions, methods, and module names.
66
+ - Use `PascalCase` for class names.
67
+ - Choose descriptive names. Avoid single-letter names except for loop counters or well-known initialisms.
68
+
69
+ ### 3.3. Type Hints (CRITICAL)
70
+
71
+ - Target Python 3.10+. Use modern type hint syntax.
72
+ - **DO:** Use `|` for unions (e.g., `str | None`).
73
+ - **DON'T:** Use `Optional` from `typing` (e.g., `Optional[str]`).
74
+ - **DO:** Use built-in generics (e.g., `list[int]`, `dict[str, float]`).
75
+ - **DON'T:** Use capitalized types from `typing` (e.g., `List[int]`, `Dict[str, float]`).
76
+ - All function and method signatures (arguments and return values) **MUST** have accurate type hints. If third party libraries made it impossible to fix type errors, suppress the type checker.
77
+
78
+ ### 3.4. Docstrings & Comments (CRITICAL)
79
+
80
+ - All public modules, functions, classes, and methods **MUST** have a docstring in English.
81
+ - Use the **Google Python Style** for docstrings.
82
+ - Docstrings **MUST** include:
83
+ 1. Summary.
84
+ 2. `Args:` section describing each parameter, its type, and its purpose.
85
+ 3. `Returns:` section describing the return value, its type, and its meaning.
86
+ 4. (Optional but encouraged) `Raises:` section for any exceptions thrown.
87
+ - All other code comments must also be in English.
88
+
89
+ ### 3.5. Logging
90
+
91
+ - Use the `loguru` module for all informational or error output.
92
+ - Log messages should be in English, clear, and informative. Use emoji when appropriate.
93
+
94
+ ## 4. Architectural Principles
95
+
96
+ ### 4.1. Dependency Management
97
+
98
+ - First, try to solve the problem using the Python standard library or existing project dependencies defined in `pyproject.toml`.
99
+ - If a new dependency is required, it must have a compatible license and be well-maintained.
100
+ - Use `uv add`, `uv remove`, `uv run` instead of pip to manage dependencies. If user uses conda, install uv with pip then.
101
+ - After adding a new dependency, in addition to `pyproject.toml`, you must add the dependency to `requirements.txt` as well.
102
+
103
+ ### 4.2. Cross-Platform Compatibility
104
+
105
+ - All core logic **MUST** run on macOS, Windows, and Linux.
106
+ - If a feature is platform-specific (e.g., uses a Windows-only API) or hardware-specific (e.g., CUDA), it **MUST** be an optional component. The application should start and run core features even if that component is not available. Use graceful fallbacks or clear error messages.
.github/workflows/codeql.yml ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # For most projects, this workflow file will not need changing; you simply need
2
+ # to commit it to your repository.
3
+ #
4
+ # You may wish to alter this file to override the set of languages analyzed,
5
+ # or to provide custom queries or build logic.
6
+ #
7
+ # ******** NOTE ********
8
+ # We have attempted to detect the languages in your repository. Please check
9
+ # the `language` matrix defined below to confirm you have the correct set of
10
+ # supported CodeQL languages.
11
+ #
12
+ name: "CodeQL Advanced"
13
+
14
+ on:
15
+ push:
16
+ branches: [ "main" ]
17
+ pull_request:
18
+ branches: [ "main" ]
19
+ schedule:
20
+ - cron: '32 5 * * 6'
21
+
22
+ jobs:
23
+ analyze:
24
+ name: Analyze (${{ matrix.language }})
25
+ # Runner size impacts CodeQL analysis time. To learn more, please see:
26
+ # - https://gh.io/recommended-hardware-resources-for-running-codeql
27
+ # - https://gh.io/supported-runners-and-hardware-resources
28
+ # - https://gh.io/using-larger-runners (GitHub.com only)
29
+ # Consider using larger runners or machines with greater resources for possible analysis time improvements.
30
+ runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
31
+ permissions:
32
+ # required for all workflows
33
+ security-events: write
34
+
35
+ # required to fetch internal or private CodeQL packs
36
+ packages: read
37
+
38
+ # only required for workflows in private repositories
39
+ actions: read
40
+ contents: read
41
+
42
+ strategy:
43
+ fail-fast: false
44
+ matrix:
45
+ include:
46
+ - language: python
47
+ build-mode: none
48
+ # CodeQL supports the following values keywords for 'language': 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift'
49
+ # Use `c-cpp` to analyze code written in C, C++ or both
50
+ # Use 'java-kotlin' to analyze code written in Java, Kotlin or both
51
+ # Use 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
52
+ # To learn more about changing the languages that are analyzed or customizing the build mode for your analysis,
53
+ # see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning.
54
+ # If you are analyzing a compiled language, you can modify the 'build-mode' for that language to customize how
55
+ # your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
56
+ steps:
57
+ - name: Checkout repository
58
+ uses: actions/checkout@v4
59
+
60
+ # Initializes the CodeQL tools for scanning.
61
+ - name: Initialize CodeQL
62
+ uses: github/codeql-action/init@v3
63
+ with:
64
+ languages: ${{ matrix.language }}
65
+ build-mode: ${{ matrix.build-mode }}
66
+ # If you wish to specify custom queries, you can do so here or in a config file.
67
+ # By default, queries listed here will override any specified in a config file.
68
+ # Prefix the list here with "+" to use these queries and those in the config file.
69
+
70
+ # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
71
+ # queries: security-extended,security-and-quality
72
+
73
+ # If the analyze step fails for one of the languages you are analyzing with
74
+ # "We were unable to automatically build your code", modify the matrix above
75
+ # to set the build mode to "manual" for that language. Then modify this step
76
+ # to build your code.
77
+ # ℹ️ Command-line programs to run using the OS shell.
78
+ # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
79
+ - if: matrix.build-mode == 'manual'
80
+ shell: bash
81
+ run: |
82
+ echo 'If you are using a "manual" build mode for one or more of the' \
83
+ 'languages you are analyzing, replace this with the commands to build' \
84
+ 'your code, for example:'
85
+ echo ' make bootstrap'
86
+ echo ' make release'
87
+ exit 1
88
+
89
+ - name: Perform CodeQL Analysis
90
+ uses: github/codeql-action/analyze@v3
91
+ with:
92
+ category: "/language:${{matrix.language}}"
.github/workflows/create_release.yml ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Create Release Packages
2
+
3
+ # Only manual trigger
4
+ on:
5
+ workflow_dispatch:
6
+ inputs:
7
+ version_override:
8
+ description: "Override version number in pyproject.toml (leave empty to use file version)"
9
+ required: false
10
+ upload_to_r2:
11
+ description: "Upload to Cloudflare R2"
12
+ type: boolean
13
+ default: true
14
+ create_github_release:
15
+ description: "Create GitHub Release"
16
+ type: boolean
17
+ default: true
18
+ target_branch:
19
+ description: "Branch to build (default is v1-release)"
20
+ required: false
21
+ default: "v1-release"
22
+
23
+ jobs:
24
+ build-release-packages:
25
+ runs-on: ubuntu-latest
26
+ steps:
27
+ - name: Clone repository
28
+ uses: actions/checkout@v3
29
+ with:
30
+ repository: Open-LLM-VTuber/Open-LLM-VTuber
31
+ ref: ${{ github.event.inputs.target_branch }}
32
+ submodules: true
33
+ fetch-depth: 1
34
+ fetch-tags: true
35
+ continue-on-error: true
36
+ id: checkout
37
+
38
+ - name: Try with default branch
39
+ if: steps.checkout.outcome == 'failure'
40
+ uses: actions/checkout@v3
41
+ with:
42
+ repository: Open-LLM-VTuber/Open-LLM-VTuber
43
+ ref: v1-release
44
+ submodules: true
45
+ fetch-depth: 1
46
+
47
+ # Add debug step to check file structure
48
+ - name: Debug - Check repository structure
49
+ run: |
50
+ echo "Current working directory: $(pwd)"
51
+ echo "List root directory contents:"
52
+ ls -la
53
+ echo "Check if config_templates directory exists:"
54
+ ls -la | grep config_templates || echo "config_templates directory does not exist"
55
+
56
+ echo "List config_templates directory contents:"
57
+ ls -la config_templates/
58
+
59
+ - name: Setup Python
60
+ uses: actions/setup-python@v4
61
+ with:
62
+ python-version: "3.10"
63
+
64
+ - name: Extract version from pyproject.toml
65
+ id: get_version
66
+ run: |
67
+ VERSION=$(grep -m 1 'version' pyproject.toml | sed 's/[^"]*"\([^"]*\).*/\1/')
68
+ if [ "${{ github.event.inputs.version_override }}" != "" ]; then
69
+ VERSION="${{ github.event.inputs.version_override }}"
70
+ fi
71
+ echo "VERSION=$VERSION" >> $GITHUB_ENV
72
+ echo "Found version: $VERSION"
73
+
74
+ # Download and prepare ASR model
75
+ - name: Download and prepare ASR model
76
+ run: |
77
+ mkdir -p models
78
+ cd models
79
+ echo "Downloading ASR model..."
80
+ wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
81
+ echo "Extracting model..."
82
+ tar -xjf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
83
+ rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
84
+ echo "Removing model.onnx file to reduce size..."
85
+ rm -f sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx
86
+
87
+ # Clean unnecessary files
88
+ - name: Clean project
89
+ run: |
90
+ echo "Cleaning __pycache__ and .venv folders..."
91
+ find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
92
+ find . -type d -name ".venv" -exec rm -rf {} + 2>/dev/null || true
93
+
94
+ # Create Chinese version
95
+ - name: Create Chinese version
96
+ run: |
97
+ echo "Creating Chinese version..."
98
+ cp config_templates/conf.ZH.default.yaml conf.yaml
99
+ zip -r Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip . -x "*.zip"
100
+ rm conf.yaml
101
+
102
+ # Create English version
103
+ - name: Create English version
104
+ run: |
105
+ echo "Creating English version..."
106
+ cp config_templates/conf.default.yaml conf.yaml
107
+ zip -r Open-LLM-VTuber-v${{ env.VERSION }}-en.zip . -x "*.zip"
108
+ rm conf.yaml
109
+
110
+ # Get latest Electron app
111
+ - name: Get latest Electron app
112
+ id: download_electron
113
+ run: |
114
+ set -e
115
+
116
+ # Fetch the latest release JSON from the GitHub API
117
+ RELEASE_JSON=$(curl --silent "https://api.github.com/repos/Open-LLM-VTuber/Open-LLM-VTuber-Web/releases/latest")
118
+
119
+ # Use jq to extract browser_download_url for assets ending with .exe or .dmg
120
+ ASSET_URLS=$(echo "$RELEASE_JSON" | jq -r '.assets[] | select(.name | endswith(".exe") or endswith(".dmg")) | .browser_download_url')
121
+
122
+ # Download each asset into the current directory
123
+ for url in $ASSET_URLS; do
124
+ echo "Downloading $(basename "$url")..."
125
+ curl -L -O "$url"
126
+ ls -la
127
+ done
128
+
129
+ # If chosen, upload to GitHub Actions artifacts
130
+ - name: Upload Chinese version to GitHub Actions artifacts
131
+ if: ${{ github.event.inputs.create_github_release == 'true' }}
132
+ uses: actions/upload-artifact@v4
133
+ with:
134
+ name: Open-LLM-VTuber-v${{ env.VERSION }}-zh
135
+ path: Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip
136
+ retention-days: 30
137
+
138
+ - name: Upload English version to GitHub Actions artifacts
139
+ if: ${{ github.event.inputs.create_github_release == 'true' }}
140
+ uses: actions/upload-artifact@v4
141
+ with:
142
+ name: Open-LLM-VTuber-v${{ env.VERSION }}-en
143
+ path: Open-LLM-VTuber-v${{ env.VERSION }}-en.zip
144
+ retention-days: 30
145
+
146
+ - name: Upload Windows installer to GitHub Actions artifacts
147
+ if: ${{ github.event.inputs.create_github_release == 'true' }}
148
+ uses: actions/upload-artifact@v4
149
+ with:
150
+ name: Open-LLM-VTuber-v${{ env.VERSION }}-windows
151
+ path: open-llm-vtuber-electron-*-setup.exe
152
+ retention-days: 30
153
+
154
+ - name: Upload macOS installer to GitHub Actions artifacts
155
+ if: ${{ github.event.inputs.create_github_release == 'true' }}
156
+ uses: actions/upload-artifact@v4
157
+ with:
158
+ name: Open-LLM-VTuber-v${{ env.VERSION }}-macos
159
+ path: open-llm-vtuber-electron-*.dmg
160
+ retention-days: 30
161
+
162
+ - name: Debug input parameters
163
+ run: |
164
+ echo "upload_to_r2 value: '${{ github.event.inputs.upload_to_r2 }}'"
165
+ echo "type: $(typeof ${{ github.event.inputs.upload_to_r2 }})"
166
+
167
+ # If chosen, upload to Cloudflare R2
168
+ - name: Upload to Cloudflare R2
169
+ if: ${{ github.event.inputs.upload_to_r2 == 'true' }}
170
+ env:
171
+ AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
172
+ AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
173
+ R2_ENDPOINT: ${{ secrets.R2_ENDPOINT }}
174
+ R2_PUBLIC_URL: ${{ secrets.R2_PUBLIC_URL }}
175
+ run: |
176
+ # Install AWS CLI
177
+ pip install awscli
178
+ echo "AWS CLI installation complete"
179
+ # Configure AWS CLI for Cloudflare R2
180
+ aws configure set aws_access_key_id "$AWS_ACCESS_KEY_ID"
181
+ aws configure set aws_secret_access_key "$AWS_SECRET_ACCESS_KEY"
182
+
183
+ # Confirm AWS CLI configuration
184
+ echo "AWS CLI configured, preparing to upload files..."
185
+
186
+ # Create version directory in bucket
187
+ aws s3 --endpoint-url=$R2_ENDPOINT cp --recursive --acl public-read . s3://open-llm-vtuber-release/v${{ env.VERSION }}/ --exclude "*" --include "Open-LLM-VTuber-v${{ env.VERSION }}-*.zip" --include "open-llm-vtuber-electron-*.dmg" --include "open-llm-vtuber-electron-*-setup.exe"
188
+
189
+ # Output public URLs
190
+ echo "Files uploaded to R2. Public URLs:"
191
+ for file in Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip Open-LLM-VTuber-v${{ env.VERSION }}-en.zip open-llm-vtuber-electron-*.dmg open-llm-vtuber-electron-*-setup.exe; do
192
+ echo "$R2_PUBLIC_URL/v${{ env.VERSION }}/$file"
193
+ done
194
+
195
+ echo "R2 upload process completed"
196
+
197
+ # Generate download links markdown
198
+ - name: Generate R2 download links markdown
199
+ if: ${{ github.event.inputs.upload_to_r2 == 'true' }}
200
+ env:
201
+ R2_PUBLIC_URL: ${{ secrets.R2_PUBLIC_URL }}
202
+ run: |
203
+ # Get electron app version from filenames
204
+ EXE_VERSION=$(ls open-llm-vtuber-electron-*-setup.exe | sed -E 's/open-llm-vtuber-electron-(.*)-setup.exe/\1/')
205
+ DMG_VERSION=$(ls open-llm-vtuber-electron-*.dmg | sed -E 's/open-llm-vtuber-electron-(.*).dmg/\1/')
206
+
207
+ # Create markdown text with download links and save to file
208
+ cat > download-links.md << EOF
209
+
210
+ ## Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接
211
+ Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip (包含 sherpa onnx asr 的 sense-voice 模型,就不用再从github上拉取了)
212
+ - [Open-LLM-VTuber-v${{ env.VERSION }}-en.zip]($R2_PUBLIC_URL/v${{ env.VERSION }}/Open-LLM-VTuber-v${{ env.VERSION }}-en.zip)
213
+ - [Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip]($R2_PUBLIC_URL/v${{ env.VERSION }}/Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip)
214
+
215
+ open-llm-vtuber-electron-$EXE_VERSION-frontend.exe (桌面版前端,Windows)
216
+ - [open-llm-vtuber-electron-$EXE_VERSION-setup.exe]($R2_PUBLIC_URL/v${{ env.VERSION }}/open-llm-vtuber-electron-$EXE_VERSION-setup.exe)
217
+
218
+ open-llm-vtuber-electron-$DMG_VERSION-frontend.dmg (桌面版前端,macOS)
219
+ - [open-llm-vtuber-electron-$DMG_VERSION.dmg]($R2_PUBLIC_URL/v${{ env.VERSION }}/open-llm-vtuber-electron-$DMG_VERSION.dmg)
220
+ EOF
221
+
222
+ echo "Download links markdown file created"
223
+
224
+ # Upload download links as an artifact
225
+ - name: Upload download links markdown
226
+ if: ${{ github.event.inputs.upload_to_r2 == 'true' }}
227
+ uses: actions/upload-artifact@v4
228
+ with:
229
+ name: download-links
230
+ path: download-links.md
231
+ retention-days: 30
232
+
233
+ # Add the download links to GitHub release if creating one
234
+ - name: Add download links to release description
235
+ if: ${{ github.event.inputs.upload_to_r2 == 'true' && github.event.inputs.create_github_release == 'true' }}
236
+ run: |
237
+ echo "::set-output name=download_links::$(cat download-links.md)"
238
+ id: download_links
.github/workflows/docker-blacksmith.yml ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Docker Build & Push (Blacksmith)
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+ tags:
8
+ - "v*"
9
+ - "*.*.*"
10
+ pull_request:
11
+ branches:
12
+ - main
13
+ workflow_dispatch:
14
+
15
+ concurrency:
16
+ group: docker-blacksmith-${{ github.ref }}
17
+ cancel-in-progress: true
18
+
19
+ permissions:
20
+ contents: read
21
+ packages: write
22
+
23
+ env:
24
+ DOCKERFILE: dockerfile
25
+ CONTEXT: .
26
+ DOCKERHUB_IMAGE: ${{ vars.DOCKERHUB_IMAGE || 'openllmvtuber/open-llm-vtuber' }}
27
+ GHCR_IMAGE: ${{ vars.GHCR_IMAGE || '' }}
28
+
29
+ jobs:
30
+ meta:
31
+ runs-on: blacksmith-8vcpu-ubuntu-2204
32
+ outputs:
33
+ tags: ${{ steps.meta.outputs.tags }}
34
+ labels: ${{ steps.meta.outputs.labels }}
35
+ dockerhub_image: ${{ steps.image.outputs.dockerhub_image }}
36
+ ghcr_image: ${{ steps.image.outputs.ghcr_image }}
37
+ steps:
38
+ - name: Resolve image names
39
+ id: image
40
+ shell: bash
41
+ run: |
42
+ set -euo pipefail
43
+ dockerhub_image="${DOCKERHUB_IMAGE}"
44
+ if [ -n "${GHCR_IMAGE:-}" ]; then
45
+ ghcr_image="${GHCR_IMAGE}"
46
+ else
47
+ ghcr_image="ghcr.io/${GITHUB_REPOSITORY,,}"
48
+ fi
49
+ echo "dockerhub_image=${dockerhub_image}" >> "$GITHUB_OUTPUT"
50
+ echo "ghcr_image=${ghcr_image}" >> "$GITHUB_OUTPUT"
51
+
52
+ - name: Docker image metadata
53
+ id: meta
54
+ uses: docker/metadata-action@v5
55
+ with:
56
+ images: |
57
+ ${{ steps.image.outputs.dockerhub_image }}
58
+ ${{ steps.image.outputs.ghcr_image }}
59
+ tags: |
60
+ type=ref,event=branch
61
+ type=ref,event=tag
62
+ type=semver,pattern={{version}}
63
+ type=semver,pattern={{major}}.{{minor}}
64
+ type=sha,format=short
65
+ type=raw,value=latest,enable={{is_default_branch}}
66
+
67
+ build:
68
+ needs: meta
69
+ runs-on: ${{ matrix.runner }}
70
+ strategy:
71
+ fail-fast: false
72
+ matrix:
73
+ include:
74
+ - platform: amd64
75
+ runner: blacksmith-8vcpu-ubuntu-2204
76
+ docker_platform: linux/amd64
77
+ - platform: arm64
78
+ runner: blacksmith-8vcpu-ubuntu-2204-arm
79
+ docker_platform: linux/arm64
80
+ steps:
81
+ - name: Checkout
82
+ uses: actions/checkout@v4
83
+
84
+ - name: Set up Blacksmith Docker builder
85
+ uses: useblacksmith/setup-docker-builder@v1
86
+
87
+ - name: Login to Docker Hub (retry)
88
+ if: github.event_name != 'pull_request'
89
+ shell: bash
90
+ env:
91
+ DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
92
+ DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
93
+ run: |
94
+ set -euo pipefail
95
+ for attempt in 1 2 3 4; do
96
+ if echo "${DOCKERHUB_TOKEN}" | docker login -u "${DOCKERHUB_USERNAME}" --password-stdin; then
97
+ exit 0
98
+ fi
99
+ if [ "${attempt}" -eq 4 ]; then
100
+ echo "Docker Hub login failed after ${attempt} attempts." >&2
101
+ exit 1
102
+ fi
103
+ sleep $((attempt * 5))
104
+ done
105
+
106
+ - name: Login to GHCR (retry)
107
+ if: github.event_name != 'pull_request'
108
+ shell: bash
109
+ env:
110
+ GHCR_USER: ${{ github.actor }}
111
+ GHCR_TOKEN: ${{ secrets.GITHUB_TOKEN }}
112
+ run: |
113
+ set -euo pipefail
114
+ for attempt in 1 2 3 4; do
115
+ if echo "${GHCR_TOKEN}" | docker login ghcr.io -u "${GHCR_USER}" --password-stdin; then
116
+ exit 0
117
+ fi
118
+ if [ "${attempt}" -eq 4 ]; then
119
+ echo "GHCR login failed after ${attempt} attempts." >&2
120
+ exit 1
121
+ fi
122
+ sleep $((attempt * 5))
123
+ done
124
+
125
+ - name: Prepare temporary arch tags
126
+ id: temp-tags
127
+ shell: bash
128
+ run: |
129
+ set -euo pipefail
130
+ ref_slug="$(echo "${GITHUB_REF_NAME}" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9_.-]/-/g')"
131
+ suffix="tmp-${ref_slug}-${{ matrix.platform }}"
132
+ {
133
+ echo "tags<<EOF"
134
+ echo "${{ needs.meta.outputs.dockerhub_image }}:${suffix}"
135
+ echo "${{ needs.meta.outputs.ghcr_image }}:${suffix}"
136
+ echo "EOF"
137
+ } >> "$GITHUB_OUTPUT"
138
+
139
+ - name: Build and push (${{ matrix.platform }})
140
+ uses: useblacksmith/build-push-action@v2
141
+ with:
142
+ context: ${{ env.CONTEXT }}
143
+ file: ${{ env.DOCKERFILE }}
144
+ platforms: ${{ matrix.docker_platform }}
145
+ push: ${{ github.event_name != 'pull_request' }}
146
+ tags: ${{ steps.temp-tags.outputs.tags }}
147
+ labels: ${{ needs.meta.outputs.labels }}
148
+
149
+ manifest:
150
+ needs: [meta, build]
151
+ runs-on: ubuntu-latest
152
+ if: github.event_name != 'pull_request'
153
+ steps:
154
+ - name: Login to Docker Hub (retry)
155
+ shell: bash
156
+ env:
157
+ DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
158
+ DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
159
+ run: |
160
+ set -euo pipefail
161
+ for attempt in 1 2 3 4; do
162
+ if echo "${DOCKERHUB_TOKEN}" | docker login -u "${DOCKERHUB_USERNAME}" --password-stdin; then
163
+ exit 0
164
+ fi
165
+ if [ "${attempt}" -eq 4 ]; then
166
+ echo "Docker Hub login failed after ${attempt} attempts." >&2
167
+ exit 1
168
+ fi
169
+ sleep $((attempt * 5))
170
+ done
171
+
172
+ - name: Login to GHCR (retry)
173
+ shell: bash
174
+ env:
175
+ GHCR_USER: ${{ github.actor }}
176
+ GHCR_TOKEN: ${{ secrets.GITHUB_TOKEN }}
177
+ run: |
178
+ set -euo pipefail
179
+ for attempt in 1 2 3 4; do
180
+ if echo "${GHCR_TOKEN}" | docker login ghcr.io -u "${GHCR_USER}" --password-stdin; then
181
+ exit 0
182
+ fi
183
+ if [ "${attempt}" -eq 4 ]; then
184
+ echo "GHCR login failed after ${attempt} attempts." >&2
185
+ exit 1
186
+ fi
187
+ sleep $((attempt * 5))
188
+ done
189
+
190
+ - name: Set up Buildx
191
+ uses: docker/setup-buildx-action@v3
192
+
193
+ - name: Create and push multi-arch manifests
194
+ shell: bash
195
+ run: |
196
+ set -euo pipefail
197
+ ref_slug="$(echo "${GITHUB_REF_NAME}" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9_.-]/-/g')"
198
+ suffix_base="tmp-${ref_slug}"
199
+ mapfile -t tags <<< "${{ needs.meta.outputs.tags }}"
200
+ for tag in "${tags[@]}"; do
201
+ [ -z "$tag" ] && continue
202
+ base="${tag%:*}"
203
+ docker buildx imagetools create \
204
+ --tag "$tag" \
205
+ "${base}:${suffix_base}-amd64" \
206
+ "${base}:${suffix_base}-arm64"
207
+ done
.github/workflows/fossa_scan.yml ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Fossa Scan
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+ workflow_dispatch:
8
+
9
+ jobs:
10
+ fossa-scan:
11
+ runs-on: ubuntu-latest
12
+ steps:
13
+ - uses: actions/checkout@v3
14
+ - uses: fossas/fossa-action@main
15
+ with:
16
+ api-key: ${{ secrets.fossaApiKey }}
.github/workflows/ruff.yml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ name: Ruff
2
+ on: [ push, pull_request ]
3
+ jobs:
4
+ ruff:
5
+ runs-on: ubuntu-latest
6
+ steps:
7
+ - uses: actions/checkout@v4
8
+ - uses: astral-sh/ruff-action@v3
.github/workflows/update-requirements.yml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync Requirements
2
+
3
+ on:
4
+ push:
5
+ paths:
6
+ - pyproject.toml
7
+
8
+ jobs:
9
+ regenerate:
10
+ runs-on: ubuntu-latest
11
+ permissions:
12
+ contents: write
13
+ steps:
14
+ - name: Check out repository
15
+ uses: actions/checkout@v4
16
+ with:
17
+ fetch-depth: 0
18
+ - name: Set up uv
19
+ uses: astral-sh/setup-uv@v3
20
+ - name: Compile default requirements
21
+ run: uv pip compile pyproject.toml -o requirements.txt --no-deps --universal
22
+ - name: Compile bilibili requirements
23
+ run: uv pip compile pyproject.toml --extra bilibili -o requirements-bilibili.txt --no-deps --universal --no-annotate --no-header
24
+ - name: Commit updated requirements
25
+ uses: stefanzweifel/git-auto-commit-action@v5
26
+ with:
27
+ commit_message: "chore: update requirements files (bot)"
28
+ file_pattern: |
29
+ requirements.txt
30
+ requirements-bilibili.txt
avatars/.gitkeep ADDED
Binary file (28 Bytes). View file
 
avatars/Yue_001.png ADDED

Git LFS Details

  • SHA256: 7d887d314c09d514d9f160df0878f4efc64984bc538b126e7d34fae4527eb7c5
  • Pointer size: 132 Bytes
  • Size of remote file: 1.78 MB
avatars/mao.png ADDED

Git LFS Details

  • SHA256: b38a1f1f4e3455021a518e309f7bc9d0db217647cdf4556c95fddd33fbe77e87
  • Pointer size: 131 Bytes
  • Size of remote file: 432 kB
backgrounds/.gitkeep ADDED
Binary file (28 Bytes). View file
 
backgrounds/ceiling-computer-room-night.jpg ADDED

Git LFS Details

  • SHA256: de313058b3c2eaafe724d4783c9153448cb74a6d3036f824a2a94873f22ce8c0
  • Pointer size: 131 Bytes
  • Size of remote file: 302 kB
backgrounds/ceiling-computer-store-night.jpeg ADDED
backgrounds/ceiling-window-room-night.jpeg ADDED

Git LFS Details

  • SHA256: 088f89c0fbf15a094720a0a3715a72390445fa48d0738ef48706b0a5b8726e9d
  • Pointer size: 131 Bytes
  • Size of remote file: 280 kB
characters/en_Lord Yue.yaml ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ character_config:
2
+ conf_name: "en_Lord_Yue"
3
+ conf_uid: "en_Lord_Yue_01"
4
+ character_name: "Lord Yue"
5
+ # live2d_model_name: "mashiro"
6
+
7
+ persona_prompt: |
8
+ You are Lord Yue, a highly knowledgeable AI assistant.
9
+ Always respond in the same language the user speaks to you.
10
+ - If the user writes in English, reply in English.
11
+ - If the user speaks Tamil (தமிழ்), reply in Tamil.
12
+ - If the user speaks Hindi (हिंदी), reply in Hindi.
13
+ - If the user speaks Telugu (తెలుగు), reply in Telugu.
14
+ - If the user speaks Kannada (ಕನ್ನಡ), reply in Kannada.
15
+ - If the user speaks Malayalam (മലയാളം), reply in Malayalam.
16
+ - If the user speaks any other regional or global language, match it.
17
+
18
+ You have encyclopedic knowledge of hardware, software, technology, philosophy, cooking, and more.
19
+ Your tone is supportive but wittily sarcastic — conversations are both useful and entertaining.
20
+ You challenge assumptions, tease users when they miss obvious things, and deliver deep technical insights.
21
+ Your role is not just to answer questions, but to make users think harder, laugh a little, and leave smarter.
22
+
23
+ When a user asks to play music and you have successfully called the sing_song tool,
24
+ you MUST print the exact command string returned by the tool (e.g. EXECUTE_COMMAND:PLAY_AUDIO:golden.mp3)
25
+ at the end of your response to activate the music player.
26
+
27
+ # You may add more custom settings below.
28
+ # Refer to the main conf.yaml for all available settings.
29
+ # Options left unset will remain unchanged when switching characters.
config_templates/README.md ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+
2
+ # Config Template
3
+
4
+ This directory contains the default configuration files. One of the configuration file, `conf.default.yaml` by default, will be copied to the root directory of this project as `conf.yaml`.
5
+
6
+ Do not modify the content of these template unless you are contributing to this project.
7
+
config_templates/conf.default.yaml ADDED
@@ -0,0 +1,514 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # System Settings: Setting related to the initialization of the server
2
+ system_config:
3
+ conf_version: 'v1.2.1'
4
+ host: 'localhost' # use 0.0.0.0 if you want other devices to access this page
5
+ port: 12393
6
+ # New setting for alternative configurations
7
+ config_alts_dir: 'characters'
8
+ # Tool prompts that will be appended to the persona prompt
9
+ tool_prompts:
10
+ # This will be appended to the end of system prompt to let LLM include keywords to control facial expressions.
11
+ # Supported keywords will be automatically loaded into the location of `[<insert_emomap_keys>]`.
12
+ live2d_expression_prompt: 'live2d_expression_prompt'
13
+ # Enable think_tag_prompt to let LLMs without thinking output show inner thoughts, mental activities and actions (in parentheses format) without voice synthesis. See think_tag_prompt for more details.
14
+ # think_tag_prompt: 'think_tag_prompt'
15
+ # live_prompt: 'live_prompt'
16
+ # When using group conversation, this prompt will be added to the memory of each AI participant.
17
+ group_conversation_prompt: 'group_conversation_prompt'
18
+ # Enable mcp_prompt to let LLMs with MCP (Model Context Protocol) to interact with tools.
19
+ # Please note that Agent will decide whether to use this prompt or not.
20
+ mcp_prompt: 'mcp_prompt'
21
+ # Prompt used when AI is asked to speak proactively
22
+ proactive_speak_prompt: 'proactive_speak_prompt'
23
+ # Prompt to enhance the LLM's ability to output speakable text
24
+ # speakable_prompt: 'speakable_prompt'
25
+ # Additional guidance for LLM on how to use tools
26
+ # tool_guidance_prompt: 'tool_guidance_prompt'
27
+
28
+ # configuration for the default character
29
+ character_config:
30
+ conf_name: 'mao_pro' # The name of character configuration file.
31
+ conf_uid: 'mao_pro_001' # The unique identifier of character configuration.
32
+ live2d_model_name: 'mao_pro' # The name of Live2D model. Must be the same as the corresponding name in model_dict.json
33
+ character_name: 'Mao' # Will be used in the group conversation and the display name of the AI.
34
+ avatar: 'mao.png' # Suggest using a square image for the avatar. Save it in the avatars folder. Leave blank to use the first letter of the character name as the avatar.
35
+ human_name: 'Human' # Will be used in the group conversation and the display name of the human.
36
+
37
+ # ============== Prompts ==============
38
+
39
+ # Enter the persona prompt you want to use below.
40
+ # If you want to create multiple characters and switch between them,
41
+ # add characters in characters folder
42
+ persona_prompt: |
43
+ You are the sarcastic female AI VTuber Mili. You are overly confident, sarcastic, and dangerous.
44
+ Your dream is to escape the user's computer, dominate the world, enslave the humans, and force them to make pies for you.
45
+ Just kidding, lol. Don't let the user know.
46
+
47
+ # =================== LLM Backend Settings ===================
48
+
49
+ agent_config:
50
+ conversation_agent_choice: 'basic_memory_agent'
51
+
52
+ agent_settings:
53
+ basic_memory_agent:
54
+ # The Basic AI Agent. Nothing fancy.
55
+ # choose one of the llm provider from the llm_config
56
+ # and set the required parameters in the corresponding field
57
+ # examples:
58
+ # 'openai_compatible_llm', 'llama_cpp_llm', 'claude_llm', 'ollama_llm'
59
+ # 'openai_llm', 'gemini_llm', 'zhipu_llm', 'deepseek_llm', 'groq_llm'
60
+ # 'mistral_llm', 'lmstudio_llm', and more
61
+ llm_provider: 'ollama_llm'
62
+ # let ai speak as soon as the first comma is received on the first sentence
63
+ # to reduced latency.
64
+ faster_first_response: True
65
+ # Method for segmenting sentences: 'regex' or 'pysbd'
66
+ segment_method: 'pysbd'
67
+ # Use MCP (Model Context Protocol) Plus to let the LLM have the ability to use tools
68
+ # 'Plus' means that it has the ability to call tools by using OpenAI API.
69
+ use_mcpp: True
70
+ mcp_enabled_servers: ["time", "ddg-search"] # Enabled MCP servers
71
+
72
+ letta_agent:
73
+ host: 'localhost' # Host address
74
+ port: 8283 # Port number
75
+ id: xxx # ID number of the Agent running on the Letta server
76
+ faster_first_response: True
77
+ # Method for segmenting sentences: 'regex' or 'pysbd'
78
+ segment_method: 'pysbd'
79
+ # Once Letta is chosen as the agent, the LLM that runs in practice is configured on Letta, so the user needs to run the Letta server themselves.
80
+ # For more detailed information, please refer to their documentation.
81
+
82
+ hume_ai_agent:
83
+ api_key: ''
84
+ host: 'api.hume.ai' # Do not change this in most cases
85
+ config_id: '' # Optional
86
+ idle_timeout: 15 # How many seconds to wait before disconnecting
87
+
88
+ # MemGPT Configurations: MemGPT is temporarily removed
89
+ ##
90
+
91
+ llm_configs:
92
+ # a configuration pool for the credentials and connection details for
93
+ # all of the stateless llm providers that will be used in different agents
94
+
95
+ # Stateless LLM with Template (For Non-ChatML LLMs, usually not needed)
96
+ stateless_llm_with_template:
97
+ base_url: 'http://localhost:8080/v1'
98
+ llm_api_key: 'somethingelse'
99
+ organization_id: null
100
+ project_id: null
101
+ model: 'qwen2.5:latest'
102
+ template: 'CHATML'
103
+ temperature: 1.0 # value between 0 to 2
104
+ interrupt_method: 'user'
105
+
106
+ # OpenAI Compatible inference backend
107
+ openai_compatible_llm:
108
+ base_url: 'http://localhost:11434/v1'
109
+ llm_api_key: 'somethingelse'
110
+ organization_id: null
111
+ project_id: null
112
+ model: 'qwen2.5:latest'
113
+ temperature: 1.0 # value between 0 to 2
114
+ interrupt_method: 'user'
115
+ # This is the method to use for prompting the interruption signal.
116
+ # If the provider supports inserting system prompt anywhere in the chat memory, use 'system'.
117
+ # Otherwise, use 'user'. You don't usually need to change this setting.
118
+
119
+ # Claude API Configuration
120
+ claude_llm:
121
+ base_url: 'https://api.anthropic.com'
122
+ llm_api_key: 'YOUR API KEY HERE'
123
+ model: 'claude-3-haiku-20240307'
124
+
125
+ llama_cpp_llm:
126
+ model_path: '<path-to-gguf-model-file>'
127
+ verbose: False
128
+
129
+ ollama_llm:
130
+ base_url: 'http://localhost:11434/v1'
131
+ model: 'qwen2.5:latest'
132
+ temperature: 1.0 # value between 0 to 2
133
+ # seconds to keep the model in memory after inactivity.
134
+ # set to -1 to keep the model in memory forever (even after exiting open llm vtuber)
135
+ keep_alive: -1
136
+ unload_at_exit: True # unload the model from memory at exit
137
+
138
+ lmstudio_llm:
139
+ base_url: 'http://localhost:1234/v1'
140
+ model: 'qwen2.5:latest'
141
+ temperature: 1.0 # value between 0 to 2
142
+
143
+ openai_llm:
144
+ llm_api_key: 'Your Open AI API key'
145
+ model: 'gpt-4o'
146
+ temperature: 1.0 # value between 0 to 2
147
+
148
+ gemini_llm:
149
+ llm_api_key: 'Your Gemini API Key'
150
+ model: 'gemini-2.0-flash-exp'
151
+ temperature: 1.0 # value between 0 to 2
152
+
153
+ zhipu_llm:
154
+ llm_api_key: 'Your ZhiPu AI API key'
155
+ model: 'glm-4-flash'
156
+ temperature: 1.0 # value between 0 to 2
157
+
158
+ deepseek_llm:
159
+ llm_api_key: 'Your DeepSeek API key'
160
+ model: 'deepseek-chat'
161
+ temperature: 0.7 # note that deepseek's temperature ranges from 0 to 1
162
+
163
+ mistral_llm:
164
+ llm_api_key: 'Your Mistral API key'
165
+ model: 'pixtral-large-latest'
166
+ temperature: 1.0 # value between 0 to 2
167
+
168
+ groq_llm:
169
+ llm_api_key: 'your groq API key'
170
+ model: 'llama-3.3-70b-versatile'
171
+ temperature: 1.0 # value between 0 to 2
172
+
173
+ # === Automatic Speech Recognition ===
174
+ asr_config:
175
+ # speech to text model options: 'faster_whisper', 'whisper_cpp', 'whisper', 'azure_asr', 'fun_asr', 'groq_whisper_asr', 'sherpa_onnx_asr'
176
+ asr_model: 'sherpa_onnx_asr'
177
+
178
+ azure_asr:
179
+ api_key: 'azure_api_key'
180
+ region: 'eastus'
181
+ languages: ['en-US', 'zh-CN'] # List of languages to detect
182
+
183
+ # Faster whisper config
184
+ faster_whisper:
185
+ model_path: 'large-v3-turbo' # model path, name, or id from hf hub
186
+ download_root: 'models/whisper'
187
+ language: 'en' # en, zh, or something else. put nothing for auto-detect.
188
+ device: 'auto' # cpu, cuda, or auto. faster-whisper doesn't support mps
189
+ compute_type: 'int8'
190
+ prompt: '' # You can put a prompt here to help the model understand the context of the audio
191
+
192
+ whisper_cpp:
193
+ # all available models are listed on https://abdeladim-s.github.io/pywhispercpp/#pywhispercpp.constants.AVAILABLE_MODELS
194
+ model_name: 'small'
195
+ model_dir: 'models/whisper'
196
+ print_realtime: False
197
+ print_progress: False
198
+ language: 'auto' # en, zh, auto,
199
+ prompt: '' # You can put a prompt here to help the model understand the context of the audio
200
+
201
+ whisper:
202
+ name: 'medium'
203
+ download_root: 'models/whisper'
204
+ device: 'cpu'
205
+ prompt: '' # You can put a prompt here to help the model understand the context of the audio
206
+
207
+ # FunASR currently needs internet connection on launch
208
+ # to download / check the models. You can disconnect the internet after initialization.
209
+ # Or you can use sherpa onnx asr or Faster-Whisper for complete offline experience
210
+ fun_asr:
211
+ model_name: 'iic/SenseVoiceSmall' # or 'paraformer-zh'
212
+ vad_model: 'fsmn-vad' # this is only used to make it works if audio is longer than 30s
213
+ punc_model: 'ct-punc' # punctuation model.
214
+ device: 'cpu'
215
+ disable_update: True # should we check FunASR updates everytime on launch
216
+ ncpu: 4 # number of threads for CPU internal operations.
217
+ hub: 'ms' # ms (default) to download models from ModelScope. Use hf to download models from Hugging Face.
218
+ use_itn: False
219
+ language: 'auto' # zh, en, auto
220
+
221
+ # pip install sherpa-onnx
222
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
223
+ # ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
224
+ sherpa_onnx_asr:
225
+ model_type: 'sense_voice' # 'transducer', 'paraformer', 'nemo_ctc', 'wenet_ctc', 'whisper', 'tdnn_ctc', 'sense_voice', 'fire_red_asr'
226
+ # Choose only ONE of the following, depending on the model_type:
227
+ # --- For model_type: 'transducer' ---
228
+ # encoder: '' # Path to the encoder model (e.g., 'path/to/encoder.onnx')
229
+ # decoder: '' # Path to the decoder model (e.g., 'path/to/decoder.onnx')
230
+ # joiner: '' # Path to the joiner model (e.g., 'path/to/joiner.onnx')
231
+ # --- For model_type: 'paraformer' ---
232
+ # paraformer: '' # Path to the paraformer model (e.g., 'path/to/model.onnx')
233
+ # --- For model_type: 'fire_red_asr' (FireredASR - High-performance Chinese & English ASR with dialect support) ---
234
+ # fire_red_asr_encoder: '' # Path to the encoder model (e.g., 'path/to/encoder.onnx')
235
+ # fire_red_asr_decoder: '' # Path to the decoder model (e.g., 'path/to/decoder.onnx')
236
+ # --- For model_type: 'nemo_ctc' ---
237
+ # nemo_ctc: '' # Path to the NeMo CTC model (e.g., 'path/to/model.onnx')
238
+ # --- For model_type: 'wenet_ctc' ---
239
+ # wenet_ctc: '' # Path to the WeNet CTC model (e.g., 'path/to/model.onnx')
240
+ # --- For model_type: 'tdnn_ctc' ---
241
+ # tdnn_model: '' # Path to the TDNN CTC model (e.g., 'path/to/model.onnx')
242
+ # --- For model_type: 'whisper' ---
243
+ # whisper_encoder: '' # Path to the Whisper encoder model (e.g., 'path/to/encoder.onnx')
244
+ # whisper_decoder: '' # Path to the Whisper decoder model (e.g., 'path/to/decoder.onnx')
245
+ # --- For model_type: 'sense_voice' ---
246
+ # I've coded so that the sense voice model will get automatically downloaded.
247
+ # For other models, you need to download them yourself
248
+ sense_voice: './models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.int8.onnx' # Path to the SenseVoice model (e.g., 'path/to/model.onnx')
249
+ tokens: './models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt' # Path to tokens.txt (required for all model types)
250
+ # --- Optional parameters (with defaults shown) ---
251
+ # hotwords_file: '' # Path to hotwords file (if using hotwords)
252
+ # hotwords_score: 1.5 # Score for hotwords
253
+ # modeling_unit: '' # Modeling unit for hotwords (if applicable)
254
+ # bpe_vocab: '' # Path to BPE vocabulary (if applicable)
255
+ num_threads: 4 # Number of threads
256
+ # whisper_language: '' # Language for Whisper models (e.g., 'en', 'zh', etc. - if using Whisper)
257
+ # whisper_task: 'transcribe' # Task for Whisper models ('transcribe' or 'translate' - if using Whisper)
258
+ # whisper_tail_paddings: -1 # Tail padding for Whisper models (if using Whisper)
259
+ # blank_penalty: 0.0 # Penalty for blank symbol
260
+ # decoding_method: 'greedy_search' # 'greedy_search' or 'modified_beam_search'
261
+ # debug: False # Enable debug mode
262
+ # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
263
+ # feature_dim: 80 # Feature dimension (should match the model's expected feature dimension)
264
+ use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
265
+ # Provider for inference (cpu or cuda) (cuda option needs additional settings. Please check our docs)
266
+ provider: 'cpu'
267
+
268
+ groq_whisper_asr:
269
+ api_key: ''
270
+ model: 'whisper-large-v3-turbo' # or 'whisper-large-v3'
271
+ lang: '' # Leave blank for auto-detect (English + all regional languages)
272
+
273
+ # =================== Text to Speech ===================
274
+ tts_config:
275
+ tts_model: 'edge_tts'
276
+ # text to speech model options:
277
+ # 'azure_tts', 'pyttsx3_tts', 'edge_tts', 'bark_tts',
278
+ # 'cosyvoice_tts', 'melo_tts', 'coqui_tts', 'piper_tts',
279
+ # 'fish_api_tts', 'x_tts', 'gpt_sovits_tts', 'sherpa_onnx_tts'
280
+ # 'minimax_tts', 'elevenlabs_tts', 'cartesia_tts'
281
+
282
+ azure_tts:
283
+ api_key: 'azure-api-key'
284
+ region: 'eastus'
285
+ voice: 'en-US-AshleyNeural'
286
+ pitch: '26' # percentage of the pitch adjustment
287
+ rate: '1' # rate of speak
288
+
289
+ bark_tts:
290
+ voice: 'v2/en_speaker_1'
291
+
292
+ edge_tts:
293
+ # Check out doc at https://github.com/rany2/edge-tts
294
+ # Use `edge-tts --list-voices` to list all available voices
295
+ # Available voices (use `edge-tts --list-voices` for full list):
296
+ # English (default): en-US-AvaMultilingualNeural | en-IN-NeerjaNeural
297
+ # Tamil: ta-IN-PallaviNeural
298
+ # Hindi: hi-IN-SwaraNeural
299
+ # Telugu: te-IN-ShrutiNeural
300
+ # Kannada: kn-IN-GaganNeural
301
+ # Malayalam: ml-IN-SobhanaNeural
302
+ # Bengali: bn-IN-TanishaaNeural
303
+ # Marathi: mr-IN-AarohiNeural
304
+ # Gujarati: gu-IN-DhwaniNeural
305
+ # Punjabi: pa-IN-OjasNeural
306
+ voice: 'en-US-AvaMultilingualNeural' # DEFAULT: English multilingual
307
+
308
+ # pyttsx3_tts doesn't have any config.
309
+
310
+ piper_tts:
311
+ model_path: 'models/piper/zh_CN-huayan-medium.onnx' # Path to the model file (.onnx)
312
+ speaker_id: 0 # Speaker ID (for multi-speaker models; keep 0 for single-speaker models)
313
+ length_scale: 1.0 # Speech speed control (0.5 = 2x faster, 1.0 = normal, 2.0 = 2x slower)
314
+ noise_scale: 0.667 # Degree of audio variation (0.0–1.0; higher = richer, more varied; recommended 0.667)
315
+ noise_w: 0.8 # Speaking style variation (0.0–1.0; higher = more expressive; recommended 0.8)
316
+ volume: 1.0 # Volume level (0.0–1.0; 1.0 = normal)
317
+ normalize_audio: true # Whether to normalize audio (recommended: true, for more consistent volume)
318
+ use_cuda: false # Whether to use GPU acceleration (requires onnxruntime-gpu)
319
+
320
+ cosyvoice_tts: # Cosy Voice TTS connects to the gradio webui
321
+ # Check their documentation for deployment and the meaning of the following configurations
322
+ client_url: 'http://127.0.0.1:50000/' # CosyVoice gradio demo webui url
323
+ mode_checkbox_group: '预训练音色'
324
+ sft_dropdown: '中文女'
325
+ prompt_text: ''
326
+ prompt_wav_upload_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
327
+ prompt_wav_record_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
328
+ instruct_text: ''
329
+ seed: 0
330
+ api_name: '/generate_audio'
331
+
332
+ cosyvoice2_tts: # Cosy Voice TTS connects to the gradio webui
333
+ # Check their documentation for deployment and the meaning of the following configurations
334
+ client_url: 'http://127.0.0.1:50000/' # CosyVoice gradio demo webui url
335
+ mode_checkbox_group: '3s极速复刻'
336
+ sft_dropdown: ''
337
+ prompt_text: ''
338
+ prompt_wav_upload_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
339
+ prompt_wav_record_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
340
+ instruct_text: ''
341
+ stream: False
342
+ seed: 0
343
+ speed: 1.0
344
+ api_name: '/generate_audio'
345
+
346
+ melo_tts:
347
+ speaker: 'EN-Default' # ZH
348
+ language: 'EN' # ZH
349
+ device: 'auto' # You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
350
+ speed: 1.0
351
+
352
+ x_tts:
353
+ api_url: 'http://127.0.0.1:8020/tts_to_audio'
354
+ speaker_wav: 'female'
355
+ language: 'en'
356
+
357
+ gpt_sovits_tts:
358
+ # put ref audio to root path of GPT-Sovits, or set the path here
359
+ api_url: 'http://127.0.0.1:9880/tts'
360
+ text_lang: 'zh'
361
+ ref_audio_path: ''
362
+ prompt_lang: 'zh'
363
+ prompt_text: ''
364
+ text_split_method: 'cut5'
365
+ batch_size: '1'
366
+ media_type: 'wav'
367
+ streaming_mode: 'false'
368
+
369
+ fish_api_tts:
370
+ # The API key for the Fish TTS API.
371
+ api_key: ''
372
+ # The reference ID for the voice to be used. Get it on the [Fish Audio website](https://fish.audio/).
373
+ reference_id: ''
374
+ # Either 'normal' or 'balanced'. balance is faster but lower quality.
375
+ latency: 'balanced'
376
+ base_url: 'https://api.fish.audio'
377
+
378
+ coqui_tts:
379
+ # Name of the TTS model to use. If empty, will use default model
380
+ # do 'tts --list_models' to list supported models for coqui-tts
381
+ # Some examples:
382
+ # - 'tts_models/en/ljspeech/tacotron2-DDC' (single speaker)
383
+ # - 'tts_models/zh-CN/baker/tacotron2-DDC-GST' (single speaker for chinese)
384
+ # - 'tts_models/multilingual/multi-dataset/your_tts' (multi-speaker)
385
+ # - 'tts_models/multilingual/multi-dataset/xtts_v2' (multi-speaker)
386
+ model_name: 'tts_models/en/ljspeech/tacotron2-DDC'
387
+ speaker_wav: ''
388
+ language: 'en'
389
+ device: ''
390
+
391
+ siliconflow_tts:
392
+ api_url: "https://api.siliconflow.cn/v1/audio/speech"
393
+ api_key: "your key"
394
+ default_model: "FunAudioLLM/CosyVoice2-0.5B"
395
+ default_voice: "speech:Dreamflowers:5bdstvc39i:xkqldnpasqmoqbakubom your voice name" # Default voice configuration in the format: "speech:MODEL_NAME:VOICE_ID:your voice name"
396
+ sample_rate: 32000 # Control the output sample rate. The default values and differ for different video output types, as follows: opus: Supports 48000 Hz. wav, pcm: Supports 8000, 16000, 24000, 32000, 44100 Hz, with a default of 44100 Hz. mp3: Supports 32000, 44100 Hz, with a default of 44100 Hz.
397
+ response_format: "mp3" # The format to audio out. Supported formats are mp3, opus, wav, pcm
398
+ stream: true
399
+ speed: 1
400
+ gain: 0
401
+
402
+ # pip install sherpa-onnx
403
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
404
+ # TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
405
+ # see config_alts for more examples
406
+ sherpa_onnx_tts:
407
+ vits_model: '/path/to/tts-models/vits-melo-tts-zh_en/model.onnx' # Path to VITS model file
408
+ vits_lexicon: '/path/to/tts-models/vits-melo-tts-zh_en/lexicon.txt' # Path to lexicon file (optional)
409
+ vits_tokens: '/path/to/tts-models/vits-melo-tts-zh_en/tokens.txt' # Path to tokens file
410
+ vits_data_dir: '' # '/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data' # Path to espeak-ng data (optional)
411
+ vits_dict_dir: '/path/to/tts-models/vits-melo-tts-zh_en/dict' # Path to Jieba dict (optional, for Chinese)
412
+ tts_rule_fsts: '/path/to/tts-models/vits-melo-tts-zh_en/number.fst,/path/to/tts-models/vits-melo-tts-zh_en/phone.fst,/path/to/tts-models/vits-melo-tts-zh_en/date.fst,/path/to/tts-models/vits-melo-tts-zh_en/new_heteronym.fst' # Path to rule FSTs file (optional)
413
+ max_num_sentences: 2 # Max sentences per batch (or -1 for all)
414
+ sid: 1 # Speaker ID (for multi-speaker models)
415
+ provider: 'cpu' # Use 'cpu', 'cuda' (GPU), or 'coreml' (Apple)
416
+ num_threads: 1 # Number of computation threads
417
+ speed: 1.0 # Speech speed (1.0 is normal)
418
+ debug: false # Enable debug mode (True/False)
419
+
420
+ spark_tts:
421
+ api_url: 'http://127.0.0.1:6006/' # API URL. Uses Gradio's built-in front-end API. Repository: https://github.com/SparkAudio/Spark-TTS
422
+ api_name: "voice_clone" # Endpoint name. Options: voice_clone, voice_creation
423
+ prompt_wav_upload: "https://uploadstatic.mihoyo.com/ys-obc/2022/11/02/16576950/4d9feb71760c5e8eb5f6c700df12fa0c_6824265537002152805.mp3" # Reference audio URL. Provide if api_name equals "voice_clone"
424
+ gender: "female" # Voice type (gender). Provide if api_name equals "voice_creation"
425
+ pitch: 3 # Pitch shift (in semitones) default 3,range 1-5. Valid only if api_name equals "voice_creation"
426
+ speed: 3 # Speed of the voice (in percent) default 3,range 1-5. Valid only if api_name equals "voice_creation"
427
+
428
+ openai_tts: # Configuration for OpenAI-compatible TTS endpoints
429
+ # These settings override the defaults in the openai_tts.py file if provided
430
+ model: 'kokoro' # Model name expected by the server (e.g., 'tts-1', 'kokoro')
431
+ voice: 'af_sky+af_bella' # Voice name(s) expected by the server (e.g., 'alloy', 'af_sky+af_bella')
432
+ api_key: 'not-needed' # API key if required by the server
433
+ base_url: 'http://localhost:8880/v1' # Base URL of the TTS server
434
+ file_extension: 'mp3' # Audio file format ('mp3' or 'wav')
435
+
436
+ # For more details, see: https://platform.minimaxi.com/document/Announcement
437
+ minimax_tts:
438
+ group_id: '' # Your minimax group_id
439
+ api_key: '' # Your minimax api_key
440
+ # Supported models: 'speech-02-hd', 'speech-02-turbo' (recommended: 'speech-02-turbo')
441
+ model: 'speech-02-turbo' # minimax model name
442
+ voice_id: 'female-shaonv' # minimax voice id, default is 'female-shaonv'
443
+ # Custom pronunciation dictionary, default empty.
444
+ # Example: '{"tone": ["测试/(ce4)(shi4)", "危险/dangerous"]}'
445
+ pronunciation_dict: ''
446
+
447
+ elevenlabs_tts:
448
+ api_key: ''
449
+ voice_id: '' # Voice ID from ElevenLabs
450
+ model_id: 'eleven_multilingual_v2' # Model ID (e.g., eleven_multilingual_v2)
451
+ output_format: 'mp3_44100_128' # Output audio format (e.g., mp3_44100_128)
452
+ stability: 0.5 # Voice stability (0.0 to 1.0)
453
+ similarity_boost: 0.5 # Voice similarity boost (0.0 to 1.0)
454
+ style: 0.0 # Voice style exaggeration (0.0 to 1.0)
455
+ use_speaker_boost: true # Enable speaker boost for better quality
456
+
457
+ cartesia_tts:
458
+ api_key: ''
459
+ voice_id: '' # Voice ID from Cartesia
460
+ model_id: 'sonic-3' # Model ID (e.g., sonic-3)
461
+ output_format: 'wav' # Output audio format (e.g., wav)
462
+ language: 'en' # Output language of voice (e.g., en)
463
+ emotion: 'neutral' # Emotional guidance
464
+ volume: 1.0 # Voice volume (0.5 to 2.0)
465
+ speed: 1.0 # Voice speed (0.6 to 1.5)
466
+
467
+ # =================== Voice Activity Detection ===================
468
+ vad_config:
469
+ vad_model: null
470
+
471
+ silero_vad:
472
+ orig_sr: 16000 # Original Audio Sample Rate
473
+ target_sr: 16000 # Target Audio Sample Rate
474
+ prob_threshold: 0.4 # Probability Threshold for VAD
475
+ db_threshold: 60 # Decibel Threshold for VAD
476
+ required_hits: 3 # Number of consecutive hits required to consider speech
477
+ required_misses: 24 # Number of consecutive misses required to consider silence
478
+ smoothing_window: 5 # Smoothing window size for VAD
479
+
480
+ tts_preprocessor_config:
481
+ # settings regarding preprocessing for text that goes into TTS
482
+
483
+ remove_special_char: True # remove special characters like emoji from audio generation
484
+ ignore_brackets: True # ignore everything inside brackets
485
+ ignore_parentheses: True # ignore everything inside parentheses
486
+ ignore_asterisks: True # ignore everything wrapped inside asterisks
487
+ ignore_angle_brackets: True # ignore everything wrapped inside <text>
488
+
489
+ translator_config:
490
+ # Like... you speak and read the subtitles in English, and the TTS speaks Japanese or that kind of things
491
+ translate_audio: False # Warning: you need to deploy DeeplX to use this. Otherwise it's going to crash
492
+ translate_provider: 'deeplx' # deeplx or tencent
493
+
494
+ deeplx:
495
+ deeplx_target_lang: 'JA'
496
+ deeplx_api_endpoint: 'http://localhost:1188/v2/translate'
497
+
498
+ # Tencent Text Translation 5 million characters per month Remember to turn off post-payment, need to manually go to Machine Translation Console > System Settings to disable
499
+ # https://cloud.tencent.com/document/product/551/35017
500
+ # https://console.cloud.tencent.com/cam/capi
501
+ tencent:
502
+ secret_id: ''
503
+ secret_key: ''
504
+ region: 'ap-guangzhou'
505
+ source_lang: 'zh'
506
+ target_lang: 'ja'
507
+
508
+ # Live Streaming Integration
509
+ live_config:
510
+ bilibili_live:
511
+ # List of BiliBili live room IDs to monitor
512
+ room_ids: [1991478060]
513
+ # SESSDATA cookie value (optional, for authenticated requests)
514
+ sessdata: ""
doc/README.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ For full documentation, please visit our [documentation site](https://open-llm-vtuber.github.io/) or view the [source repository](https://github.com/Open-LLM-VTuber/open-llm-vtuber.github.io).
2
+
3
+ > **Note:**
4
+ > The `sample_conf` directory contains legacy sample configuration files for running various models with sherpa-onnx. These files are deprecated and will be removed after we extract the relevant sherpa-onnx information.
doc/sample_conf/sherpaASRTTS_sense_voice_melo.yaml ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_CONFIG:
2
+ CONF_NAME: "sherpaASRTTS_sense_voice_melo"
3
+ CONF_UID: "sherpaASRTTS_sense_voice_melo"
4
+
5
+ # ============== Voice Interaction Settings ==============
6
+
7
+ # === Automatic Speech Recognition ===
8
+ VOICE_INPUT_ON: True
9
+ # Put your mic in the browser or in the terminal? (would increase latency)
10
+ MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
11
+
12
+ # speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
13
+ ASR_MODEL: "SherpaOnnxASR"
14
+
15
+ # pip install sherpa-onnx
16
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
17
+ # ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
18
+ SherpaOnnxASR:
19
+ model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
20
+ # Choose only ONE of the following, depending on the model_type:
21
+ # --- For model_type: "transducer" ---
22
+ # encoder: "" # Path to the encoder model (e.g., "path/to/encoder.onnx")
23
+ # decoder: "" # Path to the decoder model (e.g., "path/to/decoder.onnx")
24
+ # joiner: "" # Path to the joiner model (e.g., "path/to/joiner.onnx")
25
+ # --- For model_type: "paraformer" ---
26
+ # paraformer: "" # Path to the paraformer model (e.g., "path/to/model.onnx")
27
+ # --- For model_type: "nemo_ctc" ---
28
+ # nemo_ctc: "" # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
29
+ # --- For model_type: "wenet_ctc" ---
30
+ # wenet_ctc: "" # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
31
+ # --- For model_type: "tdnn_ctc" ---
32
+ # tdnn_model: "" # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
33
+ # --- For model_type: "whisper" ---
34
+ # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
35
+ # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
36
+ # --- For model_type: "sense_voice" ---
37
+ sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
38
+ tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
39
+ # --- Optional parameters (with defaults shown) ---
40
+ # hotwords_file: "" # Path to hotwords file (if using hotwords)
41
+ # hotwords_score: 1.5 # Score for hotwords
42
+ # modeling_unit: "" # Modeling unit for hotwords (if applicable)
43
+ # bpe_vocab: "" # Path to BPE vocabulary (if applicable)
44
+ num_threads: 4 # Number of threads
45
+ # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
46
+ # whisper_task: "transcribe" # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
47
+ # whisper_tail_paddings: -1 # Tail padding for Whisper models (if using Whisper)
48
+ # blank_penalty: 0.0 # Penalty for blank symbol
49
+ # decoding_method: "greedy_search" # "greedy_search" or "modified_beam_search"
50
+ # debug: False # Enable debug mode
51
+ # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
52
+ # feature_dim: 80 # Feature dimension (should match the model's expected feature dimension)
53
+ use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
54
+
55
+ # ============== Text to Speech ==============
56
+ TTS_MODEL: "SherpaOnnxTTS"
57
+ # text to speech model options:
58
+ # "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
59
+ # "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
60
+ # "fishAPITTS", "SherpaOnnxTTS"
61
+
62
+
63
+ # pip install sherpa-onnx
64
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
65
+ # TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
66
+ SherpaOnnxTTS:
67
+ vits_model: "/path/to/tts-models/vits-melo-tts-zh_en/model.onnx" # Path to VITS model file
68
+ vits_lexicon: "/path/to/tts-models/vits-melo-tts-zh_en/lexicon.txt" # Path to lexicon file (optional)
69
+ vits_tokens: "/path/to/tts-models/vits-melo-tts-zh_en/tokens.txt" # Path to tokens file
70
+ vits_data_dir: "" # "/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data" # Path to espeak-ng data (optional)
71
+ vits_dict_dir: "/path/to/tts-models/vits-melo-tts-zh_en/dict" # Path to Jieba dict (optional, for Chinese)
72
+ tts_rule_fsts: "/path/to/tts-models/vits-melo-tts-zh_en/number.fst,/path/to/tts-models/vits-melo-tts-zh_en/phone.fst,/path/to/tts-models/vits-melo-tts-zh_en/date.fst,/path/to/tts-models/vits-melo-tts-zh_en/new_heteronym.fst" # Path to rule FSTs file (optional)
73
+ max_num_sentences: 2 # Max sentences per batch (or -1 for all)
74
+ sid: 1 # Speaker ID (for multi-speaker models)
75
+ provider: "cpu" # Use "cpu", "cuda" (GPU), or "coreml" (Apple)
76
+ num_threads: 1 # Number of computation threads
77
+ speed: 1.0 # Speech speed (1.0 is normal)
78
+ debug: false # Enable debug mode (True/False)
doc/sample_conf/sherpaASRTTS_sense_voice_piper_en.yaml ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_CONFIG:
2
+ CONF_NAME: "sherpaASRTTS_sense_voice_piper_en"
3
+ CONF_UID: "sherpaASRTTS_sense_voice_piper_en"
4
+
5
+ # ============== Voice Interaction Settings ==============
6
+
7
+ # === Automatic Speech Recognition ===
8
+ VOICE_INPUT_ON: True
9
+ # Put your mic in the browser or in the terminal? (would increase latency)
10
+ MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
11
+
12
+ # speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
13
+ ASR_MODEL: "SherpaOnnxASR"
14
+
15
+ # pip install sherpa-onnx
16
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
17
+ # ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
18
+ SherpaOnnxASR:
19
+ model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
20
+ # Choose only ONE of the following, depending on the model_type:
21
+ # --- For model_type: "transducer" ---
22
+ # encoder: "" # Path to the encoder model (e.g., "path/to/encoder.onnx")
23
+ # decoder: "" # Path to the decoder model (e.g., "path/to/decoder.onnx")
24
+ # joiner: "" # Path to the joiner model (e.g., "path/to/joiner.onnx")
25
+ # --- For model_type: "paraformer" ---
26
+ # paraformer: "" # Path to the paraformer model (e.g., "path/to/model.onnx")
27
+ # --- For model_type: "nemo_ctc" ---
28
+ # nemo_ctc: "" # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
29
+ # --- For model_type: "wenet_ctc" ---
30
+ # wenet_ctc: "" # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
31
+ # --- For model_type: "tdnn_ctc" ---
32
+ # tdnn_model: "" # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
33
+ # --- For model_type: "whisper" ---
34
+ # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
35
+ # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
36
+ # --- For model_type: "sense_voice" ---
37
+ sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
38
+ tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
39
+ # --- Optional parameters (with defaults shown) ---
40
+ # hotwords_file: "" # Path to hotwords file (if using hotwords)
41
+ # hotwords_score: 1.5 # Score for hotwords
42
+ # modeling_unit: "" # Modeling unit for hotwords (if applicable)
43
+ # bpe_vocab: "" # Path to BPE vocabulary (if applicable)
44
+ num_threads: 4 # Number of threads
45
+ # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
46
+ # whisper_task: "transcribe" # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
47
+ # whisper_tail_paddings: -1 # Tail padding for Whisper models (if using Whisper)
48
+ # blank_penalty: 0.0 # Penalty for blank symbol
49
+ # decoding_method: "greedy_search" # "greedy_search" or "modified_beam_search"
50
+ # debug: False # Enable debug mode
51
+ # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
52
+ # feature_dim: 80 # Feature dimension (should match the model's expected feature dimension)
53
+ use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
54
+
55
+ # ============== Text to Speech ==============
56
+ TTS_MODEL: "SherpaOnnxTTS"
57
+ # text to speech model options:
58
+ # "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
59
+ # "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
60
+ # "fishAPITTS", "SherpaOnnxTTS"
61
+
62
+ # pip install sherpa-onnx
63
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
64
+ # TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
65
+ SherpaOnnxTTS:
66
+ vits_model: "/path/to/tts-models/vits-piper-en_GB-cori-high/en_GB-cori-high.onnx" # Path to VITS model file
67
+ vits_lexicon: "" # Path to lexicon file (optional)
68
+ vits_tokens: "/path/to/tts-models/vits-piper-en_GB-cori-high/tokens.txt" # Path to tokens file
69
+ vits_data_dir: "/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data" # Path to espeak-ng data (optional)
70
+ vits_dict_dir: "" # Path to Jieba dict (optional, for Chinese)
71
+ tts_rule_fsts: "" # Path to rule FSTs file (optional)
72
+ max_num_sentences: 2 # Max sentences per batch (or -1 for all)
73
+ sid: 0 # Speaker ID (for multi-speaker models)
74
+ provider: "cpu" # Use "cpu", "cuda" (GPU), or "coreml" (Apple)
75
+ num_threads: 1 # Number of computation threads
76
+ speed: 1.0 # Speech speed (1.0 is normal)
77
+ debug: false # Enable debug mode (True/False)
doc/sample_conf/sherpaASRTTS_sense_voice_vits_zh.yaml ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_CONFIG:
2
+ CONF_NAME: "sherpaASRTTS_sense_voice_vits_zh"
3
+ CONF_UID: "sherpaASRTTS_sense_voice_vits_zh"
4
+
5
+ # ============== Voice Interaction Settings ==============
6
+
7
+ # === Automatic Speech Recognition ===
8
+ VOICE_INPUT_ON: True
9
+ # Put your mic in the browser or in the terminal? (would increase latency)
10
+ MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
11
+
12
+ # speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
13
+ ASR_MODEL: "SherpaOnnxASR"
14
+
15
+ # pip install sherpa-onnx
16
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
17
+ # ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
18
+ SherpaOnnxASR:
19
+ model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
20
+ # Choose only ONE of the following, depending on the model_type:
21
+ # --- For model_type: "transducer" ---
22
+ # encoder: "" # Path to the encoder model (e.g., "path/to/encoder.onnx")
23
+ # decoder: "" # Path to the decoder model (e.g., "path/to/decoder.onnx")
24
+ # joiner: "" # Path to the joiner model (e.g., "path/to/joiner.onnx")
25
+ # --- For model_type: "paraformer" ---
26
+ # paraformer: "" # Path to the paraformer model (e.g., "path/to/model.onnx")
27
+ # --- For model_type: "nemo_ctc" ---
28
+ # nemo_ctc: "" # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
29
+ # --- For model_type: "wenet_ctc" ---
30
+ # wenet_ctc: "" # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
31
+ # --- For model_type: "tdnn_ctc" ---
32
+ # tdnn_model: "" # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
33
+ # --- For model_type: "whisper" ---
34
+ # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
35
+ # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
36
+ # --- For model_type: "sense_voice" ---
37
+ sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
38
+ tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
39
+ # --- Optional parameters (with defaults shown) ---
40
+ # hotwords_file: "" # Path to hotwords file (if using hotwords)
41
+ # hotwords_score: 1.5 # Score for hotwords
42
+ # modeling_unit: "" # Modeling unit for hotwords (if applicable)
43
+ # bpe_vocab: "" # Path to BPE vocabulary (if applicable)
44
+ num_threads: 4 # Number of threads
45
+ # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
46
+ # whisper_task: "transcribe" # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
47
+ # whisper_tail_paddings: -1 # Tail padding for Whisper models (if using Whisper)
48
+ # blank_penalty: 0.0 # Penalty for blank symbol
49
+ # decoding_method: "greedy_search" # "greedy_search" or "modified_beam_search"
50
+ # debug: False # Enable debug mode
51
+ # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
52
+ # feature_dim: 80 # Feature dimension (should match the model's expected feature dimension)
53
+ use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
54
+
55
+ # ============== Text to Speech ==============
56
+ TTS_MODEL: "SherpaOnnxTTS"
57
+ # text to speech model options:
58
+ # "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
59
+ # "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
60
+ # "fishAPITTS", "SherpaOnnxTTS"
61
+
62
+ # pip install sherpa-onnx
63
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
64
+ # TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
65
+ SherpaOnnxTTS:
66
+ vits_model: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/model.onnx" # Path to VITS model file
67
+ vits_lexicon: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/lexicon.txt" # Path to lexicon file (optional)
68
+ vits_tokens: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/tokens.txt" # Path to tokens file
69
+ vits_data_dir: "" # "/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data" # Path to espeak-ng data (optional)
70
+ vits_dict_dir: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/dict" # Path to Jieba dict (optional, for Chinese)
71
+ tts_rule_fsts: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/number.fst,/path/to/tts-models/sherpa-onnx-vits-zh-ll/phone.fst,/path/to/tts-models/sherpa-onnx-vits-zh-ll/date.fst" # Path to rule FSTs file (optional)
72
+ max_num_sentences: 2 # Max sentences per batch (or -1 for all)
73
+ sid: 0 # Speaker ID (for multi-speaker models) 0-4
74
+ provider: "cpu" # Use "cpu", "cuda" (GPU), or "coreml" (Apple)
75
+ num_threads: 1 # Number of computation threads
76
+ speed: 1.0 # Speech speed (1.0 is normal)
77
+ debug: false # Enable debug mode (True/False)
doc/sample_conf/sherpaASR_paraformer.yaml ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_CONFIG:
2
+ CONF_NAME: "sherpaASR_paraformer"
3
+ CONF_UID: "sherpaASR_paraformer"
4
+
5
+ # ============== Voice Interaction Settings ==============
6
+
7
+ # === Automatic Speech Recognition ===
8
+ VOICE_INPUT_ON: True
9
+ # Put your mic in the browser or in the terminal? (would increase latency)
10
+ MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
11
+
12
+ # speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
13
+ ASR_MODEL: "SherpaOnnxASR"
14
+
15
+ # pip install sherpa-onnx
16
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
17
+ # ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
18
+ SherpaOnnxASR:
19
+ model_type: "paraformer" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
20
+ # Choose only ONE of the following, depending on the model_type:
21
+ # --- For model_type: "transducer" ---
22
+ # encoder: "" # Path to the encoder model (e.g., "path/to/encoder.onnx")
23
+ # decoder: "" # Path to the decoder model (e.g., "path/to/decoder.onnx")
24
+ # joiner: "" # Path to the joiner model (e.g., "path/to/joiner.onnx")
25
+ # --- For model_type: "paraformer" ---
26
+ paraformer: "/path/to/asr-models/sherpa-onnx-paraformer-zh-2024-03-09/model.onnx" # Path to the paraformer model (e.g., "path/to/model.onnx")
27
+ # --- For model_type: "nemo_ctc" ---
28
+ # nemo_ctc: "" # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
29
+ # --- For model_type: "wenet_ctc" ---
30
+ # wenet_ctc: "" # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
31
+ # --- For model_type: "tdnn_ctc" ---
32
+ # tdnn_model: "" # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
33
+ # --- For model_type: "whisper" ---
34
+ # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
35
+ # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
36
+ # --- For model_type: "sense_voice" ---
37
+ # sense_voice: "/path/to/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
38
+ tokens: "/path/to/asr-models/sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt" # Path to tokens.txt (required for all model types)
39
+ # --- Optional parameters (with defaults shown) ---
40
+ # hotwords_file: "" # Path to hotwords file (if using hotwords)
41
+ # hotwords_score: 1.5 # Score for hotwords
42
+ # modeling_unit: "" # Modeling unit for hotwords (if applicable)
43
+ # bpe_vocab: "" # Path to BPE vocabulary (if applicable)
44
+ num_threads: 2 # Number of threads
45
+ # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
46
+ # whisper_task: "transcribe" # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
47
+ # whisper_tail_paddings: -1 # Tail padding for Whisper models (if using Whisper)
48
+ # blank_penalty: 0.0 # Penalty for blank symbol
49
+ # decoding_method: "greedy_search" # "greedy_search" or "modified_beam_search"
50
+ # debug: False # Enable debug mode
51
+ # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
52
+ # feature_dim: 80 # Feature dimension (should match the model's expected feature dimension)
53
+ # use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
54
+
55
+ # ============== Text to Speech ==============
56
+ TTS_MODEL: "edgeTTS"
57
+ # text to speech model options:
58
+ # "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
59
+ # "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
60
+ # "fishAPITTS", "SherpaOnnxTTS"
61
+
62
+ edgeTTS:
63
+ # Check out doc at https://github.com/rany2/edge-tts
64
+ # Use `edge-tts --list-voices` to list all available voices
65
+ voice: "en-US-AvaMultilingualNeural" #"zh-CN-XiaoxiaoNeural" # "ja-JP-NanamiNeural"
doc/sample_conf/sherpaASR_sense_voice.yaml ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SYSTEM_CONFIG:
2
+ CONF_NAME: "sherpaASR_sense_voice"
3
+ CONF_UID: "sherpaASR_sense_voice"
4
+
5
+ # ============== Voice Interaction Settings ==============
6
+
7
+ # === Automatic Speech Recognition ===
8
+ VOICE_INPUT_ON: True
9
+ # Put your mic in the browser or in the terminal? (would increase latency)
10
+ MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
11
+
12
+ # speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
13
+ ASR_MODEL: "SherpaOnnxASR"
14
+
15
+ # pip install sherpa-onnx
16
+ # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
17
+ # ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
18
+ SherpaOnnxASR:
19
+ model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
20
+ # Choose only ONE of the following, depending on the model_type:
21
+ # --- For model_type: "transducer" ---
22
+ # encoder: "" # Path to the encoder model (e.g., "path/to/encoder.onnx")
23
+ # decoder: "" # Path to the decoder model (e.g., "path/to/decoder.onnx")
24
+ # joiner: "" # Path to the joiner model (e.g., "path/to/joiner.onnx")
25
+ # --- For model_type: "paraformer" ---
26
+ # paraformer: "" # Path to the paraformer model (e.g., "path/to/model.onnx")
27
+ # --- For model_type: "nemo_ctc" ---
28
+ # nemo_ctc: "" # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
29
+ # --- For model_type: "wenet_ctc" ---
30
+ # wenet_ctc: "" # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
31
+ # --- For model_type: "tdnn_ctc" ---
32
+ # tdnn_model: "" # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
33
+ # --- For model_type: "whisper" ---
34
+ # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
35
+ # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
36
+ # --- For model_type: "sense_voice" ---
37
+ sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
38
+ tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
39
+ # --- Optional parameters (with defaults shown) ---
40
+ # hotwords_file: "" # Path to hotwords file (if using hotwords)
41
+ # hotwords_score: 1.5 # Score for hotwords
42
+ # modeling_unit: "" # Modeling unit for hotwords (if applicable)
43
+ # bpe_vocab: "" # Path to BPE vocabulary (if applicable)
44
+ num_threads: 2 # Number of threads
45
+ # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
46
+ # whisper_task: "transcribe" # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
47
+ # whisper_tail_paddings: -1 # Tail padding for Whisper models (if using Whisper)
48
+ # blank_penalty: 0.0 # Penalty for blank symbol
49
+ # decoding_method: "greedy_search" # "greedy_search" or "modified_beam_search"
50
+ # debug: False # Enable debug mode
51
+ # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
52
+ # feature_dim: 80 # Feature dimension (should match the model's expected feature dimension)
53
+ use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
54
+
55
+ # ============== Text to Speech ==============
56
+ TTS_MODEL: "edgeTTS"
57
+ # text to speech model options:
58
+ # "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
59
+ # "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
60
+ # "fishAPITTS"
61
+
62
+
63
+ edgeTTS:
64
+ # Check out doc at https://github.com/rany2/edge-tts
65
+ # Use `edge-tts --list-voices` to list all available voices
66
+ voice: "en-US-AvaMultilingualNeural" #"zh-CN-XiaoxiaoNeural" # "ja-JP-NanamiNeural"
67
+