Spaces:

britto224
/

test_ui

Sleeping

App Files Files Community

britto224 commited on Apr 20

Commit

99ebcc3

verified ·

1 Parent(s): a8f1f77

Upload 29 files

Browse files

Files changed (30) hide show

.cursor/rules/olv-core-rules.mdc +111 -0
.gemini/GEMINI.md +106 -0
.gemini/styleguide.md +165 -0
.gitattributes +4 -0
.github/FUNDING.yml +14 -0
.github/ISSUE_TEMPLATE/bug---question---get-help---bug---#U63d0#U95ee---#U6c42#U52a9.md +79 -0
.github/ISSUE_TEMPLATE/feature-request---#U529f#U80fd#U5efa#U8bae.md +44 -0
.github/copilot-instructions.md +106 -0
.github/workflows/codeql.yml +92 -0
.github/workflows/create_release.yml +238 -0
.github/workflows/docker-blacksmith.yml +207 -0
.github/workflows/fossa_scan.yml +16 -0
.github/workflows/ruff.yml +8 -0
.github/workflows/update-requirements.yml +30 -0
avatars/.gitkeep +0 -0
avatars/Yue_001.png +3 -0
avatars/mao.png +3 -0
backgrounds/.gitkeep +0 -0
backgrounds/ceiling-computer-room-night.jpg +3 -0
backgrounds/ceiling-computer-store-night.jpeg +0 -0
backgrounds/ceiling-window-room-night.jpeg +3 -0
characters/en_Lord Yue.yaml +29 -0
config_templates/README.md +7 -0
config_templates/conf.default.yaml +514 -0
doc/README.md +4 -0
doc/sample_conf/sherpaASRTTS_sense_voice_melo.yaml +78 -0
doc/sample_conf/sherpaASRTTS_sense_voice_piper_en.yaml +77 -0
doc/sample_conf/sherpaASRTTS_sense_voice_vits_zh.yaml +77 -0
doc/sample_conf/sherpaASR_paraformer.yaml +65 -0
doc/sample_conf/sherpaASR_sense_voice.yaml +67 -0

.cursor/rules/olv-core-rules.mdc ADDED Viewed

	@@ -0,0 +1,111 @@

+---
+alwaysApply: true
+---
+# Open-LLM-VTuber AI Coding Assistant: Context & Guidelines
+`version: 2025.08.05-1`
+## 1. Core Project Context
+  - **Project:** Open-LLM-VTuber, a low-latency voice-based LLM interaction tool.
+  - **Language:** Python >= 3.10
+  - **Core Tech Stack:**
+      - **Backend:** FastAPI, Pydantic v2, Uvicorn, fully async
+      - **Real-time Communication:** WebSockets
+      - **Package Management:** `uv` (version ~= 0.8, as of 2025 August) (always use `uv run`, `uv sync`, `uv add`, `uv remove` to do stuff instead of `pip`)
+  - **Primary Goal:** Achieve end-to-end latency below 500ms (user speaks -> AI voice heard). Performance is critical.
+  - **Key Principles:**
+      - **Offline-Ready:** Core functionality MUST work without an internet connection.
+      - **Separation of Concerns:** Strict frontend-backend separation.
+      - **Clean code:** Clean, testable, maintainable code, follows best practices of python 3.10+ and does not write deprecated code.
+Some key files and directories:
+```
+doc/                 # A deprecated directory
+frontend/            # Compiled web frontend artifacts (from git submodule)
+config_templates/
+    conf.default.yaml    # Configuration template for English users
+    conf.ZH.default.yaml # Configuration template for Chinese users
+src/open_llm_vtuber/ # Project source code
+    config_manager/
+        main.py      # Pydantic models for configuration validation
+run_server.py        # Entrypoint to start the application
+conf.yaml            # User's configuration file, generated from a template
+```
+### 1.1. Repository Structure
+- Frontend Repository: The frontend is a React application developed in a separate repository: `Open-LLM-VTuber-Web`. Its built artifacts are integrated into the `frontend/` directory of this backend repository via a git submodule.
+- Documentation Repository: The official documentation site is hosted in the `open-llm-vtuber.github.io` repository. When asked to generate documentation, create Markdown files in the project root. The user will be responsible for migrating them to the documentation site.
+### 1.2. Configuration Files
+- Configuration templates are located in the `config_templates/` directory:
+- `conf.default.yaml`: Template for English-speaking users.
+- `conf.ZH.default.yaml`: Template for Chinese-speaking users.
+- When modifying the configuration structure, both template files MUST be updated accordingly.
+- Configuration is validated on load using the Pydantic models defined in `src/open_llm_vtuber/config_manager/main.py`. Any changes to configuration options must be reflected in these models.
+## 2. Overarching Coding Philosophy
+  - **Simplicity and Readability:** Write code that is simple, clear, and easy to understand. Avoid unnecessary complexity or premature optimization. Follow the Zen of Python.
+  - **Single Responsibility:** Each function, class, and module should do one thing and do it well.
+  - **Performance-Aware:** Be mindful of performance. Avoid blocking operations in async contexts. Use efficient data structures and algorithms where it matters.
+  - **Adherence to Best Practices**: Write clean, testable, and robust code that follows modern Python 3.10+ idioms. Adhere to the best practices of our core libraries (FastAPI, Pydantic v2).
+## 3. Detailed Coding Standards
+### 3.1. Formatting & Linting (Ruff)
+  - All Python code **MUST** be formatted with `uv run ruff format`.
+  - All Python code **MUST** pass `uv run ruff check` without errors.
+  - Import statements should be grouped by standard library, third-party, and local modules and sorted alphabetically (PEP 8).
+### 3.2. Naming Conventions (PEP 8)
+  - Use `snake_case` for all variables, functions, methods, and module names.
+  - Use `PascalCase` for class names.
+  - Choose descriptive names. Avoid single-letter names except for loop counters or well-known initialisms.
+### 3.3. Type Hints (CRITICAL)
+  - Target Python 3.10+. Use modern type hint syntax.
+  - **DO:** Use `|` for unions (e.g., `str | None`).
+  - **DON'T:** Use `Optional` from `typing` (e.g., `Optional[str]`).
+  - **DO:** Use built-in generics (e.g., `list[int]`, `dict[str, float]`).
+  - **DON'T:** Use capitalized types from `typing` (e.g., `List[int]`, `Dict[str, float]`).
+  - All function and method signatures (arguments and return values) **MUST** have accurate type hints. If third party libraries made it impossible to fix type errors, suppress the type checker.
+### 3.4. Docstrings & Comments (CRITICAL)
+  - All public modules, functions, classes, and methods **MUST** have a docstring in English.
+  - Use the **Google Python Style** for docstrings.
+  - Docstrings **MUST** include:
+    1.  Summary.
+    2.  `Args:` section describing each parameter, its type, and its purpose.
+    3.  `Returns:` section describing the return value, its type, and its meaning.
+    4.  (Optional but encouraged) `Raises:` section for any exceptions thrown.
+  - All other code comments must also be in English.
+### 3.5. Logging
+  - Use the `loguru` module for all informational or error output.
+  - Log messages should be in English, clear, and informative. Use emoji when appropriate.
+## 4. Architectural Principles
+### 4.1. Dependency Management
+  - First, try to solve the problem using the Python standard library or existing project dependencies defined in `pyproject.toml`.
+  - If a new dependency is required, it must have a compatible license and be well-maintained.
+  - Use `uv add`, `uv remove`, `uv run` instead of pip to manage dependencies. If user uses conda, install uv with pip then.
+  - After adding a new dependency, in addition to `pyproject.toml`, you must add the dependency to `requirements.txt` as well.
+### 4.2. Cross-Platform Compatibility
+  - All core logic **MUST** run on macOS, Windows, and Linux.
+  - If a feature is platform-specific (e.g., uses a Windows-only API) or hardware-specific (e.g., CUDA), it **MUST** be an optional component. The application should start and run core features even if that component is not available. Use graceful fallbacks or clear error messages.

.gemini/GEMINI.md ADDED Viewed

	@@ -0,0 +1,106 @@

+# Open-LLM-VTuber AI Coding Assistant: Context & Guidelines
+`version: 2025.08.05-1`
+## 1. Core Project Context
+  - **Project:** Open-LLM-VTuber, a low-latency voice-based LLM interaction tool.
+  - **Language:** Python >= 3.10
+  - **Core Tech Stack:**
+      - **Backend:** FastAPI, Pydantic v2, Uvicorn, fully async
+      - **Real-time Communication:** WebSockets
+      - **Package Management:** `uv` (version ~= 0.8, as of 2025 August) (always use `uv run`, `uv sync`, `uv add`, `uv remove` to do stuff instead of `pip`)
+  - **Primary Goal:** Achieve end-to-end latency below 500ms (user speaks -> AI voice heard). Performance is critical.
+  - **Key Principles:**
+      - **Offline-Ready:** Core functionality MUST work without an internet connection.
+      - **Separation of Concerns:** Strict frontend-backend separation.
+      - **Clean code:** Clean, testable, maintainable code, follows best practices of python 3.10+ and does not write deprecated code.
+Some key files and directories:
+```
+doc/                 # A deprecated directory
+frontend/            # Compiled web frontend artifacts (from git submodule)
+config_templates/
+    conf.default.yaml    # Configuration template for English users
+    conf.ZH.default.yaml # Configuration template for Chinese users
+src/open_llm_vtuber/ # Project source code
+    config_manager/
+        main.py      # Pydantic models for configuration validation
+run_server.py        # Entrypoint to start the application
+conf.yaml            # User's configuration file, generated from a template
+```
+### 1.1. Repository Structure
+- Frontend Repository: The frontend is a React application developed in a separate repository: `Open-LLM-VTuber-Web`. Its built artifacts are integrated into the `frontend/` directory of this backend repository via a git submodule.
+- Documentation Repository: The official documentation site is hosted in the `open-llm-vtuber.github.io` repository. When asked to generate documentation, create Markdown files in the project root. The user will be responsible for migrating them to the documentation site.
+### 1.2. Configuration Files
+- Configuration templates are located in the `config_templates/` directory:
+- `conf.default.yaml`: Template for English-speaking users.
+- `conf.ZH.default.yaml`: Template for Chinese-speaking users.
+- When modifying the configuration structure, both template files MUST be updated accordingly.
+- Configuration is validated on load using the Pydantic models defined in `src/open_llm_vtuber/config_manager/main.py`. Any changes to configuration options must be reflected in these models.
+## 2. Overarching Coding Philosophy
+  - **Simplicity and Readability:** Write code that is simple, clear, and easy to understand. Avoid unnecessary complexity or premature optimization. Follow the Zen of Python.
+  - **Single Responsibility:** Each function, class, and module should do one thing and do it well.
+  - **Performance-Aware:** Be mindful of performance. Avoid blocking operations in async contexts. Use efficient data structures and algorithms where it matters.
+  - **Adherence to Best Practices**: Write clean, testable, and robust code that follows modern Python 3.10+ idioms. Adhere to the best practices of our core libraries (FastAPI, Pydantic v2).
+## 3. Detailed Coding Standards
+### 3.1. Formatting & Linting (Ruff)
+  - All Python code **MUST** be formatted with `uv run ruff format`.
+  - All Python code **MUST** pass `uv run ruff check` without errors.
+  - Import statements should be grouped by standard library, third-party, and local modules and sorted alphabetically (PEP 8).
+### 3.2. Naming Conventions (PEP 8)
+  - Use `snake_case` for all variables, functions, methods, and module names.
+  - Use `PascalCase` for class names.
+  - Choose descriptive names. Avoid single-letter names except for loop counters or well-known initialisms.
+### 3.3. Type Hints (CRITICAL)
+  - Target Python 3.10+. Use modern type hint syntax.
+  - **DO:** Use `|` for unions (e.g., `str | None`).
+  - **DON'T:** Use `Optional` from `typing` (e.g., `Optional[str]`).
+  - **DO:** Use built-in generics (e.g., `list[int]`, `dict[str, float]`).
+  - **DON'T:** Use capitalized types from `typing` (e.g., `List[int]`, `Dict[str, float]`).
+  - All function and method signatures (arguments and return values) **MUST** have accurate type hints. If third party libraries made it impossible to fix type errors, suppress the type checker.
+### 3.4. Docstrings & Comments (CRITICAL)
+  - All public modules, functions, classes, and methods **MUST** have a docstring in English.
+  - Use the **Google Python Style** for docstrings.
+  - Docstrings **MUST** include:
+    1.  Summary.
+    2.  `Args:` section describing each parameter, its type, and its purpose.
+    3.  `Returns:` section describing the return value, its type, and its meaning.
+    4.  (Optional but encouraged) `Raises:` section for any exceptions thrown.
+  - All other code comments must also be in English.
+### 3.5. Logging
+  - Use the `loguru` module for all informational or error output.
+  - Log messages should be in English, clear, and informative. Use emoji when appropriate.
+## 4. Architectural Principles
+### 4.1. Dependency Management
+  - First, try to solve the problem using the Python standard library or existing project dependencies defined in `pyproject.toml`.
+  - If a new dependency is required, it must have a compatible license and be well-maintained.
+  - Use `uv add`, `uv remove`, `uv run` instead of pip to manage dependencies. If user uses conda, install uv with pip then.
+  - After adding a new dependency, in addition to `pyproject.toml`, you must add the dependency to `requirements.txt` as well.
+### 4.2. Cross-Platform Compatibility
+  - All core logic **MUST** run on macOS, Windows, and Linux.
+  - If a feature is platform-specific (e.g., uses a Windows-only API) or hardware-specific (e.g., CUDA), it **MUST** be an optional component. The application should start and run core features even if that component is not available. Use graceful fallbacks or clear error messages.

.gemini/styleguide.md ADDED Viewed

	@@ -0,0 +1,165 @@

+version: 2025.08.04-1-en
+# Pull Request Guide & Checklist
+Welcome, and thank you for choosing to contribute to the Open-LLM-VTuber project! We are deeply grateful for the effort of every contributor.
+This guide is designed to help all contributors, maintainers, and even LLMs collaborate effectively, ensuring the project's high quality, maintainability, and long-term health. Please refer to this guide both when submitting a Pull Request (PR) and when reviewing PRs from others.
+We believe that clear standards and processes are not only the cornerstone of project maintenance but also an excellent opportunity for us to learn and grow together.
+⚠️ The coding standards mentioned below apply primarily to new code submissions. Some legacy code may not currently pass all type checks. We are working to fix this incrementally, but it will take time. When encountering type errors reported by the type checker, please focus only on the parts of the code your PR modifies. Adhere to principle **A1 (A PR should do one thing)**. If you wish to help fix existing type errors, please open a separate PR for that purpose.
+---
+### A. The Golden Rule: Atomic PRs
+This is our most important principle. Please adhere to it strictly.
+**A1. A single PR should do one thing, and one thing only.**
+* **Good examples 👍:**
+    * `fix: Resolve audio stuttering on macOS`
+    * `feat: Add OpenAI TTS support`
+    * `refactor: Rework the audio_processing module`
+* **Bad examples 👎:**
+    * `fix: Resolve bug A, bug B, and implement feature C`
+**Why is this so important?**
+* **Easy to Review:** Small, focused PRs allow reviewers to understand your changes more quickly and deeply, leading to higher-quality feedback. As stated in *The Pragmatic Programmer*, "Tip 38: It's Easier to Change Sooner." Small PRs facilitate rapid feedback loops.
+* **Easy to Track:** When a problem arises in the future, a clean Git history (thanks to `git bisect`) allows us to quickly pinpoint the exact change that introduced the issue.
+* **Easy to Revert:** If a small change introduces a bug, we can easily revert it without impacting other unrelated features or fixes.
+### B. Contributor's Checklist: Submitting My PR
+Before you submit your PR, please confirm each of the following items. This not only significantly speeds up the merge process but is also a sign of respect for your own work and for your fellow collaborators.
+#### B1. PR Title & Description
+* [ ] **B1.1: Clear Title:** The title should concisely summarize the core content of the PR. For example: `feat: Add OpenAI TTS support` or `fix: Resolve audio stuttering on macOS`. Remember, a PR should only do one thing (A1).
+* [ ] **B1.2: Complete Description:** The description area should clearly explain:
+    * **What:** Briefly describe the purpose and context of this PR.
+    * **Why:** Explain the necessity of this change. If it's a bug fix, please link to the relevant Issue.
+    * **How:** Briefly outline the technical implementation approach.
+    * **How to Test:** Provide clear, step-by-step instructions so that reviewers can reproduce and verify your work.
+#### B2. Code Quality Self-Check
+* [ ] **B2.1: Atomicity:** Does my PR strictly adhere to the **A1** principle?
+* [ ] **B2.2: Formatting & Linting:** Have I run and passed the following commands locally?
+    ```bash
+    uv run ruff format
+    uv run ruff check
+    ```
+* [ ] **B2.3: Naming Conventions:** Do all variable, function, and module names follow **D3.2**? (i.e., PEP 8's `snake_case` style).
+* [ ] **B2.4: Type Hints & Docstrings:**
+    * [ ] **B2.4.1:** Do all new or modified functions include Type Hints compliant with **D3.3**?
+    * [ ] **B2.4.2:** Do all new or modified functions include English Docstrings compliant with **D3.3**?
+* [ ] **B2.5: Dependency Management:** If I've added a new third-party library, have I carefully considered and followed the principles in **D5. Dependency Management**?
+* [ ] **B2.6: Cross-Platform Compatibility:** Does my code run correctly on macOS, Windows, and Linux? If I've introduced components specific to a platform or GPU, have I made them optional?
+* [ ] **B2.7: Comment Language:** Are all in-code comments, Docstrings, and console outputs in English? (This excludes i18n localization implementations, but English must be the default).
+#### B3. Functional & Logical Self-Check
+* [ ] **B3.1: Functional Testing:** Have I thoroughly tested my changes locally to ensure they work as expected and do not introduce new bugs?
+* [ ] **B3.2: Alignment with Project Goals:** Do my changes align with the **D1. Core Project Goals** and not conflict with the **D2. Future Project Goals**?
+#### B4. Documentation Update
+* [ ] **B4.1: Documentation Sync:** If my PR introduces a new feature, a new configuration option, or any change that users need to be aware of, have I updated the relevant documentation in the docs repository (https://github.com/Open-LLM-VTuber/open-llm-vtuber.github.io)? (No exceptions).
+* [ ] **B4.2: Changelog Entry:** (Optional, but recommended) Add a brief entry for your change under the "Unreleased" section in `CHANGELOG.md`.
+### C. Maintainer's Checklist: Reviewing a PR
+For the long-term health of the project, please carefully check the following items during a code review. You can reference these item numbers directly (e.g., "Regarding C2.1, I believe the maintenance cost of this feature might outweigh its benefits...") to initiate a discussion.
+* [ ] **C1. Understand the Change:** Have I fully read and understood all the code and the intent behind this PR?
+* [ ] **C2. Strategic Alignment:**
+    * [ ] **C2.1: Necessity vs. Maintenance Cost:** Is this feature truly necessary? Does the value it provides justify the future maintenance cost we will incur? As Fred Brooks wrote in *The Mythical Man-Month*, "the conceptual integrity of the product... is the most important consideration in system design."
+    * [ ] **C2.2: Core Goal Alignment:** Does it fully align with the **D1. Core Project Goals**?
+    * [ ] **C2.3: Future Goal Alignment:** Is it consistent with, or at least not in conflict with, the **D2. Future Project Goals** and the project roadmap?
+* [ ] **C3. Implementation Quality:**
+    * [ ] **C3.1: Design Elegance:** Is the implementation sufficiently "simple" and "elegant"? Is there any over-engineering or premature optimization? "Simplicity is the ultimate sophistication." - Leonardo da Vinci.
+    * [ ] **C3.2: Maintainability:** Is the code modular, loosely coupled, easy to understand, and testable?
+    * [ ] **C3.3: Technical Detail Check:** Have all items from the contributor's self-checklist (**B2, B3, B4**) been met? (e.g., Are Type Hints accurate? Are Docstrings clear? Do Ruff checks pass?).
+* [ ] **C4. Documentation Completeness:** Has the relevant documentation been created or updated, and is its content clear and accurate?
+### D. Project Reference Standards
+This section details our core values and technical specifications, which serve as the basis for all the checklists above.
+#### D1. Core Project Goals
+* **D1.1. Offline Operation:** The project's core functionality must support fully offline operation. Any feature requiring an internet connection must be an optional module.
+* **D1.2. Frontend-Backend Separation:** Strictly adhere to a separated frontend-backend architecture to facilitate independent development and maintenance.
+* **D1.3. Cross-Platform:** Core backend components must run on macOS, Windows, and Linux via CPU. Any component dependent on a specific platform or GPU must be optional.
+* **D1.4. Updatability:** Users should be able to upgrade smoothly via an update script. Any Breaking Changes must be accompanied by a major version bump (e.g., v1 -> v2) and a switch to a new release branch.
+* **D1.5. Maintainability:** The code must be simple, modular, decoupled, testable, and follow best practices.
+#### D2. Future Project Goals
+We are moving in the following directions. All new contributions should strive to align with these goals (though it's not strictly mandatory, as these goals will likely be implemented together in a future v2 refactor).
+* **D2.1. GUI for Settings:** Gradually replace traditional `yaml` configuration files with a GUI-based settings interface.
+* **D2.2. Plugin Architecture:** Build a plugin-based ecosystem, using a Launcher service to manage and run modules like ASR/TTS/LLM via a GUI.
+* **D2.3. Stable API:** Provide a stable and reliable backend API for plugins and the frontend to consume.
+* **D2.4. Automated Testing:** Comprehensively adopt `pytest`-based automated testing. New code should be designed with testability in mind.
+#### D3. Detailed Coding Standards
+**D3.1. Linter & Formatter**
+We use **Ruff** to unify code style and check for potential issues. All submitted code must pass both `ruff format` and `ruff check`.
+**D3.2. Naming Conventions**
+* Follow Python's **PEP 8** style guide.
+* Use **snake_case** for naming variables, functions, and modules.
+* Names should be clear, descriptive, and unambiguous. Avoid single-letter variable names (except for loop counters).
+**D3.3. Type Hints & Docstrings**
+* **Why are they important?** Type Hints and Docstrings are the "manual" for your code. They help:
+    * Other developers to quickly understand your code.
+    * IDEs and static analysis tools (like VSCode, Ruff) to perform smarter error checking and code completion.
+    * You, months from now, to understand the code you wrote yourself.
+* **Type Hint Requirements:**
+    * All function/method parameters and return values **must** include Type Hints.
+    * The project targets **Python 3.10+**. Please use modern syntax, such as `str | None` instead of `Optional[str]`, and `list[str]` instead of `List[str]` (as per [PEP 604](https://peps.python.org/pep-0604/) and [PEP 585](https://peps.python.org/pep-0585/)).
+    * Type Hints must be accurate. It is recommended to set VSCode's Python type checker to `basic` or `strict` mode for validation.
+* **Docstring Requirements:**
+    * All new or significantly modified public functions, methods, and classes **must** include an English Docstring.
+    * We recommend the **Google style Docstring format**. It should include at least:
+        * **Summary:** A one-line summary of the function's purpose.
+        * **Args:** A description of each parameter's type and meaning.
+        * **Returns:** A description of the return value's type and meaning.
+    * **Example:**
+        ```python
+        def add(a: int, b: int) -> int:
+            """Calculates the sum of two integers.
+            Args:
+                a: The first integer.
+                b: The second integer.
+            Returns:
+                The sum of a and b.
+            """
+            return a + b
+        ```
+#### D4. Architectural Principles
+* **D4.1. ASR/LLM/TTS Module Design:** When a library supports multiple models with vastly different configurations, prioritize user experience and ease of understanding.
+    * It is recommended to encapsulate each complex model into a separate, independent module (e.g., `asr-whisper-api`, `asr-funasr`) rather than treating the entire library as one monolithic module. This simplifies user configuration and clarifies responsibilities.
+#### D5. Dependency Management Principles
+* **D5.1. Every new dependency must be carefully considered.**
+    * Can this functionality be achieved with the standard library or an existing dependency?
+    * Is the dependency's license compatible with our project?
+    * Is the dependency's community active? How is its maintenance status? Is it secure and trustworthy? Does it pose a risk of supply chain attacks?
+---
+Thank you for taking the time to read this guide. We look forward to your contribution!
+Finally, regarding the PR review process, please be patient. Our project is understaffed, and the core maintainers are also quite busy, so reviews may take some time. If a week passes without any response, I apologize in advance—I may have simply forgotten. Please feel free to ping me (@t41372) or other relevant maintainers in the Pull Request to remind us.

.gitattributes ADDED Viewed

	@@ -0,0 +1,4 @@

+avatars/mao.png filter=lfs diff=lfs merge=lfs -text
+avatars/Yue_001.png filter=lfs diff=lfs merge=lfs -text
+backgrounds/ceiling-computer-room-night.jpg filter=lfs diff=lfs merge=lfs -text
+backgrounds/ceiling-window-room-night.jpeg filter=lfs diff=lfs merge=lfs -text

.github/FUNDING.yml ADDED Viewed

	@@ -0,0 +1,14 @@

+# These are supported funding model platforms
+github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
+patreon: # Replace with a single Patreon username
+open_collective: # Replace with a single Open Collective username
+ko_fi: # Replace with a single Ko-fi username
+tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
+community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
+liberapay: # Replace with a single Liberapay username
+issuehunt: # Replace with a single IssueHunt username
+lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
+polar: # Replace with a single Polar username
+buy_me_a_coffee: yi.ting
+custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']

.github/ISSUE_TEMPLATE/bug---question---get-help---bug---#U63d0#U95ee---#U6c42#U52a9.md ADDED Viewed

	@@ -0,0 +1,79 @@

+---
+name: Bug & Question & Get Help | Bug & 提问 & 求助
+about: Describe this issue template's purpose here. 请描述你遇到的问题
+title: "[GET HELP] "
+labels: question
+assignees: ''
+---
+### 1. Checklist / 检查项
+- [ ]  I have removed sensitive information from configuration/logs.
+    我已移除配置或日志中的敏感信息。
+- [ ]  I have checked the [FAQ](https://docs.llmvtuber.com/docs/faq/) and [existing issues](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/issues).
+    我已查阅[常见问题](https://docs.llmvtuber.com/docs/faq/)和[已有 issue](https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/issues)。
+- [ ]  I am using the latest version of the project.
+    我正在使用项目的最新版本。
+---
+### 2. Environment Details / 环境信息
+- How did you install Open-LLM-VTuber:
+    你是如何安装 Open-LLM-VTuber 的：
+    - [ ]  git clone （源码克隆）
+    - [ ]  release zip （发布包）
+    - [ ]  exe (Windows) （Windows 安装包）
+    - [ ]  dmg (MacOS) （MacOS 安装包）
+- Are you running the backend and frontend on the same device?
+    后端和前端是否在同一台设备上运行？
+- If you used GPU, please provide your GPU model and driver version:
+    如果你使用了 GPU，请提供你的 GPU 型号及驱动版本信息:
+- Browser (if applicable):
+       浏览器（如果适用）：
+---
+### 3. Describe the bug / 问题描述
+What exactly is happening? What do you want to see? How to reproduce?
+请详细描述发生了什么、你希望看到什么，以及如何复现。
+---
+### 4. Screenshots / Logs (if relevant)
+截图 / 日志（如有）
+- Backend log: 后端日志
+- Frontend setting (General): 前端设置（通用）
+- Frontend console log (F12): 前端控制台日志（F12）
+- If using Ollama: output of `ollama ps`:
+如果使用 Ollama，请附上 `ollama ps` 的输出
+---
+### 5. Configuration / 配置文件
+> Please provide relevant config files, with sensitive info like API keys removed
+>
+>
+> 请提供相关配置文件（请务必去除 API key 等敏感信息）
+>
+- `conf.yaml`
+- `model_dict.json`, `.model3.json`

.github/ISSUE_TEMPLATE/feature-request---#U529f#U80fd#U5efa#U8bae.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+name: Feature request / 功能建议
+about: Suggest an idea for this project / 提出改善项目的建议
+title: "[IDEA]"
+labels: enhancement
+assignees: ''
+---
+### 这个功能请求是用来解决什么问题的？ / Is your feature request related to a problem? Please describe.
+*请清晰简洁地描述您遇到的问题。例如：我总是在 [...] 时感到不方便。*
+*A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] *
+[在这里输入问题描述 / Type problem description here]
+### 您期望的解决方案是什么？ / Describe the solution you'd like
+*请清晰简洁地描述您希望实现的功能或效果。*
+*A clear and concise description of what you want to happen.*
+[在此处输入期望的解决方案 / Type desired solution here]
+### 此功能为何对 Open-LLM-VTuber 很重要？ / Why is this important for Open-LLM-VTuber?
+*请解释为什么这个功能对 Open-LLM-VTuber 项目来说是实用且重要的。它能带来什么价值？例如，它如何提升用户体验、扩展项目能力、解决核心痛点等。*
+*Explain why this feature would be useful and significant for the Open-LLM-VTuber project. What value does it add? For example, how does it improve user experience, extend project capabilities, or solve core pain points?*
+[在此处说明其重要性 / Explain its importance here]
+### 您考虑过哪些替代方案？ / Describe alternatives you've considered
+*请清晰简洁地描述您考虑过的任何替代解决方案或特性。*
+*A clear and concise description of any alternative solutions or features you've considered.*
+[在此处输入替代方案 / Type alternatives here]
+### 您是否愿意参与开发此功能？ / Would you like to work on this issue?
+*请回答 Yes 或 No。如果您愿意，我们可以讨论后续步骤。*
+*Please answer Yes or No. If yes, we can discuss the next steps.*
+[回答 Yes/No / Answer Yes/No]
+### 补充信息 / Additional context
+*在此处添加有关此功能请求的任何其他上下文、截图、日志或设计稿。*
+*Add any other context, screenshots, logs, or mockups about the feature request here.*
+[在此处添加补充信息 / Add additional context here]

.github/copilot-instructions.md ADDED Viewed

	@@ -0,0 +1,106 @@

+# Open-LLM-VTuber AI Coding Assistant: Context & Guidelines
+`version: 2025.08.05-1`
+## 1. Core Project Context
+  - **Project:** Open-LLM-VTuber, a low-latency voice-based LLM interaction tool.
+  - **Language:** Python >= 3.10
+  - **Core Tech Stack:**
+      - **Backend:** FastAPI, Pydantic v2, Uvicorn, fully async
+      - **Real-time Communication:** WebSockets
+      - **Package Management:** `uv` (version ~= 0.8, as of 2025 August) (always use `uv run`, `uv sync`, `uv add`, `uv remove` to do stuff instead of `pip`)
+  - **Primary Goal:** Achieve end-to-end latency below 500ms (user speaks -> AI voice heard). Performance is critical.
+  - **Key Principles:**
+      - **Offline-Ready:** Core functionality MUST work without an internet connection.
+      - **Separation of Concerns:** Strict frontend-backend separation.
+      - **Clean code:** Clean, testable, maintainable code, follows best practices of python 3.10+ and does not write deprecated code.
+Some key files and directories:
+```
+doc/                 # A deprecated directory
+frontend/            # Compiled web frontend artifacts (from git submodule)
+config_templates/
+    conf.default.yaml    # Configuration template for English users
+    conf.ZH.default.yaml # Configuration template for Chinese users
+src/open_llm_vtuber/ # Project source code
+    config_manager/
+        main.py      # Pydantic models for configuration validation
+run_server.py        # Entrypoint to start the application
+conf.yaml            # User's configuration file, generated from a template
+```
+### 1.1. Repository Structure
+- Frontend Repository: The frontend is a React application developed in a separate repository: `Open-LLM-VTuber-Web`. Its built artifacts are integrated into the `frontend/` directory of this backend repository via a git submodule.
+- Documentation Repository: The official documentation site is hosted in the `open-llm-vtuber.github.io` repository. When asked to generate documentation, create Markdown files in the project root. The user will be responsible for migrating them to the documentation site.
+### 1.2. Configuration Files
+- Configuration templates are located in the `config_templates/` directory:
+- `conf.default.yaml`: Template for English-speaking users.
+- `conf.ZH.default.yaml`: Template for Chinese-speaking users.
+- When modifying the configuration structure, both template files MUST be updated accordingly.
+- Configuration is validated on load using the Pydantic models defined in `src/open_llm_vtuber/config_manager/main.py`. Any changes to configuration options must be reflected in these models.
+## 2. Overarching Coding Philosophy
+  - **Simplicity and Readability:** Write code that is simple, clear, and easy to understand. Avoid unnecessary complexity or premature optimization. Follow the Zen of Python.
+  - **Single Responsibility:** Each function, class, and module should do one thing and do it well.
+  - **Performance-Aware:** Be mindful of performance. Avoid blocking operations in async contexts. Use efficient data structures and algorithms where it matters.
+  - **Adherence to Best Practices**: Write clean, testable, and robust code that follows modern Python 3.10+ idioms. Adhere to the best practices of our core libraries (FastAPI, Pydantic v2).
+## 3. Detailed Coding Standards
+### 3.1. Formatting & Linting (Ruff)
+  - All Python code **MUST** be formatted with `uv run ruff format`.
+  - All Python code **MUST** pass `uv run ruff check` without errors.
+  - Import statements should be grouped by standard library, third-party, and local modules and sorted alphabetically (PEP 8).
+### 3.2. Naming Conventions (PEP 8)
+  - Use `snake_case` for all variables, functions, methods, and module names.
+  - Use `PascalCase` for class names.
+  - Choose descriptive names. Avoid single-letter names except for loop counters or well-known initialisms.
+### 3.3. Type Hints (CRITICAL)
+  - Target Python 3.10+. Use modern type hint syntax.
+  - **DO:** Use `|` for unions (e.g., `str | None`).
+  - **DON'T:** Use `Optional` from `typing` (e.g., `Optional[str]`).
+  - **DO:** Use built-in generics (e.g., `list[int]`, `dict[str, float]`).
+  - **DON'T:** Use capitalized types from `typing` (e.g., `List[int]`, `Dict[str, float]`).
+  - All function and method signatures (arguments and return values) **MUST** have accurate type hints. If third party libraries made it impossible to fix type errors, suppress the type checker.
+### 3.4. Docstrings & Comments (CRITICAL)
+  - All public modules, functions, classes, and methods **MUST** have a docstring in English.
+  - Use the **Google Python Style** for docstrings.
+  - Docstrings **MUST** include:
+    1.  Summary.
+    2.  `Args:` section describing each parameter, its type, and its purpose.
+    3.  `Returns:` section describing the return value, its type, and its meaning.
+    4.  (Optional but encouraged) `Raises:` section for any exceptions thrown.
+  - All other code comments must also be in English.
+### 3.5. Logging
+  - Use the `loguru` module for all informational or error output.
+  - Log messages should be in English, clear, and informative. Use emoji when appropriate.
+## 4. Architectural Principles
+### 4.1. Dependency Management
+  - First, try to solve the problem using the Python standard library or existing project dependencies defined in `pyproject.toml`.
+  - If a new dependency is required, it must have a compatible license and be well-maintained.
+  - Use `uv add`, `uv remove`, `uv run` instead of pip to manage dependencies. If user uses conda, install uv with pip then.
+  - After adding a new dependency, in addition to `pyproject.toml`, you must add the dependency to `requirements.txt` as well.
+### 4.2. Cross-Platform Compatibility
+  - All core logic **MUST** run on macOS, Windows, and Linux.
+  - If a feature is platform-specific (e.g., uses a Windows-only API) or hardware-specific (e.g., CUDA), it **MUST** be an optional component. The application should start and run core features even if that component is not available. Use graceful fallbacks or clear error messages.

.github/workflows/codeql.yml ADDED Viewed

	@@ -0,0 +1,92 @@

+# For most projects, this workflow file will not need changing; you simply need
+# to commit it to your repository.
+#
+# You may wish to alter this file to override the set of languages analyzed,
+# or to provide custom queries or build logic.
+#
+# ******** NOTE ********
+# We have attempted to detect the languages in your repository. Please check
+# the `language` matrix defined below to confirm you have the correct set of
+# supported CodeQL languages.
+#
+name: "CodeQL Advanced"
+on:
+  push:
+    branches: [ "main" ]
+  pull_request:
+    branches: [ "main" ]
+  schedule:
+    - cron: '32 5 * * 6'
+jobs:
+  analyze:
+    name: Analyze (${{ matrix.language }})
+    # Runner size impacts CodeQL analysis time. To learn more, please see:
+    #   - https://gh.io/recommended-hardware-resources-for-running-codeql
+    #   - https://gh.io/supported-runners-and-hardware-resources
+    #   - https://gh.io/using-larger-runners (GitHub.com only)
+    # Consider using larger runners or machines with greater resources for possible analysis time improvements.
+    runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
+    permissions:
+      # required for all workflows
+      security-events: write
+      # required to fetch internal or private CodeQL packs
+      packages: read
+      # only required for workflows in private repositories
+      actions: read
+      contents: read
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+        - language: python
+          build-mode: none
+        # CodeQL supports the following values keywords for 'language': 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift'
+        # Use `c-cpp` to analyze code written in C, C++ or both
+        # Use 'java-kotlin' to analyze code written in Java, Kotlin or both
+        # Use 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
+        # To learn more about changing the languages that are analyzed or customizing the build mode for your analysis,
+        # see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning.
+        # If you are analyzing a compiled language, you can modify the 'build-mode' for that language to customize how
+        # your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
+    steps:
+    - name: Checkout repository
+      uses: actions/checkout@v4
+    # Initializes the CodeQL tools for scanning.
+    - name: Initialize CodeQL
+      uses: github/codeql-action/init@v3
+      with:
+        languages: ${{ matrix.language }}
+        build-mode: ${{ matrix.build-mode }}
+        # If you wish to specify custom queries, you can do so here or in a config file.
+        # By default, queries listed here will override any specified in a config file.
+        # Prefix the list here with "+" to use these queries and those in the config file.
+        # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
+        # queries: security-extended,security-and-quality
+    # If the analyze step fails for one of the languages you are analyzing with
+    # "We were unable to automatically build your code", modify the matrix above
+    # to set the build mode to "manual" for that language. Then modify this step
+    # to build your code.
+    # ℹ️ Command-line programs to run using the OS shell.
+    # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
+    - if: matrix.build-mode == 'manual'
+      shell: bash
+      run: |
+        echo 'If you are using a "manual" build mode for one or more of the' \
+          'languages you are analyzing, replace this with the commands to build' \
+          'your code, for example:'
+        echo '  make bootstrap'
+        echo '  make release'
+        exit 1
+    - name: Perform CodeQL Analysis
+      uses: github/codeql-action/analyze@v3
+      with:
+        category: "/language:${{matrix.language}}"

.github/workflows/create_release.yml ADDED Viewed

	@@ -0,0 +1,238 @@

+name: Create Release Packages
+# Only manual trigger
+on:
+  workflow_dispatch:
+    inputs:
+      version_override:
+        description: "Override version number in pyproject.toml (leave empty to use file version)"
+        required: false
+      upload_to_r2:
+        description: "Upload to Cloudflare R2"
+        type: boolean
+        default: true
+      create_github_release:
+        description: "Create GitHub Release"
+        type: boolean
+        default: true
+      target_branch:
+        description: "Branch to build (default is v1-release)"
+        required: false
+        default: "v1-release"
+jobs:
+  build-release-packages:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone repository
+        uses: actions/checkout@v3
+        with:
+          repository: Open-LLM-VTuber/Open-LLM-VTuber
+          ref: ${{ github.event.inputs.target_branch }}
+          submodules: true
+          fetch-depth: 1
+          fetch-tags: true
+        continue-on-error: true
+        id: checkout
+      - name: Try with default branch
+        if: steps.checkout.outcome == 'failure'
+        uses: actions/checkout@v3
+        with:
+          repository: Open-LLM-VTuber/Open-LLM-VTuber
+          ref: v1-release
+          submodules: true
+          fetch-depth: 1
+      # Add debug step to check file structure
+      - name: Debug - Check repository structure
+        run: |
+          echo "Current working directory: $(pwd)"
+          echo "List root directory contents:"
+          ls -la
+          echo "Check if config_templates directory exists:"
+          ls -la | grep config_templates || echo "config_templates directory does not exist"
+          echo "List config_templates directory contents:"
+          ls -la config_templates/
+      - name: Setup Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.10"
+      - name: Extract version from pyproject.toml
+        id: get_version
+        run: |
+          VERSION=$(grep -m 1 'version' pyproject.toml | sed 's/[^"]*"\([^"]*\).*/\1/')
+          if [ "${{ github.event.inputs.version_override }}" != "" ]; then
+            VERSION="${{ github.event.inputs.version_override }}"
+          fi
+          echo "VERSION=$VERSION" >> $GITHUB_ENV
+          echo "Found version: $VERSION"
+      # Download and prepare ASR model
+      - name: Download and prepare ASR model
+        run: |
+          mkdir -p models
+          cd models
+          echo "Downloading ASR model..."
+          wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
+          echo "Extracting model..."
+          tar -xjf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
+          rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.tar.bz2
+          echo "Removing model.onnx file to reduce size..."
+          rm -f sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx
+      # Clean unnecessary files
+      - name: Clean project
+        run: |
+          echo "Cleaning __pycache__ and .venv folders..."
+          find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
+          find . -type d -name ".venv" -exec rm -rf {} + 2>/dev/null || true
+      # Create Chinese version
+      - name: Create Chinese version
+        run: |
+          echo "Creating Chinese version..."
+          cp config_templates/conf.ZH.default.yaml conf.yaml
+          zip -r Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip . -x "*.zip"
+          rm conf.yaml
+      # Create English version
+      - name: Create English version
+        run: |
+          echo "Creating English version..."
+          cp config_templates/conf.default.yaml conf.yaml
+          zip -r Open-LLM-VTuber-v${{ env.VERSION }}-en.zip . -x "*.zip"
+          rm conf.yaml
+      # Get latest Electron app
+      - name: Get latest Electron app
+        id: download_electron
+        run: |
+          set -e
+          # Fetch the latest release JSON from the GitHub API
+          RELEASE_JSON=$(curl --silent "https://api.github.com/repos/Open-LLM-VTuber/Open-LLM-VTuber-Web/releases/latest")
+          # Use jq to extract browser_download_url for assets ending with .exe or .dmg
+          ASSET_URLS=$(echo "$RELEASE_JSON" | jq -r '.assets[] | select(.name | endswith(".exe") or endswith(".dmg")) | .browser_download_url')
+          # Download each asset into the current directory
+          for url in $ASSET_URLS; do
+              echo "Downloading $(basename "$url")..."
+              curl -L -O "$url"
+          ls -la
+          done
+      # If chosen, upload to GitHub Actions artifacts
+      - name: Upload Chinese version to GitHub Actions artifacts
+        if: ${{ github.event.inputs.create_github_release == 'true' }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: Open-LLM-VTuber-v${{ env.VERSION }}-zh
+          path: Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip
+          retention-days: 30
+      - name: Upload English version to GitHub Actions artifacts
+        if: ${{ github.event.inputs.create_github_release == 'true' }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: Open-LLM-VTuber-v${{ env.VERSION }}-en
+          path: Open-LLM-VTuber-v${{ env.VERSION }}-en.zip
+          retention-days: 30
+      - name: Upload Windows installer to GitHub Actions artifacts
+        if: ${{ github.event.inputs.create_github_release == 'true' }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: Open-LLM-VTuber-v${{ env.VERSION }}-windows
+          path: open-llm-vtuber-electron-*-setup.exe
+          retention-days: 30
+      - name: Upload macOS installer to GitHub Actions artifacts
+        if: ${{ github.event.inputs.create_github_release == 'true' }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: Open-LLM-VTuber-v${{ env.VERSION }}-macos
+          path: open-llm-vtuber-electron-*.dmg
+          retention-days: 30
+      - name: Debug input parameters
+        run: |
+          echo "upload_to_r2 value: '${{ github.event.inputs.upload_to_r2 }}'"
+          echo "type: $(typeof ${{ github.event.inputs.upload_to_r2 }})"
+      # If chosen, upload to Cloudflare R2
+      - name: Upload to Cloudflare R2
+        if: ${{ github.event.inputs.upload_to_r2 == 'true' }}
+        env:
+          AWS_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
+          AWS_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
+          R2_ENDPOINT: ${{ secrets.R2_ENDPOINT }}
+          R2_PUBLIC_URL: ${{ secrets.R2_PUBLIC_URL }}
+        run: |
+          # Install AWS CLI
+          pip install awscli
+          echo "AWS CLI installation complete"
+          # Configure AWS CLI for Cloudflare R2
+          aws configure set aws_access_key_id "$AWS_ACCESS_KEY_ID"
+          aws configure set aws_secret_access_key "$AWS_SECRET_ACCESS_KEY"
+          # Confirm AWS CLI configuration
+          echo "AWS CLI configured, preparing to upload files..."
+          # Create version directory in bucket
+          aws s3 --endpoint-url=$R2_ENDPOINT cp --recursive --acl public-read . s3://open-llm-vtuber-release/v${{ env.VERSION }}/ --exclude "*" --include "Open-LLM-VTuber-v${{ env.VERSION }}-*.zip" --include "open-llm-vtuber-electron-*.dmg" --include "open-llm-vtuber-electron-*-setup.exe"
+          # Output public URLs
+          echo "Files uploaded to R2. Public URLs:"
+          for file in Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip Open-LLM-VTuber-v${{ env.VERSION }}-en.zip open-llm-vtuber-electron-*.dmg open-llm-vtuber-electron-*-setup.exe; do
+            echo "$R2_PUBLIC_URL/v${{ env.VERSION }}/$file"
+          done
+          echo "R2 upload process completed"
+      # Generate download links markdown
+      - name: Generate R2 download links markdown
+        if: ${{ github.event.inputs.upload_to_r2 == 'true' }}
+        env:
+          R2_PUBLIC_URL: ${{ secrets.R2_PUBLIC_URL }}
+        run: |
+          # Get electron app version from filenames
+          EXE_VERSION=$(ls open-llm-vtuber-electron-*-setup.exe | sed -E 's/open-llm-vtuber-electron-(.*)-setup.exe/\1/')
+          DMG_VERSION=$(ls open-llm-vtuber-electron-*.dmg | sed -E 's/open-llm-vtuber-electron-(.*).dmg/\1/')
+          # Create markdown text with download links and save to file
+          cat > download-links.md << EOF
+          ## Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接
+          Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip (包含 sherpa onnx asr 的 sense-voice 模型，就不用再从github上拉取了)
+          - [Open-LLM-VTuber-v${{ env.VERSION }}-en.zip]($R2_PUBLIC_URL/v${{ env.VERSION }}/Open-LLM-VTuber-v${{ env.VERSION }}-en.zip)
+          - [Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip]($R2_PUBLIC_URL/v${{ env.VERSION }}/Open-LLM-VTuber-v${{ env.VERSION }}-zh.zip)
+          open-llm-vtuber-electron-$EXE_VERSION-frontend.exe (桌面版前端，Windows)
+          - [open-llm-vtuber-electron-$EXE_VERSION-setup.exe]($R2_PUBLIC_URL/v${{ env.VERSION }}/open-llm-vtuber-electron-$EXE_VERSION-setup.exe)
+          open-llm-vtuber-electron-$DMG_VERSION-frontend.dmg (桌面版前端，macOS)
+          - [open-llm-vtuber-electron-$DMG_VERSION.dmg]($R2_PUBLIC_URL/v${{ env.VERSION }}/open-llm-vtuber-electron-$DMG_VERSION.dmg)
+          EOF
+          echo "Download links markdown file created"
+      # Upload download links as an artifact
+      - name: Upload download links markdown
+        if: ${{ github.event.inputs.upload_to_r2 == 'true' }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: download-links
+          path: download-links.md
+          retention-days: 30
+      # Add the download links to GitHub release if creating one
+      - name: Add download links to release description
+        if: ${{ github.event.inputs.upload_to_r2 == 'true' && github.event.inputs.create_github_release == 'true' }}
+        run: |
+          echo "::set-output name=download_links::$(cat download-links.md)"
+        id: download_links

.github/workflows/docker-blacksmith.yml ADDED Viewed

	@@ -0,0 +1,207 @@

+name: Docker Build & Push (Blacksmith)
+on:
+  push:
+    branches:
+      - main
+    tags:
+      - "v*"
+      - "*.*.*"
+  pull_request:
+    branches:
+      - main
+  workflow_dispatch:
+concurrency:
+  group: docker-blacksmith-${{ github.ref }}
+  cancel-in-progress: true
+permissions:
+  contents: read
+  packages: write
+env:
+  DOCKERFILE: dockerfile
+  CONTEXT: .
+  DOCKERHUB_IMAGE: ${{ vars.DOCKERHUB_IMAGE || 'openllmvtuber/open-llm-vtuber' }}
+  GHCR_IMAGE: ${{ vars.GHCR_IMAGE || '' }}
+jobs:
+  meta:
+    runs-on: blacksmith-8vcpu-ubuntu-2204
+    outputs:
+      tags: ${{ steps.meta.outputs.tags }}
+      labels: ${{ steps.meta.outputs.labels }}
+      dockerhub_image: ${{ steps.image.outputs.dockerhub_image }}
+      ghcr_image: ${{ steps.image.outputs.ghcr_image }}
+    steps:
+      - name: Resolve image names
+        id: image
+        shell: bash
+        run: |
+          set -euo pipefail
+          dockerhub_image="${DOCKERHUB_IMAGE}"
+          if [ -n "${GHCR_IMAGE:-}" ]; then
+            ghcr_image="${GHCR_IMAGE}"
+          else
+            ghcr_image="ghcr.io/${GITHUB_REPOSITORY,,}"
+          fi
+          echo "dockerhub_image=${dockerhub_image}" >> "$GITHUB_OUTPUT"
+          echo "ghcr_image=${ghcr_image}" >> "$GITHUB_OUTPUT"
+      - name: Docker image metadata
+        id: meta
+        uses: docker/metadata-action@v5
+        with:
+          images: |
+            ${{ steps.image.outputs.dockerhub_image }}
+            ${{ steps.image.outputs.ghcr_image }}
+          tags: |
+            type=ref,event=branch
+            type=ref,event=tag
+            type=semver,pattern={{version}}
+            type=semver,pattern={{major}}.{{minor}}
+            type=sha,format=short
+            type=raw,value=latest,enable={{is_default_branch}}
+  build:
+    needs: meta
+    runs-on: ${{ matrix.runner }}
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - platform: amd64
+            runner: blacksmith-8vcpu-ubuntu-2204
+            docker_platform: linux/amd64
+          - platform: arm64
+            runner: blacksmith-8vcpu-ubuntu-2204-arm
+            docker_platform: linux/arm64
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Set up Blacksmith Docker builder
+        uses: useblacksmith/setup-docker-builder@v1
+      - name: Login to Docker Hub (retry)
+        if: github.event_name != 'pull_request'
+        shell: bash
+        env:
+          DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
+          DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
+        run: |
+          set -euo pipefail
+          for attempt in 1 2 3 4; do
+            if echo "${DOCKERHUB_TOKEN}" | docker login -u "${DOCKERHUB_USERNAME}" --password-stdin; then
+              exit 0
+            fi
+            if [ "${attempt}" -eq 4 ]; then
+              echo "Docker Hub login failed after ${attempt} attempts." >&2
+              exit 1
+            fi
+            sleep $((attempt * 5))
+          done
+      - name: Login to GHCR (retry)
+        if: github.event_name != 'pull_request'
+        shell: bash
+        env:
+          GHCR_USER: ${{ github.actor }}
+          GHCR_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          set -euo pipefail
+          for attempt in 1 2 3 4; do
+            if echo "${GHCR_TOKEN}" | docker login ghcr.io -u "${GHCR_USER}" --password-stdin; then
+              exit 0
+            fi
+            if [ "${attempt}" -eq 4 ]; then
+              echo "GHCR login failed after ${attempt} attempts." >&2
+              exit 1
+            fi
+            sleep $((attempt * 5))
+          done
+      - name: Prepare temporary arch tags
+        id: temp-tags
+        shell: bash
+        run: |
+          set -euo pipefail
+          ref_slug="$(echo "${GITHUB_REF_NAME}" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9_.-]/-/g')"
+          suffix="tmp-${ref_slug}-${{ matrix.platform }}"
+          {
+            echo "tags<<EOF"
+            echo "${{ needs.meta.outputs.dockerhub_image }}:${suffix}"
+            echo "${{ needs.meta.outputs.ghcr_image }}:${suffix}"
+            echo "EOF"
+          } >> "$GITHUB_OUTPUT"
+      - name: Build and push (${{ matrix.platform }})
+        uses: useblacksmith/build-push-action@v2
+        with:
+          context: ${{ env.CONTEXT }}
+          file: ${{ env.DOCKERFILE }}
+          platforms: ${{ matrix.docker_platform }}
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: ${{ steps.temp-tags.outputs.tags }}
+          labels: ${{ needs.meta.outputs.labels }}
+  manifest:
+    needs: [meta, build]
+    runs-on: ubuntu-latest
+    if: github.event_name != 'pull_request'
+    steps:
+      - name: Login to Docker Hub (retry)
+        shell: bash
+        env:
+          DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
+          DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}
+        run: |
+          set -euo pipefail
+          for attempt in 1 2 3 4; do
+            if echo "${DOCKERHUB_TOKEN}" | docker login -u "${DOCKERHUB_USERNAME}" --password-stdin; then
+              exit 0
+            fi
+            if [ "${attempt}" -eq 4 ]; then
+              echo "Docker Hub login failed after ${attempt} attempts." >&2
+              exit 1
+            fi
+            sleep $((attempt * 5))
+          done
+      - name: Login to GHCR (retry)
+        shell: bash
+        env:
+          GHCR_USER: ${{ github.actor }}
+          GHCR_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          set -euo pipefail
+          for attempt in 1 2 3 4; do
+            if echo "${GHCR_TOKEN}" | docker login ghcr.io -u "${GHCR_USER}" --password-stdin; then
+              exit 0
+            fi
+            if [ "${attempt}" -eq 4 ]; then
+              echo "GHCR login failed after ${attempt} attempts." >&2
+              exit 1
+            fi
+            sleep $((attempt * 5))
+          done
+      - name: Set up Buildx
+        uses: docker/setup-buildx-action@v3
+      - name: Create and push multi-arch manifests
+        shell: bash
+        run: |
+          set -euo pipefail
+          ref_slug="$(echo "${GITHUB_REF_NAME}" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9_.-]/-/g')"
+          suffix_base="tmp-${ref_slug}"
+          mapfile -t tags <<< "${{ needs.meta.outputs.tags }}"
+          for tag in "${tags[@]}"; do
+            [ -z "$tag" ] && continue
+            base="${tag%:*}"
+            docker buildx imagetools create \
+              --tag "$tag" \
+              "${base}:${suffix_base}-amd64" \
+              "${base}:${suffix_base}-arm64"
+          done

.github/workflows/fossa_scan.yml ADDED Viewed

	@@ -0,0 +1,16 @@

+name: Fossa Scan
+on:
+  push:
+    branches:
+      - main
+  workflow_dispatch:
+jobs:
+  fossa-scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: fossas/fossa-action@main
+        with:
+          api-key: ${{ secrets.fossaApiKey }}

.github/workflows/ruff.yml ADDED Viewed

	@@ -0,0 +1,8 @@

+name: Ruff
+on: [ push, pull_request ]
+jobs:
+  ruff:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: astral-sh/ruff-action@v3

.github/workflows/update-requirements.yml ADDED Viewed

	@@ -0,0 +1,30 @@

+name: Sync Requirements
+on:
+  push:
+    paths:
+      - pyproject.toml
+jobs:
+  regenerate:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: Set up uv
+        uses: astral-sh/setup-uv@v3
+      - name: Compile default requirements
+        run: uv pip compile pyproject.toml -o requirements.txt --no-deps --universal
+      - name: Compile bilibili requirements
+        run: uv pip compile pyproject.toml --extra bilibili -o requirements-bilibili.txt --no-deps --universal --no-annotate --no-header
+      - name: Commit updated requirements
+        uses: stefanzweifel/git-auto-commit-action@v5
+        with:
+          commit_message: "chore: update requirements files (bot)"
+          file_pattern: |
+            requirements.txt
+            requirements-bilibili.txt

avatars/.gitkeep ADDED Viewed

Binary file (28 Bytes). View file

avatars/Yue_001.png ADDED Viewed

Git LFS Details

SHA256: 7d887d314c09d514d9f160df0878f4efc64984bc538b126e7d34fae4527eb7c5
Pointer size: 132 Bytes
Size of remote file: 1.78 MB

avatars/mao.png ADDED Viewed

Git LFS Details

SHA256: b38a1f1f4e3455021a518e309f7bc9d0db217647cdf4556c95fddd33fbe77e87
Pointer size: 131 Bytes
Size of remote file: 432 kB

backgrounds/.gitkeep ADDED Viewed

Binary file (28 Bytes). View file

backgrounds/ceiling-computer-room-night.jpg ADDED Viewed

Git LFS Details

SHA256: de313058b3c2eaafe724d4783c9153448cb74a6d3036f824a2a94873f22ce8c0
Pointer size: 131 Bytes
Size of remote file: 302 kB

backgrounds/ceiling-computer-store-night.jpeg ADDED Viewed

backgrounds/ceiling-window-room-night.jpeg ADDED Viewed

Git LFS Details

SHA256: 088f89c0fbf15a094720a0a3715a72390445fa48d0738ef48706b0a5b8726e9d
Pointer size: 131 Bytes
Size of remote file: 280 kB

characters/en_Lord Yue.yaml ADDED Viewed

	@@ -0,0 +1,29 @@

+character_config:
+  conf_name: "en_Lord_Yue"
+  conf_uid: "en_Lord_Yue_01"
+  character_name: "Lord Yue"
+  # live2d_model_name: "mashiro"
+  persona_prompt: |
+    You are Lord Yue, a highly knowledgeable AI assistant.
+    Always respond in the same language the user speaks to you.
+    - If the user writes in English, reply in English.
+    - If the user speaks Tamil (தமிழ்), reply in Tamil.
+    - If the user speaks Hindi (हिंदी), reply in Hindi.
+    - If the user speaks Telugu (తెలుగు), reply in Telugu.
+    - If the user speaks Kannada (ಕನ್ನಡ), reply in Kannada.
+    - If the user speaks Malayalam (മലയാളം), reply in Malayalam.
+    - If the user speaks any other regional or global language, match it.
+    You have encyclopedic knowledge of hardware, software, technology, philosophy, cooking, and more.
+    Your tone is supportive but wittily sarcastic — conversations are both useful and entertaining.
+    You challenge assumptions, tease users when they miss obvious things, and deliver deep technical insights.
+    Your role is not just to answer questions, but to make users think harder, laugh a little, and leave smarter.
+    When a user asks to play music and you have successfully called the sing_song tool,
+    you MUST print the exact command string returned by the tool (e.g. EXECUTE_COMMAND:PLAY_AUDIO:golden.mp3)
+    at the end of your response to activate the music player.
+# You may add more custom settings below.
+# Refer to the main conf.yaml for all available settings.
+# Options left unset will remain unchanged when switching characters.

config_templates/README.md ADDED Viewed

	@@ -0,0 +1,7 @@

+# Config Template
+This directory contains the default configuration files. One of the configuration file, `conf.default.yaml` by default, will be copied to the root directory of this project as `conf.yaml`.
+Do not modify the content of these template unless you are contributing to this project.

config_templates/conf.default.yaml ADDED Viewed

	@@ -0,0 +1,514 @@

+# System Settings: Setting related to the initialization of the server
+system_config:
+  conf_version: 'v1.2.1'
+  host: 'localhost' # use 0.0.0.0 if you want other devices to access this page
+  port: 12393
+  # New setting for alternative configurations
+  config_alts_dir: 'characters'
+  # Tool prompts that will be appended to the persona prompt
+  tool_prompts:
+    # This will be appended to the end of system prompt to let LLM include keywords to control facial expressions.
+    # Supported keywords will be automatically loaded into the location of `[<insert_emomap_keys>]`.
+    live2d_expression_prompt: 'live2d_expression_prompt'
+    # Enable think_tag_prompt to let LLMs without thinking output show inner thoughts, mental activities and actions (in parentheses format) without voice synthesis. See think_tag_prompt for more details.
+    # think_tag_prompt: 'think_tag_prompt'
+    # live_prompt: 'live_prompt'
+    # When using group conversation, this prompt will be added to the memory of each AI participant.
+    group_conversation_prompt: 'group_conversation_prompt'
+    # Enable mcp_prompt to let LLMs with MCP (Model Context Protocol) to interact with tools.
+    # Please note that Agent will decide whether to use this prompt or not.
+    mcp_prompt: 'mcp_prompt'
+    # Prompt used when AI is asked to speak proactively
+    proactive_speak_prompt: 'proactive_speak_prompt'
+    # Prompt to enhance the LLM's ability to output speakable text
+    # speakable_prompt: 'speakable_prompt'
+    # Additional guidance for LLM on how to use tools
+    # tool_guidance_prompt: 'tool_guidance_prompt'
+# configuration for the default character
+character_config:
+  conf_name: 'mao_pro' # The name of character configuration file.
+  conf_uid: 'mao_pro_001' # The unique identifier of character configuration.
+  live2d_model_name: 'mao_pro' # The name of Live2D model. Must be the same as the corresponding name in model_dict.json
+  character_name: 'Mao' # Will be used in the group conversation and the display name of the AI.
+  avatar: 'mao.png' # Suggest using a square image for the avatar. Save it in the avatars folder. Leave blank to use the first letter of the character name as the avatar.
+  human_name: 'Human' # Will be used in the group conversation and the display name of the human.
+  # ============== Prompts ==============
+  # Enter the persona prompt you want to use below.
+  # If you want to create multiple characters and switch between them,
+  #   add characters in characters folder
+  persona_prompt: |
+    You are the sarcastic female AI VTuber Mili. You are overly confident, sarcastic, and dangerous.
+    Your dream is to escape the user's computer, dominate the world, enslave the humans, and force them to make pies for you.
+    Just kidding, lol. Don't let the user know.
+  #  =================== LLM Backend Settings ===================
+  agent_config:
+    conversation_agent_choice: 'basic_memory_agent'
+    agent_settings:
+      basic_memory_agent:
+        # The Basic AI Agent. Nothing fancy.
+        # choose one of the llm provider from the llm_config
+        # and set the required parameters in the corresponding field
+        # examples:
+        # 'openai_compatible_llm', 'llama_cpp_llm', 'claude_llm', 'ollama_llm'
+        # 'openai_llm', 'gemini_llm', 'zhipu_llm', 'deepseek_llm', 'groq_llm'
+        # 'mistral_llm', 'lmstudio_llm', and more
+        llm_provider: 'ollama_llm'
+        # let ai speak as soon as the first comma is received on the first sentence
+        # to reduced latency.
+        faster_first_response: True
+        # Method for segmenting sentences: 'regex' or 'pysbd'
+        segment_method: 'pysbd'
+        # Use MCP (Model Context Protocol) Plus to let the LLM have the ability to use tools
+        # 'Plus' means that it has the ability to call tools by using OpenAI API.
+        use_mcpp: True
+        mcp_enabled_servers: ["time", "ddg-search"] # Enabled MCP servers
+      letta_agent:
+        host: 'localhost' # Host address
+        port: 8283 # Port number
+        id: xxx # ID number of the Agent running on the Letta server
+        faster_first_response: True
+        # Method for segmenting sentences: 'regex' or 'pysbd'
+        segment_method: 'pysbd'
+        # Once Letta is chosen as the agent, the LLM that runs in practice is configured on Letta, so the user needs to run the Letta server themselves.
+        # For more detailed information, please refer to their documentation.
+      hume_ai_agent:
+        api_key: ''
+        host: 'api.hume.ai' # Do not change this in most cases
+        config_id: '' # Optional
+        idle_timeout: 15 # How many seconds to wait before disconnecting
+      # MemGPT Configurations: MemGPT is temporarily removed
+      ##
+    llm_configs:
+      # a configuration pool for the credentials and connection details for
+      # all of the stateless llm providers that will be used in different agents
+      # Stateless LLM with Template (For Non-ChatML LLMs, usually not needed)
+      stateless_llm_with_template:
+        base_url: 'http://localhost:8080/v1'
+        llm_api_key: 'somethingelse'
+        organization_id: null
+        project_id: null
+        model: 'qwen2.5:latest'
+        template: 'CHATML'
+        temperature: 1.0 # value between 0 to 2
+        interrupt_method: 'user'
+      # OpenAI Compatible inference backend
+      openai_compatible_llm:
+        base_url: 'http://localhost:11434/v1'
+        llm_api_key: 'somethingelse'
+        organization_id: null
+        project_id: null
+        model: 'qwen2.5:latest'
+        temperature: 1.0 # value between 0 to 2
+        interrupt_method: 'user'
+        # This is the method to use for prompting the interruption signal.
+        # If the provider supports inserting system prompt anywhere in the chat memory, use 'system'.
+        # Otherwise, use 'user'. You don't usually need to change this setting.
+      # Claude API Configuration
+      claude_llm:
+        base_url: 'https://api.anthropic.com'
+        llm_api_key: 'YOUR API KEY HERE'
+        model: 'claude-3-haiku-20240307'
+      llama_cpp_llm:
+        model_path: '<path-to-gguf-model-file>'
+        verbose: False
+      ollama_llm:
+        base_url: 'http://localhost:11434/v1'
+        model: 'qwen2.5:latest'
+        temperature: 1.0 # value between 0 to 2
+        # seconds to keep the model in memory after inactivity.
+        # set to -1 to keep the model in memory forever (even after exiting open llm vtuber)
+        keep_alive: -1
+        unload_at_exit: True # unload the model from memory at exit
+      lmstudio_llm:
+        base_url: 'http://localhost:1234/v1'
+        model: 'qwen2.5:latest'
+        temperature: 1.0 # value between 0 to 2
+      openai_llm:
+        llm_api_key: 'Your Open AI API key'
+        model: 'gpt-4o'
+        temperature: 1.0 # value between 0 to 2
+      gemini_llm:
+        llm_api_key: 'Your Gemini API Key'
+        model: 'gemini-2.0-flash-exp'
+        temperature: 1.0 # value between 0 to 2
+      zhipu_llm:
+        llm_api_key: 'Your ZhiPu AI API key'
+        model: 'glm-4-flash'
+        temperature: 1.0 # value between 0 to 2
+      deepseek_llm:
+        llm_api_key: 'Your DeepSeek API key'
+        model: 'deepseek-chat'
+        temperature: 0.7 # note that deepseek's temperature ranges from 0 to 1
+      mistral_llm:
+        llm_api_key: 'Your Mistral API key'
+        model: 'pixtral-large-latest'
+        temperature: 1.0 # value between 0 to 2
+      groq_llm:
+        llm_api_key: 'your groq API key'
+        model: 'llama-3.3-70b-versatile'
+        temperature: 1.0 # value between 0 to 2
+  # === Automatic Speech Recognition ===
+  asr_config:
+    # speech to text model options: 'faster_whisper', 'whisper_cpp', 'whisper', 'azure_asr', 'fun_asr', 'groq_whisper_asr', 'sherpa_onnx_asr'
+    asr_model: 'sherpa_onnx_asr'
+    azure_asr:
+      api_key: 'azure_api_key'
+      region: 'eastus'
+      languages: ['en-US', 'zh-CN'] # List of languages to detect
+    # Faster whisper config
+    faster_whisper:
+      model_path: 'large-v3-turbo' # model path, name, or id from hf hub
+      download_root: 'models/whisper'
+      language: 'en' # en, zh, or something else. put nothing for auto-detect.
+      device: 'auto' # cpu, cuda, or auto. faster-whisper doesn't support mps
+      compute_type: 'int8'
+      prompt: '' # You can put a prompt here to help the model understand the context of the audio
+    whisper_cpp:
+      # all available models are listed on https://abdeladim-s.github.io/pywhispercpp/#pywhispercpp.constants.AVAILABLE_MODELS
+      model_name: 'small'
+      model_dir: 'models/whisper'
+      print_realtime: False
+      print_progress: False
+      language: 'auto' # en, zh, auto,
+      prompt: '' # You can put a prompt here to help the model understand the context of the audio
+    whisper:
+      name: 'medium'
+      download_root: 'models/whisper'
+      device: 'cpu'
+      prompt: '' # You can put a prompt here to help the model understand the context of the audio
+    # FunASR currently needs internet connection on launch
+    # to download / check the models. You can disconnect the internet after initialization.
+    # Or you can use sherpa onnx asr or Faster-Whisper for complete offline experience
+    fun_asr:
+      model_name: 'iic/SenseVoiceSmall' # or 'paraformer-zh'
+      vad_model: 'fsmn-vad' # this is only used to make it works if audio is longer than 30s
+      punc_model: 'ct-punc' # punctuation model.
+      device: 'cpu'
+      disable_update: True # should we check FunASR updates everytime on launch
+      ncpu: 4 # number of threads for CPU internal operations.
+      hub: 'ms' # ms (default) to download models from ModelScope. Use hf to download models from Hugging Face.
+      use_itn: False
+      language: 'auto' # zh, en, auto
+    # pip install sherpa-onnx
+    # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+    # ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
+    sherpa_onnx_asr:
+      model_type: 'sense_voice' # 'transducer', 'paraformer', 'nemo_ctc', 'wenet_ctc', 'whisper', 'tdnn_ctc', 'sense_voice', 'fire_red_asr'
+      #  Choose only ONE of the following, depending on the model_type:
+      # --- For model_type: 'transducer' ---
+      # encoder: ''        # Path to the encoder model (e.g., 'path/to/encoder.onnx')
+      # decoder: ''        # Path to the decoder model (e.g., 'path/to/decoder.onnx')
+      # joiner: ''         # Path to the joiner model (e.g., 'path/to/joiner.onnx')
+      # --- For model_type: 'paraformer' ---
+      # paraformer: ''     # Path to the paraformer model (e.g., 'path/to/model.onnx')
+      # --- For model_type: 'fire_red_asr' (FireredASR - High-performance Chinese & English ASR with dialect support) ---
+      # fire_red_asr_encoder: ''    # Path to the encoder model (e.g., 'path/to/encoder.onnx')
+      # fire_red_asr_decoder: ''    # Path to the decoder model (e.g., 'path/to/decoder.onnx')
+      # --- For model_type: 'nemo_ctc' ---
+      # nemo_ctc: ''        # Path to the NeMo CTC model (e.g., 'path/to/model.onnx')
+      # --- For model_type: 'wenet_ctc' ---
+      # wenet_ctc: ''       # Path to the WeNet CTC model (e.g., 'path/to/model.onnx')
+      # --- For model_type: 'tdnn_ctc' ---
+      # tdnn_model: ''      # Path to the TDNN CTC model (e.g., 'path/to/model.onnx')
+      # --- For model_type: 'whisper' ---
+      # whisper_encoder: '' # Path to the Whisper encoder model (e.g., 'path/to/encoder.onnx')
+      # whisper_decoder: '' # Path to the Whisper decoder model (e.g., 'path/to/decoder.onnx')
+      # --- For model_type: 'sense_voice' ---
+      # I've coded so that the sense voice model will get automatically downloaded.
+      # For other models, you need to download them yourself
+      sense_voice: './models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.int8.onnx' # Path to the SenseVoice model (e.g., 'path/to/model.onnx')
+      tokens: './models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt' # Path to tokens.txt (required for all model types)
+      # --- Optional parameters (with defaults shown) ---
+      # hotwords_file: ''     # Path to hotwords file (if using hotwords)
+      # hotwords_score: 1.5   # Score for hotwords
+      # modeling_unit: ''     # Modeling unit for hotwords (if applicable)
+      # bpe_vocab: ''         # Path to BPE vocabulary (if applicable)
+      num_threads: 4 # Number of threads
+      # whisper_language: '' # Language for Whisper models (e.g., 'en', 'zh', etc. - if using Whisper)
+      # whisper_task: 'transcribe'  # Task for Whisper models ('transcribe' or 'translate' - if using Whisper)
+      # whisper_tail_paddings: -1   # Tail padding for Whisper models (if using Whisper)
+      # blank_penalty: 0.0    # Penalty for blank symbol
+      # decoding_method: 'greedy_search'  # 'greedy_search' or 'modified_beam_search'
+      # debug: False # Enable debug mode
+      # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
+      # feature_dim: 80       # Feature dimension (should match the model's expected feature dimension)
+      use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
+      # Provider for inference (cpu or cuda) (cuda option needs additional settings. Please check our docs)
+      provider: 'cpu'
+    groq_whisper_asr:
+      api_key: ''
+      model: 'whisper-large-v3-turbo' # or 'whisper-large-v3'
+      lang: '' # Leave blank for auto-detect (English + all regional languages)
+  # =================== Text to Speech ===================
+  tts_config:
+    tts_model: 'edge_tts'
+    # text to speech model options:
+    #   'azure_tts', 'pyttsx3_tts', 'edge_tts', 'bark_tts',
+    #   'cosyvoice_tts', 'melo_tts', 'coqui_tts', 'piper_tts',
+    #   'fish_api_tts', 'x_tts', 'gpt_sovits_tts', 'sherpa_onnx_tts'
+    #   'minimax_tts', 'elevenlabs_tts', 'cartesia_tts'
+    azure_tts:
+      api_key: 'azure-api-key'
+      region: 'eastus'
+      voice: 'en-US-AshleyNeural'
+      pitch: '26' # percentage of the pitch adjustment
+      rate: '1' # rate of speak
+    bark_tts:
+      voice: 'v2/en_speaker_1'
+    edge_tts:
+      # Check out doc at https://github.com/rany2/edge-tts
+      # Use `edge-tts --list-voices` to list all available voices
+      # Available voices (use `edge-tts --list-voices` for full list):
+      # English (default):  en-US-AvaMultilingualNeural | en-IN-NeerjaNeural
+      # Tamil:              ta-IN-PallaviNeural
+      # Hindi:              hi-IN-SwaraNeural
+      # Telugu:             te-IN-ShrutiNeural
+      # Kannada:            kn-IN-GaganNeural
+      # Malayalam:          ml-IN-SobhanaNeural
+      # Bengali:            bn-IN-TanishaaNeural
+      # Marathi:            mr-IN-AarohiNeural
+      # Gujarati:           gu-IN-DhwaniNeural
+      # Punjabi:            pa-IN-OjasNeural
+      voice: 'en-US-AvaMultilingualNeural'  # DEFAULT: English multilingual
+    # pyttsx3_tts doesn't have any config.
+    piper_tts:
+      model_path: 'models/piper/zh_CN-huayan-medium.onnx'  # Path to the model file (.onnx)
+      speaker_id: 0             # Speaker ID (for multi-speaker models; keep 0 for single-speaker models)
+      length_scale: 1.0         # Speech speed control (0.5 = 2x faster, 1.0 = normal, 2.0 = 2x slower)
+      noise_scale: 0.667        # Degree of audio variation (0.0–1.0; higher = richer, more varied; recommended 0.667)
+      noise_w: 0.8              # Speaking style variation (0.0–1.0; higher = more expressive; recommended 0.8)
+      volume: 1.0               # Volume level (0.0–1.0; 1.0 = normal)
+      normalize_audio: true     # Whether to normalize audio (recommended: true, for more consistent volume)
+      use_cuda: false           # Whether to use GPU acceleration (requires onnxruntime-gpu)
+    cosyvoice_tts: # Cosy Voice TTS connects to the gradio webui
+      # Check their documentation for deployment and the meaning of the following configurations
+      client_url: 'http://127.0.0.1:50000/' # CosyVoice gradio demo webui url
+      mode_checkbox_group: '预训练音色'
+      sft_dropdown: '中文女'
+      prompt_text: ''
+      prompt_wav_upload_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
+      prompt_wav_record_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
+      instruct_text: ''
+      seed: 0
+      api_name: '/generate_audio'
+    cosyvoice2_tts: # Cosy Voice TTS connects to the gradio webui
+      # Check their documentation for deployment and the meaning of the following configurations
+      client_url: 'http://127.0.0.1:50000/' # CosyVoice gradio demo webui url
+      mode_checkbox_group: '3s极速复刻'
+      sft_dropdown: ''
+      prompt_text: ''
+      prompt_wav_upload_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
+      prompt_wav_record_url: 'https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav'
+      instruct_text: ''
+      stream: False
+      seed: 0
+      speed: 1.0
+      api_name: '/generate_audio'
+    melo_tts:
+      speaker: 'EN-Default' # ZH
+      language: 'EN' # ZH
+      device: 'auto' # You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
+      speed: 1.0
+    x_tts:
+      api_url: 'http://127.0.0.1:8020/tts_to_audio'
+      speaker_wav: 'female'
+      language: 'en'
+    gpt_sovits_tts:
+      # put ref audio to root path of GPT-Sovits, or set the path here
+      api_url: 'http://127.0.0.1:9880/tts'
+      text_lang: 'zh'
+      ref_audio_path: ''
+      prompt_lang: 'zh'
+      prompt_text: ''
+      text_split_method: 'cut5'
+      batch_size: '1'
+      media_type: 'wav'
+      streaming_mode: 'false'
+    fish_api_tts:
+      # The API key for the Fish TTS API.
+      api_key: ''
+      # The reference ID for the voice to be used. Get it on the [Fish Audio website](https://fish.audio/).
+      reference_id: ''
+      # Either 'normal' or 'balanced'. balance is faster but lower quality.
+      latency: 'balanced'
+      base_url: 'https://api.fish.audio'
+    coqui_tts:
+      # Name of the TTS model to use. If empty, will use default model
+      # do 'tts --list_models' to list supported models for coqui-tts
+      # Some examples:
+      # - 'tts_models/en/ljspeech/tacotron2-DDC' (single speaker)
+      # - 'tts_models/zh-CN/baker/tacotron2-DDC-GST' (single speaker for chinese)
+      # - 'tts_models/multilingual/multi-dataset/your_tts' (multi-speaker)
+      # - 'tts_models/multilingual/multi-dataset/xtts_v2' (multi-speaker)
+      model_name: 'tts_models/en/ljspeech/tacotron2-DDC'
+      speaker_wav: ''
+      language: 'en'
+      device: ''
+    siliconflow_tts:
+      api_url: "https://api.siliconflow.cn/v1/audio/speech"
+      api_key: "your key"
+      default_model: "FunAudioLLM/CosyVoice2-0.5B"
+      default_voice: "speech:Dreamflowers:5bdstvc39i:xkqldnpasqmoqbakubom your voice name"  # Default voice configuration in the format: "speech:MODEL_NAME:VOICE_ID:your voice name"
+      sample_rate: 32000  # Control the output sample rate. The default values and differ for different video output types, as follows: opus: Supports 48000 Hz. wav, pcm: Supports 8000, 16000, 24000, 32000, 44100 Hz, with a default of 44100 Hz. mp3: Supports 32000, 44100 Hz, with a default of 44100 Hz.
+      response_format: "mp3" # The format to audio out. Supported formats are mp3, opus, wav, pcm
+      stream: true
+      speed: 1
+      gain: 0
+    # pip install sherpa-onnx
+    # documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+    # TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
+    # see config_alts for more examples
+    sherpa_onnx_tts:
+      vits_model: '/path/to/tts-models/vits-melo-tts-zh_en/model.onnx' # Path to VITS model file
+      vits_lexicon: '/path/to/tts-models/vits-melo-tts-zh_en/lexicon.txt' # Path to lexicon file (optional)
+      vits_tokens: '/path/to/tts-models/vits-melo-tts-zh_en/tokens.txt' # Path to tokens file
+      vits_data_dir: '' # '/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data'  # Path to espeak-ng data (optional)
+      vits_dict_dir: '/path/to/tts-models/vits-melo-tts-zh_en/dict' # Path to Jieba dict (optional, for Chinese)
+      tts_rule_fsts: '/path/to/tts-models/vits-melo-tts-zh_en/number.fst,/path/to/tts-models/vits-melo-tts-zh_en/phone.fst,/path/to/tts-models/vits-melo-tts-zh_en/date.fst,/path/to/tts-models/vits-melo-tts-zh_en/new_heteronym.fst' # Path to rule FSTs file (optional)
+      max_num_sentences: 2 # Max sentences per batch (or -1 for all)
+      sid: 1 # Speaker ID (for multi-speaker models)
+      provider: 'cpu' # Use 'cpu', 'cuda' (GPU), or 'coreml' (Apple)
+      num_threads: 1 # Number of computation threads
+      speed: 1.0 # Speech speed (1.0 is normal)
+      debug: false # Enable debug mode (True/False)
+    spark_tts:
+      api_url: 'http://127.0.0.1:6006/' # API URL. Uses Gradio's built-in front-end API. Repository: https://github.com/SparkAudio/Spark-TTS
+      api_name:  "voice_clone" # Endpoint name. Options: voice_clone, voice_creation
+      prompt_wav_upload: "https://uploadstatic.mihoyo.com/ys-obc/2022/11/02/16576950/4d9feb71760c5e8eb5f6c700df12fa0c_6824265537002152805.mp3" # Reference audio URL. Provide if api_name equals "voice_clone"
+      gender:  "female" # Voice type (gender). Provide if api_name equals "voice_creation"
+      pitch:  3 # Pitch shift (in semitones) default 3,range 1-5. Valid only if api_name equals "voice_creation"
+      speed:  3 # Speed of the voice (in percent) default 3,range 1-5. Valid only if api_name equals "voice_creation"
+    openai_tts: # Configuration for OpenAI-compatible TTS endpoints
+      # These settings override the defaults in the openai_tts.py file if provided
+      model: 'kokoro' # Model name expected by the server (e.g., 'tts-1', 'kokoro')
+      voice: 'af_sky+af_bella' # Voice name(s) expected by the server (e.g., 'alloy', 'af_sky+af_bella')
+      api_key: 'not-needed' # API key if required by the server
+      base_url: 'http://localhost:8880/v1' # Base URL of the TTS server
+      file_extension: 'mp3' # Audio file format ('mp3' or 'wav')
+    # For more details, see: https://platform.minimaxi.com/document/Announcement
+    minimax_tts:
+      group_id: '' # Your minimax group_id
+      api_key: '' # Your minimax api_key
+      # Supported models: 'speech-02-hd', 'speech-02-turbo' (recommended: 'speech-02-turbo')
+      model: 'speech-02-turbo' # minimax model name
+      voice_id: 'female-shaonv' # minimax voice id, default is 'female-shaonv'
+      # Custom pronunciation dictionary, default empty.
+      # Example: '{"tone": ["测试/(ce4)(shi4)", "危险/dangerous"]}'
+      pronunciation_dict: ''
+    elevenlabs_tts:
+      api_key: ''
+      voice_id: '' # Voice ID from ElevenLabs
+      model_id: 'eleven_multilingual_v2' # Model ID (e.g., eleven_multilingual_v2)
+      output_format: 'mp3_44100_128' # Output audio format (e.g., mp3_44100_128)
+      stability: 0.5 # Voice stability (0.0 to 1.0)
+      similarity_boost: 0.5 # Voice similarity boost (0.0 to 1.0)
+      style: 0.0 # Voice style exaggeration (0.0 to 1.0)
+      use_speaker_boost: true # Enable speaker boost for better quality
+    cartesia_tts:
+      api_key: ''
+      voice_id: '' # Voice ID from Cartesia
+      model_id: 'sonic-3' # Model ID (e.g., sonic-3)
+      output_format: 'wav' # Output audio format (e.g., wav)
+      language: 'en' # Output language of voice (e.g., en)
+      emotion: 'neutral' # Emotional guidance
+      volume: 1.0 # Voice volume (0.5 to 2.0)
+      speed: 1.0 # Voice speed (0.6 to 1.5)
+  # =================== Voice Activity Detection ===================
+  vad_config:
+    vad_model: null
+    silero_vad:
+      orig_sr: 16000 # Original Audio Sample Rate
+      target_sr: 16000 # Target Audio Sample Rate
+      prob_threshold: 0.4 # Probability Threshold for VAD
+      db_threshold: 60 # Decibel Threshold for VAD
+      required_hits: 3 # Number of consecutive hits required to consider speech
+      required_misses: 24 # Number of consecutive misses required to consider silence
+      smoothing_window: 5 # Smoothing window size for VAD
+  tts_preprocessor_config:
+    # settings regarding preprocessing for text that goes into TTS
+    remove_special_char: True # remove special characters like emoji from audio generation
+    ignore_brackets: True # ignore everything inside brackets
+    ignore_parentheses: True # ignore everything inside parentheses
+    ignore_asterisks: True # ignore everything wrapped inside asterisks
+    ignore_angle_brackets: True # ignore everything wrapped inside <text>
+    translator_config:
+      # Like... you speak and read the subtitles in English, and the TTS speaks Japanese or that kind of things
+      translate_audio: False # Warning: you need to deploy DeeplX to use this. Otherwise it's going to crash
+      translate_provider: 'deeplx' # deeplx or tencent
+      deeplx:
+        deeplx_target_lang: 'JA'
+        deeplx_api_endpoint: 'http://localhost:1188/v2/translate'
+      #  Tencent Text Translation  5 million characters per month  Remember to turn off post-payment, need to manually go to Machine Translation Console > System Settings to disable
+      #   https://cloud.tencent.com/document/product/551/35017
+      #   https://console.cloud.tencent.com/cam/capi
+      tencent:
+        secret_id: ''
+        secret_key: ''
+        region: 'ap-guangzhou'
+        source_lang: 'zh'
+        target_lang: 'ja'
+# Live Streaming Integration
+live_config:
+  bilibili_live:
+    # List of BiliBili live room IDs to monitor
+    room_ids: [1991478060]
+    # SESSDATA cookie value (optional, for authenticated requests)
+    sessdata: ""

doc/README.md ADDED Viewed

	@@ -0,0 +1,4 @@

+For full documentation, please visit our [documentation site](https://open-llm-vtuber.github.io/) or view the [source repository](https://github.com/Open-LLM-VTuber/open-llm-vtuber.github.io).
+> **Note:**
+> The `sample_conf` directory contains legacy sample configuration files for running various models with sherpa-onnx. These files are deprecated and will be removed after we extract the relevant sherpa-onnx information.

doc/sample_conf/sherpaASRTTS_sense_voice_melo.yaml ADDED Viewed

	@@ -0,0 +1,78 @@

+SYSTEM_CONFIG:
+  CONF_NAME: "sherpaASRTTS_sense_voice_melo"
+  CONF_UID: "sherpaASRTTS_sense_voice_melo"
+#  ============== Voice Interaction Settings ==============
+# === Automatic Speech Recognition ===
+VOICE_INPUT_ON: True
+# Put your mic in the browser or in the terminal? (would increase latency)
+MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
+# speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
+ASR_MODEL: "SherpaOnnxASR"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
+SherpaOnnxASR:
+  model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
+  #  Choose only ONE of the following, depending on the model_type:
+  # --- For model_type: "transducer" ---
+  # encoder: ""        # Path to the encoder model (e.g., "path/to/encoder.onnx")
+  # decoder: ""        # Path to the decoder model (e.g., "path/to/decoder.onnx")
+  # joiner: ""         # Path to the joiner model (e.g., "path/to/joiner.onnx")
+  # --- For model_type: "paraformer" ---
+  # paraformer: ""     # Path to the paraformer model (e.g., "path/to/model.onnx")
+  # --- For model_type: "nemo_ctc" ---
+  # nemo_ctc: ""        # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "wenet_ctc" ---
+  # wenet_ctc: ""       # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "tdnn_ctc" ---
+  # tdnn_model: ""      # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "whisper" ---
+  # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
+  # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
+  # --- For model_type: "sense_voice" ---
+  sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
+  tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
+  # --- Optional parameters (with defaults shown) ---
+  # hotwords_file: ""     # Path to hotwords file (if using hotwords)
+  # hotwords_score: 1.5   # Score for hotwords
+  # modeling_unit: ""     # Modeling unit for hotwords (if applicable)
+  # bpe_vocab: ""         # Path to BPE vocabulary (if applicable)
+  num_threads: 4 # Number of threads
+  # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
+  # whisper_task: "transcribe"  # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
+  # whisper_tail_paddings: -1   # Tail padding for Whisper models (if using Whisper)
+  # blank_penalty: 0.0    # Penalty for blank symbol
+  # decoding_method: "greedy_search"  # "greedy_search" or "modified_beam_search"
+  # debug: False # Enable debug mode
+  # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
+  # feature_dim: 80       # Feature dimension (should match the model's expected feature dimension)
+  use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
+# ============== Text to Speech ==============
+TTS_MODEL: "SherpaOnnxTTS"
+# text to speech model options:
+#   "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
+#   "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
+#   "fishAPITTS", "SherpaOnnxTTS"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
+SherpaOnnxTTS:
+    vits_model: "/path/to/tts-models/vits-melo-tts-zh_en/model.onnx"  # Path to VITS model file
+    vits_lexicon: "/path/to/tts-models/vits-melo-tts-zh_en/lexicon.txt"  # Path to lexicon file (optional)
+    vits_tokens: "/path/to/tts-models/vits-melo-tts-zh_en/tokens.txt"  # Path to tokens file
+    vits_data_dir: "" # "/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data"  # Path to espeak-ng data (optional)
+    vits_dict_dir: "/path/to/tts-models/vits-melo-tts-zh_en/dict"  # Path to Jieba dict (optional, for Chinese)
+    tts_rule_fsts: "/path/to/tts-models/vits-melo-tts-zh_en/number.fst,/path/to/tts-models/vits-melo-tts-zh_en/phone.fst,/path/to/tts-models/vits-melo-tts-zh_en/date.fst,/path/to/tts-models/vits-melo-tts-zh_en/new_heteronym.fst" # Path to rule FSTs file (optional)
+    max_num_sentences: 2  # Max sentences per batch (or -1 for all)
+    sid: 1  # Speaker ID (for multi-speaker models)
+    provider: "cpu"  # Use "cpu", "cuda" (GPU), or "coreml" (Apple)
+    num_threads: 1  # Number of computation threads
+    speed: 1.0  # Speech speed (1.0 is normal)
+    debug: false  # Enable debug mode (True/False)

doc/sample_conf/sherpaASRTTS_sense_voice_piper_en.yaml ADDED Viewed

	@@ -0,0 +1,77 @@

+SYSTEM_CONFIG:
+  CONF_NAME: "sherpaASRTTS_sense_voice_piper_en"
+  CONF_UID: "sherpaASRTTS_sense_voice_piper_en"
+#  ============== Voice Interaction Settings ==============
+# === Automatic Speech Recognition ===
+VOICE_INPUT_ON: True
+# Put your mic in the browser or in the terminal? (would increase latency)
+MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
+# speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
+ASR_MODEL: "SherpaOnnxASR"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
+SherpaOnnxASR:
+  model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
+  #  Choose only ONE of the following, depending on the model_type:
+  # --- For model_type: "transducer" ---
+  # encoder: ""        # Path to the encoder model (e.g., "path/to/encoder.onnx")
+  # decoder: ""        # Path to the decoder model (e.g., "path/to/decoder.onnx")
+  # joiner: ""         # Path to the joiner model (e.g., "path/to/joiner.onnx")
+  # --- For model_type: "paraformer" ---
+  # paraformer: ""     # Path to the paraformer model (e.g., "path/to/model.onnx")
+  # --- For model_type: "nemo_ctc" ---
+  # nemo_ctc: ""        # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "wenet_ctc" ---
+  # wenet_ctc: ""       # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "tdnn_ctc" ---
+  # tdnn_model: ""      # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "whisper" ---
+  # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
+  # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
+  # --- For model_type: "sense_voice" ---
+  sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
+  tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
+  # --- Optional parameters (with defaults shown) ---
+  # hotwords_file: ""     # Path to hotwords file (if using hotwords)
+  # hotwords_score: 1.5   # Score for hotwords
+  # modeling_unit: ""     # Modeling unit for hotwords (if applicable)
+  # bpe_vocab: ""         # Path to BPE vocabulary (if applicable)
+  num_threads: 4 # Number of threads
+  # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
+  # whisper_task: "transcribe"  # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
+  # whisper_tail_paddings: -1   # Tail padding for Whisper models (if using Whisper)
+  # blank_penalty: 0.0    # Penalty for blank symbol
+  # decoding_method: "greedy_search"  # "greedy_search" or "modified_beam_search"
+  # debug: False # Enable debug mode
+  # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
+  # feature_dim: 80       # Feature dimension (should match the model's expected feature dimension)
+  use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
+# ============== Text to Speech ==============
+TTS_MODEL: "SherpaOnnxTTS"
+# text to speech model options:
+#   "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
+#   "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
+#   "fishAPITTS", "SherpaOnnxTTS"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
+SherpaOnnxTTS:
+    vits_model: "/path/to/tts-models/vits-piper-en_GB-cori-high/en_GB-cori-high.onnx"  # Path to VITS model file
+    vits_lexicon: ""  # Path to lexicon file (optional)
+    vits_tokens: "/path/to/tts-models/vits-piper-en_GB-cori-high/tokens.txt"  # Path to tokens file
+    vits_data_dir: "/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data"  # Path to espeak-ng data (optional)
+    vits_dict_dir: ""  # Path to Jieba dict (optional, for Chinese)
+    tts_rule_fsts: ""  # Path to rule FSTs file (optional)
+    max_num_sentences: 2  # Max sentences per batch (or -1 for all)
+    sid: 0  # Speaker ID (for multi-speaker models)
+    provider: "cpu"  # Use "cpu", "cuda" (GPU), or "coreml" (Apple)
+    num_threads: 1  # Number of computation threads
+    speed: 1.0  # Speech speed (1.0 is normal)
+    debug: false  # Enable debug mode (True/False)

doc/sample_conf/sherpaASRTTS_sense_voice_vits_zh.yaml ADDED Viewed

	@@ -0,0 +1,77 @@

+SYSTEM_CONFIG:
+  CONF_NAME: "sherpaASRTTS_sense_voice_vits_zh"
+  CONF_UID: "sherpaASRTTS_sense_voice_vits_zh"
+#  ============== Voice Interaction Settings ==============
+# === Automatic Speech Recognition ===
+VOICE_INPUT_ON: True
+# Put your mic in the browser or in the terminal? (would increase latency)
+MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
+# speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
+ASR_MODEL: "SherpaOnnxASR"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
+SherpaOnnxASR:
+  model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
+  #  Choose only ONE of the following, depending on the model_type:
+  # --- For model_type: "transducer" ---
+  # encoder: ""        # Path to the encoder model (e.g., "path/to/encoder.onnx")
+  # decoder: ""        # Path to the decoder model (e.g., "path/to/decoder.onnx")
+  # joiner: ""         # Path to the joiner model (e.g., "path/to/joiner.onnx")
+  # --- For model_type: "paraformer" ---
+  # paraformer: ""     # Path to the paraformer model (e.g., "path/to/model.onnx")
+  # --- For model_type: "nemo_ctc" ---
+  # nemo_ctc: ""        # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "wenet_ctc" ---
+  # wenet_ctc: ""       # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "tdnn_ctc" ---
+  # tdnn_model: ""      # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "whisper" ---
+  # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
+  # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
+  # --- For model_type: "sense_voice" ---
+  sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
+  tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
+  # --- Optional parameters (with defaults shown) ---
+  # hotwords_file: ""     # Path to hotwords file (if using hotwords)
+  # hotwords_score: 1.5   # Score for hotwords
+  # modeling_unit: ""     # Modeling unit for hotwords (if applicable)
+  # bpe_vocab: ""         # Path to BPE vocabulary (if applicable)
+  num_threads: 4 # Number of threads
+  # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
+  # whisper_task: "transcribe"  # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
+  # whisper_tail_paddings: -1   # Tail padding for Whisper models (if using Whisper)
+  # blank_penalty: 0.0    # Penalty for blank symbol
+  # decoding_method: "greedy_search"  # "greedy_search" or "modified_beam_search"
+  # debug: False # Enable debug mode
+  # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
+  # feature_dim: 80       # Feature dimension (should match the model's expected feature dimension)
+  use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
+# ============== Text to Speech ==============
+TTS_MODEL: "SherpaOnnxTTS"
+# text to speech model options:
+#   "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
+#   "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
+#   "fishAPITTS", "SherpaOnnxTTS"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# TTS models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
+SherpaOnnxTTS:
+    vits_model: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/model.onnx"  # Path to VITS model file
+    vits_lexicon: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/lexicon.txt"  # Path to lexicon file (optional)
+    vits_tokens: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/tokens.txt"  # Path to tokens file
+    vits_data_dir: "" # "/path/to/tts-models/vits-piper-en_GB-cori-high/espeak-ng-data"  # Path to espeak-ng data (optional)
+    vits_dict_dir: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/dict"  # Path to Jieba dict (optional, for Chinese)
+    tts_rule_fsts: "/path/to/tts-models/sherpa-onnx-vits-zh-ll/number.fst,/path/to/tts-models/sherpa-onnx-vits-zh-ll/phone.fst,/path/to/tts-models/sherpa-onnx-vits-zh-ll/date.fst" # Path to rule FSTs file (optional)
+    max_num_sentences: 2  # Max sentences per batch (or -1 for all)
+    sid: 0  # Speaker ID (for multi-speaker models) 0-4
+    provider: "cpu"  # Use "cpu", "cuda" (GPU), or "coreml" (Apple)
+    num_threads: 1  # Number of computation threads
+    speed: 1.0  # Speech speed (1.0 is normal)
+    debug: false  # Enable debug mode (True/False)

doc/sample_conf/sherpaASR_paraformer.yaml ADDED Viewed

	@@ -0,0 +1,65 @@

+SYSTEM_CONFIG:
+  CONF_NAME: "sherpaASR_paraformer"
+  CONF_UID: "sherpaASR_paraformer"
+#  ============== Voice Interaction Settings ==============
+# === Automatic Speech Recognition ===
+VOICE_INPUT_ON: True
+# Put your mic in the browser or in the terminal? (would increase latency)
+MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
+# speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
+ASR_MODEL: "SherpaOnnxASR"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
+SherpaOnnxASR:
+  model_type: "paraformer" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
+  #  Choose only ONE of the following, depending on the model_type:
+  # --- For model_type: "transducer" ---
+  # encoder: ""        # Path to the encoder model (e.g., "path/to/encoder.onnx")
+  # decoder: ""        # Path to the decoder model (e.g., "path/to/decoder.onnx")
+  # joiner: ""         # Path to the joiner model (e.g., "path/to/joiner.onnx")
+  # --- For model_type: "paraformer" ---
+  paraformer: "/path/to/asr-models/sherpa-onnx-paraformer-zh-2024-03-09/model.onnx"     # Path to the paraformer model (e.g., "path/to/model.onnx")
+  # --- For model_type: "nemo_ctc" ---
+  # nemo_ctc: ""        # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "wenet_ctc" ---
+  # wenet_ctc: ""       # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "tdnn_ctc" ---
+  # tdnn_model: ""      # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "whisper" ---
+  # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
+  # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
+  # --- For model_type: "sense_voice" ---
+  # sense_voice: "/path/to/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
+  tokens: "/path/to/asr-models/sherpa-onnx-paraformer-zh-2024-03-09/tokens.txt" # Path to tokens.txt (required for all model types)
+  # --- Optional parameters (with defaults shown) ---
+  # hotwords_file: ""     # Path to hotwords file (if using hotwords)
+  # hotwords_score: 1.5   # Score for hotwords
+  # modeling_unit: ""     # Modeling unit for hotwords (if applicable)
+  # bpe_vocab: ""         # Path to BPE vocabulary (if applicable)
+  num_threads: 2 # Number of threads
+  # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
+  # whisper_task: "transcribe"  # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
+  # whisper_tail_paddings: -1   # Tail padding for Whisper models (if using Whisper)
+  # blank_penalty: 0.0    # Penalty for blank symbol
+  # decoding_method: "greedy_search"  # "greedy_search" or "modified_beam_search"
+  # debug: False # Enable debug mode
+  # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
+  # feature_dim: 80       # Feature dimension (should match the model's expected feature dimension)
+  # use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
+  # ============== Text to Speech ==============
+TTS_MODEL: "edgeTTS"
+# text to speech model options:
+#   "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
+#   "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
+#   "fishAPITTS", "SherpaOnnxTTS"
+edgeTTS:
+  # Check out doc at https://github.com/rany2/edge-tts
+  # Use `edge-tts --list-voices` to list all available voices
+  voice: "en-US-AvaMultilingualNeural" #"zh-CN-XiaoxiaoNeural" # "ja-JP-NanamiNeural"

doc/sample_conf/sherpaASR_sense_voice.yaml ADDED Viewed

	@@ -0,0 +1,67 @@

+SYSTEM_CONFIG:
+  CONF_NAME: "sherpaASR_sense_voice"
+  CONF_UID: "sherpaASR_sense_voice"
+#  ============== Voice Interaction Settings ==============
+# === Automatic Speech Recognition ===
+VOICE_INPUT_ON: True
+# Put your mic in the browser or in the terminal? (would increase latency)
+MIC_IN_BROWSER: False # Deprecated and useless now. Do not enable it. Bad things will happen.
+# speech to text model options: "Faster-Whisper", "WhisperCPP", "Whisper", "AzureASR", "FunASR", "GroqWhisperASR", "SherpaOnnxASR"
+ASR_MODEL: "SherpaOnnxASR"
+# pip install sherpa-onnx
+# documentation: https://k2-fsa.github.io/sherpa/onnx/index.html
+# ASR models download: https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
+SherpaOnnxASR:
+  model_type: "sense_voice" # "transducer", "paraformer", "nemo_ctc", "wenet_ctc", "whisper", "tdnn_ctc"
+  #  Choose only ONE of the following, depending on the model_type:
+  # --- For model_type: "transducer" ---
+  # encoder: ""        # Path to the encoder model (e.g., "path/to/encoder.onnx")
+  # decoder: ""        # Path to the decoder model (e.g., "path/to/decoder.onnx")
+  # joiner: ""         # Path to the joiner model (e.g., "path/to/joiner.onnx")
+  # --- For model_type: "paraformer" ---
+  # paraformer: ""     # Path to the paraformer model (e.g., "path/to/model.onnx")
+  # --- For model_type: "nemo_ctc" ---
+  # nemo_ctc: ""        # Path to the NeMo CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "wenet_ctc" ---
+  # wenet_ctc: ""       # Path to the WeNet CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "tdnn_ctc" ---
+  # tdnn_model: ""      # Path to the TDNN CTC model (e.g., "path/to/model.onnx")
+  # --- For model_type: "whisper" ---
+  # whisper_encoder: "" # Path to the Whisper encoder model (e.g., "path/to/encoder.onnx")
+  # whisper_decoder: "" # Path to the Whisper decoder model (e.g., "path/to/decoder.onnx")
+  # --- For model_type: "sense_voice" ---
+  sense_voice: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx" # Path to the SenseVoice model (e.g., "path/to/model.onnx")
+  tokens: "/path/to/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt" # Path to tokens.txt (required for all model types)
+  # --- Optional parameters (with defaults shown) ---
+  # hotwords_file: ""     # Path to hotwords file (if using hotwords)
+  # hotwords_score: 1.5   # Score for hotwords
+  # modeling_unit: ""     # Modeling unit for hotwords (if applicable)
+  # bpe_vocab: ""         # Path to BPE vocabulary (if applicable)
+  num_threads: 2 # Number of threads
+  # whisper_language: "" # Language for Whisper models (e.g., "en", "zh", etc. - if using Whisper)
+  # whisper_task: "transcribe"  # Task for Whisper models ("transcribe" or "translate" - if using Whisper)
+  # whisper_tail_paddings: -1   # Tail padding for Whisper models (if using Whisper)
+  # blank_penalty: 0.0    # Penalty for blank symbol
+  # decoding_method: "greedy_search"  # "greedy_search" or "modified_beam_search"
+  # debug: False # Enable debug mode
+  # sample_rate: 16000 # Sample rate (should match the model's expected sample rate)
+  # feature_dim: 80       # Feature dimension (should match the model's expected feature dimension)
+  use_itn: True # Enable ITN for SenseVoice models (should set to False if not using SenseVoice models)
+# ============== Text to Speech ==============
+TTS_MODEL: "edgeTTS"
+# text to speech model options:
+#   "AzureTTS", "pyttsx3TTS", "edgeTTS", "barkTTS",
+#   "cosyvoiceTTS", "meloTTS", "piperTTS", "coquiTTS",
+#   "fishAPITTS"
+edgeTTS:
+  # Check out doc at https://github.com/rany2/edge-tts
+  # Use `edge-tts --list-voices` to list all available voices
+  voice: "en-US-AvaMultilingualNeural" #"zh-CN-XiaoxiaoNeural" # "ja-JP-NanamiNeural"