Upload 4 files

Browse files

Files changed (4) hide show

README.md +285 -0
pyproject.toml +70 -0
requirements.txt +162 -0
uv.lock +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,285 @@

+# VieNeu-TTS
+[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/pnnbao97/VieNeu-TTS)
+[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-0.5B-yellow)](https://huggingface.co/pnnbao-ump/VieNeu-TTS)
+[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-0.3B-orange)](https://huggingface.co/pnnbao-ump/VieNeu-TTS-0.3B)
+[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-0.3B--GGUF-green)](https://huggingface.co/pnnbao-ump/VieNeu-TTS-0.3B-q8-gguf)
+[![Discord](https://img.shields.io/badge/Discord-Join%20Us-5865F2?logo=discord&logoColor=white)](https://discord.gg/mQWr4cp3)
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1V1DjG-KdmurCAhvXrxxTLsa9tteDxSVO?usp=sharing)
+<img width="899" height="615" alt="Untitled" src="https://github.com/user-attachments/assets/7eb9b816-6ab7-4049-866f-f85e36cb9c6f" />
+**VieNeu-TTS** is an advanced on-device Vietnamese Text-to-Speech (TTS) model with **instant voice cloning**.
+> [!TIP]
+> **Voice Cloning:** All model variants (including GGUF) support instant voice cloning with just **3-5 seconds** of reference audio.
+This project features two core architectures trained on the [VieNeu-TTS-1000h](https://huggingface.co/datasets/pnnbao-ump/VieNeu-TTS-1000h) dataset:
+- **VieNeu-TTS (0.5B):** An enhanced model fine-tuned from the NeuTTS Air architecture for maximum stability.
+- **VieNeu-TTS-0.3B:** A specialized model **trained from scratch**, delivering 2x faster inference and ultra-low latency.
+These represent a significant upgrade from the previous VieNeu-TTS-140h with the following improvements:
+- **Enhanced pronunciation**: More accurate and stable Vietnamese pronunciation
+- **Code-switching support**: Seamless transitions between Vietnamese and English
+- **Better voice cloning**: Higher fidelity and speaker consistency
+- **Real-time synthesis**: 24 kHz waveform generation on CPU or GPU
+- **Multiple model formats**: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec
+VieNeu-TTS delivers production-ready speech synthesis fully offline.
+**Author:** Phạm Nguyễn Ngọc Bảo
+---
+[<img width="600" height="595" alt="VieNeu-TTS" src="https://github.com/user-attachments/assets/6b32df9d-7e2e-474f-94c8-43d6fa586d15" />](https://github.com/user-attachments/assets/6b32df9d-7e2e-474f-94c8-43d6fa586d15)
+---
+## 🔬 Model Overview
+- **Backbone:**
+  - **VieNeu-TTS (0.5B):** Qwen-0.5B fine-tuned from [NeuTTS Air](https://huggingface.co/neuphonic/neutts-air).
+  - **VieNeu-TTS-0.3B:** Custom 0.3B model **trained from scratch**, optimized for extreme speed (2x faster).
+- **Audio codec:** NeuCodec (torch implementation; ONNX & quantized variants supported)
+- **Context window:** 2,048 tokens shared by prompt text and speech tokens
+- **Output watermark:** Enabled by default
+- **Training data:** [VieNeu-TTS-1000h](https://huggingface.co/datasets/pnnbao-ump/VieNeu-TTS-1000h) — 443,641 curated Vietnamese samples (Used for both versions).
+### Model Variants
+| Model                   | Format  | Device  | Quality    | Speed                   |
+| ----------------------- | ------- | ------- | ---------- | ----------------------- |
+| VieNeu-TTS              | PyTorch | GPU/CPU | ⭐⭐⭐⭐⭐ | Very Fast with lmdeploy |
+| VieNeu-TTS-0.3B         | PyTorch | GPU/CPU | ⭐⭐⭐⭐   | **Ultra Fast (2x)**     |
+| VieNeu-TTS-q8-gguf      | GGUF Q8 | CPU/GPU | ⭐⭐⭐⭐   | Fast                    |
+| VieNeu-TTS-q4-gguf      | GGUF Q4 | CPU/GPU | ⭐⭐⭐     | Very Fast               |
+| VieNeu-TTS-0.3B-q8-gguf | GGUF Q8 | CPU/GPU | ⭐⭐⭐⭐   | **Ultra Fast (1.5x)**   |
+| VieNeu-TTS-0.3B-q4-gguf | GGUF Q4 | CPU/GPU | ⭐⭐⭐     | **Extreme Speed (2x)**  |
+**Recommendations:**
+- **GPU users**: Use `VieNeu-TTS` (PyTorch) for best quality
+- **CPU users**: Use `VieNeu-TTS-0.3B-q4-gguf` for fastest inference or `VieNeu-TTS-0.3B-q8-gguf` for best CPU quality.
+- **Streaming**: Only GGUF models support streaming inference (Requires `llama-cpp-python >= 0.3.16`)
+---
+## ✅ Todo & Status
+- [x] Publish safetensor artifacts
+- [x] Release GGUF Q4 / Q8 models
+- [x] Release datasets (1000h and 140h)
+- [x] Enable streaming on GPU
+- [x] Provide Dockerized setup
+- [ ] Release fine-tuning code
+---
+## 🏁 Getting Started
+### 1. Clone the repository
+```bash
+git clone https://github.com/pnnbao97/VieNeu-TTS.git
+cd VieNeu-TTS
+```
+### 2. Install eSpeak NG (Required)
+Phonemizer requires eSpeak NG to function.
+- **Windows:** Download installer from [eSpeak NG Releases](https://github.com/espeak-ng/espeak-ng/releases) (Recommended: `.msi`).
+- **macOS:** `brew install espeak`
+- **Ubuntu/Debian:** `sudo apt install espeak-ng`
+- **Arch Linux:** `paru -S aur/espeak-ng`
+---
+### 3. Environment Setup (Choose ONE method)
+#### Method 1: Standard with `uv` (Recommended)
+This is the fastest and most reliable way to manage dependencies.
+**A. Install `uv`** (If you haven't already):
+- **Windows:** `powershell -c "irm https://astral.sh/uv/install.ps1 | iex"`
+- **Linux/macOS:** `curl -LsSf https://astral.sh/uv/install.sh | sh`
+**B. Choose your hardware:**
+**Option A: For GPU Users (NVIDIA 30xx/40xx/50xx)**
+> [!IMPORTANT]
+> **Update your NVIDIA Drivers & Install CUDA Toolkit!**
+> This project uses **CUDA 12.8**. Please ensure your NVIDIA driver is up-to-date (support CUDA 12.8 or newer) to avoid compatibility issues, especially on RTX 30 series.
+>
+> To use `lmdeploy`, you **MUST** install the **NVIDIA GPU Computing Toolkit**: [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads).
+```bash
+uv sync
+```
+**Option B: For CPU-only Users**
+1. Switch to CPU configuration:
+   ```bash
+   # Windows:
+   ren pyproject.toml pyproject.toml.bak
+   copy pyproject.toml.cpu pyproject.toml
+   # Linux/macOS:
+   mv pyproject.toml pyproject.toml.bak
+   cp pyproject.toml.cpu pyproject.toml
+   ```
+2. Install dependencies:
+   ```bash
+   uv sync
+   ```
+**C. Run the Application:**
+```bash
+uv run gradio_app.py
+```
+Then access the Web UI at `http://127.0.0.1:7860`.
+---
+#### Method 2: Automatic with Makefile (Alternative)
+Best if you have `make` installed (standard on Linux/macOS, or via Git Bash on Windows). It handles configuration swaps automatically.
+- **Setup GPU:** `make setup-gpu`
+- **Setup CPU:** `make setup-cpu`
+- **Run Demo:** `make demo`
+Then access the Web UI at `http://127.0.0.1:7860`.
+---
+---
+## 🐋 Docker Deployment
+For a quick start or production deployment without manually installing dependencies, use Docker.
+### Quick Start
+Copy .env.example to .env
+```
+cp .env.example .env
+```
+Build and start container
+```bash
+# Run with CPU
+docker compose --profile cpu up
+# Run with GPU (requires NVIDIA Container Toolkit)
+docker compose --profile gpu up
+```
+Access the Web UI at `http://localhost:7860`.
+For detailed deployment instructions, including production setup, see [docs/Deploy.md](docs/Deploy.md).
+---
+## 📦 Project Structure
+```
+VieNeu-TTS/
+├── examples/
+│   ├── infer_long_text.py     # CLI for long-form synthesis (chunked)
+│   └── sample_long_text.txt   # Example paragraph for testing
+├── gradio_app.py              # Local Gradio web demo with LMDeploy support
+├── main.py                    # Basic batch inference script
+├── config.yaml                # Configuration for models, codecs, and voices
+├── output_audio/              # Generated audio (created when running scripts)
+├── sample/                    # Reference voices (audio + transcript + codes)
+│   ├── Bình (nam miền Bắc).wav/txt/pt
+│   ├── Đoan (nữ miền Nam).wav/txt/pt
+│   ├── Dung (nữ miền Nam).wav/txt/pt
+│   ├── Hương (nữ miền Bắc).wav/txt/pt
+│   ├── Ly (nữ miền Bắc).wav/txt/pt
+│   ├── Ngọc (nữ miền Bắc).wav/txt/pt
+│   ├── Nguyên (nam miền Nam).wav/txt/pt
+│   ├── Sơn (nam miền Nam).wav/txt/pt
+│   ├── Tuyên (nam miền Bắc).wav/txt/pt
+│   └── Vĩnh (nam miền Nam).wav/txt/pt
+├── utils/
+│   ├── __init__.py
+│   ├── core_utils.py          # Text chunking utilities
+│   ├── normalize_text.py      # Vietnamese text normalization pipeline
+│   ├── phonemize_text.py      # Text to phoneme conversion
+│   └── phoneme_dict.json      # Phoneme dictionary
+├── vieneu_tts/
+│   ├── __init__.py            # Exports VieNeuTTS and FastVieNeuTTS
+│   └── vieneu_tts.py          # Core VieNeuTTS implementation (VieNeuTTS & FastVieNeuTTS)
+├── README.md
+├── requirements.txt           # Basic dependencies (legacy)
+├── pyproject.toml             # Project configuration with full dependencies (UV)
+└── uv.lock                    # UV lock file for dependency management
+```
+---
+## 📚 References
+- [GitHub Repository](https://github.com/pnnbao97/VieNeu-TTS)
+- [Hugging Face Model (0.5B)](https://huggingface.co/pnnbao-ump/VieNeu-TTS)
+- [Hugging Face Model (0.3B)](https://huggingface.co/pnnbao-ump/VieNeu-TTS-0.3B)
+- [VieNeuTTS Fine-tuning Guide](https://github.com/pnnbao-ump/VieNeuTTS/blob/main/finetune.ipynb)
+- [VieNeuCodec dataset](https://huggingface.co/datasets/pnnbao-ump/VieNeuCodec-dataset)
+---
+## 📄 License
+- **VieNeu-TTS (0.5B):** Original terms (Apache 2.0).
+- **VieNeu-TTS-0.3B:** Released under **CC BY-NC 4.0** (Non-Commercial).
+  - This version is currently **experimental**.
+  - **Commercial use is prohibited** without authorization. Please contact the author for commercial licensing.
+---
+## 📑 Citation
+```bibtex
+@misc{vieneutts2026,
+  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
+  author       = {Pham Nguyen Ngoc Bao},
+  year         = {2026},
+  publisher    = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
+}
+```
+## 🤝 Contributing
+Contributions are welcome!
+1. Fork the repository
+2. Create a feature branch: `git checkout -b feature/amazing-feature`
+3. Commit your changes: `git commit -m "Add amazing feature"`
+4. Push the branch: `git push origin feature/amazing-feature`
+5. Open a pull request
+---
+## 📞 Support
+- GitHub Issues: [github.com/pnnbao97/VieNeu-TTS/issues](https://github.com/pnnbao97/VieNeu-TTS/issues)
+- Hugging Face: [huggingface.co/pnnbao-ump](https://huggingface.co/pnnbao-ump)
+- Discord: [Join with us](https://discord.gg/mQWr4cp3)
+- Facebook: [Phạm Nguyễn Ngọc Bảo](https://www.facebook.com/bao.phamnguyenngoc.5)
+---
+## 🙏 Acknowledgements
+This project builds upon [NeuTTS Air](https://huggingface.co/neuphonic/neutts-air) for the original 0.5B model. The 0.3B version is a custom architecture trained from scratch using the [VieNeu-TTS-1000h](https://huggingface.co/datasets/pnnbao-ump/VieNeu-TTS-1000h) dataset.
+---
+**Made with ❤️ for the Vietnamese TTS community**

pyproject.toml ADDED Viewed

	@@ -0,0 +1,70 @@

+[tool.uv]
+index-strategy = "unsafe-best-match"
+required-environments = [
+    "sys_platform == 'win32' and platform_machine == 'AMD64'",
+    "sys_platform == 'linux' and platform_machine == 'x86_64'",
+    "sys_platform == 'darwin' and platform_machine == 'arm64'",
+]
+override-dependencies = [
+    "nvidia-nccl-cu12; sys_platform == 'linux'",
+]
+[[tool.uv.index]]
+name = "pytorch"
+url = "https://download.pytorch.org/whl/cu128"
+explicit = true
+[[tool.uv.index]]
+name = "pypi"
+url = "https://pypi.org/simple"
+[project]
+name = "VieNeu-TTS"
+version = "0.1.0"
+description = "Advanced on-device Vietnamese TTS with instant voice cloning"
+readme = "README.md"
+requires-python = "==3.12.*"
+dependencies = [
+    "phonemizer>=3.3.0",
+    "neucodec>=0.0.4",
+    "librosa>=0.11.0",
+    "gradio>=5.49.1",
+    "onnxruntime>=1.23.2",
+    "datasets>=3.2.0",
+    "lmdeploy; sys_platform != 'darwin'",
+    "triton-windows; sys_platform == 'win32'",
+    "triton; sys_platform == 'linux'",
+    "transformers; sys_platform == 'darwin'",
+    "accelerate; sys_platform == 'darwin'",
+    "torch",
+    "torchvision",
+    "torchaudio",
+    "perth>=0.2.0",
+    "llama-cpp-python==0.3.16",
+]
+[tool.uv.sources]
+torch = [
+    { index = "pytorch", marker = "sys_platform != 'darwin'" },
+    { index = "pypi", marker = "sys_platform == 'darwin'" }
+]
+torchvision = [
+    { index = "pytorch", marker = "sys_platform != 'darwin'" },
+    { index = "pypi", marker = "sys_platform == 'darwin'" }
+]
+torchaudio = [
+    { index = "pytorch", marker = "sys_platform != 'darwin'" },
+    { index = "pypi", marker = "sys_platform == 'darwin'" }
+]
+lmdeploy = [
+    { url = "https://github.com/InternLM/lmdeploy/releases/download/v0.11.0/lmdeploy-0.11.0+cu128-cp312-cp312-win_amd64.whl", marker = "sys_platform == 'win32' and python_version == '3.12'" },
+    { url = "https://github.com/InternLM/lmdeploy/releases/download/v0.11.0/lmdeploy-0.11.0+cu128-cp312-cp312-manylinux2014_x86_64.whl", marker = "sys_platform == 'linux' and python_version == '3.12'" },
+    { index = "pypi", marker = "sys_platform == 'darwin'" }
+]
+llama-cpp-python = [
+    { url = "https://github.com/pnnbao97/VieNeu-TTS/releases/download/wheels-v0.3.16/llama_cpp_python-0.3.16-cp312-cp312-win_amd64.whl", marker = "sys_platform == 'win32' and python_version == '3.12'" },
+    { index = "pypi", marker = "sys_platform != 'win32'" }
+]

requirements.txt ADDED Viewed

	@@ -0,0 +1,162 @@

+accelerate==1.12.0
+addict==2.4.0
+aiofiles==24.1.0
+aiohappyeyeballs==2.6.1
+aiohttp==3.13.1
+aiosignal==1.4.0
+annotated-doc==0.0.3
+annotated-types==0.7.0
+antlr4-python3-runtime==4.9.3
+anyio==4.11.0
+attrs==25.4.0
+audioread==3.1.0
+babel==2.17.0
+blobfile==3.1.0
+brotli==1.1.0
+certifi==2025.10.5
+cffi==2.0.0
+charset-normalizer==3.4.4
+click==8.3.0
+cloudpickle==3.1.2
+colorama==0.4.6
+coloredlogs==15.0.1
+csvw==3.7.0
+datasets==4.3.0
+decorator==5.2.1
+dill==0.4.0
+distro==1.9.0
+dlinfo==2.0.0
+einops==0.8.1
+einx==0.3.0
+fastapi==0.120.2
+ffmpy==0.6.4
+filelock==3.20.0
+fire==0.7.1
+flatbuffers==25.9.23
+frozendict==2.4.6
+frozenlist==1.8.0
+fsspec==2025.9.0
+gradio==5.49.1
+gradio-client==1.13.3
+groovy==0.1.2
+h11==0.16.0
+hf-transfer==0.1.9
+httpcore==1.0.9
+httpx==0.28.1
+huggingface-hub==0.36.0
+humanfriendly==10.0
+hyper-connections==0.2.1
+idna==3.11
+inquirerpy==0.3.4
+isodate==0.7.2
+jinja2==3.1.6
+jiter==0.12.0
+joblib==1.5.2
+jsonschema==4.25.1
+jsonschema-specifications==2025.9.1
+kagglehub==0.3.13
+language-tags==1.2.0
+lazy-loader==0.4
+librosa==0.11.0
+llvmlite==0.45.1
+lmdeploy==0.11.0
+local-attention==1.11.2
+lxml==6.0.2
+markdown-it-py==4.0.0
+markupsafe==3.0.3
+mdurl==0.1.2
+mmengine-lite==0.10.7
+mpmath==1.3.0
+msgpack==1.1.2
+multidict==6.7.0
+multiprocess==0.70.16
+networkx==3.5
+neucodec==0.0.4
+numba==0.62.1
+numpy==2.3.4
+omegaconf==2.3.0
+onnxruntime==1.23.2
+openai==2.12.0
+openai-harmony==0.0.8
+orjson==3.11.4
+packaging==25.0
+pandas==2.3.3
+partial-json-parser==0.2.1.1.post7
+peft==0.14.0
+pfzy==0.3.4
+phonemizer==3.3.0
+pillow==11.3.0
+platformdirs==4.5.0
+pooch==1.8.2
+prometheus-client==0.23.1
+prompt-toolkit==3.0.52
+propcache==0.4.1
+protobuf==6.33.2
+psutil==7.1.2
+pyarrow==22.0.0
+pycparser==2.23
+pycryptodomex==3.23.0
+pydantic==2.11.10
+pydantic-core==2.33.2
+pydub==0.25.1
+pygments==2.19.2
+pyparsing==3.2.5
+pyreadline3==3.5.4
+python-dateutil==2.9.0.post0
+python-multipart==0.0.20
+pytz==2025.2
+pyyaml==6.0.3
+pyzmq==27.1.0
+ray==2.52.1
+rdflib==7.3.0
+referencing==0.37.0
+regex==2025.10.23
+requests==2.32.5
+rfc3986==1.5.0
+rich==14.2.0
+rpds-py==0.28.0
+ruff==0.14.2
+safehttpx==0.1.7
+safetensors==0.6.2
+scikit-learn==1.7.2
+scipy==1.16.2
+segments==2.3.0
+semantic-version==2.10.0
+sentencepiece==0.2.1
+setuptools==80.9.0
+shellingham==1.5.4
+shortuuid==1.0.13
+six==1.17.0
+sniffio==1.3.1
+soundfile==0.13.1
+soxr==1.0.0
+starlette==0.49.1
+sympy==1.14.0
+termcolor==3.2.0
+threadpoolctl==3.6.0
+tiktoken==0.12.0
+tokenizers==0.22.1
+tomlkit==0.13.3
+torch==2.7.1+cu118
+torchao==0.14.1
+torchaudio==2.7.1+cu118
+torchdata==0.11.0
+torchtune==0.6.1
+torchvision==0.22.1+cu118
+tqdm==4.67.1
+transformers==4.57.1
+triton-windows==3.5.1.post22
+typer==0.20.0
+typing-extensions==4.15.0
+typing-inspection==0.4.2
+tzdata==2025.2
+uritemplate==4.2.0
+urllib3==2.5.0
+uvicorn==0.38.0
+vector-quantize-pytorch==1.17.8
+wcwidth==0.2.14
+websockets==15.0.1
+xgrammar==0.1.28
+xxhash==3.6.0
+yapf==0.43.0
+yarl==1.22.0

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff