ATTS1HG1 / README.md

README.md

d0ca1fd verified 4 days ago

3.68 kB

	---
	language:
	- en
	- es
	- fr
	- de
	- it
	- pt
	- pl
	- tr
	- ru
	- nl
	- cs
	- ar
	- zh
	- ja
	- hu
	- ko
	- hi
	pipeline_tag: text-to-speech
	tags:
	- text-to-speech
	- tts
	- ggml
	- vulkan
	- c++
	- on-device
	license: other
	license_name: coqui-public-model-license
	license_link: https://coqui.ai/cpml
	base_model: coqui/XTTS-v2
	---


	# ATTS1HG1: High-Performance GGML Implementation of XTTS-v2

	ATTS1HG1 is a high-speed, native C++ implementation of the Coqui XTTS-v2 model, utilizing the GGML tensor library. It features a custom integrated HiFiGAN vocoder optimized for Vulkan and CPU inference.

	<div align="center">

	\| Source Code & GUI \| Base Model \| Backend \|
	\|:---:\|:---:\|:---:\|
	\| [GitHub: ATTS1HG1](https://github.com/abbndz/ATTS1HG1) \| [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) \| GGML / Vulkan \|

	</div>

	## 🚀 Key Features

	* Blazing Fast: Generates audio in < 0.5s on consumer GPUs (RTX 3090) and ~1.0s on CPU.
	* Vulkan Support: Fully optimized HiFiGAN vocoder running on Vulkan (compatible with NVIDIA, AMD, Intel iGPUs).
	* Lightweight: Native C++ application, no heavy Python dependencies (PyTorch/TensorFlow not required at runtime).
	* Multi-Language: Supports 17 languages.
	* Voice : Supports 58 speaker (similar to XTTS).

	## 🌍 Supported Languages

	The model supports the following 17 languages:

	\| Code \| Language \| Native Name \|
	\| :--- \| :--- \| :--- \|
	\| en \| English \| English \|
	\| es \| Spanish \| Español \|
	\| fr \| French \| Français \|
	\| de \| German \| Deutsch \|
	\| it \| Italian \| Italiano \|
	\| pt \| Portuguese \| Português \|
	\| pl \| Polish \| Polski \|
	\| tr \| Turkish \| Türkçe \|
	\| ru \| Russian \| Русский \|
	\| nl \| Dutch \| Nederlands \|
	\| cs \| Czech \| Čeština \|
	\| ar \| Arabic \| العربية \|
	\| zh \| Chinese \| 中文 \|
	\| ja \| Japanese \| 日本語 \|
	\| hu \| Hungarian \| Magyar \|
	\| ko \| Korean \| 한국어 \|
	\| hi \| Hindi \| हिन्दी \|

	## ⚡ Performance

	Benchmarks based on standard text generation ("Bonjour le monde") using the C++ client:

	\| Device \| Backend \| Latency (Total) \| Note \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| NVIDIA RTX 3090 \| Vulkan \| ~0.47s \| 🚀 Recommended \|
	\| Intel iGPU \| Vulkan \| ~1.40s \| Good for laptops \|
	\| CPU (Ryzen/Intel)\| CPU (AVX2)\| ~1.02s \| Solid fallback \|
	\| NVIDIA RTX 3090 \| CUDA \| ~1.45s \| Slower on HiFiGAN due to kernel overhead \|

	> Note: The Vulkan backend is significantly faster for the HiFiGAN part of the pipeline compared to CUDA due to optimized command buffers and reduced kernel launch overhead for small convolutions.

	## 🛠️ Usage

	This repository contains the converted `.bin` / `.gguf` weights required by the ATTS1HG1 software.

	1. Download the model files from this repository.
	2. Clone and compile the software from GitHub:
	```bash
	git clone [https://github.com/abbndz/ATTS1HG1](https://github.com/abbndz/ATTS1HG1)
	```
	3. Load the model in the GUI or CLI and select Vulkan for best performance.

	## 📜 License

	This project uses the weights from Coqui XTTS-v2, which is licensed under the Coqui Public Model License (CPML).
	* Non-commercial use: You can use this model for personal, educational, and non-commercial projects.
	* Commercial use: Requires a license from Coqui (check their repository for details).

	The C++ code (inference engine) is available under the MIT License (see GitHub).

	---
	Credits: Based on the excellent work by Coqui.ai and the GGML library by ggerganov.