Aratako commited on
Commit
62b6c6c
ยท
verified ยท
1 Parent(s): 85c7d5b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ja
4
+ - en
5
+ library_name: transformers
6
+ license: apache-2.0
7
+ datasets:
8
+ - nvidia/hifitts-2
9
+ - amphion/Emilia-Dataset
10
+ base_model:
11
+ - Qwen/Qwen3-0.6B-Base
12
+ pipeline_tag: text-to-speech
13
+ tags:
14
+ - speech
15
+ - tts
16
+ - voice
17
+ ---
18
+
19
+ # MioTTS-0.6B: Lightweight & Fast LLM-based TTS
20
+
21
+ [![Hugging Face Collection](https://img.shields.io/badge/Collection-HuggingFace-yellow)](https://huggingface.co/collections/Aratako/miotts)
22
+ [![Inference Code](https://img.shields.io/badge/Inference-GitHub-black)](https://github.com/Aratako/MioTTS-Inference)
23
+
24
+ **MioTTS-0.6B** is a lightweight, high-speed Text-to-Speech (TTS) model based on an LLM architecture. It is designed to generate high-quality speech in **English and Japanese** while maintaining low latency and minimal resource usage.
25
+
26
+ This model supports zero-shot voice cloning and is built on top of the efficient neural audio codec **[MioCodec-25Hz-24kHz](https://huggingface.co/Aratako/MioCodec-25Hz-24kHz)**.
27
+
28
+ ## ๐Ÿ“Š MioTTS Family
29
+
30
+ We offer a range of model sizes to suit different performance and resource requirements.
31
+
32
+ | Model Name | Parameters | Base Model | License |
33
+ | :--- | :---: | :--- | :--- |
34
+ | [MioTTS-0.1B]((https://huggingface.co/Aratako/MioTTS-0.1B)) | 0.1B | [tiiuae/Falcon-H1-Tiny-Multilingual-100M-Base](https://huggingface.co/tiiuae/Falcon-H1-Tiny-Multilingual-100M-Base) | [Falcon-LLM License](https://falconllm.tii.ae/falcon-terms-and-conditions.html) |
35
+ | [MioTTS-0.4B](https://huggingface.co/Aratako/MioTTS-0.4B) | 0.4B | [LiquidAI/LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M) | [LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2-350M/blob/main/LICENSE) |
36
+ | **MioTTS-0.6B** | **0.6B** | **[Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base)** | **[Apache 2.0](https://choosealicense.com/licenses/apache-2.0/)** |
37
+ | [MioTTS-1.2B](https://huggingface.co/Aratako/MioTTS-1.2B) | 1.2B | [LiquidAI/LFM2.5-1.2B-Base](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base) | [LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base/blob/main/LICENSE) |
38
+ | [MioTTS-1.7B](https://huggingface.co/Aratako/MioTTS-1.7B) | 1.7B | [Qwen/Qwen3-1.7B-Base](https://huggingface.co/Qwen/Qwen3-1.7B-Base) | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
39
+ | [MioTTS-2.6B](https://huggingface.co/Aratako/MioTTS-2.6B) | 2.6B | [LiquidAI/LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B) | [LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2-2.6B/blob/main/LICENSE) |
40
+
41
+ ## ๐ŸŒŸ Key Features
42
+
43
+ * **Lightweight & Fast:** Optimized for speed, making it suitable for consumer-grade GPUs and edge deployment.
44
+ * **Bilingual Support:** Trained on approximately **100,000 hours** of English and Japanese data.
45
+ * **Voice Cloning:** Supports high-fidelity zero-shot voice cloning from a short reference audio clip.
46
+ * **Efficient Codec:** Uses [Aratako/MioCodec-25Hz-24kHz](https://huggingface.co/Aratako/MioCodec-25Hz-24kHz), which operates at a low framerate (25Hz) for faster generation without sacrificing quality.
47
+
48
+ ## ๐Ÿš€ Inference
49
+
50
+ We provide a dedicated repository for inference, including installation instructions and example WebUI.
51
+
52
+ ๐Ÿ‘‰ **[GitHub: Aratako/MioTTS-Inference](https://github.com/Aratako/MioTTS-Inference)**
53
+
54
+ ## ๐ŸŽง Audio Samples
55
+
56
+ Below are some samples generated by **MioTTS-0.6B**.
57
+
58
+ > **Note:** The reference audio samples below were generated using **[Aratako/T5Gemma-TTS-2b-2b](https://huggingface.co/Aratako/T5Gemma-TTS-2b-2b)** and **[gemini-2.5-pro-tts](https://cloud.google.com/text-to-speech/docs/gemini-tts)**.
59
+
60
+ | Case | Text | Reference Audio | Generated Audio |
61
+ | :--- | :--- | :--- | :--- |
62
+ | **English 1** | "The old library was silent, save for the gentle ticking of a clock somewhere in the shadows. As I ran my fingers along the dusty spines of the books, I felt a strange sense of nostalgia, as if I had lived a thousand lives within these walls." | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/en_ref1.wav"></audio> | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/0.6B_en_sample1.wav"></audio> |
63
+ | **English 2** | "Hey! I haven't seen you in ages. Do you want to grab some coffee later? I've got so much to tell you!" | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/en_ref2.wav"></audio> | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/0.6B_en_sample2.wav"></audio> |
64
+ | **Japanese 1** | "ๆฐ—่ฑกๅบใซใ‚ˆใ‚Šใพใ™ใจใ€ๅคงๅž‹ใฎๅฐ้ขจ10ๅทใฏใ€ๆ˜Žๆ—ฅใฎๆ˜Žใ‘ๆ–นใซใ‹ใ‘ใฆ้–ขๆฑๅœฐๆ–นใซๆŽฅ่ฟ‘ใ™ใ‚‹่ฆ‹่พผใฟใงใ™ใ€‚ๆฒฟๅฒธ้ƒจใงใฏ้ซ˜ๆณขใซ่ญฆๆˆ’ใŒๅฟ…่ฆใงใ™ใ€‚" | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/jp_ref1.wav"></audio> | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/0.6B_jp_sample1.wav"></audio> |
65
+ | **Japanese 2** | "ใใฎๆฃฎใซใฏใ€ๅคใ„่จ€ใ„ไผใˆใŒใ‚ใ‚Šใพใ—ใŸใ€‚ๆœˆใŒๆœ€ใ‚‚้ซ˜ใๆ˜‡ใ‚‹ๅคœใ€้™ใ‹ใซ่€ณใ‚’ๆพ„ใพใ›ใฐใ€้ขจใฎๆญŒๅฃฐใŒ่žใ“ใˆใ‚‹ใจใ„ใ†ใฎใงใ™ใ€‚็งใฏๅŠไฟกๅŠ็–‘ใงใ—ใŸใŒใ€ใใฎๅคœใ€็ขบใ‹ใซ่ชฐใ‹ใŒ็งใ‚’ๅ‘ผใถๅฃฐใ‚’่žใ„ใŸใฎใงใ™ใ€‚" | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/jp_ref2.wav"></audio> | <audio controls src="https://huggingface.co/Aratako/MioTTS-0.6B/resolve/main/samples/0.6B_jp_sample2.wav"></audio> |
66
+
67
+ ## ๐Ÿ—๏ธ Training Details
68
+
69
+ * **Data:** ~100k hours of speech data (English & Japanese).
70
+ * **Codec:** [MioCodec-25Hz-24kHz](https://huggingface.co/Aratako/MioCodec-25Hz-24kHz)
71
+ * **Base Model:** Initialized from [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base).
72
+
73
+ ## ๐Ÿ“œ License & Ethical Restrictions
74
+
75
+ ### License
76
+
77
+ This model is released under the **[Apache 2.0](https://choosealicense.com/licenses/apache-2.0/)**.
78
+
79
+ ### Ethical Considerations & Limitations
80
+
81
+ While this model is released under a permissive license, we aim to promote responsible AI development and urge users to respect the rights of others.
82
+
83
+ 1. **Voice Cloning:** Please respect the privacy and rights of individuals. We strongly discourage using this model to clone the voices of real people (especially non-consenting individuals) for deceptive or harmful purposes.
84
+ 2. **No Misinformation:** This model should not be used to generate deepfakes intended to mislead others or spread misinformation.
85
+ 3. **Disclaimer:** The developers assume no liability for any misuse of this model. Users are solely responsible for ensuring their use of the generated content complies with applicable laws and regulations in their jurisdiction.
86
+
87
+ ## ๐Ÿ™ Acknowledgments
88
+
89
+ * **Compute Support:** Part of the compute resources for this project were provided by **Saldra, Witness and Lumina Logic Minds**. We deeply appreciate their support.
90
+ * **Base Model:** We thank the developers of the base LLM for their open-source contributions.
91
+ * **Community:** Thanks to the open-source community for the datasets and tools that made this project possible.
92
+
93
+ ## ๐Ÿ–Š๏ธ Citation
94
+
95
+ If you use MioTTS in your research or project, please cite it as follows:
96
+
97
+ ```bibtex
98
+ @misc{miotts,
99
+ author = {Chihiro Arata},
100
+ title = {MioTTS: Lightweight and Fast LLM-based Text-to-Speech},
101
+ year = {2026},
102
+ publisher = {Hugging Face},
103
+ journal = {Hugging Face repository},
104
+ howpublished = {\url{https://huggingface.co/collections/Aratako/miotts}}
105
+ }
106
+ ```