Spaces:

Akjava
/

AIGamingVoice-Japanese

Sleeping

App Files Files Community

Akjava commited on Jan 9

Commit

0aa6a4a

1 Parent(s): 1cee964

add images

Browse files

Files changed (8) hide show

README.md +69 -2
app.py +82 -18
imgs/0.webp +0 -0
imgs/1.webp +0 -0
imgs/2.webp +0 -0
imgs/3.webp +0 -0
imgs/4.webp +0 -0
imgs/5.webp +0 -0

README.md CHANGED Viewed

@@ -8,7 +8,74 @@ sdk_version: 6.2.0
 app_file: app.py
 pinned: false
 license: mit
-short_description: TTS voice for AI (Creently Matcha-TTS)
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 app_file: app.py
 pinned: false
 license: mit
+short_description: TTS voice for AI (Currently Matcha-TTS)
 ---
+# AIGamingVoice - Japanese / 日本語
+High-quality, lightweight Japanese Text-to-Speech specifically tuned for AI gaming characters.
+Running on ONNX Runtime for fast inference
+AIゲームキャラクター向けに調整された高品質・軽量な日本語音声合成システムです。
+ONNX Runtime上で動作します。
+## 🌟 Features / 特徴
+- **⚡ Fast & Lightweight**: Pure ONNX Runtime implementation
+  - **高速・軽量**: 純粋なONNX Runtime実装です。
+- **🖼️ Visual Speaker Selection**: Select speakers intuitively from an image gallery.
+  - **視覚的な話者選択**: 画像ギャラリーから直感的にキャラクター（話者）を選択できます。
+- **🇯🇵 Japanese Optimization**: Uses `pyopenjtalk` for accurate Japanese phoneme generation.
+  - **日本語最適化**: `pyopenjtalk` を使用し、正確な日本語読み上げを実現しています。
+## 🛠️ Installation & Local Usage / インストールとローカルでの使用方法
+1. **Clone the repository / リポジトリをクローン**
+   ```bash
+   git clone https://huggingface.co/spaces/YOUR_USERNAME/AIGamingVoice-Japanese
+   cd AIGamingVoice-Japanese
+   ```
+2. **Install dependencies / 依存関係のインストール**
+   ```bash
+   pip install -r requirements.txt
+   ```
+   *Note: You need `cmake` installed for pyopenjtalk.*
+   *注: pyopenjtalkのインストールには `cmake` が必要です。*
+3. **Prepare Models / モデルの準備**
+   Place your `.onnx` models in the `models/` directory.
+   `models/` ディレクトリに `.onnx` モデルファイルを配置してください。
+4. **Prepare Speaker Images (Optional) / 話者画像の準備（オプション）**
+   Place images (`0.jpg`, `1.jpg`, ...) in the `imgs/` directory to enable the visual selector.
+   `imgs/` ディレクトリに画像ファイル（`0.jpg`, `1.jpg` ...）を配置すると、画像による話者選択機能が有効になります。
+5. **Run the application / アプリケーションの実行**
+   ```bash
+   python app.py
+   ```
+   Access http://localhost:7860 in your browser.
+   ブラウザで http://localhost:7860 にアクセスしてください。
+## 🎮 How to Use / 使い方
+1. **Select Model**: Choose a voice model from the dropdown.
+   - **モデル選択**: ドロップダウンから音声モデルを選択します。
+2. **Select Speaker**: Click on a character image or enter the Speaker ID.
+   - **話者選択**: キャラクター画像をクリックするか、Speaker IDを入力します。
+3. **Input Text**: Enter Japanese text to synthesize.
+   - **テキスト入力**: 読み上げたい日本語テキストを入力します。
+4. **Adjust Settings**: Tweak Temperature (randomness) and Speaking Rate (speed).
+   - **設定調整**: Temperature（ランダム性）やSpeaking Rate（話速）を調整できます。
+5. **Synthesize**: Click the button to generate audio.
+   - **音声生成**: ボタンをクリックして音声を生成します。
+## 🤝 Credits / クレジット
+- **Matcha-TTS**: Architecture based on Matcha-TTS.
+- **ONNX Runtime**: Inference engine.
+- **pyopenjtalk**: Japanese text processing frontend.
+---
+*Created for AI Gaming Voice Project*

app.py CHANGED Viewed

@@ -319,14 +319,25 @@ def create_gradio_interface():
     # Get available models
     available_models = get_available_models()
     with gr.Blocks(
-        title="🍵 Matcha-TTS ONNX (Japanese)",
     ) as demo:
         gr.Markdown(
             """
-            # 🍵 Matcha-TTS ONNX - Japanese Text-to-Speech
-            ### PyTorch-free implementation using ONNX Runtime
             """
         )
@@ -347,6 +358,22 @@ def create_gradio_interface():
                     placeholder="日本語のテキストを入力してください..."
                 )
                 # Speaker ID
                 speaker_id = gr.Number(
                     label="Speaker ID (スピーカーID)",
@@ -354,7 +381,7 @@ def create_gradio_interface():
                     minimum=0,
                     maximum=99,
                     precision=0,
-                    info="単一スピーカーモデルでは無視されます"
                 )
                 with gr.Row():
@@ -403,15 +430,35 @@ def create_gradio_interface():
         gr.Examples(
             examples=[
                 ["こんにちは、世界！", "g003_ep5709.onnx", 0, 0.667, 1.0],
-                ["本日は晴天なり。", "g003_ep5709.onnx", 0, 0.667, 1.0],
-                ["日本語の音声合成をテストしています。", "g003_ep5709.onnx", 0, 0.667, 1.0],
-                ["人工知能の進化は目覚ましいものがあります。", "g003_ep5709.onnx", 0, 0.667, 1.0],
             ],
             inputs=[text_input, model_dropdown, speaker_id, temperature, speaking_rate],
             label="例文 / Examples"
         )
         # Event handlers
         synthesise_btn.click(
             fn=synthesise,
             inputs=[text_input, model_dropdown, speaker_id, temperature, speaking_rate],
@@ -426,18 +473,35 @@ def create_gradio_interface():
         gr.Markdown(
             """
             ---
-            ### 情報 / Information
-            - **モデル**: ONNX (PyTorch-free)
-            - **サンプルレート**: 22050 Hz
-            - **音素化**: pyopenjtalk
-            - **推論**: ONNX Runtime
-            - **モデル自動切り替え**: 選択したモデルを自動的にロード
-            ### Speaker ID について
-            - **単一スピーカーモデル**: Speaker ID は無視されます
-            - **マルチスピーカーモデル**: Speaker ID で話者を切り替え
-            """
         )
     return demo
@@ -454,4 +518,4 @@ if __name__ == "__main__":
         server_port=7860,
         share=False,
         show_error=True
-    )

     # Get available models
     available_models = get_available_models()
+    # Load speaker images
+    imgs_dir = os.path.join(SCRIPT_DIR, "imgs")
+    speaker_images = []
+    if os.path.exists(imgs_dir):
+        # Sort by numerical filename (0.webp, 1.webp, ...)
+        image_files = sorted(glob.glob(os.path.join(imgs_dir, "*.webp")),
+                           key=lambda x: int(os.path.splitext(os.path.basename(x))[0]))
+        speaker_images = [(img, f"Speaker {os.path.splitext(os.path.basename(img))[0]}") for img in image_files]
     with gr.Blocks(
+        title="AI Gaming Voice",
     ) as demo:
         gr.Markdown(
             """
+            # AI Gaming Voice - 🍵 Matcha-TTS ONNX (Japanese) / 日本語
+            ### 6 Voices - 140MB or 42MB(Qint8 but slow)
+            Japanese Text-to-Speech.(Half-width alphanumeric characters are not supported. Please correct/fix it.)
+            日本語音声合成です。(半角・英数字は未対応・直してください。)
             """
         )
                     placeholder="日本語のテキストを入力してください..."
                 )
+                # Speaker Selection Gallery
+                if speaker_images:
+                    gr.Markdown("### 話者選択 / Select Speaker")
+                    speaker_gallery = gr.Gallery(
+                        value=speaker_images,
+                        label="話者 / Speakers",
+                        show_label=False,
+                        columns=6,
+                        rows=1,
+                        height=160,
+                        allow_preview=False,
+                        interactive=False,
+                        object_fit="cover",
+                        elem_id="speaker_gallery"
+                    )
                 # Speaker ID
                 speaker_id = gr.Number(
                     label="Speaker ID (スピーカーID)",
                     minimum=0,
                     maximum=99,
                     precision=0,
+                    info="上の画像をタップするか、数値を入力してください"
                 )
                 with gr.Row():
         gr.Examples(
             examples=[
                 ["こんにちは、世界！", "g003_ep5709.onnx", 0, 0.667, 1.0],
+                ["エイアイゲーミングボイス", "g003_ep5709.onnx", 0, 0.667, 0.8],
+                ["わたくしの名前はストラよ", "g003_ep5709.onnx", 0, 0.667, 1.0],
+                ["わたしの名前はシムですよ", "g003_ep5709.onnx", 1, 0.667, 1.0],
+                ["わたしはナラともうします", "g003_ep5709.onnx", 2, 0.667, 1.0],
+                ["わたし、ロールプリンよ!", "g003_ep5709.onnx", 3, 0.667, 1.0],
+                ["僕の名前はショーンだよ", "g003_ep5709.onnx", 4, 0.667, 1.0],
+                ["私の名前はありません", "g003_ep5709.onnx", 5, 0.667, 1.0],
             ],
             inputs=[text_input, model_dropdown, speaker_id, temperature, speaking_rate],
             label="例文 / Examples"
         )
         # Event handlers
+        # Gallery click handler
+        if speaker_images:
+            def on_gallery_select(evt: gr.SelectData):
+                return evt.index
+            speaker_gallery.select(
+                fn=on_gallery_select,
+                inputs=None,
+                outputs=speaker_id
+            ).then(
+                fn=synthesise,
+                inputs=[text_input, model_dropdown, speaker_id, temperature, speaking_rate],
+                outputs=[audio_output, info_output]
+            )
         synthesise_btn.click(
             fn=synthesise,
             inputs=[text_input, model_dropdown, speaker_id, temperature, speaking_rate],
         gr.Markdown(
             """
             ---
+            ### ℹ️ Information / 情報
+            - **Model / モデル**: Matcha-TTS (ONNX)
+            - **Inference / 推論**: ONNX Runtime
+            - **Phonemizer / 音素化**: `pyopenjtalk`
+            - **ZeroGPU**: Optimized for fast startup & inference / 高速起動・推論に最適化
+            ### 🗣️ Speaker Selection / 話者選択
+            - **Click Image / 画像クリック**: Selects speaker & generates audio / 話者を選択して音声を生成
+            - **Speaker ID**: Manual input also supported / 手動入力も可能
+            ### FAQ
+            **Why AI Gaming Voice?**
+            - I have a plan to support another ONNX models.
+            **Model Difference**
+            - **qint8**: 1/3 size but slow.
+            **How to create my voice**
+            - [Github](https://github.com/akjava/Matcha-TTS-Japanese) - I'll update here.
+            **Model**
+            - [Huggingface:matcha-tts_ja_100speakers_group003f-CL-V2](https://huggingface.co/Akjava/matcha-tts_ja_100speakers_group003f-CL-V2)
+             **Who are they?**
+             - [Youtube:4 of them are member of AI Gaming Circle](https://www.youtube.com/@ai-gaming-circle)
+             """
         )
     return demo
         server_port=7860,
         share=False,
         show_error=True
+    )

imgs/0.webp ADDED Viewed

imgs/1.webp ADDED Viewed

imgs/2.webp ADDED Viewed

imgs/3.webp ADDED Viewed

imgs/4.webp ADDED Viewed

imgs/5.webp ADDED Viewed