Spaces:
Running on Zero
Running on Zero
| title: TextSyncMimi Speech Editing | |
| emoji: ๐๏ธ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: cc-by-4.0 | |
| # TextSyncMimi Speech Editing Demo | |
| Interactive demo for **TextSyncMimi**, a text-synchronous neural audio codec that enables token-level speech editing. | |
| ## What This Demo Does | |
| 1. **Generate Speech**: Use OpenAI TTS to create two audio samples with different voices and speaking styles | |
| 2. **Token-Level Analysis**: See how text is tokenized (LLaMA-3 tokenizer) | |
| 3. **Speech Embedding Swapping**: Swap speech characteristics at specific token positions | |
| 4. **Real-time Editing**: Hear the results instantly | |
| ## How to Use | |
| ### Step 1: Configure Voices | |
| - Enter your text transcript | |
| - Select two different OpenAI TTS voices (e.g., "alloy" and "echo") | |
| - (Optional) Add style instructions like "speak slowly" or "sound excited" | |
| ### Step 2: Generate Audio | |
| - Click "Generate & Process" to create both audio samples | |
| - The model will show you the tokenization and generate a baseline reconstruction | |
| ### Step 3: Swap Embeddings | |
| - Enter token indices to swap (e.g., "0,2,5") | |
| - Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions | |
| ## Examples | |
| ### Example 1: Word-Level Swapping | |
| **Text**: "Hello, how are you today?" | |
| - Token 0-1: "Hello" (swap these) | |
| - Result: First word has Voice 2's style, rest has Voice 1's style | |
| ### Example 2: Prosody Transfer | |
| **Voice 1**: "speak slowly and calmly" | |
| **Voice 2**: "speak quickly with excitement" | |
| **Swap indices**: Middle of sentence | |
| **Result**: Sentence starts calm, becomes excited mid-way | |
| ## For Users | |
| Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing. | |
| ## For Developers (Running Your Own Copy) | |
| Want to run your own version? Here's how: | |
| 1. **Duplicate this Space** or create a new one | |
| 2. Copy the files (`app.py`, `requirements.txt`, `README.md`) | |
| 3. **Add your OpenAI API key as a Secret**: | |
| - Go to Space Settings โ Repository secrets | |
| - Click "New secret" | |
| - Name: `OPENAI_API_KEY` | |
| - Value: Your OpenAI API key | |
| - Click "Add secret" | |
| 4. The Space will automatically restart with your key (securely stored, never exposed) | |
| ## Technical Details | |
| - **Model**: TextSyncMimi-v1 (loaded from [HuggingFace Hub](https://huggingface.co/potsawee/TextSyncMimi-v1)) | |
| - **Tokenizer**: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace) | |
| - **Text Embeddings**: Embeddings built into the model (4096-dim) | |
| - **Audio Codec**: Mimi (24kHz, 12.5 fps) | |
| - **TTS Provider**: OpenAI (gpt-4o-mini-tts with instructions, or tts-1) | |
| - **Security**: API keys stored securely in Space secrets | |
| ## Links | |
| - ๐ค [Model Card](https://huggingface.co/potsawee/TextSyncMimi-v1) | |