potsawee's picture
Upload README.md with huggingface_hub
c3d23ec verified
---
title: TextSyncMimi Speech Editing
emoji: ๐ŸŽ™๏ธ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: cc-by-4.0
---
# TextSyncMimi Speech Editing Demo
Interactive demo for **TextSyncMimi**, a text-synchronous neural audio codec that enables token-level speech editing.
## What This Demo Does
1. **Generate Speech**: Use OpenAI TTS to create two audio samples with different voices and speaking styles
2. **Token-Level Analysis**: See how text is tokenized (LLaMA-3 tokenizer)
3. **Speech Embedding Swapping**: Swap speech characteristics at specific token positions
4. **Real-time Editing**: Hear the results instantly
## How to Use
### Step 1: Configure Voices
- Enter your text transcript
- Select two different OpenAI TTS voices (e.g., "alloy" and "echo")
- (Optional) Add style instructions like "speak slowly" or "sound excited"
### Step 2: Generate Audio
- Click "Generate & Process" to create both audio samples
- The model will show you the tokenization and generate a baseline reconstruction
### Step 3: Swap Embeddings
- Enter token indices to swap (e.g., "0,2,5")
- Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions
## Examples
### Example 1: Word-Level Swapping
**Text**: "Hello, how are you today?"
- Token 0-1: "Hello" (swap these)
- Result: First word has Voice 2's style, rest has Voice 1's style
### Example 2: Prosody Transfer
**Voice 1**: "speak slowly and calmly"
**Voice 2**: "speak quickly with excitement"
**Swap indices**: Middle of sentence
**Result**: Sentence starts calm, becomes excited mid-way
## For Users
Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing.
## For Developers (Running Your Own Copy)
Want to run your own version? Here's how:
1. **Duplicate this Space** or create a new one
2. Copy the files (`app.py`, `requirements.txt`, `README.md`)
3. **Add your OpenAI API key as a Secret**:
- Go to Space Settings โ†’ Repository secrets
- Click "New secret"
- Name: `OPENAI_API_KEY`
- Value: Your OpenAI API key
- Click "Add secret"
4. The Space will automatically restart with your key (securely stored, never exposed)
## Technical Details
- **Model**: TextSyncMimi-v1 (loaded from [HuggingFace Hub](https://huggingface.co/potsawee/TextSyncMimi-v1))
- **Tokenizer**: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace)
- **Text Embeddings**: Embeddings built into the model (4096-dim)
- **Audio Codec**: Mimi (24kHz, 12.5 fps)
- **TTS Provider**: OpenAI (gpt-4o-mini-tts with instructions, or tts-1)
- **Security**: API keys stored securely in Space secrets
## Links
- ๐Ÿค— [Model Card](https://huggingface.co/potsawee/TextSyncMimi-v1)