Spaces:

potsawee
/

TextSyncMimi-SpeechEditing

Running on Zero

App Files Files Community

TextSyncMimi-SpeechEditing / README.md

potsawee

Upload README.md with huggingface_hub

c3d23ec verified 6 months ago

preview code

raw

history blame contribute delete

2.78 kB

	---
	title: TextSyncMimi Speech Editing
	emoji: 🎙️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: cc-by-4.0
	---

	# TextSyncMimi Speech Editing Demo

	Interactive demo for TextSyncMimi, a text-synchronous neural audio codec that enables token-level speech editing.

	## What This Demo Does

	1. Generate Speech: Use OpenAI TTS to create two audio samples with different voices and speaking styles
	2. Token-Level Analysis: See how text is tokenized (LLaMA-3 tokenizer)
	3. Speech Embedding Swapping: Swap speech characteristics at specific token positions
	4. Real-time Editing: Hear the results instantly

	## How to Use

	### Step 1: Configure Voices
	- Enter your text transcript
	- Select two different OpenAI TTS voices (e.g., "alloy" and "echo")
	- (Optional) Add style instructions like "speak slowly" or "sound excited"

	### Step 2: Generate Audio
	- Click "Generate & Process" to create both audio samples
	- The model will show you the tokenization and generate a baseline reconstruction

	### Step 3: Swap Embeddings
	- Enter token indices to swap (e.g., "0,2,5")
	- Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions

	## Examples

	### Example 1: Word-Level Swapping
	Text: "Hello, how are you today?"
	- Token 0-1: "Hello" (swap these)
	- Result: First word has Voice 2's style, rest has Voice 1's style

	### Example 2: Prosody Transfer
	Voice 1: "speak slowly and calmly"
	Voice 2: "speak quickly with excitement"
	Swap indices: Middle of sentence
	Result: Sentence starts calm, becomes excited mid-way

	## For Users

	Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing.

	## For Developers (Running Your Own Copy)

	Want to run your own version? Here's how:

	1. Duplicate this Space or create a new one
	2. Copy the files (`app.py`, `requirements.txt`, `README.md`)
	3. Add your OpenAI API key as a Secret:
	- Go to Space Settings → Repository secrets
	- Click "New secret"
	- Name: `OPENAI_API_KEY`
	- Value: Your OpenAI API key
	- Click "Add secret"
	4. The Space will automatically restart with your key (securely stored, never exposed)

	## Technical Details

	- Model: TextSyncMimi-v1 (loaded from [HuggingFace Hub](https://huggingface.co/potsawee/TextSyncMimi-v1))
	- Tokenizer: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace)
	- Text Embeddings: Embeddings built into the model (4096-dim)
	- Audio Codec: Mimi (24kHz, 12.5 fps)
	- TTS Provider: OpenAI (gpt-4o-mini-tts with instructions, or tts-1)
	- Security: API keys stored securely in Space secrets

	## Links

	- 🤗 [Model Card](https://huggingface.co/potsawee/TextSyncMimi-v1)