Prompt_Edit_Demo

Sleeping

Upload: Click "Upload or Record Audio File" and select a pre-recorded audio file (WAV, MP3, etc.)
Record: Click the microphone icon to record audio directly

Step 2: Configure LLM Prompt

Edit the "LLM prompt" textbox to customize the system behavior:

Default: General purpose conversation assistant
Translation: "You are a translator. Translate user text into English."
Summarization: "You are summarizer. Summarize user's utterance."
Custom: Write your own prompt for specific use cases

Step 3: Select Models (Optional)

Choose the models for each component:

ASR (Automatic Speech Recognition): Transcribes your audio
- Default: pyf98/owsm_ctc_v3.1_1B
LLM (Language Model): Generates the response
- Default: meta-llama/Llama-3.2-1B-Instruct
TTS (Text-to-Speech): Creates audio output
- Default: espnet/kan-bayashi_ljspeech_vits

Step 4: Process

Click the "Process Audio" button

Step 5: View Results

The system will display:

ASR Transcription: What was transcribed from your audio
LLM Response: The generated text response
TTS Output: Audio playback of the response (auto-plays)

Example Use Cases

1. Voice Translation

Audio: "Bonjour, comment allez-vous?"
LLM Prompt: "You are a translator. Translate user text into English."
Output: "Hello, how are you?"

2. Voice Summarization

Audio: "Today I went to the store and bought apples, oranges, bananas, and some milk. Then I went to the park..."
LLM Prompt: "You are summarizer. Summarize user's utterance."
Output: "User went shopping and to the park."

3. Voice Assistant

Audio: "What's the weather like today?"
LLM Prompt: "You are a helpful and friendly AI assistant..."
Output: "I don't have access to real-time weather data..."

Technical Details

Audio Processing Pipeline

Audio File → ASR → Transcription → LLM → Response → TTS → Audio Output

Supported Audio Formats

WAV
MP3
FLAC
OGG
Any format supported by Gradio's Audio component

Processing Time

Depends on audio length and selected models
Typically 2-10 seconds for 5-second audio clips
GPU acceleration enabled via @spaces.GPU decorator

Troubleshooting

"Please upload an audio file" message

Ensure you've either uploaded or recorded audio before clicking "Process Audio"

No audio output

Check that TTS model loaded correctly
Check browser audio settings

Long processing time

Longer audio files take more time to process
First run may be slower due to model loading

Model loading errors

Check HF_TOKEN environment variable for Hugging Face authentication
Verify internet connection for model downloads

Differences from Streaming Mode

Feature	Streaming Mode (Old)	Offline Mode (New)
Input	Real-time microphone	Recorded files
Processing	Chunk-by-chunk	Complete file
Time Limit	5 minutes	None
Use Case	Live conversation	Batch processing
Complexity	High (state management)	Low (single pass)

Tips for Best Results

Audio Quality: Use clear audio with minimal background noise
Prompt Engineering: Craft specific prompts for better LLM responses
Model Selection: Experiment with different models for quality vs. speed tradeoffs
Audio Length: Start with shorter clips (5-15 seconds) for faster results