Spaces:

Rcarvalo
/

speech-to-speech

Runtime error

File size: 2,450 Bytes

099ceaf
23d6a6e
 
 
 
1734104
 
099ceaf
ce25b23
099ceaf
 
23d6a6e
ce25b23
23d6a6e
ce25b23
23d6a6e
ce25b23
23d6a6e
 
 
 
 
ce25b23
23d6a6e
ce25b23
23d6a6e
 
 
 
ce25b23
23d6a6e
ce25b23
23d6a6e
 
 
 
ce25b23
23d6a6e
 
 
 
ce25b23
23d6a6e
ce25b23
23d6a6e
 
 
 
 
ce25b23
23d6a6e
ce25b23
23d6a6e
 
 
 
 
ce25b23
23d6a6e
ce25b23
 
 
 
23d6a6e
ce25b23
23d6a6e
ce25b23
 
23d6a6e

---
title: LFM2-Audio Real-time Speech-to-Speech
emoji: 🎙️
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: other
---

# LFM2-Audio Real-time Speech-to-Speech Chat

Real-time WebRTC streaming demo of LFM2-Audio-1.5B, Liquid AI's first end-to-end audio foundation model.

## ✨ Features

- **🔴 Real-time WebRTC streaming** - Instant response with minimal latency
- **🎙️ Continuous listening** - Natural conversation flow with automatic pause detection
- **💬 Interleaved output** - Simultaneous text and audio generation
- **🔄 Multi-turn memory** - Context-aware conversations
- **⚡ Low latency** - Optimized for real-time interaction

## 🚀 How to Use

1. **Grant microphone access** when prompted by your browser
2. **Start speaking** - The model listens continuously
3. **Pause briefly** - The model detects pauses and responds automatically
4. **Continue conversation** - Build multi-turn dialogues naturally

## 🎛️ Parameters

### Temperature
- **0**: Greedy decoding (most deterministic)
- **1.0**: Default (balanced creativity and coherence)
- **2.0**: Maximum creativity (more diverse outputs)

### Top-k
- **0**: No filtering (full vocabulary)
- **4**: Default (conservative, high quality)
- **Higher values**: More diverse but potentially less coherent

## 🏗️ Technical Details

- **Model**: LFM2-Audio-1.5B
- **Generation Mode**: Interleaved (optimized for real-time)
- **Audio Codec**: Mimi (24kHz)
- **Streaming**: WebRTC via fastrtc
- **Backend**: PyTorch with CUDA acceleration

## 🔧 Differences from Standard Demo

This demo uses **fastrtc** for WebRTC streaming, enabling:
- Continuous audio streaming without manual recording
- Automatic voice activity detection (VAD)
- Lower latency through chunked processing
- More natural conversation flow

## 📚 Resources

- [Liquid AI Website](https://www.liquid.ai/)
- [GitHub Repository](https://github.com/Liquid4All/liquid-audio/)
- [Model on Hugging Face](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B)
- [fastrtc Documentation](https://github.com/freddyaboulton/fastrtc)

## 📝 License

Licensed under the LFM Open License v1.0

## 💡 Tips

- Speak clearly and pause briefly between thoughts
- Use a good quality microphone for best results
- Adjust temperature for different creativity levels
- Lower top-k values produce more consistent responses
- GPU acceleration is recommended for real-time performance