speech-to-speech / README.md
Rcarvalo's picture
Upload README.md with huggingface_hub
23d6a6e verified
|
raw
history blame
2.45 kB
metadata
title: LFM2-Audio Real-time Speech-to-Speech
emoji: πŸŽ™οΈ
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: other

LFM2-Audio Real-time Speech-to-Speech Chat

Real-time WebRTC streaming demo of LFM2-Audio-1.5B, Liquid AI's first end-to-end audio foundation model.

✨ Features

  • πŸ”΄ Real-time WebRTC streaming - Instant response with minimal latency
  • πŸŽ™οΈ Continuous listening - Natural conversation flow with automatic pause detection
  • πŸ’¬ Interleaved output - Simultaneous text and audio generation
  • πŸ”„ Multi-turn memory - Context-aware conversations
  • ⚑ Low latency - Optimized for real-time interaction

πŸš€ How to Use

  1. Grant microphone access when prompted by your browser
  2. Start speaking - The model listens continuously
  3. Pause briefly - The model detects pauses and responds automatically
  4. Continue conversation - Build multi-turn dialogues naturally

πŸŽ›οΈ Parameters

Temperature

  • 0: Greedy decoding (most deterministic)
  • 1.0: Default (balanced creativity and coherence)
  • 2.0: Maximum creativity (more diverse outputs)

Top-k

  • 0: No filtering (full vocabulary)
  • 4: Default (conservative, high quality)
  • Higher values: More diverse but potentially less coherent

πŸ—οΈ Technical Details

  • Model: LFM2-Audio-1.5B
  • Generation Mode: Interleaved (optimized for real-time)
  • Audio Codec: Mimi (24kHz)
  • Streaming: WebRTC via fastrtc
  • Backend: PyTorch with CUDA acceleration

πŸ”§ Differences from Standard Demo

This demo uses fastrtc for WebRTC streaming, enabling:

  • Continuous audio streaming without manual recording
  • Automatic voice activity detection (VAD)
  • Lower latency through chunked processing
  • More natural conversation flow

πŸ“š Resources

πŸ“ License

Licensed under the LFM Open License v1.0

πŸ’‘ Tips

  • Speak clearly and pause briefly between thoughts
  • Use a good quality microphone for best results
  • Adjust temperature for different creativity levels
  • Lower top-k values produce more consistent responses
  • GPU acceleration is recommended for real-time performance