Spaces:
Paused
Paused
Linly-Talker Gemini Live - Directory Structure
Linly-Talker/
β
βββ π Core Application Files
β βββ webui.py # Main Gradio WebUI (Gemini Live only)
β βββ app_gemini_live.py # Standalone Gemini Live app
β βββ app.py # Original multi-feature app
β βββ app_musetalk.py # MuseTalk-specific app
β βββ app_talk.py # SadTalker app
β βββ app_vits.py # VITS voice cloning app
β βββ app_multi.py # Multi-turn conversation app
β βββ app_img.py # Image-based app
β βββ configs.py # Configuration settings
β
βββ π€ LLM/ (Large Language Models)
β βββ GeminiLive.py # β WebSocket client for Gemini Live
β βββ Gemini.py # Standard Gemini API
β βββ Linly-api-fast.py # FastAPI LLM server
β βββ template.py # LLM template class
β βββ __init__.py # LLM module initialization
β βββ README.md # LLM documentation
β
βββ π TFG/ (Talking Face Generation)
β βββ MuseTalk.py # β MuseTalk real-time inference
β βββ MuseV.py # MuseV variant
β βββ SadTalker.py # SadTalker implementation
β βββ Wav2Lip.py # Wav2Lip lip-sync
β βββ Wav2Lipv2.py # Wav2Lip v2
β βββ NeRFTalk.py # NeRF-based talking face
β βββ Streamer.py # β Audio buffer for streaming
β βββ __init__.py # TFG module initialization
β βββ requirements_musetalk.txt # MuseTalk dependencies
β βββ requirements_nerf.txt # NeRF dependencies
β βββ README.md # TFG documentation
β
βββ π€ ASR/ (Automatic Speech Recognition)
β βββ Whisper.py # OpenAI Whisper
β βββ FunASR.py # FunASR implementation
β βββ OmniSenseVoice.py # OmniSenseVoice
β βββ __init__.py # ASR module initialization
β βββ requirements_funasr.txt # FunASR dependencies
β βββ requirements_OmniSenseVoice.txt
β βββ README.md # ASR documentation
β
βββ π TTS/ (Text-to-Speech)
β βββ EdgeTTS.py # Microsoft Edge TTS
β βββ PaddleTTS.py # PaddlePaddle TTS
β βββ XTTS.py # XTTS implementation
β βββ edge_app.py # EdgeTTS demo app
β βββ paddletts_app.py # PaddleTTS demo app
β βββ __init__.py # TTS module initialization
β βββ requirements_paddle.txt # PaddleTTS dependencies
β βββ README.md # TTS documentation
β
βββ π΅ Voice Cloning Models
β βββ GPT_SoVITS/ # GPT-SoVITS voice cloning (86 files)
β βββ VITS/ # VITS voice synthesis (8 files)
β βββ CosyVoice/ # CosyVoice model
β βββ ChatTTS/ # ChatTTS model
β
βββ π¬ Avatar Models & Data
β βββ Musetalk/ # MuseTalk models & data (57 files)
β β βββ models/ # Model weights
β β β βββ musetalk/ # Core MuseTalk models
β β β βββ dwpose/ # Pose detection models
β β β βββ face-parse-bisent/ # Face parsing models
β β βββ data/
β β βββ video/ # Avatar video sources
β β βββ yongen_musev.mp4 # Default avatar
β β
β βββ NeRF/ # NeRF models (59 files)
β βββ checkpoints/ # SadTalker checkpoints
β β βββ mapping_00109-model.pth.tar # 149MB
β β βββ mapping_00229-model.pth.tar # 149MB
β β βββ ...
β βββ face_detection/ # Face detection models (12 files)
β
βββ π API & Server
β βββ api/ # API implementations (8 files)
β
βββ π¦ Dependencies & Scripts
β βββ requirements.txt # Basic requirements
β βββ requirements_app.txt # App-specific requirements
β βββ requirements_webui.txt # β WebUI requirements (main)
β βββ scripts/ # Utility scripts (5 files)
β βββ download_models.sh # Auto-download models
β βββ modelscope_download.py # ModelScope downloader
β
βββ π Documentation
β βββ README.md # Main README (English)
β βββ README_zh.md # Chinese README
β βββ FAQ.md # β English FAQ (Gemini Live)
β βββ AutoDLι¨η½².md # AutoDL deployment guide
β βββ SECURITY.md # Security policy
β βββ docs/ # Additional documentation
β
βββ πΌοΈ Assets
β βββ inputs/ # Input files (4 files)
β βββ examples/ # Example files
β
βββ π§ Configuration
β βββ .gitignore # Git ignore rules
β βββ .gitmodules # Git submodules
β βββ configs.py # β Main configuration
β βββ https_cert/ # HTTPS certificates (2 files)
β
βββ π Notebooks
β βββ colab_webui.ipynb # Google Colab notebook
β
βββ π License & Source
βββ LICENSE # Apache 2.0 License
βββ src/ # Source code (151 files)
Key Files for Gemini Live Integration
Essential Components (β)
webui.py- Main application entry pointLLM/GeminiLive.py- WebSocket client for Gemini APITFG/MuseTalk.py- Real-time avatar renderingTFG/Streamer.py- Audio buffer managementFAQ.md- Troubleshooting guiderequirements_webui.txt- All dependencies
Model Weights (Must Download)
checkpoints/
βββ mapping_00109-model.pth.tar # 149MB - SadTalker
βββ mapping_00229-model.pth.tar # 149MB - SadTalker
βββ ...
Musetalk/models/
βββ musetalk/
β βββ pytorch_model.bin # Main MuseTalk model
β βββ ...
βββ dwpose/
β βββ dw-ll_ucoco_384.pth # Pose detection
βββ face-parse-bisent/
βββ 79999_iter.pth # Face parsing
File Count Summary
| Category | Count |
|---|---|
| Core Apps | 8 files |
| LLM Module | 6 files |
| TFG Module | 11 files |
| ASR Module | 7 files |
| TTS Module | 8 files |
| Voice Cloning | ~100 files |
| Avatar Models | ~120 files |
| Documentation | 6 files |
| Total | ~260+ files |
Disk Space Requirements
| Component | Size |
|---|---|
| Code & Scripts | ~50 MB |
| MuseTalk Models | ~2.5 GB |
| SadTalker Checkpoints | ~1.5 GB |
| Face Detection | ~500 MB |
| GPT-SoVITS (optional) | ~1 GB |
| Total (Minimum) | ~5.5 GB |
| Total (Full) | ~8 GB |
Quick Navigation
- Start Here:
webui.py - Configuration:
configs.py - Gemini Integration:
LLM/GeminiLive.py - Avatar Rendering:
TFG/MuseTalk.py - Audio Streaming:
TFG/Streamer.py - Troubleshooting:
FAQ.md - Installation:
requirements_webui.txt
Last Updated: February 2026
Repository: Kedreamix/Linly-Talker