personaxgemini / DIRECTORY.md
eshwar06's picture
Upload 29 files
229897d verified

Linly-Talker Gemini Live - Directory Structure

Linly-Talker/
β”‚
β”œβ”€β”€ πŸ“„ Core Application Files
β”‚   β”œβ”€β”€ webui.py                    # Main Gradio WebUI (Gemini Live only)
β”‚   β”œβ”€β”€ app_gemini_live.py          # Standalone Gemini Live app
β”‚   β”œβ”€β”€ app.py                      # Original multi-feature app
β”‚   β”œβ”€β”€ app_musetalk.py             # MuseTalk-specific app
β”‚   β”œβ”€β”€ app_talk.py                 # SadTalker app
β”‚   β”œβ”€β”€ app_vits.py                 # VITS voice cloning app
β”‚   β”œβ”€β”€ app_multi.py                # Multi-turn conversation app
β”‚   β”œβ”€β”€ app_img.py                  # Image-based app
β”‚   └── configs.py                  # Configuration settings
β”‚
β”œβ”€β”€ πŸ€– LLM/ (Large Language Models)
β”‚   β”œβ”€β”€ GeminiLive.py              # ⭐ WebSocket client for Gemini Live
β”‚   β”œβ”€β”€ Gemini.py                  # Standard Gemini API
β”‚   β”œβ”€β”€ Linly-api-fast.py          # FastAPI LLM server
β”‚   β”œβ”€β”€ template.py                # LLM template class
β”‚   β”œβ”€β”€ __init__.py                # LLM module initialization
β”‚   └── README.md                  # LLM documentation
β”‚
β”œβ”€β”€ 🎭 TFG/ (Talking Face Generation)
β”‚   β”œβ”€β”€ MuseTalk.py                # ⭐ MuseTalk real-time inference
β”‚   β”œβ”€β”€ MuseV.py                   # MuseV variant
β”‚   β”œβ”€β”€ SadTalker.py               # SadTalker implementation
β”‚   β”œβ”€β”€ Wav2Lip.py                 # Wav2Lip lip-sync
β”‚   β”œβ”€β”€ Wav2Lipv2.py               # Wav2Lip v2
β”‚   β”œβ”€β”€ NeRFTalk.py                # NeRF-based talking face
β”‚   β”œβ”€β”€ Streamer.py                # ⭐ Audio buffer for streaming
β”‚   β”œβ”€β”€ __init__.py                # TFG module initialization
β”‚   β”œβ”€β”€ requirements_musetalk.txt  # MuseTalk dependencies
β”‚   β”œβ”€β”€ requirements_nerf.txt      # NeRF dependencies
β”‚   └── README.md                  # TFG documentation
β”‚
β”œβ”€β”€ 🎀 ASR/ (Automatic Speech Recognition)
β”‚   β”œβ”€β”€ Whisper.py                 # OpenAI Whisper
β”‚   β”œβ”€β”€ FunASR.py                  # FunASR implementation
β”‚   β”œβ”€β”€ OmniSenseVoice.py          # OmniSenseVoice
β”‚   β”œβ”€β”€ __init__.py                # ASR module initialization
β”‚   β”œβ”€β”€ requirements_funasr.txt    # FunASR dependencies
β”‚   β”œβ”€β”€ requirements_OmniSenseVoice.txt
β”‚   └── README.md                  # ASR documentation
β”‚
β”œβ”€β”€ πŸ”Š TTS/ (Text-to-Speech)
β”‚   β”œβ”€β”€ EdgeTTS.py                 # Microsoft Edge TTS
β”‚   β”œβ”€β”€ PaddleTTS.py               # PaddlePaddle TTS
β”‚   β”œβ”€β”€ XTTS.py                    # XTTS implementation
β”‚   β”œβ”€β”€ edge_app.py                # EdgeTTS demo app
β”‚   β”œβ”€β”€ paddletts_app.py           # PaddleTTS demo app
β”‚   β”œβ”€β”€ __init__.py                # TTS module initialization
β”‚   β”œβ”€β”€ requirements_paddle.txt    # PaddleTTS dependencies
β”‚   └── README.md                  # TTS documentation
β”‚
β”œβ”€β”€ 🎡 Voice Cloning Models
β”‚   β”œβ”€β”€ GPT_SoVITS/                # GPT-SoVITS voice cloning (86 files)
β”‚   β”œβ”€β”€ VITS/                      # VITS voice synthesis (8 files)
β”‚   β”œβ”€β”€ CosyVoice/                 # CosyVoice model
β”‚   └── ChatTTS/                   # ChatTTS model
β”‚
β”œβ”€β”€ 🎬 Avatar Models & Data
β”‚   β”œβ”€β”€ Musetalk/                  # MuseTalk models & data (57 files)
β”‚   β”‚   β”œβ”€β”€ models/                # Model weights
β”‚   β”‚   β”‚   β”œβ”€β”€ musetalk/          # Core MuseTalk models
β”‚   β”‚   β”‚   β”œβ”€β”€ dwpose/            # Pose detection models
β”‚   β”‚   β”‚   └── face-parse-bisent/ # Face parsing models
β”‚   β”‚   └── data/
β”‚   β”‚       └── video/             # Avatar video sources
β”‚   β”‚           └── yongen_musev.mp4  # Default avatar
β”‚   β”‚
β”‚   β”œβ”€β”€ NeRF/                      # NeRF models (59 files)
β”‚   β”œβ”€β”€ checkpoints/               # SadTalker checkpoints
β”‚   β”‚   β”œβ”€β”€ mapping_00109-model.pth.tar  # 149MB
β”‚   β”‚   β”œβ”€β”€ mapping_00229-model.pth.tar  # 149MB
β”‚   β”‚   └── ...
β”‚   └── face_detection/            # Face detection models (12 files)
β”‚
β”œβ”€β”€ 🌐 API & Server
β”‚   └── api/                       # API implementations (8 files)
β”‚
β”œβ”€β”€ πŸ“¦ Dependencies & Scripts
β”‚   β”œβ”€β”€ requirements.txt           # Basic requirements
β”‚   β”œβ”€β”€ requirements_app.txt       # App-specific requirements
β”‚   β”œβ”€β”€ requirements_webui.txt     # ⭐ WebUI requirements (main)
β”‚   └── scripts/                   # Utility scripts (5 files)
β”‚       β”œβ”€β”€ download_models.sh     # Auto-download models
β”‚       └── modelscope_download.py # ModelScope downloader
β”‚
β”œβ”€β”€ πŸ“š Documentation
β”‚   β”œβ”€β”€ README.md                  # Main README (English)
β”‚   β”œβ”€β”€ README_zh.md               # Chinese README
β”‚   β”œβ”€β”€ FAQ.md                     # ⭐ English FAQ (Gemini Live)
β”‚   β”œβ”€β”€ AutoDL部署.md              # AutoDL deployment guide
β”‚   β”œβ”€β”€ SECURITY.md                # Security policy
β”‚   └── docs/                      # Additional documentation
β”‚
β”œβ”€β”€ πŸ–ΌοΈ Assets
β”‚   β”œβ”€β”€ inputs/                    # Input files (4 files)
β”‚   └── examples/                  # Example files
β”‚
β”œβ”€β”€ πŸ”§ Configuration
β”‚   β”œβ”€β”€ .gitignore                 # Git ignore rules
β”‚   β”œβ”€β”€ .gitmodules                # Git submodules
β”‚   β”œβ”€β”€ configs.py                 # ⭐ Main configuration
β”‚   └── https_cert/                # HTTPS certificates (2 files)
β”‚
β”œβ”€β”€ πŸ““ Notebooks
β”‚   └── colab_webui.ipynb          # Google Colab notebook
β”‚
└── πŸ“œ License & Source
    β”œβ”€β”€ LICENSE                    # Apache 2.0 License
    └── src/                       # Source code (151 files)

Key Files for Gemini Live Integration

Essential Components (⭐)

  1. webui.py - Main application entry point
  2. LLM/GeminiLive.py - WebSocket client for Gemini API
  3. TFG/MuseTalk.py - Real-time avatar rendering
  4. TFG/Streamer.py - Audio buffer management
  5. FAQ.md - Troubleshooting guide
  6. requirements_webui.txt - All dependencies

Model Weights (Must Download)

checkpoints/
β”œβ”€β”€ mapping_00109-model.pth.tar    # 149MB - SadTalker
β”œβ”€β”€ mapping_00229-model.pth.tar    # 149MB - SadTalker
└── ...

Musetalk/models/
β”œβ”€β”€ musetalk/
β”‚   β”œβ”€β”€ pytorch_model.bin          # Main MuseTalk model
β”‚   └── ...
β”œβ”€β”€ dwpose/
β”‚   └── dw-ll_ucoco_384.pth        # Pose detection
└── face-parse-bisent/
    └── 79999_iter.pth             # Face parsing

File Count Summary

Category Count
Core Apps 8 files
LLM Module 6 files
TFG Module 11 files
ASR Module 7 files
TTS Module 8 files
Voice Cloning ~100 files
Avatar Models ~120 files
Documentation 6 files
Total ~260+ files

Disk Space Requirements

Component Size
Code & Scripts ~50 MB
MuseTalk Models ~2.5 GB
SadTalker Checkpoints ~1.5 GB
Face Detection ~500 MB
GPT-SoVITS (optional) ~1 GB
Total (Minimum) ~5.5 GB
Total (Full) ~8 GB

Quick Navigation

  • Start Here: webui.py
  • Configuration: configs.py
  • Gemini Integration: LLM/GeminiLive.py
  • Avatar Rendering: TFG/MuseTalk.py
  • Audio Streaming: TFG/Streamer.py
  • Troubleshooting: FAQ.md
  • Installation: requirements_webui.txt

Last Updated: February 2026
Repository: Kedreamix/Linly-Talker