AudioDubbAi / .github /copilot-instructions.md
vasugo05's picture
Upload 24 files
fad5c32 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

AudioDubb Project - Development Instructions

Project Overview

AudioDubb is an AI-powered multilingual audio dubbing engine optimized for Hugging Face Spaces deployment. It transcribes, translates, and generates dubbed audio while preserving speaker identity and emotional expression.

Checklist

  • Verify copilot-instructions.md file created
  • Clarify Project Requirements (Hugging Face Spaces audio dubbing app)
  • Scaffold the Project
  • Customize the Project
  • Install Required Extensions (Gradio built-in, no VS Code extensions needed)
  • Compile/Verify the Project
  • Create and Run Task
  • Launch the Project
  • Ensure Documentation is Complete

Project Structure Created

AudioDubb/
β”œβ”€β”€ app.py                      # Main Gradio application
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ .gitignore                  # Git ignore rules
β”œβ”€β”€ README.md                   # Complete documentation
β”œβ”€β”€ README_HF.md                # Hugging Face Spaces metadata
β”œβ”€β”€ .github/
β”‚   └── copilot-instructions.md # Development instructions
└── src/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ models/
    β”‚   β”œβ”€β”€ __init__.py
    β”‚   └── model_manager.py    # Model loading and caching
    └── core/
        β”œβ”€β”€ __init__.py
        β”œβ”€β”€ transcriber.py      # Whisper Large v3 integration
        β”œβ”€β”€ translator.py       # NLLB-200 Distilled integration
        β”œβ”€β”€ voice_cloner.py     # XTTS-v2 voice cloning
        β”œβ”€β”€ audio_processor.py  # Audio I/O and processing
        └── pipeline.py         # Complete dubbing workflow

Implementation Details

Components Implemented

  1. Model Manager: Singleton for efficient model caching with GPU support
  2. Transcriber: Whisper Large v3 with language detection
  3. Translator: NLLB-200 Distilled with batch processing
  4. Voice Cloner: XTTS-v2 with speaker identity preservation
  5. Audio Processor: File I/O, normalization, and cleanup
  6. Dubbing Pipeline: Orchestrates complete workflow
  7. Web Interface: Gradio app with advanced options

Key Features

  • βœ… In-memory only processing (no permanent storage)
  • βœ… GPU acceleration and model caching
  • βœ… 20+ language support
  • βœ… Speaker identity and emotion preservation
  • βœ… Privacy-first architecture (no logging)
  • βœ… Educational/personal use disclaimer
  • βœ… Hugging Face Spaces optimized

Key Requirements

  • βœ… Hugging Face Spaces ONLY deployment (no local storage)
  • βœ… In-memory only processing (no permanent storage)
  • βœ… Models: Whisper Large v3, NLLB-200 Distilled, XTTS-v2
  • βœ… GPU acceleration and model caching
  • βœ… Gradio web interface for HF Spaces
  • βœ… Support 20+ languages
  • βœ… Privacy: No logging, immediate cleanup
  • βœ… Educational/personal use disclaimer
  • βœ… Complete documentation for HF Spaces deployment

Files Created

  • app.py - Gradio web interface (HF Spaces only)
  • requirements.txt - Python dependencies
  • README.md - Complete documentation
  • DEPLOYMENT.md - HF Spaces deployment guide
  • README_HF.md - HF Spaces metadata
  • .gitignore - Git ignore rules
  • src/models/model_manager.py - Model caching and management
  • src/core/transcriber.py - Whisper transcription
  • src/core/translator.py - NLLB-200 translation
  • src/core/voice_cloner.py - XTTS-v2 voice cloning
  • src/core/audio_processor.py - Audio I/O and processing
  • src/core/pipeline.py - Dubbing workflow orchestration

Deployment Instructions

  1. Create new Space on huggingface.co/spaces
  2. Select Gradio SDK
  3. Upload all project files
  4. Application auto-deploys with HF infrastructure
  5. See DEPLOYMENT.md for detailed steps