Spaces:

vasugo05
/

AudioDubbAi

Paused

App Files Files Community

AudioDubbAi / .github /copilot-instructions.md

vasugo05

Upload 24 files

fad5c32 verified 3 months ago

preview code

raw

history blame contribute delete

3.82 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

AudioDubb Project - Development Instructions

Project Overview

AudioDubb is an AI-powered multilingual audio dubbing engine optimized for Hugging Face Spaces deployment. It transcribes, translates, and generates dubbed audio while preserving speaker identity and emotional expression.

Checklist

Verify copilot-instructions.md file created
Clarify Project Requirements (Hugging Face Spaces audio dubbing app)
Scaffold the Project
Customize the Project
Install Required Extensions (Gradio built-in, no VS Code extensions needed)
Compile/Verify the Project
Create and Run Task
Launch the Project
Ensure Documentation is Complete

Project Structure Created

AudioDubb/
├── app.py                      # Main Gradio application
├── requirements.txt            # Python dependencies
├── .gitignore                  # Git ignore rules
├── README.md                   # Complete documentation
├── README_HF.md                # Hugging Face Spaces metadata
├── .github/
│   └── copilot-instructions.md # Development instructions
└── src/
    ├── __init__.py
    ├── models/
    │   ├── __init__.py
    │   └── model_manager.py    # Model loading and caching
    └── core/
        ├── __init__.py
        ├── transcriber.py      # Whisper Large v3 integration
        ├── translator.py       # NLLB-200 Distilled integration
        ├── voice_cloner.py     # XTTS-v2 voice cloning
        ├── audio_processor.py  # Audio I/O and processing
        └── pipeline.py         # Complete dubbing workflow

Implementation Details

Components Implemented

Model Manager: Singleton for efficient model caching with GPU support
Transcriber: Whisper Large v3 with language detection
Translator: NLLB-200 Distilled with batch processing
Voice Cloner: XTTS-v2 with speaker identity preservation
Audio Processor: File I/O, normalization, and cleanup
Dubbing Pipeline: Orchestrates complete workflow
Web Interface: Gradio app with advanced options

Key Features

✅ In-memory only processing (no permanent storage)
✅ GPU acceleration and model caching
✅ 20+ language support
✅ Speaker identity and emotion preservation
✅ Privacy-first architecture (no logging)
✅ Educational/personal use disclaimer
✅ Hugging Face Spaces optimized

Key Requirements

✅ Hugging Face Spaces ONLY deployment (no local storage)
✅ In-memory only processing (no permanent storage)
✅ Models: Whisper Large v3, NLLB-200 Distilled, XTTS-v2
✅ GPU acceleration and model caching
✅ Gradio web interface for HF Spaces
✅ Support 20+ languages
✅ Privacy: No logging, immediate cleanup
✅ Educational/personal use disclaimer
✅ Complete documentation for HF Spaces deployment

Files Created

app.py - Gradio web interface (HF Spaces only)
requirements.txt - Python dependencies
README.md - Complete documentation
DEPLOYMENT.md - HF Spaces deployment guide
README_HF.md - HF Spaces metadata
.gitignore - Git ignore rules
src/models/model_manager.py - Model caching and management
src/core/transcriber.py - Whisper transcription
src/core/translator.py - NLLB-200 translation
src/core/voice_cloner.py - XTTS-v2 voice cloning
src/core/audio_processor.py - Audio I/O and processing
src/core/pipeline.py - Dubbing workflow orchestration

Deployment Instructions

Create new Space on huggingface.co/spaces
Select Gradio SDK
Upload all project files
Application auto-deploys with HF infrastructure
See DEPLOYMENT.md for detailed steps