YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Ason Studio - Media Processing & Dubbing Pipeline
This repository contains tools developed for automated media processing, including:
1. Video Dubbing & Speaker Diarization (speaker_diarization.py)
A comprehensive tool to extract audio, separate speakers, and automatically map them to synthetic voices using CapCut.
- State-of-the-art Diarization: Integrated with
pyannote.audio 3.1(andsegmentation-3.0) for highly accurate multi-speaker tracking. - Smart Timeline Construction: Overlap detection and precision SRT-to-Speaker mapping.
- CapCut Integration: Automatically generates CapCut drafts with correctly spaced and pitch-adjusted dubbed audio blocks.
- Clean UI: Built on PyQt5 with an optimized 3-panel light-mode workflow.
2. Manga Editor (manga_editor.py / manga_editor-vipfinal.py)
A pipeline tool for processing Manga videos/images.
- Bubble Extraction: Extracts dialogue bubbles automatically.
- Inpainting & Cleaning: Prepares backgrounds for translated text.
- Timeline Export: Seamlessly imports the edited frames back into video editing software.
3. Local TTS Engine (chumtts2/)
A dedicated text-to-speech inference module tailored for Vietnamese multi-voice generation.
- Supports numerous distinct speaker profiles.
- Integrated directly with the dubbing pipeline to produce high-quality audio segments.
Cài đặt (Installation)
Bước 1: Tạo môi trường Python
# Dùng Miniconda (khuyến nghị)
conda create -n ason python=3.10 -y
conda activate ason
Bước 2: Cài PyTorch + CUDA
Vào https://pytorch.org/get-started/locally/ chọn phiên bản phù hợp GPU của mày, ví dụ:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
Bước 3: Cài toàn bộ dependencies
pip install pyannote.audio soundfile numpy opencv-python Pillow PyQt5 requests manga-ocr huggingface_hub
Bước 4: Đăng nhập HuggingFace
huggingface-cli login
Sau đó vào chấp nhận điều khoản (Accept) tại 3 repo sau:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
- https://huggingface.co/pyannote/speaker-diarization-community-1
Bước 5: Cài FFmpeg (bắt buộc cho xử lý audio/video)
# Windows - dùng conda cho nhanh:
conda install -c conda-forge ffmpeg -y
# Hoặc tải tay từ https://ffmpeg.org/download.html rồi thêm vào PATH
Tổng hợp packages
| Package | Mục đích |
|---|---|
torch |
Deep learning framework (GPU) |
torchaudio |
Xử lý audio cho PyTorch |
pyannote.audio |
Speaker diarization (phân biệt giọng nói) |
soundfile |
Đọc/ghi file âm thanh (.wav, .flac) |
numpy |
Tính toán mảng số |
opencv-python |
Xử lý ảnh/video (cv2) |
Pillow |
Xử lý ảnh (PIL) |
PyQt5 |
Giao diện đồ họa (GUI) |
requests |
Gọi HTTP API |
manga-ocr |
OCR chữ Nhật trên bong bóng manga |
huggingface_hub |
Tải/upload model từ HuggingFace |
ffmpeg |
Encode/decode audio & video (system tool) |
Chạy chương trình
# Dubbing & Diarization
python speaker_diarization.py
# Manga Editor
python manga_editor.py
# TTS GUI
python chumtts2/gui.py
- Downloads last month
- 2
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support