Ason Studio - Media Processing & Dubbing Pipeline

This repository contains tools developed for automated media processing, including:

1. Video Dubbing & Speaker Diarization (`speaker_diarization.py`)

A comprehensive tool to extract audio, separate speakers, and automatically map them to synthetic voices using CapCut.

State-of-the-art Diarization: Integrated with pyannote.audio 3.1 (and segmentation-3.0) for highly accurate multi-speaker tracking.
Smart Timeline Construction: Overlap detection and precision SRT-to-Speaker mapping.
CapCut Integration: Automatically generates CapCut drafts with correctly spaced and pitch-adjusted dubbed audio blocks.
Clean UI: Built on PyQt5 with an optimized 3-panel light-mode workflow.

A pipeline tool for processing Manga videos/images.

Bubble Extraction: Extracts dialogue bubbles automatically.
Inpainting & Cleaning: Prepares backgrounds for translated text.
Timeline Export: Seamlessly imports the edited frames back into video editing software.

A dedicated text-to-speech inference module tailored for Vietnamese multi-voice generation.

Supports numerous distinct speaker profiles.
Integrated directly with the dubbing pipeline to produce high-quality audio segments.

# Dùng Miniconda (khuyến nghị)
conda create -n ason python=3.10 -y
conda activate ason

Vào https://pytorch.org/get-started/locally/ chọn phiên bản phù hợp GPU của mày, ví dụ:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install pyannote.audio soundfile numpy opencv-python Pillow PyQt5 requests manga-ocr huggingface_hub

huggingface-cli login

Sau đó vào chấp nhận điều khoản (Accept) tại 3 repo sau:

# Windows - dùng conda cho nhanh:
conda install -c conda-forge ffmpeg -y

# Hoặc tải tay từ https://ffmpeg.org/download.html rồi thêm vào PATH

Package	Mục đích
`torch`	Deep learning framework (GPU)
`torchaudio`	Xử lý audio cho PyTorch
`pyannote.audio`	Speaker diarization (phân biệt giọng nói)
`soundfile`	Đọc/ghi file âm thanh (.wav, .flac)
`numpy`	Tính toán mảng số
`opencv-python`	Xử lý ảnh/video (cv2)
`Pillow`	Xử lý ảnh (PIL)
`PyQt5`	Giao diện đồ họa (GUI)
`requests`	Gọi HTTP API
`manga-ocr`	OCR chữ Nhật trên bong bóng manga
`huggingface_hub`	Tải/upload model từ HuggingFace
`ffmpeg`	Encode/decode audio & video (system tool)

# Dubbing & Diarization
python speaker_diarization.py

# Manga Editor
python manga_editor.py

# TTS GUI
python chumtts2/gui.py

GGUF

Model size

6B params

Architecture

lumina2

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support