YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Ason Studio - Media Processing & Dubbing Pipeline

This repository contains tools developed for automated media processing, including:

1. Video Dubbing & Speaker Diarization (speaker_diarization.py)

A comprehensive tool to extract audio, separate speakers, and automatically map them to synthetic voices using CapCut.

  • State-of-the-art Diarization: Integrated with pyannote.audio 3.1 (and segmentation-3.0) for highly accurate multi-speaker tracking.
  • Smart Timeline Construction: Overlap detection and precision SRT-to-Speaker mapping.
  • CapCut Integration: Automatically generates CapCut drafts with correctly spaced and pitch-adjusted dubbed audio blocks.
  • Clean UI: Built on PyQt5 with an optimized 3-panel light-mode workflow.

2. Manga Editor (manga_editor.py / manga_editor-vipfinal.py)

A pipeline tool for processing Manga videos/images.

  • Bubble Extraction: Extracts dialogue bubbles automatically.
  • Inpainting & Cleaning: Prepares backgrounds for translated text.
  • Timeline Export: Seamlessly imports the edited frames back into video editing software.

3. Local TTS Engine (chumtts2/)

A dedicated text-to-speech inference module tailored for Vietnamese multi-voice generation.

  • Supports numerous distinct speaker profiles.
  • Integrated directly with the dubbing pipeline to produce high-quality audio segments.

Cài đặt (Installation)

Bước 1: Tạo môi trường Python

# Dùng Miniconda (khuyến nghị)
conda create -n ason python=3.10 -y
conda activate ason

Bước 2: Cài PyTorch + CUDA

Vào https://pytorch.org/get-started/locally/ chọn phiên bản phù hợp GPU của mày, ví dụ:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Bước 3: Cài toàn bộ dependencies

pip install pyannote.audio soundfile numpy opencv-python Pillow PyQt5 requests manga-ocr huggingface_hub

Bước 4: Đăng nhập HuggingFace

huggingface-cli login

Sau đó vào chấp nhận điều khoản (Accept) tại 3 repo sau:

Bước 5: Cài FFmpeg (bắt buộc cho xử lý audio/video)

# Windows - dùng conda cho nhanh:
conda install -c conda-forge ffmpeg -y

# Hoặc tải tay từ https://ffmpeg.org/download.html rồi thêm vào PATH

Tổng hợp packages

Package Mục đích
torch Deep learning framework (GPU)
torchaudio Xử lý audio cho PyTorch
pyannote.audio Speaker diarization (phân biệt giọng nói)
soundfile Đọc/ghi file âm thanh (.wav, .flac)
numpy Tính toán mảng số
opencv-python Xử lý ảnh/video (cv2)
Pillow Xử lý ảnh (PIL)
PyQt5 Giao diện đồ họa (GUI)
requests Gọi HTTP API
manga-ocr OCR chữ Nhật trên bong bóng manga
huggingface_hub Tải/upload model từ HuggingFace
ffmpeg Encode/decode audio & video (system tool)

Chạy chương trình

# Dubbing & Diarization
python speaker_diarization.py

# Manga Editor
python manga_editor.py

# TTS GUI
python chumtts2/gui.py
Downloads last month
2
GGUF
Model size
6B params
Architecture
lumina2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support