VoxMorph-Models / README.md
BharathK333's picture
Update README.md
664a6cf verified
metadata
license: mit
datasets:
  - BharathK333/VoxMorph-Dataset
language:
  - en
base_model:
  - ResembleAI/chatterbox
pipeline_tag: text-to-speech
tags:
  - ICASSP
  - Audio-to-Audio
  - Zero-shot-tts
  - Voice-Morphing
  - Security
papers:
  - https://huggingface.co/papers/2601.20883

VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings

Project Page | Paper | GitHub | Demo | Dataset

VoxMorph is a zero-shot framework that produces high-fidelity voice morphs from as little as five seconds of audio per subject without model retraining. The method disentangles vocal traits into prosody and timbre embeddings, enabling fine-grained interpolation of speaking style and identity. These embeddings are fused via Spherical Linear Interpolation (Slerp) and synthesized using an autoregressive language model coupled with a Conditional Flow Matching network.

This repository hosts the official model checkpoints for VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings (ICASSP 2026). It contains the checkpoint files (s3gen.pt and t3_cfg.pt) for VoxMorph, a zero-shot TTS framework built on top of Resemble AI's frozen Chatterbox-TTS backbone.