| | --- |
| | license: mit |
| | datasets: |
| | - BharathK333/VoxMorph-Dataset |
| | language: |
| | - en |
| | base_model: |
| | - ResembleAI/chatterbox |
| | pipeline_tag: text-to-speech |
| | tags: |
| | - ICASSP |
| | - Audio-to-Audio |
| | - Zero-shot-tts |
| | - Voice-Morphing |
| | - Security |
| | papers: |
| | - https://huggingface.co/papers/2601.20883 |
| | --- |
| | # VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings |
| |
|
| | [**Project Page**](https://vcbsl.github.io/VoxMorph/) | [**Paper**](https://huggingface.co/papers/2601.20883) | [**GitHub**](https://github.com/Bharath-K3/VoxMorph) | [**Demo**](https://huggingface.co/spaces/Bharath-K3/VoxMorph) | [**Dataset**](https://huggingface.co/datasets/BharathK333/VoxMorph-Dataset) |
| |
|
| | VoxMorph is a zero-shot framework that produces high-fidelity voice morphs from as little as five seconds of audio per subject without model retraining. The method disentangles vocal traits into prosody and timbre embeddings, enabling fine-grained interpolation of speaking style and identity. These embeddings are fused via Spherical Linear Interpolation (Slerp) and synthesized using an autoregressive language model coupled with a Conditional Flow Matching network. |
| |
|
| | This repository hosts the official model checkpoints for **VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings** (ICASSP 2026). It contains the checkpoint files (`s3gen.pt` and `t3_cfg.pt`) for VoxMorph, a zero-shot TTS framework built on top of Resemble AI's frozen Chatterbox-TTS backbone. |
| |
|