BharathK333
/

VoxMorph-Models

Model card Files Files and versions

VoxMorph-Models / README.md

BharathK333's picture

Update README.md

664a6cf verified 29 days ago

|

history blame contribute delete

1.45 kB

	---
	license: mit
	datasets:
	- BharathK333/VoxMorph-Dataset
	language:
	- en
	base_model:
	- ResembleAI/chatterbox
	pipeline_tag: text-to-speech
	tags:
	- ICASSP
	- Audio-to-Audio
	- Zero-shot-tts
	- Voice-Morphing
	- Security
	papers:
	- https://huggingface.co/papers/2601.20883
	---
	# VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings

	[Project Page](https://vcbsl.github.io/VoxMorph/) \| [Paper](https://huggingface.co/papers/2601.20883) \| [GitHub](https://github.com/Bharath-K3/VoxMorph) \| [Demo](https://huggingface.co/spaces/Bharath-K3/VoxMorph) \| [Dataset](https://huggingface.co/datasets/BharathK333/VoxMorph-Dataset)

	VoxMorph is a zero-shot framework that produces high-fidelity voice morphs from as little as five seconds of audio per subject without model retraining. The method disentangles vocal traits into prosody and timbre embeddings, enabling fine-grained interpolation of speaking style and identity. These embeddings are fused via Spherical Linear Interpolation (Slerp) and synthesized using an autoregressive language model coupled with a Conditional Flow Matching network.

	This repository hosts the official model checkpoints for VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings (ICASSP 2026). It contains the checkpoint files (`s3gen.pt` and `t3_cfg.pt`) for VoxMorph, a zero-shot TTS framework built on top of Resemble AI's frozen Chatterbox-TTS backbone.