VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings

VoxMorph is a zero-shot framework that produces high-fidelity voice morphs from as little as five seconds of audio per subject without model retraining. The method disentangles vocal traits into prosody and timbre embeddings, enabling fine-grained interpolation of speaking style and identity. These embeddings are fused via Spherical Linear Interpolation (Slerp) and synthesized using an autoregressive language model coupled with a Conditional Flow Matching network.

This repository hosts the official model checkpoints for VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings (ICASSP 2026). It contains the checkpoint files (s3gen.pt and t3_cfg.pt) for VoxMorph, a zero-shot TTS framework built on top of Resemble AI's frozen Chatterbox-TTS backbone.

Citation

If you find this work useful in your research, please consider citing the ICASSP 2026 paper:

@INPROCEEDINGS{11462383,
  author={Krishnamurthy, Bharath and Rattani, Ajita},
  booktitle={ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={VoxMorph: Scalable Zero-Shot Voice Identity Morphing via Disentangled Embeddings}, 
  year={2026},
  volume={},
  number={},
  pages={13332-13336},
  keywords={Filtering;Filters;Deepfakes;Vocoders;Videos;Protocols;HTTP;Wide area networks;Communication equipment;Communication systems;Voice morphing;text-to-speech;zero-shot learning;speaker embedding;interpolation;speech synthesis},
  doi={10.1109/ICASSP55912.2026.11462383}
}

@article{krishnamurthy2026voxmorph_arxiv,
  title={VoxMorph: Scalable Zero-Shot Voice Identity Morphing via Disentangled Embeddings},
  author={Krishnamurthy, Bharath and Rattani, Ajita},
  journal={arXiv preprint arXiv:2601.20883},
  year={2026}
}

Downloads last month: 4

Model tree for BharathK333/VoxMorph-Models

Base model

ResembleAI/chatterbox

Finetuned

(58)

this model

Dataset used to train BharathK333/VoxMorph-Models

Paper for BharathK333/VoxMorph-Models

VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings

Paper • 2601.20883 • Published Jan 27 • 1