--- license: cc-by-nc-sa-4.0 --- # 🌸 PitchFlower

arXiv GitHub

Official pretrained checkpoint of the paper *PitchFlower: A flow-based neural audio codec with pitch controllability*. ## 🧠 Overview PitchFlower achieves pitch controllability by means of a perturbation strategy. During inference, pitch information is removed by applying a random flattening and shifting operation. The model is trained with a reconstruction task, providing pitch information explicitly.

PitchFlower architecture

We use an autoencoder with an RVQ bottleneck and a flow-based decoder to produce high-quality audio. More details can be found in the paper. ## 📦 Installation and Usage Check out our GitHub repo to learn how to use PitchFlower https://github.com/diegotg2000/PitchFlower ## 🙌 Acknowledgements We'd like to acknowledge the repositories from which we draw inspiration and parts of the code - Vocos: https://github.com/gemelo-ai/vocos - WavTokenizer: https://github.com/jishengpeng/WavTokenizer - Encodec: https://github.com/facebookresearch/encodec This work has been done in the [Analysis/Synthesis team of the STMS laboratory](https://www.stms-lab.fr/team/analyse-et-synthese-des-sons/) at IRCAM. It has been funded by the [ANR project EVA](https://anr.fr/Project-ANR-23-CE23-0018). ## 📫 Contact For questions or collaboration opportunities, feel free to reach out: dtorres@ircam.fr ## 🧩 Citation ```bibtex @misc{pitchflower, title={PitchFlower: A flow-based neural audio codec with pitch controllability}, author={Diego Torres and Axel Roebel and Nicolas Obin}, year={2025}, eprint={2510.25566}, archivePrefix={arXiv}, url={https://arxiv.org/abs/2510.25566}, } ``` ## 📜 License This project is licensed under the [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license.