| --- |
| license: cc-by-nc-sa-4.0 |
| --- |
| |
| # πΈ PitchFlower |
|
|
| <p align="left"> |
| <a href="https://arxiv.org/abs/2510.25566"> |
| <img src="https://img.shields.io/badge/arXiv-PitchFlower-b31b1b?logo=arxiv&logoColor=white" alt="arXiv"> |
| </a> |
| <a href="https://github.com/diegotg2000/PitchFlower"> |
| <img src="https://img.shields.io/badge/GitHub-PitchFlower-181717?logo=github" alt="GitHub"> |
| </a> |
| </p> |
| |
| Official pretrained checkpoint of the paper *PitchFlower: A flow-based neural audio codec with pitch controllability*. |
|
|
| ## π§ Overview |
|
|
| PitchFlower achieves pitch controllability by means of a perturbation strategy. During inference, pitch information is removed by applying a random flattening and shifting operation. The model is trained with a reconstruction task, providing pitch information explicitly. |
|
|
| <p align="center"> |
| <img src="pitchflower_diagram.png" alt="PitchFlower architecture" width="600"> |
| </p> |
|
|
| We use an autoencoder with an RVQ bottleneck and a flow-based decoder to produce high-quality audio. More details can be found in the paper. |
|
|
| ## π¦ Installation and Usage |
|
|
| Check out our GitHub repo to learn how to use PitchFlower https://github.com/diegotg2000/PitchFlower |
|
|
| ## π Acknowledgements |
|
|
| We'd like to acknowledge the repositories from which we draw inspiration and parts of the code |
|
|
| - Vocos: https://github.com/gemelo-ai/vocos |
| - WavTokenizer: https://github.com/jishengpeng/WavTokenizer |
| - Encodec: https://github.com/facebookresearch/encodec |
|
|
| This work has been done in the [Analysis/Synthesis team of the STMS laboratory](https://www.stms-lab.fr/team/analyse-et-synthese-des-sons/) at IRCAM. It has been funded by the [ANR project EVA](https://anr.fr/Project-ANR-23-CE23-0018). |
|
|
| ## π« Contact |
|
|
| For questions or collaboration opportunities, feel free to reach out: dtorres@ircam.fr |
|
|
| ## π§© Citation |
|
|
| ```bibtex |
| @misc{pitchflower, |
| title={PitchFlower: A flow-based neural audio codec with pitch controllability}, |
| author={Diego Torres and Axel Roebel and Nicolas Obin}, |
| year={2025}, |
| eprint={2510.25566}, |
| archivePrefix={arXiv}, |
| url={https://arxiv.org/abs/2510.25566}, |
| } |
| ``` |
|
|
| ## π License |
|
|
| This project is licensed under the [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. |