| # F5-TTS Model Inference Guide | |
| Welcome! This guide will walk you through the steps to load and run the **F5-TTS** model for text-to-speech synthesis using reference audio and text inputs. | |
| --- | |
| ### Did You Know? | |
| *Text-to-speech models like F5-TTS can mimic voice characteristics by analyzing just a few seconds of audio input. This adaptability is paving the way for personalized, AI-driven audio content.* | |
| --- | |
| ## Steps to Run the F5-TTS Model | |
| ### 1. Clone the Repository | |
| Start by cloning the F5-TTS repository to your local environment: | |
| ```bash | |
| git clone https://github.com/SWivid/F5-TTS.git | |
| cd F5-TTS | |
| ``` | |
| ### 2. Download the model weights | |
| #### copy the download link of the model file and download using wget | |
| ```bash | |
| wget https://hf.rst.im/ModelsLab/F5-tts-brazilian/resolve/main/Brazilian_Portuguese/model_2600000.pt -P ckpts/ | |
| ``` | |
| ### 3. Install CUDA | |
| #### Install an appropriate CUDA version compatible with your PyTorch and TorchAudio versions to enable GPU support. | |
| ```bash | |
| pip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 | |
| pip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 | |
| ``` | |
| ### 4. Install Required Python Packages | |
| Install the required dependencies specified in the requirements.txt file to set up your environment: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 5. System Setup: APT Update, FFmpeg, and CUDA | |
| Before running inference, ensure your system has the necessary dependencies: | |
| Update APT Packages and Install FFmpeg | |
| FFmpeg is essential for audio processing tasks. Update your APT packages and install ffmpeg with the following commands: | |
| ```bash | |
| apt update | |
| apt install -y ffmpeg | |
| ``` | |
| ### 6. Run Inference with the F5-TTS Model | |
| With the environment ready, you can now run the inference script. Adjust the paths as needed: | |
| ```bash | |
| python inference-cli.py \ | |
| # Specify the model name to use for inference | |
| --model "F5-TTS" \ | |
| # Path to the model checkpoint file, which contains the saved model weights | |
| --ckpt_file "path/to/model.pt" \ | |
| # Path to the reference audio file. This file is used to capture the speaking style | |
| # and voice characteristics, which the model will try to mimic. | |
| --ref_audio "wavs/sample_audio.wav" \ | |
| # Reference text associated with the reference audio file. | |
| # This helps the model understand the speaking style. | |
| --ref_text "levantara a mão contra ele e o oficial então arrancara da espada e atravessara o de lado a lado estava direito ah" \ | |
| # Text that the model will generate speech for. This will be spoken in the style | |
| # derived from the reference audio and text. | |
| --gen_text "O Brasil, oficialmente República Federativa do Brasil, é o maior país da América do Sul e da América Latina." | |
| ``` |