ModelsLab
/

F5-tts-brazilian

Model card Files Files and versions

xet

Community

AbhishekTiwariAKT commited on Nov 2, 2024

Commit

ad2b94a

verified ·

1 Parent(s): cee2aea

Upload /README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +73 -0

README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# F5-TTS Model Inference Guide
+Welcome! This guide will walk you through the steps to load and run the **F5-TTS** model for text-to-speech synthesis using reference audio and text inputs.
+---
+### Did You Know?
+*Text-to-speech models like F5-TTS can mimic voice characteristics by analyzing just a few seconds of audio input. This adaptability is paving the way for personalized, AI-driven audio content.*
+---
+## Steps to Run the F5-TTS Model
+### 1. Clone the Repository
+Start by cloning the F5-TTS repository to your local environment:
+```bash
+git clone https://github.com/SWivid/F5-TTS.git
+cd F5-TTS
+```
+### 2. Install CUDA
+#### Install an appropriate CUDA version compatible with your PyTorch and TorchAudio versions to enable GPU support.
+```bash
+pip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
+pip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
+```
+### 3. Install Required Python Packages
+Install the required dependencies specified in the requirements.txt file to set up your environment:
+```bash
+pip install -r requirements.txt
+```
+### 4. System Setup: APT Update, FFmpeg, and CUDA
+Before running inference, ensure your system has the necessary dependencies:
+Update APT Packages and Install FFmpeg
+FFmpeg is essential for audio processing tasks. Update your APT packages and install ffmpeg with the following commands:
+```bash
+apt update
+apt install -y ffmpeg
+```
+### 5. Run Inference with the F5-TTS Model
+With the environment ready, you can now run the inference script. Adjust the paths as needed:
+```bash
+python inference-cli.py \
+  # Specify the model name to use for inference
+  --model "F5-TTS" \
+  # Path to the model checkpoint file, which contains the saved model weights
+  --ckpt_file "path/to/model.pt" \
+  # Path to the reference audio file. This file is used to capture the speaking style
+  # and voice characteristics, which the model will try to mimic.
+  --ref_audio "wavs/sample_audio.wav" \
+  # Reference text associated with the reference audio file.
+  # This helps the model understand the speaking style.
+  --ref_text "levantara a mão contra ele e o oficial então arrancara da espada e atravessara o de lado a lado estava direito ah" \
+  # Text that the model will generate speech for. This will be spoken in the style
+  # derived from the reference audio and text.
+  --gen_text "O Brasil, oficialmente República Federativa do Brasil, é o maior país da América do Sul e da América Latina."
+```