Upload folder using huggingface_hub

91fc676 verified 26 days ago

4.57 kB

	---
	license: cc-by-4.0
	language:
	- hi
	- as
	- bn
	- bo
	- en
	- gu
	- kn
	- ml
	- mr
	- or
	- pa
	- ta
	- te
	- ur
	tags:
	- tts
	- indictts
	- fs2
	- mfa
	- HS
	- hybrid_segmentation
	- fastspeech2
	---
	# Latest Fastspeech2 Models using FLAT Start

	This repository contains new and high quality Fastspeech2 Models for Indian languages implemented using the Flat Start for speech synthesis. The models are capable of generating mel-spectrograms from text inputs and can be used to synthesize speech.

	The Repo is large in size. New Models are in "language"_latest folder.

	Supported languages: Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Konkani(Maharashtrian), Maithili, Malayalam, Manipuri, Nepali, Punjabi, Rajasthani, Sanskrit, Tamil, Telugu.

	NOTE - I do not own any right to the repository, all the rights goes to original owner. This repository is meant to help easy installation of the speech models.

	## Model Files

	The model for each language includes the following files:

	- `config.yaml`: Configuration file for the Fastspeech2 Model.
	- `energy_stats.npz`: Energy statistics for normalization during synthesis.
	- `feats_stats.npz`: Features statistics for normalization during synthesis.
	- `feats_type`: Features type information.
	- `pitch_stats.npz`: Pitch statistics for normalization during synthesis.
	- `model.pth`: Pre-trained Fastspeech2 model weights.

	## Installation

	1. Install [Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) first. Create a conda environment using the provided `environment.yml` file:

	```shell
	conda env create -f environment.yml
	```

	2.Activate the conda environment (check inside environment.yaml file):
	```shell
	conda activate tts-hs-hifigan
	```

	3. Install PyTorch separately (you can install the specific version based on your requirements):
	```shell
	conda install pytorch cudatoolkit
	pip install torchaudio
	```
	## Vocoder
	For generating WAV files from mel-spectrograms, you can use a vocoder of your choice. One popular option is the [HIFIGAN](https://github.com/jik876/hifi-gan) vocoder (Clone this repo and put it in the current working directory). Please refer to the documentation of the vocoder you choose for installation and usage instructions.

	(We have used the HIFIGAN V1 vocoder and have provided Vocoder for few languages in the Vocoder folder. If needed, make sure to adjust the path in the inference file.)

	## Usage

	The directory paths are Relative. ( But if needed, Make changes to text_preprocess_for_inference.py and inference.py file, Update folder/file paths wherever required.)

	Please give language/gender in small cases and sample text between quotes. Adjust output speed using the alpha parameter (higher for slow voiced output and vice versa). Output argument is optional; the provide name will be used for the output file.

	Use the inference file to synthesize speech from text inputs:
	```shell
	python inference.py --sample_text "Your input text here" --language <language>_latest --gender <gender> --alpha <alpha> --output_file <file_name.wav OR path/to/file_name.wav>
	```

	Example:

	```
	python inference.py --sample_text "श्रीलंका और पाकिस्तान में खेला जा रहा एशिया कप अब तक का सबसे विवादित टूर्नामेंट होता जा रहा है।" --language hindi_latest --gender male --alpha 1 --output_file male_hindi_output.wav
	```
	The file will be stored as `male_hindi_output.wav` and will be inside current working directory. If --output_file argument is not given it will be stored as `<language>_<gender>_output.wav` in the current working directory.

	Use "language"_latest in --language to use latest models.


	### Citation
	If you use this Fastspeech2 Model in your research or work, please consider citing:

	“
	COPYRIGHT
	2025, Speech Technology Consortium,

	Bhashini, MeiTY and by Hema A Murthy & S Umesh,


	DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
	and
	ELECTRICAL ENGINEERING,
	IIT MADRAS. ALL RIGHTS RESERVED "



	Shield: [![CC BY 4.0][cc-by-shield]][cc-by]

	This work is licensed under a
	[Creative Commons Attribution 4.0 International License][cc-by].

	[![CC BY 4.0][cc-by-image]][cc-by]

	[cc-by]: http://creativecommons.org/licenses/by/4.0/
	[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
	[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg