Initial_commit

f15e280 verified 8 months ago

7.76 kB

	# SYSPIN Hackathon TTS API Documentation

	## Overview

	This API provides a Text-to-Speech (TTS) service that converts input text into speech audio. It supports multiple Indian languages and offers voice customization through predefined male and female speaker references.

	---

	## Endpoint: `/Get_Inference`

	* Method: `GET`
	* Description: Generates speech audio from the provided text using the specified language and speaker.

	### Query Parameters

	\| Parameter \| Type \| Required \| Description \| \|
	\| --------- \| ------ \| -------- \| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \| --------------------------------------------- \|
	\| `text` \| string \| Yes \| The input text to be converted into speech. \| \|
	\| `lang` \| string \| Yes \| The language of the input text. Acceptable values include: `bhojpuri`, `bengali`, `english`, `gujarati`, `hindi`, `chhattisgarhi`, `kannada`, `magahi`, `maithili`, `marathi`, `telugu`. \| \|
	\| `speaker` \| string \| Yes \| The desired speaker's voice. Format: `<language>_<gender>`. For example: `hindi_male`, `english_female`. Refer to the available speakers below. \|

	### Available Speakers

	\| Language \| Language codes \| Male Speaker \| Female Speaker \| \|
	\| ------------- \| -------- \| ------------------- \| --------------------- \| ----------------------------------------------------------------------------------------------------------------------------------- \|
	\| chhattisgarhi \| hne \| chhattisgarhi\_male \| chhattisgarhi\_female \| \|
	\| kannada \| kn \| kannada\_male \| kannada\_female \| \|
	\| maithili \| mai \| maithili\_male \| maithili\_female \| \|
	\| telugu \| te \| telugu\_male \| telugu\_female \| \|
	\| bengali \| bn \| bengali\_male \| bengali\_female \| \|
	\| bhojpuri \| bho \| bhojpuri\_male \| bhojpuri\_female \| \|
	\| marathi \| mr \| marathi\_male \| marathi\_female \| \|
	\| gujarati \| gu \| gujarati\_male \| gujarati\_female \| \|
	\| hindi \| hi \| hindi\_male \| hindi\_female \| \|
	\| magahi \| mag \| magahi\_male \| magahi\_female \| \|
	\| english \| en \| english\_male \| english\_female \|

	### Responses

	* 200 OK: Returns a WAV audio file as a streaming response containing the synthesized speech.
	* 422 Unprocessable Entity: Returned when:

	* Any of the required query parameters (`text`, `lang`, `speaker`) are missing.
	* The specified `lang` is not supported.
	* The specified `speaker` is not available.



	## Running the Server

	To start the FastAPI server:

	```bash
	docker build -t your_image_name ./
	docker run -d -p 8080:8080 your_image_name
	```

	## Hosting on a GPU

	To run your FastAPI-based Text-to-Speech (TTS) server inside a Docker container with GPU support, follow these steps:

	---

	## Prerequisites

	1. NVIDIA GPU: Ensure your system has an NVIDIA GPU installed.

	2. NVIDIA Drivers: Install the appropriate NVIDIA drivers for your GPU.

	3. Docker: Install Docker on your system.

	4. NVIDIA Container Toolkit: Install the NVIDIA Container Toolkit to enable GPU support in Docker containers.

	---

	## Installation Steps

	### 1. Install NVIDIA Drivers

	Ensure that the NVIDIA drivers compatible with your GPU are installed on your system.

	### 2. Install Docker

	If Docker is not already installed, you can install it by following the official Docker installation guide for your operating system.

	### 3. Install NVIDIA Container Toolkit

	The NVIDIA Container Toolkit allows Docker containers to utilize the GPU.

	For Ubuntu:

	```bash
	# Add the package repositories
	distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
	curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey \| sudo apt-key add -
	curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \| \
	sudo tee /etc/apt/sources.list.d/nvidia-docker.list

	# Update the package lists
	sudo apt-get update

	# Install the NVIDIA Container Toolkit
	sudo apt-get install -y nvidia-container-toolkit

	# Restart the Docker daemon to apply changes
	sudo systemctl restart docker
	```

	For other operating systems: Refer to the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) for detailed instructions.

	### 4. Verify GPU Access in Docker

	To confirm that Docker can access your GPU, run the following command:

	```bash
	docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
	```


	## Running Your FastAPI TTS Server with GPU Support

	Assuming your FastAPI TTS application is containerized and ready to run:

	1. Build Your Docker Image

	Navigate to the directory containing your `Dockerfile` and build the Docker image:

	```bash
	docker build -t your_image_name .
	```


	2. Run the Docker Container with GPU Support

	Start the container with GPU access enabled:

	```bash
	docker run --gpus all -p 8080:8080 your_image_name
	```

	## Example API Call

	```python
	import requests

	# Define the base URL of your API
	base_url = 'http://localhost:8080/Get_Inference'

	# Set up the query parameters
	params = {
	'text': 'Hello world',
	'lang': 'english',
	'speaker': 'english_female'
	}

	# Send the GET request
	response = requests.get(base_url, params=params)

	# Check if the request was successful
	if response.status_code == 200:
	# Save the audio content to a file
	with open('output.wav', 'wb') as f:
	f.write(response.content)
	print("Audio saved as 'output.wav'")
	else:
	# Print the error message
	print(f"Request failed with status code {response.status_code}")
	print("Response:", response.text)
	```