SpireLab's picture
Initial_commit
f15e280 verified
|
raw
history blame
7.76 kB
# SYSPIN Hackathon TTS API Documentation
## Overview
This API provides a Text-to-Speech (TTS) service that converts input text into speech audio. It supports multiple Indian languages and offers voice customization through predefined male and female speaker references.
---
## Endpoint: `/Get_Inference`
* **Method**: `GET`
* **Description**: Generates speech audio from the provided text using the specified language and speaker.
### Query Parameters
| Parameter | Type | Required | Description | |
| --------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------- |
| `text` | string | Yes | The input text to be converted into speech. | |
| `lang` | string | Yes | The language of the input text. Acceptable values include: `bhojpuri`, `bengali`, `english`, `gujarati`, `hindi`, `chhattisgarhi`, `kannada`, `magahi`, `maithili`, `marathi`, `telugu`. | |
| `speaker` | string | Yes | The desired speaker's voice. Format: `<language>_<gender>`. For example: `hindi_male`, `english_female`. Refer to the available speakers below. |
### Available Speakers
| Language | Language codes | Male Speaker | Female Speaker | |
| ------------- | -------- | ------------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| chhattisgarhi | hne | chhattisgarhi\_male | chhattisgarhi\_female | |
| kannada | kn | kannada\_male | kannada\_female | |
| maithili | mai | maithili\_male | maithili\_female | |
| telugu | te | telugu\_male | telugu\_female | |
| bengali | bn | bengali\_male | bengali\_female | |
| bhojpuri | bho | bhojpuri\_male | bhojpuri\_female | |
| marathi | mr | marathi\_male | marathi\_female | |
| gujarati | gu | gujarati\_male | gujarati\_female | |
| hindi | hi | hindi\_male | hindi\_female | |
| magahi | mag | magahi\_male | magahi\_female | |
| english | en | english\_male | english\_female |
### Responses
* **200 OK**: Returns a WAV audio file as a streaming response containing the synthesized speech.
* **422 Unprocessable Entity**: Returned when:
* Any of the required query parameters (`text`, `lang`, `speaker`) are missing.
* The specified `lang` is not supported.
* The specified `speaker` is not available.
## Running the Server
To start the FastAPI server:
```bash
docker build -t your_image_name ./
docker run -d -p 8080:8080 your_image_name
```
## Hosting on a GPU
To run your FastAPI-based Text-to-Speech (TTS) server inside a Docker container with GPU support, follow these steps:
---
## Prerequisites
1. **NVIDIA GPU**: Ensure your system has an NVIDIA GPU installed.
2. **NVIDIA Drivers**: Install the appropriate NVIDIA drivers for your GPU.
3. **Docker**: Install Docker on your system.
4. **NVIDIA Container Toolkit**: Install the NVIDIA Container Toolkit to enable GPU support in Docker containers.
---
## Installation Steps
### 1. Install NVIDIA Drivers
Ensure that the NVIDIA drivers compatible with your GPU are installed on your system.
### 2. Install Docker
If Docker is not already installed, you can install it by following the official Docker installation guide for your operating system.
### 3. Install NVIDIA Container Toolkit
The NVIDIA Container Toolkit allows Docker containers to utilize the GPU.
**For Ubuntu:**
```bash
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Update the package lists
sudo apt-get update
# Install the NVIDIA Container Toolkit
sudo apt-get install -y nvidia-container-toolkit
# Restart the Docker daemon to apply changes
sudo systemctl restart docker
```
**For other operating systems:** Refer to the [NVIDIA Container Toolkit installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) for detailed instructions.
### 4. Verify GPU Access in Docker
To confirm that Docker can access your GPU, run the following command:
```bash
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
```
## Running Your FastAPI TTS Server with GPU Support
Assuming your FastAPI TTS application is containerized and ready to run:
1. **Build Your Docker Image**
Navigate to the directory containing your `Dockerfile` and build the Docker image:
```bash
docker build -t your_image_name .
```
2. **Run the Docker Container with GPU Support**
Start the container with GPU access enabled:
```bash
docker run --gpus all -p 8080:8080 your_image_name
```
## Example API Call
```python
import requests
# Define the base URL of your API
base_url = 'http://localhost:8080/Get_Inference'
# Set up the query parameters
params = {
'text': 'Hello world',
'lang': 'english',
'speaker': 'english_female'
}
# Send the GET request
response = requests.get(base_url, params=params)
# Check if the request was successful
if response.status_code == 200:
# Save the audio content to a file
with open('output.wav', 'wb') as f:
f.write(response.content)
print("Audio saved as 'output.wav'")
else:
# Print the error message
print(f"Request failed with status code {response.status_code}")
print("Response:", response.text)
```