Spaces:

NitinBot001
/

TTS-API

Sleeping

File size: 4,569 Bytes

---
title: TTS API
emoji: 🏆
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
---



# Text-to-Speech API 🎤

A public Text-to-Speech API built with FastAPI and Microsoft Edge TTS, optimized for Hugging Face Spaces deployment.

## 🚀 Features

- **Convert text to natural-sounding speech** using Microsoft Edge TTS
- **Multiple voice options** with different languages and accents
- **Customizable speech parameters** (pitch and rate adjustment)
- **RESTful API** with automatic OpenAPI documentation
- **Public access** with CORS enabled
- **Real-time audio generation** and streaming

## 📖 API Documentation

Once deployed, visit the root URL to access the interactive API documentation (Swagger UI).

## 🔧 API Endpoints

### Core Endpoints

- `GET /` - API information and documentation links
- `GET /health` - Health check endpoint
- `GET /voices` - List all available voices
- `POST /synthesize` - Convert text to speech (JSON)
- `POST /synthesize-form` - Convert text to speech (Form data)

### Example Usage

#### Using cURL with JSON:
```bash
curl -X POST 'https://your-space-url/synthesize' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Hello from Hugging Face Spaces!",
    "voice": "en-GB-SoniaNeural",
    "pitch": "-10Hz",
    "rate": "+15%"
  }' \
  --output speech.mp3
```

#### Using cURL with Form Data:
```bash
curl -X POST 'https://your-space-url/synthesize-form' \
  -F 'text=Hello World!' \
  -F 'voice=en-US-AriaNeural' \
  -F 'pitch=+5Hz' \
  -F 'rate=+10%' \
  --output speech.mp3
```

#### Using Python requests:
```python
import requests

response = requests.post(
    'https://your-space-url/synthesize',
    json={
        'text': 'Hello from Python!',
        'voice': 'en-US-AriaNeural',
        'pitch': '+0Hz',
        'rate': '+0%'
    }
)

with open('speech.mp3', 'wb') as f:
    f.write(response.content)
```

## 📝 Parameters

### Request Parameters

| Parameter | Type | Default | Description | Example |
|-----------|------|---------|-------------|---------|
| `text` | string | required | Text to convert to speech | "Hello World!" |
| `voice` | string | "en-US-AriaNeural" | Voice identifier | "en-GB-SoniaNeural" |
| `pitch` | string | "+0Hz" | Pitch adjustment | "+10Hz", "-15Hz" |
| `rate` | string | "+0%" | Rate adjustment | "+20%", "-10%" |

### Voice Examples

- `en-US-AriaNeural` - US English, Female
- `en-GB-SoniaNeural` - UK English, Female  
- `en-AU-NatashaNeural` - Australian English, Female
- `de-DE-KatjaNeural` - German, Female
- `fr-FR-DeniseNeural` - French, Female
- `es-ES-ElviraNeural` - Spanish, Female

*Use the `/voices` endpoint to get the complete list of available voices.*

### Parameter Ranges

- **Pitch**: -50Hz to +50Hz (e.g., "-25Hz", "+0Hz", "+30Hz")
- **Rate**: -50% to +50% (e.g., "-20%", "+0%", "+25%")

## 🛠️ Local Development

### Installation

1. Clone the repository
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
3. Run the server:
   ```bash
   python app.py
   ```
4. Open http://localhost:7860 for API documentation

### Docker Deployment

```bash
# Build the image
docker build -t tts-api .

# Run the container
docker run -p 7860:7860 tts-api
```

## 🌐 Hugging Face Spaces Deployment

1. Create a new Space on Hugging Face
2. Choose "Docker" as the SDK
3. Upload the following files:
   - `app.py` (main application)
   - `requirements.txt` (dependencies)
   - `Dockerfile` (container configuration)
   - `README.md` (this file)
4. Your API will be publicly accessible once deployed!

## 📋 Response Format

### Successful Response
- **Content-Type**: `audio/mpeg`
- **Body**: MP3 audio file

### Error Response
```json
{
  "detail": "Error description"
}
```

## 🔒 Rate Limiting & Usage

This is a public API, but please use it responsibly:
- Maximum text length: 5,000 characters
- Recommended: Don't exceed 100 requests per minute
- For production use, consider implementing authentication

## 🐛 Troubleshooting

### Common Issues

1. **Voice not found**: Use the `/voices` endpoint to check available voices
2. **Invalid parameters**: Check pitch/rate format (must include Hz/% suffix)
3. **Text too long**: Maximum 5,000 characters per request
4. **Network timeout**: Large texts may take longer to process

## 📄 License

This project uses Microsoft Edge TTS service. Please review Microsoft's terms of service for usage guidelines.

## 🤝 Contributing

Feel free to open issues or submit pull requests to improve this API!

---

**Made with ❤️ for the Hugging Face community**