Spaces:
Sleeping
Sleeping
File size: 4,569 Bytes
d7b937e 332ab08 ec6a5b1 332ab08 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
---
title: TTS API
emoji: π
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
---
# Text-to-Speech API π€
A public Text-to-Speech API built with FastAPI and Microsoft Edge TTS, optimized for Hugging Face Spaces deployment.
## π Features
- **Convert text to natural-sounding speech** using Microsoft Edge TTS
- **Multiple voice options** with different languages and accents
- **Customizable speech parameters** (pitch and rate adjustment)
- **RESTful API** with automatic OpenAPI documentation
- **Public access** with CORS enabled
- **Real-time audio generation** and streaming
## π API Documentation
Once deployed, visit the root URL to access the interactive API documentation (Swagger UI).
## π§ API Endpoints
### Core Endpoints
- `GET /` - API information and documentation links
- `GET /health` - Health check endpoint
- `GET /voices` - List all available voices
- `POST /synthesize` - Convert text to speech (JSON)
- `POST /synthesize-form` - Convert text to speech (Form data)
### Example Usage
#### Using cURL with JSON:
```bash
curl -X POST 'https://your-space-url/synthesize' \
-H 'Content-Type: application/json' \
-d '{
"text": "Hello from Hugging Face Spaces!",
"voice": "en-GB-SoniaNeural",
"pitch": "-10Hz",
"rate": "+15%"
}' \
--output speech.mp3
```
#### Using cURL with Form Data:
```bash
curl -X POST 'https://your-space-url/synthesize-form' \
-F 'text=Hello World!' \
-F 'voice=en-US-AriaNeural' \
-F 'pitch=+5Hz' \
-F 'rate=+10%' \
--output speech.mp3
```
#### Using Python requests:
```python
import requests
response = requests.post(
'https://your-space-url/synthesize',
json={
'text': 'Hello from Python!',
'voice': 'en-US-AriaNeural',
'pitch': '+0Hz',
'rate': '+0%'
}
)
with open('speech.mp3', 'wb') as f:
f.write(response.content)
```
## π Parameters
### Request Parameters
| Parameter | Type | Default | Description | Example |
|-----------|------|---------|-------------|---------|
| `text` | string | required | Text to convert to speech | "Hello World!" |
| `voice` | string | "en-US-AriaNeural" | Voice identifier | "en-GB-SoniaNeural" |
| `pitch` | string | "+0Hz" | Pitch adjustment | "+10Hz", "-15Hz" |
| `rate` | string | "+0%" | Rate adjustment | "+20%", "-10%" |
### Voice Examples
- `en-US-AriaNeural` - US English, Female
- `en-GB-SoniaNeural` - UK English, Female
- `en-AU-NatashaNeural` - Australian English, Female
- `de-DE-KatjaNeural` - German, Female
- `fr-FR-DeniseNeural` - French, Female
- `es-ES-ElviraNeural` - Spanish, Female
*Use the `/voices` endpoint to get the complete list of available voices.*
### Parameter Ranges
- **Pitch**: -50Hz to +50Hz (e.g., "-25Hz", "+0Hz", "+30Hz")
- **Rate**: -50% to +50% (e.g., "-20%", "+0%", "+25%")
## π οΈ Local Development
### Installation
1. Clone the repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the server:
```bash
python app.py
```
4. Open http://localhost:7860 for API documentation
### Docker Deployment
```bash
# Build the image
docker build -t tts-api .
# Run the container
docker run -p 7860:7860 tts-api
```
## π Hugging Face Spaces Deployment
1. Create a new Space on Hugging Face
2. Choose "Docker" as the SDK
3. Upload the following files:
- `app.py` (main application)
- `requirements.txt` (dependencies)
- `Dockerfile` (container configuration)
- `README.md` (this file)
4. Your API will be publicly accessible once deployed!
## π Response Format
### Successful Response
- **Content-Type**: `audio/mpeg`
- **Body**: MP3 audio file
### Error Response
```json
{
"detail": "Error description"
}
```
## π Rate Limiting & Usage
This is a public API, but please use it responsibly:
- Maximum text length: 5,000 characters
- Recommended: Don't exceed 100 requests per minute
- For production use, consider implementing authentication
## π Troubleshooting
### Common Issues
1. **Voice not found**: Use the `/voices` endpoint to check available voices
2. **Invalid parameters**: Check pitch/rate format (must include Hz/% suffix)
3. **Text too long**: Maximum 5,000 characters per request
4. **Network timeout**: Large texts may take longer to process
## π License
This project uses Microsoft Edge TTS service. Please review Microsoft's terms of service for usage guidelines.
## π€ Contributing
Feel free to open issues or submit pull requests to improve this API!
---
**Made with β€οΈ for the Hugging Face community** |