File size: 4,569 Bytes
d7b937e
 
 
 
 
 
 
 
 
 
 
332ab08
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec6a5b1
 
332ab08
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
title: TTS API
emoji: πŸ†
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
---



# Text-to-Speech API 🎀

A public Text-to-Speech API built with FastAPI and Microsoft Edge TTS, optimized for Hugging Face Spaces deployment.

## πŸš€ Features

- **Convert text to natural-sounding speech** using Microsoft Edge TTS
- **Multiple voice options** with different languages and accents
- **Customizable speech parameters** (pitch and rate adjustment)
- **RESTful API** with automatic OpenAPI documentation
- **Public access** with CORS enabled
- **Real-time audio generation** and streaming

## πŸ“– API Documentation

Once deployed, visit the root URL to access the interactive API documentation (Swagger UI).

## πŸ”§ API Endpoints

### Core Endpoints

- `GET /` - API information and documentation links
- `GET /health` - Health check endpoint
- `GET /voices` - List all available voices
- `POST /synthesize` - Convert text to speech (JSON)
- `POST /synthesize-form` - Convert text to speech (Form data)

### Example Usage

#### Using cURL with JSON:
```bash
curl -X POST 'https://your-space-url/synthesize' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Hello from Hugging Face Spaces!",
    "voice": "en-GB-SoniaNeural",
    "pitch": "-10Hz",
    "rate": "+15%"
  }' \
  --output speech.mp3
```

#### Using cURL with Form Data:
```bash
curl -X POST 'https://your-space-url/synthesize-form' \
  -F 'text=Hello World!' \
  -F 'voice=en-US-AriaNeural' \
  -F 'pitch=+5Hz' \
  -F 'rate=+10%' \
  --output speech.mp3
```

#### Using Python requests:
```python
import requests

response = requests.post(
    'https://your-space-url/synthesize',
    json={
        'text': 'Hello from Python!',
        'voice': 'en-US-AriaNeural',
        'pitch': '+0Hz',
        'rate': '+0%'
    }
)

with open('speech.mp3', 'wb') as f:
    f.write(response.content)
```

## πŸ“ Parameters

### Request Parameters

| Parameter | Type | Default | Description | Example |
|-----------|------|---------|-------------|---------|
| `text` | string | required | Text to convert to speech | "Hello World!" |
| `voice` | string | "en-US-AriaNeural" | Voice identifier | "en-GB-SoniaNeural" |
| `pitch` | string | "+0Hz" | Pitch adjustment | "+10Hz", "-15Hz" |
| `rate` | string | "+0%" | Rate adjustment | "+20%", "-10%" |

### Voice Examples

- `en-US-AriaNeural` - US English, Female
- `en-GB-SoniaNeural` - UK English, Female  
- `en-AU-NatashaNeural` - Australian English, Female
- `de-DE-KatjaNeural` - German, Female
- `fr-FR-DeniseNeural` - French, Female
- `es-ES-ElviraNeural` - Spanish, Female

*Use the `/voices` endpoint to get the complete list of available voices.*

### Parameter Ranges

- **Pitch**: -50Hz to +50Hz (e.g., "-25Hz", "+0Hz", "+30Hz")
- **Rate**: -50% to +50% (e.g., "-20%", "+0%", "+25%")

## πŸ› οΈ Local Development

### Installation

1. Clone the repository
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
3. Run the server:
   ```bash
   python app.py
   ```
4. Open http://localhost:7860 for API documentation

### Docker Deployment

```bash
# Build the image
docker build -t tts-api .

# Run the container
docker run -p 7860:7860 tts-api
```

## 🌐 Hugging Face Spaces Deployment

1. Create a new Space on Hugging Face
2. Choose "Docker" as the SDK
3. Upload the following files:
   - `app.py` (main application)
   - `requirements.txt` (dependencies)
   - `Dockerfile` (container configuration)
   - `README.md` (this file)
4. Your API will be publicly accessible once deployed!

## πŸ“‹ Response Format

### Successful Response
- **Content-Type**: `audio/mpeg`
- **Body**: MP3 audio file

### Error Response
```json
{
  "detail": "Error description"
}
```

## πŸ”’ Rate Limiting & Usage

This is a public API, but please use it responsibly:
- Maximum text length: 5,000 characters
- Recommended: Don't exceed 100 requests per minute
- For production use, consider implementing authentication

## πŸ› Troubleshooting

### Common Issues

1. **Voice not found**: Use the `/voices` endpoint to check available voices
2. **Invalid parameters**: Check pitch/rate format (must include Hz/% suffix)
3. **Text too long**: Maximum 5,000 characters per request
4. **Network timeout**: Large texts may take longer to process

## πŸ“„ License

This project uses Microsoft Edge TTS service. Please review Microsoft's terms of service for usage guidelines.

## 🀝 Contributing

Feel free to open issues or submit pull requests to improve this API!

---

**Made with ❀️ for the Hugging Face community**