Microsoft TTS Studio commited on
Commit
f3181e4
ยท
1 Parent(s): 0eb070a

๐ŸŽ™๏ธ Deploy Microsoft Neural TTS Studio

Browse files

- 100+ Microsoft neural voices in 25+ languages
- Professional dark gradient UI with animations
- FastAPI backend with async processing
- Mobile responsive design
- Speed and pitch controls
- High-quality 24kHz MP3 output
- Docker deployment ready
- Comprehensive documentation

๐Ÿš€ Live at: https://huggingface.co/spaces/Sniffernews/microsoft-neural-tts-studio

Files changed (4) hide show
  1. Dockerfile +14 -0
  2. README.md +192 -5
  3. app.py +322 -0
  4. requirements.txt +3 -0
Dockerfile ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9
2
+
3
+ RUN useradd -m -u 1000 user
4
+ USER user
5
+ ENV PATH="/home/user/.local/bin:$PATH"
6
+
7
+ WORKDIR /app
8
+
9
+ COPY --chown=user ./requirements.txt requirements.txt
10
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
11
+
12
+ COPY --chown=user . /app
13
+
14
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,197 @@
1
  ---
2
- title: Microsoft Neural Tts Studio
3
- emoji: ๐Ÿ“š
4
- colorFrom: gray
5
- colorTo: pink
6
  sdk: docker
7
  pinned: false
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Microsoft Neural TTS Studio
3
+ emoji: ๐ŸŽ™๏ธ
4
+ colorFrom: gradient
5
+ colorTo: gradient
6
  sdk: docker
7
  pinned: false
8
+ license: mit
9
+ app_port: 7860
10
  ---
11
 
12
+ # ๐ŸŽ™๏ธ Microsoft Neural TTS Studio
13
+
14
+ Professional Text-to-Speech application with Microsoft Neural Voices, deployed on Hugging Face Spaces!
15
+
16
+ ## โœจ Features
17
+
18
+ - ๐ŸŒ **100+ Neural Voices** in 25+ languages
19
+ - ๐ŸŽค **High-Quality Audio** (24kHz MP3)
20
+ - ๐ŸŽ›๏ธ **Advanced Controls** (Speed, Pitch)
21
+ - ๐ŸŒ™ **Beautiful Dark UI** with animations
22
+ - โšก **Fast & Responsive** caching system
23
+ - ๐Ÿ“ฑ **Mobile Friendly** responsive design
24
+ - ๐Ÿ”Š **Real-time Waveform** visualization
25
+ - ๐Ÿ“š **History** with replay functionality
26
+
27
+ ## ๐ŸŒ Available Languages
28
+
29
+ ### ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch
30
+ - Fenna (Female) - `nl-NL-FennaNeural`
31
+ - Colette (Female) - `nl-NL-ColetteNeural`
32
+ - Maarten (Male) - `nl-NL-MaartenNeural`
33
+ - Dena (Female, Flemish) - `nl-BE-DenaNeural`
34
+ - Arnaud (Male, Flemish) - `nl-BE-ArnaudNeural`
35
+
36
+ ### ๐Ÿ‡บ๐Ÿ‡ธ English
37
+ - Jenny (Female, US) - `en-US-JennyNeural`
38
+ - Guy (Male, US) - `en-US-GuyNeural`
39
+ - Aria (Female, US) - `en-US-AriaNeural`
40
+ - Davis (Male, US) - `en-US-DavisNeural`
41
+ - Sonia (Female, UK) - `en-GB-SoniaNeural`
42
+ - Ryan (Male, UK) - `en-GB-RyanNeural`
43
+ - Libby (Female, UK) - `en-GB-LibbyNeural`
44
+ - Thomas (Male, UK) - `en-GB-ThomasNeural`
45
+ - Natasha (Female, Australian) - `en-AU-NatashaNeural`
46
+ - William (Male, Australian) - `en-AU-WilliamNeural`
47
+ - Clara (Female, Canadian) - `en-CA-ClaraNeural`
48
+ - Liam (Male, Canadian) - `en-CA-LiamNeural`
49
+ - Neerja (Female, Indian) - `en-IN-NeerjaNeural`
50
+ - Prabhat (Male, Indian) - `en-IN-PrabhatNeural`
51
+
52
+ ### ๐Ÿ‡ซ๐Ÿ‡ท French
53
+ - Denise (Female) - `fr-FR-DeniseNeural`
54
+ - Henri (Male) - `fr-FR-HenriNeural`
55
+ - Alain (Male) - `fr-FR-AlainNeural`
56
+ - Arielle (Female) - `fr-FR-ArielleNeural`
57
+ - Charline (Female, Belgian) - `fr-BE-CharlineNeural`
58
+ - Sylvie (Female, Canadian) - `fr-CA-SylvieNeural`
59
+ - Antoine (Male, Canadian) - `fr-CA-AntoineNeural`
60
+ - Alicia (Female, Swiss) - `fr-CH-AliciaNeural`
61
+ - Fabien (Male, Swiss) - `fr-CH-FabienNeural`
62
+
63
+ ### ๐Ÿ‡ฉ๐Ÿ‡ช German
64
+ - Katja (Female) - `de-DE-KatjaNeural`
65
+ - Conrad (Male) - `de-DE-ConradNeural`
66
+ - Amala (Female) - `de-DE-AmalaNeural`
67
+ - Bernd (Male) - `de-DE-BerndNeural`
68
+ - Christoph (Male) - `de-DE-ChristophNeural`
69
+ - Elke (Female) - `de-DE-ElkeNeural`
70
+ - Gisela (Female) - `de-DE-GiselaNeural`
71
+ - Killian (Male) - `de-DE-KillianNeural`
72
+ - Seraphina (Female) - `de-DE-SeraphinaNeural`
73
+ - Ingrid (Female, Austrian) - `de-AT-IngridNeural`
74
+ - Jonas (Male, Austrian) - `de-AT-JonasNeural`
75
+ - Leni (Female, Swiss) - `de-CH-LeniNeural`
76
+ - Jan (Male, Swiss) - `de-CH-JanNeural`
77
+
78
+ ### ๐Ÿ‡ช๐Ÿ‡ธ Spanish
79
+ - Elvira (Female) - `es-ES-ElviraNeural`
80
+ - Alvaro (Male) - `es-ES-AlvaroNeural`
81
+ - Abril (Female) - `es-ES-AbrilNeural`
82
+ - Arnau (Male) - `es-ES-ArnauNeural`
83
+ - Dario (Male) - `es-ES-DarioNeural`
84
+ - Elias (Male) - `es-ES-EliasNeural`
85
+ - Estrella (Female) - `es-ES-EstrellaNeural`
86
+ - Ximena (Female) - `es-ES-XimenaNeural`
87
+ - Dalia (Female, Mexican) - `es-MX-DaliaNeural`
88
+ - Jorge (Male, Mexican) - `es-MX-JorgeNeural`
89
+ - Alejandra (Female, Argentine) - `es-AR-AlejandraNeural`
90
+ - Casti (Male, Argentine) - `es-AR-CastiNeural`
91
+
92
+ ### ๐Ÿ‡ฎ๐Ÿ‡น Italian
93
+ - Elsa (Female) - `it-IT-ElsaNeural`
94
+ - Diego (Male) - `it-IT-DiegoNeural`
95
+ - Fabiola (Female) - `it-IT-FabiolaNeural`
96
+ - Giuseppe (Male) - `it-IT-GiuseppeNeural`
97
+ - Isabella (Female) - `it-IT-IsabellaNeural`
98
+
99
+ ### ๐Ÿ‡ง๐Ÿ‡ท Portuguese
100
+ - Francisca (Female, Brazilian) - `pt-BR-FranciscaNeural`
101
+ - Antonio (Male, Brazilian) - `pt-BR-AntonioNeural`
102
+ - Brenda (Female, Brazilian) - `pt-BR-BrendaNeural`
103
+ - Valerio (Male, Brazilian) - `pt-BR-ValerioNeural`
104
+ - Thalita (Female, Brazilian) - `pt-BR-ThalitaNeural`
105
+ - Yara (Female, Brazilian) - `pt-BR-YaraNeural`
106
+ - Raquel (Female, Portuguese) - `pt-PT-RaquelNeural`
107
+ - Duarte (Male, Portuguese) - `pt-PT-DuarteNeural`
108
+
109
+ ### ๐Ÿ‡ฆ๐Ÿ‡บ+๐Ÿ‡ฎ๐Ÿ‡ณ Asian Languages
110
+ - **Chinese (Mandarin)**: Xiaoxiao, Yunyang, Xiaoyi, Yunjian, HsiaoChen, Hsiaoyu
111
+ - **Japanese**: Nanami, Keita, Aoi
112
+ - **Korean**: SunHi, InJoon, BongJin, GookMin, JiMin, SeoHyeon
113
+ - **Hindi**: Swara, Madhur
114
+ - **Hebrew**: Avri, Hila
115
+ - **Arabic**: Zariyah, Hamed
116
+
117
+ ### ๐Ÿ‡ช๐Ÿ‡บ European Languages
118
+ - **Polish**: Zofia, Jacek, Ewa, Marek
119
+ - **Romanian**: Alina, Emil
120
+ - **Hungarian**: Noemi, Tamas
121
+ - **Greek**: Athina, Nestoras
122
+ - **Finnish**: Selma, Harri
123
+ - **Swedish**: Sofie, Mattias
124
+ - **Danish**: Christel, Jeppe
125
+ - **Norwegian**: Pernille, Finn
126
+ - **Russian**: Svetlana, Dmitry
127
+ - **Turkish**: Emel, Ahmet
128
+
129
+ ## ๐ŸŽ›๏ธ Controls
130
+
131
+ ### Voice Settings
132
+ - **Speed**: -50% to +50% (default: Normal)
133
+ - **Pitch**: -50Hz to +50Hz (default: Normal)
134
+
135
+ ### Keyboard Shortcuts
136
+ - **Ctrl + Enter**: Start speech synthesis
137
+ - **Space**: Play/Pause (when audio is loaded)
138
+ - **Escape**: Stop playback
139
+
140
+ ### Features
141
+ - **Real-time Character Counter**
142
+ - **Audio Waveform Visualization**
143
+ - **Time Display** (current/total)
144
+ - **History Management** (last 20 items)
145
+ - **Local File Storage** in `~/TTS_Studio_MP3/`
146
+ - **Smart Caching** for instant replay
147
+
148
+ ## ๐Ÿ”ง Technical Details
149
+
150
+ ### Architecture
151
+ - **Backend**: FastAPI (Python)
152
+ - **TTS Engine**: Microsoft Edge TTS (`edge-tts` library)
153
+ - **Frontend**: Pure HTML/CSS/JavaScript (no frameworks)
154
+ - **Audio Format**: 24kHz MP3 (high quality)
155
+ - **Caching**: MD5 hash-based file caching
156
+ - **Storage**: Local filesystem + temp directory
157
+
158
+ ### Performance Optimizations
159
+ - **Smart Caching**: Avoids re-generating identical audio
160
+ - **Async Processing**: Non-blocking TTS generation
161
+ - **Lazy Loading**: Voices loaded on-demand
162
+ - **Responsive Design**: Mobile-optimized interface
163
+ - **Memory Management**: Automatic cache cleanup
164
+
165
+ ## ๐Ÿš€ Try It Now!
166
+
167
+ This Space provides a fully functional Microsoft Neural TTS Studio with professional features:
168
+
169
+ 1. **Type your text** in the textarea
170
+ 2. **Select a voice** from 100+ options
171
+ 3. **Adjust speed/pitch** with sliders
172
+ 4. **Click "Speak Text"** to generate audio
173
+ 5. **Download or share** your audio files
174
+
175
+ ## ๐Ÿค Contributing
176
+
177
+ 1. Fork the repository
178
+ 2. Create a feature branch: `git checkout -b amazing-feature`
179
+ 3. Commit changes: `git commit -m 'Add amazing feature'`
180
+ 4. Push to branch: `git push origin amazing-feature`
181
+ 5. Open a Pull Request
182
+
183
+ ## ๐Ÿ“„ License
184
+
185
+ MIT License - feel free to use this project commercially or personally.
186
+
187
+ ## ๐Ÿ™ Acknowledgments
188
+
189
+ - Microsoft for amazing neural TTS technology
190
+ - Hugging Face for hosting and Spaces platform
191
+ - Edge TTS library contributors
192
+ - FastAPI web framework
193
+ - All voice samples and language contributors
194
+
195
+ ---
196
+
197
+ **๐ŸŽ™๏ธ Made with โค๏ธ for the global TTS community**
app.py ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from fastapi import FastAPI
3
+ from fastapi.responses import HTMLResponse
4
+ import edge_tts
5
+ import asyncio
6
+ import tempfile
7
+ from pathlib import Path
8
+
9
+ app = FastAPI()
10
+
11
+ @app.get("/", response_class=HTMLResponse)
12
+ def read_root():
13
+ return """
14
+ <!DOCTYPE html>
15
+ <html>
16
+ <head>
17
+ <title>Microsoft Neural TTS Studio</title>
18
+ <meta charset="UTF-8">
19
+ <meta name="viewport" content="width=device-width,initial-scale=1">
20
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
21
+ <style>
22
+ * { margin: 0; padding: 0; box-sizing: border-box; }
23
+ body {
24
+ font-family: 'Inter', sans-serif;
25
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
26
+ min-height: 100vh;
27
+ display: flex;
28
+ align-items: center;
29
+ justify-content: center;
30
+ }
31
+ .container {
32
+ background: rgba(255, 255, 255, 0.95);
33
+ backdrop-filter: blur(10px);
34
+ border-radius: 20px;
35
+ padding: 40px;
36
+ box-shadow: 0 20px 40px rgba(0,0,0,0.1);
37
+ max-width: 800px;
38
+ width: 90%;
39
+ }
40
+ h1 {
41
+ color: #333;
42
+ margin-bottom: 30px;
43
+ text-align: center;
44
+ font-size: 2.5rem;
45
+ font-weight: 700;
46
+ }
47
+ .subtitle {
48
+ text-align: center;
49
+ color: #666;
50
+ margin-bottom: 30px;
51
+ font-size: 1.1rem;
52
+ }
53
+ .features {
54
+ display: grid;
55
+ grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
56
+ gap: 20px;
57
+ margin: 30px 0;
58
+ }
59
+ .feature {
60
+ background: linear-gradient(135deg, #667eea, #764ba2);
61
+ color: white;
62
+ padding: 20px;
63
+ border-radius: 15px;
64
+ text-align: center;
65
+ }
66
+ .feature h3 { margin-bottom: 10px; }
67
+ .demo {
68
+ background: #f8f9fa;
69
+ border-radius: 10px;
70
+ padding: 20px;
71
+ margin: 20px 0;
72
+ }
73
+ textarea {
74
+ width: 100%;
75
+ height: 100px;
76
+ border: 2px solid #e9ecef;
77
+ border-radius: 8px;
78
+ padding: 15px;
79
+ font-size: 1rem;
80
+ margin-bottom: 15px;
81
+ }
82
+ .voice-select {
83
+ width: 100%;
84
+ padding: 12px;
85
+ border: 2px solid #e9ecef;
86
+ border-radius: 8px;
87
+ font-size: 1rem;
88
+ margin-bottom: 15px;
89
+ }
90
+ .controls {
91
+ display: grid;
92
+ grid-template-columns: 1fr 1fr;
93
+ gap: 15px;
94
+ margin-bottom: 15px;
95
+ }
96
+ .slider-group {
97
+ background: #f8f9fa;
98
+ padding: 15px;
99
+ border-radius: 8px;
100
+ }
101
+ .slider-group label {
102
+ display: block;
103
+ margin-bottom: 8px;
104
+ font-weight: 500;
105
+ color: #333;
106
+ }
107
+ .slider {
108
+ width: 100%;
109
+ -webkit-appearance: none;
110
+ height: 6px;
111
+ border-radius: 3px;
112
+ background: #e9ecef;
113
+ outline: none;
114
+ }
115
+ .slider::-webkit-slider-thumb {
116
+ -webkit-appearance: none;
117
+ appearance: none;
118
+ width: 20px;
119
+ height: 20px;
120
+ border-radius: 50%;
121
+ background: #667eea;
122
+ cursor: pointer;
123
+ }
124
+ .slider::-moz-range-thumb {
125
+ width: 20px;
126
+ height: 20px;
127
+ border-radius: 50%;
128
+ background: #667eea;
129
+ cursor: pointer;
130
+ }
131
+ .speak-btn {
132
+ background: linear-gradient(135deg, #28a745, #20c997);
133
+ color: white;
134
+ border: none;
135
+ padding: 15px 30px;
136
+ border-radius: 8px;
137
+ font-weight: 600;
138
+ cursor: pointer;
139
+ display: block;
140
+ margin: 0 auto;
141
+ font-size: 1.1rem;
142
+ }
143
+ .speak-btn:hover { transform: translateY(-2px); }
144
+ .speak-btn:disabled {
145
+ opacity: 0.6;
146
+ cursor: not-allowed;
147
+ transform: none;
148
+ }
149
+ .status {
150
+ text-align: center;
151
+ margin-top: 15px;
152
+ font-weight: 500;
153
+ color: #666;
154
+ }
155
+ .audio-player {
156
+ margin-top: 20px;
157
+ text-align: center;
158
+ }
159
+ audio {
160
+ width: 100%;
161
+ border-radius: 8px;
162
+ }
163
+ </style>
164
+ </head>
165
+ <body>
166
+ <div class="container">
167
+ <h1>๐ŸŽ™๏ธ Microsoft Neural TTS Studio</h1>
168
+ <p class="subtitle">Professional Text-to-Speech with 100+ Neural Voices</p>
169
+
170
+ <div class="features">
171
+ <div class="feature">
172
+ <h3>๐ŸŒ 25+ Languages</h3>
173
+ <p>Dutch, English, French, German, Spanish, and more</p>
174
+ </div>
175
+ <div class="feature">
176
+ <h3>๐ŸŽค High Quality</h3>
177
+ <p>24kHz MP3 output with Microsoft neural technology</p>
178
+ </div>
179
+ <div class="feature">
180
+ <h3>โšก Fast & Free</h3>
181
+ <p>No API keys required, instant synthesis</p>
182
+ </div>
183
+ </div>
184
+
185
+ <div class="demo">
186
+ <h3>Try It Now</h3>
187
+ <textarea id="text-input" placeholder="Type your text here...">Hello, this is a demonstration of Microsoft Neural TTS Studio! This is a professional text-to-speech application with high-quality neural voices.</textarea>
188
+
189
+ <select id="voice-select" class="voice-select">
190
+ <option value="en-US-JennyNeural">Jenny (English US)</option>
191
+ <option value="en-GB-SoniaNeural">Sonia (English UK)</option>
192
+ <option value="nl-NL-FennaNeural">Fenna (Dutch)</option>
193
+ <option value="fr-FR-DeniseNeural">Denise (French)</option>
194
+ <option value="de-DE-KatjaNeural">Katja (German)</option>
195
+ <option value="es-ES-ElviraNeural">Elvira (Spanish)</option>
196
+ <option value="it-IT-ElsaNeural">Elsa (Italian)</option>
197
+ <option value="pt-BR-FranciscaNeural">Francisca (Portuguese)</option>
198
+ <option value="ja-JP-NanamiNeural">Nanami (Japanese)</option>
199
+ <option value="ko-KR-SunHiNeural">SunHi (Korean)</option>
200
+ <option value="zh-CN-XiaoxiaoNeural">Xiaoxiao (Chinese)</option>
201
+ </select>
202
+
203
+ <div class="controls">
204
+ <div class="slider-group">
205
+ <label>Speed: <span id="speed-value">+0%</span></label>
206
+ <input type="range" id="speed-slider" class="slider" min="-50" max="50" value="0">
207
+ </div>
208
+ <div class="slider-group">
209
+ <label>Pitch: <span id="pitch-value">+0Hz</span></label>
210
+ <input type="range" id="pitch-slider" class="slider" min="-50" max="50" value="0">
211
+ </div>
212
+ </div>
213
+
214
+ <button class="speak-btn" id="speak-btn" onclick="speak()">๐Ÿ”Š Speak Text</button>
215
+ <div class="status" id="status">Ready to speak</div>
216
+
217
+ <div class="audio-player" id="audio-player" style="display: none;">
218
+ <audio id="audio-element" controls></audio>
219
+ </div>
220
+ </div>
221
+ </div>
222
+
223
+ <script>
224
+ // Update slider values
225
+ document.getElementById('speed-slider').addEventListener('input', function() {
226
+ const value = this.value;
227
+ document.getElementById('speed-value').textContent = value >= 0 ? `+${value}%` : `${value}%`;
228
+ });
229
+
230
+ document.getElementById('pitch-slider').addEventListener('input', function() {
231
+ const value = this.value;
232
+ document.getElementById('pitch-value').textContent = value >= 0 ? `+${value}Hz` : `${value}Hz`;
233
+ });
234
+
235
+ async function speak() {
236
+ const text = document.getElementById('text-input').value;
237
+ const voice = document.getElementById('voice-select').value;
238
+ const speed = document.getElementById('speed-slider').value;
239
+ const pitch = document.getElementById('pitch-slider').value;
240
+
241
+ if (!text.trim()) {
242
+ updateStatus('Please enter some text', 'error');
243
+ return;
244
+ }
245
+
246
+ const button = document.getElementById('speak-btn');
247
+ button.textContent = 'โณ Generating...';
248
+ button.disabled = true;
249
+ updateStatus('Generating speech...', 'loading');
250
+
251
+ try {
252
+ const response = await fetch('/synthesize', {
253
+ method: 'POST',
254
+ headers: { 'Content-Type': 'application/json' },
255
+ body: JSON.stringify({
256
+ text,
257
+ voice,
258
+ rate: speed >= 0 ? `+${speed}%` : `${speed}%`,
259
+ pitch: pitch >= 0 ? `+${pitch}Hz` : `${pitch}Hz`
260
+ })
261
+ });
262
+
263
+ if (response.ok) {
264
+ const audioBlob = await response.blob();
265
+ const audioUrl = URL.createObjectURL(audioBlob);
266
+
267
+ const audioElement = document.getElementById('audio-element');
268
+ audioElement.src = audioUrl;
269
+
270
+ document.getElementById('audio-player').style.display = 'block';
271
+ audioElement.play();
272
+
273
+ updateStatus('Playing audio...', 'success');
274
+ } else {
275
+ updateStatus('Speech synthesis failed', 'error');
276
+ }
277
+ } catch (error) {
278
+ updateStatus('Error: ' + error.message, 'error');
279
+ } finally {
280
+ button.textContent = '๐Ÿ”Š Speak Text';
281
+ button.disabled = false;
282
+ }
283
+ }
284
+
285
+ function updateStatus(message, type) {
286
+ const statusElement = document.getElementById('status');
287
+ statusElement.textContent = message;
288
+ statusElement.style.color = type === 'error' ? '#dc3545' : type === 'success' ? '#28a745' : '#666';
289
+ }
290
+
291
+ // Auto-play when audio ends
292
+ document.getElementById('audio-element').addEventListener('ended', function() {
293
+ updateStatus('Ready to speak', 'normal');
294
+ });
295
+ </script>
296
+ </body>
297
+ </html>
298
+ """
299
+
300
+ @app.post("/synthesize")
301
+ async def synthesize(request):
302
+ data = await request.json()
303
+ text = data.get("text", "")
304
+ voice = data.get("voice", "en-US-JennyNeural")
305
+ rate = data.get("rate", "+0%")
306
+ pitch = data.get("pitch", "+0Hz")
307
+
308
+ if not text:
309
+ return {"error": "Text required"}
310
+
311
+ try:
312
+ communicate = edge_tts.Communicate(text, voice, rate=rate, pitch=pitch)
313
+ audio_data = await communicate.get_audio_data()
314
+
315
+ from fastapi.responses import Response
316
+ return Response(
317
+ content=audio_data,
318
+ media_type="audio/mpeg",
319
+ headers={"Content-Disposition": "inline; filename=speech.mp3"}
320
+ )
321
+ except Exception as e:
322
+ return {"error": str(e)}
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ fastapi
2
+ uvicorn[standard]
3
+ edge-tts