NitinBot001 commited on
Commit
332ab08
·
verified ·
1 Parent(s): ec6a5b1

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +20 -0
  2. README.md +171 -8
  3. app.py +262 -0
  4. requirements.txt +6 -0
Dockerfile ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ && rm -rf /var/lib/apt/lists/*
8
+
9
+ # Copy requirements and install Python dependencies
10
+ COPY requirements.txt .
11
+ RUN pip install --no-cache-dir -r requirements.txt
12
+
13
+ # Copy application code
14
+ COPY . .
15
+
16
+ # Expose port
17
+ EXPOSE 7860
18
+
19
+ # Run the application
20
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,173 @@
1
- ---
2
- title: TTS API
3
- emoji: 🏆
4
- colorFrom: green
5
- colorTo: purple
6
- sdk: docker
7
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # Text-to-Speech API 🎤
2
+
3
+ A public Text-to-Speech API built with FastAPI and Microsoft Edge TTS, optimized for Hugging Face Spaces deployment.
4
+
5
+ ## 🚀 Features
6
+
7
+ - **Convert text to natural-sounding speech** using Microsoft Edge TTS
8
+ - **Multiple voice options** with different languages and accents
9
+ - **Customizable speech parameters** (pitch and rate adjustment)
10
+ - **RESTful API** with automatic OpenAPI documentation
11
+ - **Public access** with CORS enabled
12
+ - **Real-time audio generation** and streaming
13
+
14
+ ## 📖 API Documentation
15
+
16
+ Once deployed, visit the root URL to access the interactive API documentation (Swagger UI).
17
+
18
+ ## 🔧 API Endpoints
19
+
20
+ ### Core Endpoints
21
+
22
+ - `GET /` - API information and documentation links
23
+ - `GET /health` - Health check endpoint
24
+ - `GET /voices` - List all available voices
25
+ - `POST /synthesize` - Convert text to speech (JSON)
26
+ - `POST /synthesize-form` - Convert text to speech (Form data)
27
+
28
+ ### Example Usage
29
+
30
+ #### Using cURL with JSON:
31
+ ```bash
32
+ curl -X POST 'https://your-space-url/synthesize' \
33
+ -H 'Content-Type: application/json' \
34
+ -d '{
35
+ "text": "Hello from Hugging Face Spaces!",
36
+ "voice": "en-GB-SoniaNeural",
37
+ "pitch": "-10Hz",
38
+ "rate": "+15%"
39
+ }' \
40
+ --output speech.mp3
41
+ ```
42
+
43
+ #### Using cURL with Form Data:
44
+ ```bash
45
+ curl -X POST 'https://your-space-url/synthesize-form' \
46
+ -F 'text=Hello World!' \
47
+ -F 'voice=en-US-AriaNeural' \
48
+ -F 'pitch=+5Hz' \
49
+ -F 'rate=+10%' \
50
+ --output speech.mp3
51
+ ```
52
+
53
+ #### Using Python requests:
54
+ ```python
55
+ import requests
56
+
57
+ response = requests.post(
58
+ 'https://your-space-url/synthesize',
59
+ json={
60
+ 'text': 'Hello from Python!',
61
+ 'voice': 'en-US-AriaNeural',
62
+ 'pitch': '+0Hz',
63
+ 'rate': '+0%'
64
+ }
65
+ )
66
+
67
+ with open('speech.mp3', 'wb') as f:
68
+ f.write(response.content)
69
+ ```
70
+
71
+ ## 📝 Parameters
72
+
73
+ ### Request Parameters
74
+
75
+ | Parameter | Type | Default | Description | Example |
76
+ |-----------|------|---------|-------------|---------|
77
+ | `text` | string | required | Text to convert to speech | "Hello World!" |
78
+ | `voice` | string | "en-US-AriaNeural" | Voice identifier | "en-GB-SoniaNeural" |
79
+ | `pitch` | string | "+0Hz" | Pitch adjustment | "+10Hz", "-15Hz" |
80
+ | `rate` | string | "+0%" | Rate adjustment | "+20%", "-10%" |
81
+
82
+ ### Voice Examples
83
+
84
+ - `en-US-AriaNeural` - US English, Female
85
+ - `en-GB-SoniaNeural` - UK English, Female
86
+ - `en-AU-NatashaNeural` - Australian English, Female
87
+ - `de-DE-KatjaNeural` - German, Female
88
+ - `fr-FR-DeniseNeural` - French, Female
89
+ - `es-ES-ElviraNeural` - Spanish, Female
90
+
91
+ *Use the `/voices` endpoint to get the complete list of available voices.*
92
+
93
+ ### Parameter Ranges
94
+
95
+ - **Pitch**: -50Hz to +50Hz (e.g., "-25Hz", "+0Hz", "+30Hz")
96
+ - **Rate**: -50% to +50% (e.g., "-20%", "+0%", "+25%")
97
+
98
+ ## 🛠️ Local Development
99
+
100
+ ### Installation
101
+
102
+ 1. Clone the repository
103
+ 2. Install dependencies:
104
+ ```bash
105
+ pip install -r requirements.txt
106
+ ```
107
+ 3. Run the server:
108
+ ```bash
109
+ python app.py
110
+ ```
111
+ 4. Open http://localhost:7860 for API documentation
112
+
113
+ ### Docker Deployment
114
+
115
+ ```bash
116
+ # Build the image
117
+ docker build -t tts-api .
118
+
119
+ # Run the container
120
+ docker run -p 7860:7860 tts-api
121
+ ```
122
+
123
+ ## 🌐 Hugging Face Spaces Deployment
124
+
125
+ 1. Create a new Space on Hugging Face
126
+ 2. Choose "Docker" as the SDK
127
+ 3. Upload the following files:
128
+ - `app.py` (main application)
129
+ - `requirements.txt` (dependencies)
130
+ - `Dockerfile` (container configuration)
131
+ - `README.md` (this file)
132
+ 4. Your API will be publicly accessible once deployed!
133
+
134
+ ## 📋 Response Format
135
+
136
+ ### Successful Response
137
+ - **Content-Type**: `audio/mpeg`
138
+ - **Body**: MP3 audio file
139
+
140
+ ### Error Response
141
+ ```json
142
+ {
143
+ "detail": "Error description"
144
+ }
145
+ ```
146
+
147
+ ## 🔒 Rate Limiting & Usage
148
+
149
+ This is a public API, but please use it responsibly:
150
+ - Maximum text length: 5,000 characters
151
+ - Recommended: Don't exceed 100 requests per minute
152
+ - For production use, consider implementing authentication
153
+
154
+ ## 🐛 Troubleshooting
155
+
156
+ ### Common Issues
157
+
158
+ 1. **Voice not found**: Use the `/voices` endpoint to check available voices
159
+ 2. **Invalid parameters**: Check pitch/rate format (must include Hz/% suffix)
160
+ 3. **Text too long**: Maximum 5,000 characters per request
161
+ 4. **Network timeout**: Large texts may take longer to process
162
+
163
+ ## 📄 License
164
+
165
+ This project uses Microsoft Edge TTS service. Please review Microsoft's terms of service for usage guidelines.
166
+
167
+ ## 🤝 Contributing
168
+
169
+ Feel free to open issues or submit pull requests to improve this API!
170
+
171
  ---
172
 
173
+ **Made with ❤️ for the Hugging Face community**
app.py ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Text-to-Speech API using Edge-TTS with FastAPI
5
+ Optimized for Hugging Face Spaces deployment
6
+ """
7
+
8
+ import edge_tts
9
+ import asyncio
10
+ import os
11
+ import tempfile
12
+ import uuid
13
+ import re
14
+ from fastapi import FastAPI, HTTPException, Form, UploadFile
15
+ from fastapi.responses import FileResponse, JSONResponse
16
+ from fastapi.middleware.cors import CORSMiddleware
17
+ from pydantic import BaseModel, Field, validator
18
+ import logging
19
+ from typing import Optional
20
+ import aiofiles
21
+
22
+ # Configure logging
23
+ logging.basicConfig(level=logging.INFO)
24
+ logger = logging.getLogger(__name__)
25
+
26
+ # FastAPI app initialization
27
+ app = FastAPI(
28
+ title="Text-to-Speech API",
29
+ description="Convert text to speech using Microsoft Edge TTS with customizable voice, pitch, and rate",
30
+ version="1.0.0",
31
+ docs_url="/", # Swagger UI at root for easy access
32
+ redoc_url="/redoc"
33
+ )
34
+
35
+ # Add CORS middleware for public API access
36
+ app.add_middleware(
37
+ CORSMiddleware,
38
+ allow_origins=["*"], # Allow all origins for public API
39
+ allow_credentials=True,
40
+ allow_methods=["*"],
41
+ allow_headers=["*"],
42
+ )
43
+
44
+ # Configuration
45
+ TEMP_DIR = tempfile.gettempdir()
46
+ MAX_TEXT_LENGTH = 5000
47
+
48
+ # Pydantic models for request validation
49
+ class TTSRequest(BaseModel):
50
+ text: str = Field(..., min_length=1, max_length=MAX_TEXT_LENGTH, description="Text to convert to speech")
51
+ voice: str = Field(default="en-US-AriaNeural", description="Voice identifier (e.g., 'en-GB-SoniaNeural')")
52
+ pitch: str = Field(default="+0Hz", description="Pitch adjustment (e.g., '+10Hz', '-15Hz')")
53
+ rate: str = Field(default="+0%", description="Rate adjustment (e.g., '+20%', '-10%')")
54
+
55
+ @validator('pitch')
56
+ def validate_pitch(cls, v):
57
+ if not re.match(r'^[+-]?\d+Hz$', v):
58
+ raise ValueError("Pitch must be in format like '+10Hz' or '-15Hz'")
59
+ pitch_value = int(v.replace('Hz', '').replace('+', ''))
60
+ if not -50 <= pitch_value <= 50:
61
+ raise ValueError("Pitch value must be between -50 and 50")
62
+ return v
63
+
64
+ @validator('rate')
65
+ def validate_rate(cls, v):
66
+ if not re.match(r'^[+-]?\d+%$', v):
67
+ raise ValueError("Rate must be in format like '+15%' or '-20%'")
68
+ rate_value = int(v.replace('%', '').replace('+', ''))
69
+ if not -50 <= rate_value <= 50:
70
+ raise ValueError("Rate value must be between -50 and 50")
71
+ return v
72
+
73
+ class VoiceInfo(BaseModel):
74
+ name: str
75
+ short_name: str
76
+ gender: str
77
+ locale: str
78
+ language: str
79
+ display_name: str
80
+
81
+ class HealthResponse(BaseModel):
82
+ status: str
83
+ service: str
84
+ version: str
85
+
86
+ class VoicesResponse(BaseModel):
87
+ voices: list[VoiceInfo]
88
+ count: int
89
+
90
+ # Utility functions
91
+ async def generate_speech_async(text: str, voice: str, pitch: str, rate: str, output_file: str) -> bool:
92
+ """Generate speech asynchronously"""
93
+ try:
94
+ # Create SSML with pitch and rate adjustments
95
+ ssml_text = f'<speak><prosody pitch="{pitch}" rate="{rate}">{text}</prosody></speak>'
96
+
97
+ communicate = edge_tts.Communicate(ssml_text, voice)
98
+ await communicate.save(output_file)
99
+ return True
100
+ except Exception as e:
101
+ logger.error(f"Error generating speech: {str(e)}")
102
+ return False
103
+
104
+ def cleanup_file(file_path: str):
105
+ """Clean up temporary file"""
106
+ try:
107
+ if os.path.exists(file_path):
108
+ os.remove(file_path)
109
+ logger.info(f"Cleaned up temporary file: {file_path}")
110
+ except Exception as e:
111
+ logger.warning(f"Failed to clean up temp file {file_path}: {str(e)}")
112
+
113
+ # API Endpoints
114
+ @app.get("/health", response_model=HealthResponse, tags=["Health"])
115
+ async def health_check():
116
+ """Health check endpoint"""
117
+ return HealthResponse(
118
+ status="healthy",
119
+ service="TTS API",
120
+ version="1.0.0"
121
+ )
122
+
123
+ @app.get("/voices", response_model=VoicesResponse, tags=["Voices"])
124
+ async def get_voices():
125
+ """Get list of available voices"""
126
+ try:
127
+ voices = await edge_tts.list_voices()
128
+
129
+ voice_list = [
130
+ VoiceInfo(
131
+ name=voice["Name"],
132
+ short_name=voice["ShortName"],
133
+ gender=voice["Gender"],
134
+ locale=voice["Locale"],
135
+ language=voice.get("Language", ""),
136
+ display_name=voice.get("DisplayName", "")
137
+ )
138
+ for voice in voices
139
+ ]
140
+
141
+ return VoicesResponse(voices=voice_list, count=len(voice_list))
142
+ except Exception as e:
143
+ logger.error(f"Error fetching voices: {str(e)}")
144
+ raise HTTPException(status_code=500, detail="Failed to fetch voices")
145
+
146
+ @app.post("/synthesize", tags=["TTS"])
147
+ async def synthesize_speech(request: TTSRequest):
148
+ """
149
+ Convert text to speech and return audio file
150
+
151
+ - **text**: Text to convert to speech (required)
152
+ - **voice**: Voice identifier (default: en-US-AriaNeural)
153
+ - **pitch**: Pitch adjustment like '+10Hz' or '-15Hz' (default: +0Hz)
154
+ - **rate**: Rate adjustment like '+20%' or '-10%' (default: +0%)
155
+ """
156
+ output_file = None
157
+ try:
158
+ # Generate unique filename
159
+ file_id = str(uuid.uuid4())
160
+ output_file = os.path.join(TEMP_DIR, f"tts_{file_id}.mp3")
161
+
162
+ # Generate speech
163
+ success = await generate_speech_async(
164
+ request.text, request.voice, request.pitch, request.rate, output_file
165
+ )
166
+
167
+ if not success:
168
+ raise HTTPException(status_code=500, detail="Failed to generate speech")
169
+
170
+ if not os.path.exists(output_file):
171
+ raise HTTPException(status_code=500, detail="Audio file was not generated")
172
+
173
+ # Return the audio file
174
+ return FileResponse(
175
+ output_file,
176
+ media_type="audio/mpeg",
177
+ filename=f"speech_{file_id}.mp3",
178
+ background=cleanup_file(output_file) # Cleanup after response
179
+ )
180
+
181
+ except HTTPException:
182
+ if output_file:
183
+ cleanup_file(output_file)
184
+ raise
185
+ except Exception as e:
186
+ if output_file:
187
+ cleanup_file(output_file)
188
+ logger.error(f"Error in synthesize_speech: {str(e)}")
189
+ raise HTTPException(status_code=500, detail="Internal server error")
190
+
191
+ @app.post("/synthesize-form", tags=["TTS"])
192
+ async def synthesize_speech_form(
193
+ text: str = Form(..., description="Text to convert to speech"),
194
+ voice: str = Form(default="en-US-AriaNeural", description="Voice identifier"),
195
+ pitch: str = Form(default="+0Hz", description="Pitch adjustment (e.g., '+10Hz')"),
196
+ rate: str = Form(default="+0%", description="Rate adjustment (e.g., '+20%')")
197
+ ):
198
+ """
199
+ Convert text to speech using form data (alternative endpoint)
200
+ Useful for HTML forms or when JSON is not preferred
201
+ """
202
+ # Create request object and validate
203
+ try:
204
+ request = TTSRequest(text=text, voice=voice, pitch=pitch, rate=rate)
205
+ return await synthesize_speech(request)
206
+ except ValueError as e:
207
+ raise HTTPException(status_code=422, detail=str(e))
208
+
209
+ @app.get("/", include_in_schema=False)
210
+ async def root():
211
+ """Root endpoint redirects to API documentation"""
212
+ return JSONResponse({
213
+ "message": "Welcome to Text-to-Speech API",
214
+ "documentation": "/docs",
215
+ "health": "/health",
216
+ "voices": "/voices",
217
+ "synthesize": "/synthesize"
218
+ })
219
+
220
+ # Exception handlers
221
+ @app.exception_handler(422)
222
+ async def validation_exception_handler(request, exc):
223
+ return JSONResponse(
224
+ status_code=422,
225
+ content={"detail": "Validation error", "errors": exc.detail}
226
+ )
227
+
228
+ @app.exception_handler(500)
229
+ async def internal_exception_handler(request, exc):
230
+ return JSONResponse(
231
+ status_code=500,
232
+ content={"detail": "Internal server error"}
233
+ )
234
+
235
+ # Startup event
236
+ @app.on_event("startup")
237
+ async def startup_event():
238
+ logger.info("TTS API is starting up...")
239
+ # Test edge-tts functionality
240
+ try:
241
+ voices = await edge_tts.list_voices()
242
+ logger.info(f"Successfully loaded {len(voices)} voices")
243
+ except Exception as e:
244
+ logger.error(f"Failed to load voices: {e}")
245
+
246
+ @app.on_event("shutdown")
247
+ async def shutdown_event():
248
+ logger.info("TTS API is shutting down...")
249
+
250
+ if __name__ == "__main__":
251
+ import uvicorn
252
+ print("Starting TTS API Server with FastAPI...")
253
+ print("API Documentation will be available at: http://localhost:7860/")
254
+ print("Health check: http://localhost:7860/health")
255
+ print("Available voices: http://localhost:7860/voices")
256
+ print("\nExample usage:")
257
+ print("curl -X POST 'http://localhost:7860/synthesize' \\")
258
+ print(" -H 'Content-Type: application/json' \\")
259
+ print(" -d '{\"text\":\"Hello from Hugging Face!\",\"voice\":\"en-GB-SoniaNeural\",\"pitch\":\"-10Hz\",\"rate\":\"+15%\"}' \\")
260
+ print(" --output speech.mp3")
261
+
262
+ uvicorn.run(app, host="0.0.0.0", port=7860)
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ fastapi==0.104.1
2
+ uvicorn[standard]==0.24.0
3
+ edge-tts==6.1.9
4
+ python-multipart==0.0.6
5
+ aiofiles==23.2.1
6
+ pydantic==2.5.0