Spaces:

NitinBot001
/

TTS-API

Sleeping

App Files Files Community

NitinBot001 commited on Jun 18, 2025

Commit

332ab08

verified ·

1 Parent(s): ec6a5b1

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +20 -0
README.md +171 -8
app.py +262 -0
requirements.txt +6 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,20 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements and install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Expose port
+EXPOSE 7860
+# Run the application
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,173 @@
----
-title: TTS API
-emoji: 🏆
-colorFrom: green
-colorTo: purple
-sdk: docker
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Text-to-Speech API 🎤
+A public Text-to-Speech API built with FastAPI and Microsoft Edge TTS, optimized for Hugging Face Spaces deployment.
+## 🚀 Features
+- **Convert text to natural-sounding speech** using Microsoft Edge TTS
+- **Multiple voice options** with different languages and accents
+- **Customizable speech parameters** (pitch and rate adjustment)
+- **RESTful API** with automatic OpenAPI documentation
+- **Public access** with CORS enabled
+- **Real-time audio generation** and streaming
+## 📖 API Documentation
+Once deployed, visit the root URL to access the interactive API documentation (Swagger UI).
+## 🔧 API Endpoints
+### Core Endpoints
+- `GET /` - API information and documentation links
+- `GET /health` - Health check endpoint
+- `GET /voices` - List all available voices
+- `POST /synthesize` - Convert text to speech (JSON)
+- `POST /synthesize-form` - Convert text to speech (Form data)
+### Example Usage
+#### Using cURL with JSON:
+```bash
+curl -X POST 'https://your-space-url/synthesize' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "text": "Hello from Hugging Face Spaces!",
+    "voice": "en-GB-SoniaNeural",
+    "pitch": "-10Hz",
+    "rate": "+15%"
+  }' \
+  --output speech.mp3
+```
+#### Using cURL with Form Data:
+```bash
+curl -X POST 'https://your-space-url/synthesize-form' \
+  -F 'text=Hello World!' \
+  -F 'voice=en-US-AriaNeural' \
+  -F 'pitch=+5Hz' \
+  -F 'rate=+10%' \
+  --output speech.mp3
+```
+#### Using Python requests:
+```python
+import requests
+response = requests.post(
+    'https://your-space-url/synthesize',
+    json={
+        'text': 'Hello from Python!',
+        'voice': 'en-US-AriaNeural',
+        'pitch': '+0Hz',
+        'rate': '+0%'
+    }
+)
+with open('speech.mp3', 'wb') as f:
+    f.write(response.content)
+```
+## 📝 Parameters
+### Request Parameters
+| Parameter | Type | Default | Description | Example |
+|-----------|------|---------|-------------|---------|
+| `text` | string | required | Text to convert to speech | "Hello World!" |
+| `voice` | string | "en-US-AriaNeural" | Voice identifier | "en-GB-SoniaNeural" |
+| `pitch` | string | "+0Hz" | Pitch adjustment | "+10Hz", "-15Hz" |
+| `rate` | string | "+0%" | Rate adjustment | "+20%", "-10%" |
+### Voice Examples
+- `en-US-AriaNeural` - US English, Female
+- `en-GB-SoniaNeural` - UK English, Female
+- `en-AU-NatashaNeural` - Australian English, Female
+- `de-DE-KatjaNeural` - German, Female
+- `fr-FR-DeniseNeural` - French, Female
+- `es-ES-ElviraNeural` - Spanish, Female
+*Use the `/voices` endpoint to get the complete list of available voices.*
+### Parameter Ranges
+- **Pitch**: -50Hz to +50Hz (e.g., "-25Hz", "+0Hz", "+30Hz")
+- **Rate**: -50% to +50% (e.g., "-20%", "+0%", "+25%")
+## 🛠️ Local Development
+### Installation
+1. Clone the repository
+2. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Run the server:
+   ```bash
+   python app.py
+   ```
+4. Open http://localhost:7860 for API documentation
+### Docker Deployment
+```bash
+# Build the image
+docker build -t tts-api .
+# Run the container
+docker run -p 7860:7860 tts-api
+```
+## 🌐 Hugging Face Spaces Deployment
+1. Create a new Space on Hugging Face
+2. Choose "Docker" as the SDK
+3. Upload the following files:
+   - `app.py` (main application)
+   - `requirements.txt` (dependencies)
+   - `Dockerfile` (container configuration)
+   - `README.md` (this file)
+4. Your API will be publicly accessible once deployed!
+## 📋 Response Format
+### Successful Response
+- **Content-Type**: `audio/mpeg`
+- **Body**: MP3 audio file
+### Error Response
+```json
+{
+  "detail": "Error description"
+}
+```
+## 🔒 Rate Limiting & Usage
+This is a public API, but please use it responsibly:
+- Maximum text length: 5,000 characters
+- Recommended: Don't exceed 100 requests per minute
+- For production use, consider implementing authentication
+## 🐛 Troubleshooting
+### Common Issues
+1. **Voice not found**: Use the `/voices` endpoint to check available voices
+2. **Invalid parameters**: Check pitch/rate format (must include Hz/% suffix)
+3. **Text too long**: Maximum 5,000 characters per request
+4. **Network timeout**: Large texts may take longer to process
+## 📄 License
+This project uses Microsoft Edge TTS service. Please review Microsoft's terms of service for usage guidelines.
+## 🤝 Contributing
+Feel free to open issues or submit pull requests to improve this API!
 ---
+**Made with ❤️ for the Hugging Face community**

app.py ADDED Viewed

	@@ -0,0 +1,262 @@

+#!/usr/bin/env python3
+"""
+Text-to-Speech API using Edge-TTS with FastAPI
+Optimized for Hugging Face Spaces deployment
+"""
+import edge_tts
+import asyncio
+import os
+import tempfile
+import uuid
+import re
+from fastapi import FastAPI, HTTPException, Form, UploadFile
+from fastapi.responses import FileResponse, JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel, Field, validator
+import logging
+from typing import Optional
+import aiofiles
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# FastAPI app initialization
+app = FastAPI(
+    title="Text-to-Speech API",
+    description="Convert text to speech using Microsoft Edge TTS with customizable voice, pitch, and rate",
+    version="1.0.0",
+    docs_url="/",  # Swagger UI at root for easy access
+    redoc_url="/redoc"
+)
+# Add CORS middleware for public API access
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # Allow all origins for public API
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Configuration
+TEMP_DIR = tempfile.gettempdir()
+MAX_TEXT_LENGTH = 5000
+# Pydantic models for request validation
+class TTSRequest(BaseModel):
+    text: str = Field(..., min_length=1, max_length=MAX_TEXT_LENGTH, description="Text to convert to speech")
+    voice: str = Field(default="en-US-AriaNeural", description="Voice identifier (e.g., 'en-GB-SoniaNeural')")
+    pitch: str = Field(default="+0Hz", description="Pitch adjustment (e.g., '+10Hz', '-15Hz')")
+    rate: str = Field(default="+0%", description="Rate adjustment (e.g., '+20%', '-10%')")
+    @validator('pitch')
+    def validate_pitch(cls, v):
+        if not re.match(r'^[+-]?\d+Hz$', v):
+            raise ValueError("Pitch must be in format like '+10Hz' or '-15Hz'")
+        pitch_value = int(v.replace('Hz', '').replace('+', ''))
+        if not -50 <= pitch_value <= 50:
+            raise ValueError("Pitch value must be between -50 and 50")
+        return v
+    @validator('rate')
+    def validate_rate(cls, v):
+        if not re.match(r'^[+-]?\d+%$', v):
+            raise ValueError("Rate must be in format like '+15%' or '-20%'")
+        rate_value = int(v.replace('%', '').replace('+', ''))
+        if not -50 <= rate_value <= 50:
+            raise ValueError("Rate value must be between -50 and 50")
+        return v
+class VoiceInfo(BaseModel):
+    name: str
+    short_name: str
+    gender: str
+    locale: str
+    language: str
+    display_name: str
+class HealthResponse(BaseModel):
+    status: str
+    service: str
+    version: str
+class VoicesResponse(BaseModel):
+    voices: list[VoiceInfo]
+    count: int
+# Utility functions
+async def generate_speech_async(text: str, voice: str, pitch: str, rate: str, output_file: str) -> bool:
+    """Generate speech asynchronously"""
+    try:
+        # Create SSML with pitch and rate adjustments
+        ssml_text = f'<speak><prosody pitch="{pitch}" rate="{rate}">{text}</prosody></speak>'
+        communicate = edge_tts.Communicate(ssml_text, voice)
+        await communicate.save(output_file)
+        return True
+    except Exception as e:
+        logger.error(f"Error generating speech: {str(e)}")
+        return False
+def cleanup_file(file_path: str):
+    """Clean up temporary file"""
+    try:
+        if os.path.exists(file_path):
+            os.remove(file_path)
+            logger.info(f"Cleaned up temporary file: {file_path}")
+    except Exception as e:
+        logger.warning(f"Failed to clean up temp file {file_path}: {str(e)}")
+# API Endpoints
+@app.get("/health", response_model=HealthResponse, tags=["Health"])
+async def health_check():
+    """Health check endpoint"""
+    return HealthResponse(
+        status="healthy",
+        service="TTS API",
+        version="1.0.0"
+    )
+@app.get("/voices", response_model=VoicesResponse, tags=["Voices"])
+async def get_voices():
+    """Get list of available voices"""
+    try:
+        voices = await edge_tts.list_voices()
+        voice_list = [
+            VoiceInfo(
+                name=voice["Name"],
+                short_name=voice["ShortName"],
+                gender=voice["Gender"],
+                locale=voice["Locale"],
+                language=voice.get("Language", ""),
+                display_name=voice.get("DisplayName", "")
+            )
+            for voice in voices
+        ]
+        return VoicesResponse(voices=voice_list, count=len(voice_list))
+    except Exception as e:
+        logger.error(f"Error fetching voices: {str(e)}")
+        raise HTTPException(status_code=500, detail="Failed to fetch voices")
+@app.post("/synthesize", tags=["TTS"])
+async def synthesize_speech(request: TTSRequest):
+    """
+    Convert text to speech and return audio file
+    - **text**: Text to convert to speech (required)
+    - **voice**: Voice identifier (default: en-US-AriaNeural)
+    - **pitch**: Pitch adjustment like '+10Hz' or '-15Hz' (default: +0Hz)
+    - **rate**: Rate adjustment like '+20%' or '-10%' (default: +0%)
+    """
+    output_file = None
+    try:
+        # Generate unique filename
+        file_id = str(uuid.uuid4())
+        output_file = os.path.join(TEMP_DIR, f"tts_{file_id}.mp3")
+        # Generate speech
+        success = await generate_speech_async(
+            request.text, request.voice, request.pitch, request.rate, output_file
+        )
+        if not success:
+            raise HTTPException(status_code=500, detail="Failed to generate speech")
+        if not os.path.exists(output_file):
+            raise HTTPException(status_code=500, detail="Audio file was not generated")
+        # Return the audio file
+        return FileResponse(
+            output_file,
+            media_type="audio/mpeg",
+            filename=f"speech_{file_id}.mp3",
+            background=cleanup_file(output_file)  # Cleanup after response
+        )
+    except HTTPException:
+        if output_file:
+            cleanup_file(output_file)
+        raise
+    except Exception as e:
+        if output_file:
+            cleanup_file(output_file)
+        logger.error(f"Error in synthesize_speech: {str(e)}")
+        raise HTTPException(status_code=500, detail="Internal server error")
+@app.post("/synthesize-form", tags=["TTS"])
+async def synthesize_speech_form(
+    text: str = Form(..., description="Text to convert to speech"),
+    voice: str = Form(default="en-US-AriaNeural", description="Voice identifier"),
+    pitch: str = Form(default="+0Hz", description="Pitch adjustment (e.g., '+10Hz')"),
+    rate: str = Form(default="+0%", description="Rate adjustment (e.g., '+20%')")
+):
+    """
+    Convert text to speech using form data (alternative endpoint)
+    Useful for HTML forms or when JSON is not preferred
+    """
+    # Create request object and validate
+    try:
+        request = TTSRequest(text=text, voice=voice, pitch=pitch, rate=rate)
+        return await synthesize_speech(request)
+    except ValueError as e:
+        raise HTTPException(status_code=422, detail=str(e))
+@app.get("/", include_in_schema=False)
+async def root():
+    """Root endpoint redirects to API documentation"""
+    return JSONResponse({
+        "message": "Welcome to Text-to-Speech API",
+        "documentation": "/docs",
+        "health": "/health",
+        "voices": "/voices",
+        "synthesize": "/synthesize"
+    })
+# Exception handlers
+@app.exception_handler(422)
+async def validation_exception_handler(request, exc):
+    return JSONResponse(
+        status_code=422,
+        content={"detail": "Validation error", "errors": exc.detail}
+    )
+@app.exception_handler(500)
+async def internal_exception_handler(request, exc):
+    return JSONResponse(
+        status_code=500,
+        content={"detail": "Internal server error"}
+    )
+# Startup event
+@app.on_event("startup")
+async def startup_event():
+    logger.info("TTS API is starting up...")
+    # Test edge-tts functionality
+    try:
+        voices = await edge_tts.list_voices()
+        logger.info(f"Successfully loaded {len(voices)} voices")
+    except Exception as e:
+        logger.error(f"Failed to load voices: {e}")
+@app.on_event("shutdown")
+async def shutdown_event():
+    logger.info("TTS API is shutting down...")
+if __name__ == "__main__":
+    import uvicorn
+    print("Starting TTS API Server with FastAPI...")
+    print("API Documentation will be available at: http://localhost:7860/")
+    print("Health check: http://localhost:7860/health")
+    print("Available voices: http://localhost:7860/voices")
+    print("\nExample usage:")
+    print("curl -X POST 'http://localhost:7860/synthesize' \\")
+    print("  -H 'Content-Type: application/json' \\")
+    print("  -d '{\"text\":\"Hello from Hugging Face!\",\"voice\":\"en-GB-SoniaNeural\",\"pitch\":\"-10Hz\",\"rate\":\"+15%\"}' \\")
+    print("  --output speech.mp3")
+    uvicorn.run(app, host="0.0.0.0", port=7860)

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+edge-tts==6.1.9
+python-multipart==0.0.6
+aiofiles==23.2.1
+pydantic==2.5.0