studify-tts-service / README.md
Hexa06's picture
Simplify: Remove Supabase, auth, and quotas - pure TTS API
559b0d5
metadata
title: Kokoro TTS API - Simple & Fast
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

🎀 Kokoro TTS API - Simple & Fast

High-speed text-to-speech service powered by Kokoro (82M parameters). No authentication, no database - just pure TTS!

⚑ Features

  • Lightning Fast: 10x faster than XTTS
  • Emotional Voices: 6 expressive voices
  • CPU Optimized: Runs smoothly on CPU
  • No Auth Required: Simple HTTP API
  • Long Audio: Up to 5 minutes per generation

🎡 Available Voices

Voice Description Best For
bf_isabella British Female ⭐ Storytelling, audiobooks
af_heart American Female Warm, conversational
af_bella American Female Professional narration
bf_emma British Female Elegant, formal
am_adam American Male Confident, clear
am_michael American Male Friendly, casual

πŸš€ Quick Start

Using curl:

curl -X POST https://your-space.hf.space/api/generate \
  -F "text=Once upon a time, in a distant kingdom..." \
  -F "voice=bf_isabella" \
  -F "speed=1.0" \
  --output story.wav

Using Python:

import requests

response = requests.post(
    "https://your-space.hf.space/api/generate",
    data={
        "text": "Hello world! This is Kokoro TTS.",
        "voice": "bf_isabella",
        "speed": 1.0
    }
)

with open("audio.wav", "wb") as f:
    f.write(response.content)

Using Flutter/Dart:

import 'package:http/http.dart' as http;

Future<File> generateTTS(String text) async {
  final response = await http.post(
    Uri.parse('https://your-space.hf.space/api/generate'),
    body: {
      'text': text,
      'voice': 'bf_isabella',
      'speed': '1.0',
    },
  );
  
  final file = File('${Directory.systemTemp.path}/tts_${DateTime.now().millisecondsSinceEpoch}.wav');
  await file.writeAsBytes(response.bodyBytes);
  return file;
}

πŸ“Š API Endpoints

POST /api/generate

Generate TTS audio from text.

Parameters:

  • text (required): Text to convert (5-4500 characters)
  • voice (optional): Voice to use (default: bf_isabella)
  • speed (optional): Speech speed 0.5-2.0 (default: 1.0)

Response: WAV audio file

GET /health

Check service health and available voices.

GET /docs

Interactive API documentation (Swagger UI).

βš™οΈ Technical Specs

  • Model: Kokoro-82M (ONNX)
  • Max Characters: 4500 (~5 minutes audio)
  • Generation Time: ~20-30 seconds (CPU)
  • Speech Rate: ~900 characters/minute
  • Output Format: WAV, 24kHz

πŸ› οΈ Deployment

This Space runs on Docker with automatic model download on startup.

No environment variables needed!

πŸ“ License

Model: Kokoro TTS by thewh1teagle
Service: MIT License

πŸ”— Links