A newer version of the Gradio SDK is available:
6.1.0
HFStudio Technical Specifications
Project Overview
HFStudio is a web-based text-to-speech application that provides both local and API-based TTS capabilities, inspired by ElevenLabs Studio but with support for local model execution.
Core Features
1. Text-to-Speech Engine
- Input: Multi-line text area for user input
- Output: Generated audio playback with download capability
- Models: Support for multiple TTS models (local and API-based)
- Voice Selection: Dropdown/list for available voices
- Audio Controls: Play, pause, download generated audio
2. Execution Modes
- API Mode: Connect to remote TTS services (HuggingFace, OpenAI, etc.)
- Local Mode: Run TTS models locally using downloaded models
- Mode Toggle: Clear UI toggle between API and Local execution
- Local Setup Instructions: Display installation command when local mode selected
3. Voice Configuration
- Speed Control: Slider (0.5x - 2.0x speed)
- Stability: Slider for voice consistency (when applicable)
- Similarity: Slider for voice matching (when applicable)
- Style/Emotion: Optional controls for voice style
4. User Interface Layout
- Left Sidebar: Navigation and feature selection
- Home/Text-to-Speech (default)
- Settings
- History (future feature)
- Main Content Area: Text input and controls
- Right Panel: Voice/model selection and parameters
Technology Stack
Frontend
- Framework: SvelteKit
- Styling: TailwindCSS
- Components:
- Shadcn-svelte for UI components
- Audio player: Native HTML5 or Wavesurfer.js
- State Management: Svelte stores
- Build Tool: Vite
Backend (Python Package)
- Framework: FastAPI for API server
- TTS Libraries:
- Transformers (HuggingFace models)
- Coqui TTS
- Optional: Piper, Bark
- Audio Processing: librosa, soundfile
- CLI: Click or Typer for command-line interface
API Integration
- HuggingFace Inference API
- OpenAI TTS API (optional)
- Custom model endpoints
Project Structure
hfstudio/
βββ frontend/ # Svelte frontend
β βββ src/
β β βββ routes/
β β β βββ +layout.svelte
β β β βββ +page.svelte
β β β βββ api/
β β βββ lib/
β β β βββ components/
β β β β βββ Sidebar.svelte
β β β β βββ TextInput.svelte
β β β β βββ VoiceSelector.svelte
β β β β βββ AudioPlayer.svelte
β β β β βββ ModeToggle.svelte
β β β β βββ ParameterControls.svelte
β β β βββ stores/
β β β β βββ app.js
β β β β βββ audio.js
β β β βββ api/
β β β βββ client.js
β β βββ app.html
β βββ package.json
β βββ vite.config.js
β βββ tailwind.config.js
β
βββ backend/ # Python backend
β βββ hfstudio/
β β βββ __init__.py
β β βββ __main__.py
β β βββ server.py # FastAPI app
β β βββ cli.py # CLI interface
β β βββ models/
β β β βββ __init__.py
β β β βββ base.py
β β β βββ local.py
β β β βββ api.py
β β βββ voices/
β β β βββ __init__.py
β β β βββ manager.py
β β βββ utils/
β β βββ __init__.py
β β βββ audio.py
β βββ requirements.txt
β βββ setup.py
β
βββ README.md
βββ docker-compose.yml # Optional containerization
API Endpoints
REST API
POST /api/tts/generate
Body: {
text: string,
voice_id: string,
model_id: string,
parameters: {
speed: float,
stability: float,
similarity: float,
style: string
},
mode: "api" | "local"
}
Response: {
audio_url: string,
duration: float,
format: string
}
GET /api/voices
Response: {
voices: [{
id: string,
name: string,
preview_url: string,
supported_models: string[]
}]
}
GET /api/models
Response: {
models: [{
id: string,
name: string,
type: "local" | "api",
status: "available" | "downloadable" | "api-only"
}]
}
GET /api/status
Response: {
mode: "api" | "local",
local_available: boolean,
api_configured: boolean
}
Component Specifications
1. ModeToggle Component
Props:
- mode: "api" | "local"
- onModeChange: function
Features:
- Visual toggle switch
- Installation hint for local mode
- Status indicator (green/yellow/red)
2. TextInput Component
Props:
- value: string
- maxLength: number (default: 5000)
- placeholder: string
Features:
- Character counter
- Auto-resize
- Clear button
3. VoiceSelector Component
Props:
- voices: Voice[]
- selectedVoice: string
- onSelect: function
Features:
- Search/filter
- Voice preview
- Favorite voices
4. AudioPlayer Component
Props:
- audioUrl: string
- duration: number
Features:
- Play/pause
- Progress bar
- Volume control
- Download button
- Waveform visualization (optional)
Local Package (hfstudio)
Installation
pip install hfstudio
CLI Usage
# Start the server
hfstudio
# Start with custom port
hfstudio --port 8080
# Download models for offline use
hfstudio download-models
# List available models
hfstudio list-models
Python API
from hfstudio import TTSEngine
# Initialize engine
engine = TTSEngine(mode="local")
# Generate speech
audio = engine.generate(
text="Hello, world!",
voice="default",
model="coqui/tts-vits"
)
# Save audio
audio.save("output.wav")
Configuration
Frontend (.env)
PUBLIC_API_URL=http://localhost:8000
PUBLIC_DEFAULT_MODE=api
Backend (config.yaml)
server:
host: 0.0.0.0
port: 8000
cors_origins:
- http://localhost:5173
- http://localhost:3000
models:
local:
cache_dir: ~/.hfstudio/models
default: "coqui/tts-vits"
api:
huggingface_token: ${HF_TOKEN}
openai_key: ${OPENAI_API_KEY}
audio:
output_format: "wav"
sample_rate: 22050
bitrate: 128
Development Workflow
Phase 1: MVP
- Basic Svelte frontend with text input and generate button
- FastAPI backend with single TTS model support
- Mode toggle (UI only, local mode shows installation message)
- Basic audio playback
Phase 2: Core Features
- Multiple voice support
- Parameter controls (speed, stability, similarity)
- Local model execution
- Audio download functionality
Phase 3: Enhanced Features
- History/saved generations
- Voice cloning (if supported by models)
- Batch processing
- Audio format options
Phase 4: Polish
- Waveform visualization
- Real-time generation (streaming)
- Voice preview
- Keyboard shortcuts
Performance Requirements
- API Response Time: < 2s for typical requests
- Local Generation: < 5s for 100 words
- Frontend Load Time: < 1s
- Audio Streaming: Start playback within 500ms
Security Considerations
- API key management (environment variables)
- CORS configuration
- Rate limiting
- Input sanitization
- File size limits for audio generation
Testing Strategy
- Frontend: Vitest for unit tests, Playwright for E2E
- Backend: Pytest for unit and integration tests
- Load testing: Locust or K6
- Audio quality: Manual testing with various inputs
Deployment Options
- Standalone: User runs both frontend and backend locally
- Docker: Containerized deployment
- Cloud: Separate frontend (Vercel/Netlify) and backend (Railway/Fly.io)
- Desktop: Electron wrapper (future consideration)