Spaces:
Sleeping
Sleeping
VoiceForge Architecture
Overview
VoiceForge is a production-grade Speech-to-Text and Text-to-Speech application built with modern Python technologies. This document describes the system architecture and key design decisions.
System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Load Balancer β
β (Nginx / Cloud LB) β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
ββββββββΌβββββββ βββββββΌβββββββ βββββββΌβββββββ
β Frontend β β Backend β β Worker β
β Streamlit β β FastAPI β β Celery β
β :8501 β β :8000 β β β
ββββββββ¬βββββββ βββββββ¬βββββββ βββββββ¬βββββββ
β β β
ββββββββΌββββββββββββββββΌββββββββββββββββΌβββββββ
β Service Layer β
β βββββββββββ βββββββββββ βββββββββββββββββββ β
β β STT β β TTS β β File Service β β
β β Service β β Service β β β β
β ββββββ¬βββββ ββββββ¬βββββ ββββββββββ¬βββββββββ β
β βββββββββββ βββββββββββ β β
β β NLP β β Export β β β
β β Service β β Service β β β
β ββββββ¬βββββ ββββββ¬βββββ β β
βββββββββΌββββββββββββΌββββββββββββββββΌβββββββββββ
β β β
βββββββββΌββββββββββββΌββββββββββββββββΌβββββββββββ
β Data Layer β
β ββββββββββββ βββββββββ βββββββββββββββββ β
β βPostgreSQLβ β Redis β β File Storage β β
β β :5432 β β :6379 β β /uploads β β
β ββββββββββββ βββββββββ βββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββΌββββββββββββββββββββββββββββββββββββββ
β External APIs β
β βββββββββββββββββββ ββββββββββββββββββββ β
β β Google Cloud β β Google Cloud β β
β β Speech-to-Text β β Text-to-Speech β β
β βββββββββββββββββββ ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
Components
Frontend (Streamlit)
- Purpose: Web interface for users
- Technology: Streamlit 1.31+
- Key Features:
- Real-time microphone recording (WebRTC)
- File upload with drag-and-drop
- Audio waveform visualization
- Transcript editing and export
- Voice selection and preview
Backend (FastAPI)
- Purpose: REST API server
- Technology: FastAPI 0.109+
- Key Features:
- OpenAPI documentation
- CORS middleware
- JWT authentication (Phase 3)
- Request validation
- Error handling
Worker (Celery)
- Purpose: Background task processing
- Technology: Celery 5.3+ with Redis broker
- Key Features:
- Long audio file processing
- Batch transcription
- NLP analysis tasks
Database (PostgreSQL)
- Purpose: Persistent data storage
- Technology: PostgreSQL 15+
- Tables:
users- User accountsaudio_files- Uploaded audio metadatatranscripts- Transcription resultsuser_preferences- User settingsusage_events- Analytics dataapi_keys- Enterprise API keys
Cache (Redis)
- Purpose: Caching and task queue
- Technology: Redis 7+
- Use Cases:
- Voice list caching
- Transcription result caching
- Celery task queue
- Session storage
Observability (Prometheus)
- Purpose: Application monitoring
- Technology: prometheus-fastapi-instrumentator
- Key Metrics:
- Request latency and throughput
- Error rates
- Endpoint usage statistics
Data Flow
Speech-to-Text Flow
1. User uploads audio file
2. Frontend sends to /api/v1/stt/upload
3. Backend validates file format and size
4. File saved to storage
5. STT Service calls Google Cloud Speech API
6. Results processed (words, segments, timestamps)
7. Transcript saved to database
8. Response returned to frontend
Text-to-Speech Flow
1. User enters text
2. Frontend sends to /api/v1/tts/synthesize
3. Backend validates text and voice
4. TTS Service calls Google Cloud TTS API
5. Audio returned as base64
6. Frontend plays/downloads audio
Design Decisions
Why PostgreSQL with JSONB?
- Single database simplifies deployment
- JSONB supports flexible document storage for segments
- SQL for relational queries (users, files)
- Full-text search capability
Why Streamlit?
- Rapid development for data apps
- Built-in components for audio
- Easy deployment
- Python-native (no JS required)
Why Google Cloud APIs?
- Industry-leading accuracy
- 100+ languages supported
- 200+ voice options
- Generous free tier
Security Considerations
- Secrets via environment variables
- HTTPS in production
- JWT for authentication
- Per-user data isolation
- Temporary file cleanup
Deployment Options
Local Development
# Backend
cd backend
uvicorn app.main:app --reload
# Frontend
cd frontend
streamlit run streamlit_app.py
Docker Compose
docker-compose -f deploy/docker/docker-compose.dev.yml up
Production
- Deploy to any container orchestrator
- Use managed PostgreSQL (Cloud SQL, RDS)
- Use managed Redis (Memorystore, ElastiCache)
- Load balance with Nginx/Cloud LB