# AudioForge Architecture ## Overview AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions. ## System Architecture ``` ┌─────────────┐ │ Frontend │ (Next.js + React) │ Port 3000 │ └──────┬───────┘ │ HTTP/REST ▼ ┌─────────────┐ │ Backend │ (FastAPI) │ Port 8000 │ └──────┬───────┘ │ ├──► PostgreSQL (Metadata Storage) ├──► Redis (Caching) └──► Storage (Audio Files) ``` ## Generation Pipeline ### Stage 1: Prompt Understanding - **Service**: `PromptUnderstandingService` - **Purpose**: Analyze user prompt to extract: - Musical style/genre - Tempo/BPM - Mood - Instrumentation hints - Lyrics (if provided) - Duration preferences - **Output**: Enriched prompt with metadata ### Stage 2: Music Generation - **Service**: `MusicGenerationService` - **Model**: Meta MusicGen (via AudioCraft) - **Purpose**: Generate instrumental music track - **Output**: WAV file with instrumental track ### Stage 3: Vocal Generation (Optional) - **Service**: `VocalGenerationService` - **Model**: Bark or XTTS - **Purpose**: Generate vocals from lyrics - **Output**: WAV file with vocals ### Stage 4: Mixing - **Service**: `PostProcessingService` - **Purpose**: Mix instrumental and vocal tracks - **Output**: Mixed audio file ### Stage 5: Post-Processing/Mastering - **Service**: `PostProcessingService` - **Purpose**: Apply compression, EQ, normalization - **Output**: Final mastered audio file ### Stage 6: Metadata Storage - **Service**: Database layer - **Purpose**: Store generation metadata, paths, status - **Output**: Database record ## Technology Stack ### Backend - **Framework**: FastAPI (async Python) - **Database**: PostgreSQL with SQLAlchemy async - **Caching**: Redis - **ML Framework**: PyTorch - **Music Models**: - MusicGen (Meta AudioCraft) - Bark (for vocals) - **Audio Processing**: librosa, soundfile, scipy ### Frontend - **Framework**: Next.js 14+ (App Router) - **Language**: TypeScript (strict mode) - **Styling**: Tailwind CSS - **UI Components**: Radix UI primitives - **State Management**: React Query + Zustand - **Forms**: React Hook Form + Zod ### Observability - **Logging**: structlog (structured JSON logs) - **Metrics**: Prometheus - **Tracing**: OpenTelemetry (optional) ## Data Flow 1. User submits prompt via frontend 2. Frontend sends POST to `/api/v1/generations` 3. Backend creates generation record (status: pending) 4. Background task starts processing 5. Pipeline executes stages 1-6 6. Frontend polls `/api/v1/generations/{id}` for status 7. On completion, audio available at `/api/v1/generations/{id}/audio` ## Database Schema ### Generations Table - `id`: UUID (primary key) - `prompt`: Text (user input) - `lyrics`: Text (optional) - `style`: String (extracted style) - `duration`: Integer (seconds) - `status`: String (pending/processing/completed/failed) - `audio_path`: String (final audio file path) - `instrumental_path`: String (instrumental track path) - `vocal_path`: String (vocal track path, if applicable) - `metadata`: JSON (analysis results, etc.) - `created_at`, `updated_at`, `completed_at`: Timestamps - `error_message`: Text (if failed) - `processing_time_seconds`: Float ## API Endpoints ### Generations - `POST /api/v1/generations` - Create generation - `GET /api/v1/generations/{id}` - Get generation status - `GET /api/v1/generations/{id}/audio` - Download audio - `GET /api/v1/generations` - List generations (paginated) ## Configuration All configuration via environment variables (see `.env.example`): - Database connection - Redis connection - Model paths and devices (CPU/CUDA) - Storage paths - Logging levels - Feature flags ## Scalability Considerations - **Horizontal Scaling**: Stateless API, can run multiple instances - **Queue System**: Background tasks can be moved to Celery/RQ - **Model Serving**: Models can be served separately via TorchServe - **Storage**: Audio files can be stored in S3/object storage - **Caching**: Redis caches prompt analysis results ## Security - Input validation via Pydantic schemas - SQL injection prevention via SQLAlchemy ORM - CORS configuration - Rate limiting (to be added) - Authentication (to be added) ## Performance Optimizations - Async/await throughout - Model lazy loading - Background task processing - Connection pooling (database, Redis) - Audio file streaming ## Future Enhancements - User authentication & authorization - Rate limiting - WebSocket for real-time updates - Advanced post-processing (reverb, delay, etc.) - Multiple model support (switch between MusicGen variants) - Batch generation - Playlist creation - Social features (sharing, likes)