Spaces:
Build error
Build error
| # AudioForge Architecture | |
| ## Overview | |
| AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions. | |
| ## System Architecture | |
| ``` | |
| ┌─────────────┐ | |
| │ Frontend │ (Next.js + React) | |
| │ Port 3000 │ | |
| └──────┬───────┘ | |
| │ HTTP/REST | |
| ▼ | |
| ┌─────────────┐ | |
| │ Backend │ (FastAPI) | |
| │ Port 8000 │ | |
| └──────┬───────┘ | |
| │ | |
| ├──► PostgreSQL (Metadata Storage) | |
| ├──► Redis (Caching) | |
| └──► Storage (Audio Files) | |
| ``` | |
| ## Generation Pipeline | |
| ### Stage 1: Prompt Understanding | |
| - **Service**: `PromptUnderstandingService` | |
| - **Purpose**: Analyze user prompt to extract: | |
| - Musical style/genre | |
| - Tempo/BPM | |
| - Mood | |
| - Instrumentation hints | |
| - Lyrics (if provided) | |
| - Duration preferences | |
| - **Output**: Enriched prompt with metadata | |
| ### Stage 2: Music Generation | |
| - **Service**: `MusicGenerationService` | |
| - **Model**: Meta MusicGen (via AudioCraft) | |
| - **Purpose**: Generate instrumental music track | |
| - **Output**: WAV file with instrumental track | |
| ### Stage 3: Vocal Generation (Optional) | |
| - **Service**: `VocalGenerationService` | |
| - **Model**: Bark or XTTS | |
| - **Purpose**: Generate vocals from lyrics | |
| - **Output**: WAV file with vocals | |
| ### Stage 4: Mixing | |
| - **Service**: `PostProcessingService` | |
| - **Purpose**: Mix instrumental and vocal tracks | |
| - **Output**: Mixed audio file | |
| ### Stage 5: Post-Processing/Mastering | |
| - **Service**: `PostProcessingService` | |
| - **Purpose**: Apply compression, EQ, normalization | |
| - **Output**: Final mastered audio file | |
| ### Stage 6: Metadata Storage | |
| - **Service**: Database layer | |
| - **Purpose**: Store generation metadata, paths, status | |
| - **Output**: Database record | |
| ## Technology Stack | |
| ### Backend | |
| - **Framework**: FastAPI (async Python) | |
| - **Database**: PostgreSQL with SQLAlchemy async | |
| - **Caching**: Redis | |
| - **ML Framework**: PyTorch | |
| - **Music Models**: | |
| - MusicGen (Meta AudioCraft) | |
| - Bark (for vocals) | |
| - **Audio Processing**: librosa, soundfile, scipy | |
| ### Frontend | |
| - **Framework**: Next.js 14+ (App Router) | |
| - **Language**: TypeScript (strict mode) | |
| - **Styling**: Tailwind CSS | |
| - **UI Components**: Radix UI primitives | |
| - **State Management**: React Query + Zustand | |
| - **Forms**: React Hook Form + Zod | |
| ### Observability | |
| - **Logging**: structlog (structured JSON logs) | |
| - **Metrics**: Prometheus | |
| - **Tracing**: OpenTelemetry (optional) | |
| ## Data Flow | |
| 1. User submits prompt via frontend | |
| 2. Frontend sends POST to `/api/v1/generations` | |
| 3. Backend creates generation record (status: pending) | |
| 4. Background task starts processing | |
| 5. Pipeline executes stages 1-6 | |
| 6. Frontend polls `/api/v1/generations/{id}` for status | |
| 7. On completion, audio available at `/api/v1/generations/{id}/audio` | |
| ## Database Schema | |
| ### Generations Table | |
| - `id`: UUID (primary key) | |
| - `prompt`: Text (user input) | |
| - `lyrics`: Text (optional) | |
| - `style`: String (extracted style) | |
| - `duration`: Integer (seconds) | |
| - `status`: String (pending/processing/completed/failed) | |
| - `audio_path`: String (final audio file path) | |
| - `instrumental_path`: String (instrumental track path) | |
| - `vocal_path`: String (vocal track path, if applicable) | |
| - `metadata`: JSON (analysis results, etc.) | |
| - `created_at`, `updated_at`, `completed_at`: Timestamps | |
| - `error_message`: Text (if failed) | |
| - `processing_time_seconds`: Float | |
| ## API Endpoints | |
| ### Generations | |
| - `POST /api/v1/generations` - Create generation | |
| - `GET /api/v1/generations/{id}` - Get generation status | |
| - `GET /api/v1/generations/{id}/audio` - Download audio | |
| - `GET /api/v1/generations` - List generations (paginated) | |
| ## Configuration | |
| All configuration via environment variables (see `.env.example`): | |
| - Database connection | |
| - Redis connection | |
| - Model paths and devices (CPU/CUDA) | |
| - Storage paths | |
| - Logging levels | |
| - Feature flags | |
| ## Scalability Considerations | |
| - **Horizontal Scaling**: Stateless API, can run multiple instances | |
| - **Queue System**: Background tasks can be moved to Celery/RQ | |
| - **Model Serving**: Models can be served separately via TorchServe | |
| - **Storage**: Audio files can be stored in S3/object storage | |
| - **Caching**: Redis caches prompt analysis results | |
| ## Security | |
| - Input validation via Pydantic schemas | |
| - SQL injection prevention via SQLAlchemy ORM | |
| - CORS configuration | |
| - Rate limiting (to be added) | |
| - Authentication (to be added) | |
| ## Performance Optimizations | |
| - Async/await throughout | |
| - Model lazy loading | |
| - Background task processing | |
| - Connection pooling (database, Redis) | |
| - Audio file streaming | |
| ## Future Enhancements | |
| - User authentication & authorization | |
| - Rate limiting | |
| - WebSocket for real-time updates | |
| - Advanced post-processing (reverb, delay, etc.) | |
| - Multiple model support (switch between MusicGen variants) | |
| - Batch generation | |
| - Playlist creation | |
| - Social features (sharing, likes) | |