Spaces:
Build error
Build error
File size: 5,121 Bytes
09fa60b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
# AudioForge Architecture
## Overview
AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.
## System Architecture
```
βββββββββββββββ
β Frontend β (Next.js + React)
β Port 3000 β
ββββββββ¬ββββββββ
β HTTP/REST
βΌ
βββββββββββββββ
β Backend β (FastAPI)
β Port 8000 β
ββββββββ¬ββββββββ
β
ββββΊ PostgreSQL (Metadata Storage)
ββββΊ Redis (Caching)
ββββΊ Storage (Audio Files)
```
## Generation Pipeline
### Stage 1: Prompt Understanding
- **Service**: `PromptUnderstandingService`
- **Purpose**: Analyze user prompt to extract:
- Musical style/genre
- Tempo/BPM
- Mood
- Instrumentation hints
- Lyrics (if provided)
- Duration preferences
- **Output**: Enriched prompt with metadata
### Stage 2: Music Generation
- **Service**: `MusicGenerationService`
- **Model**: Meta MusicGen (via AudioCraft)
- **Purpose**: Generate instrumental music track
- **Output**: WAV file with instrumental track
### Stage 3: Vocal Generation (Optional)
- **Service**: `VocalGenerationService`
- **Model**: Bark or XTTS
- **Purpose**: Generate vocals from lyrics
- **Output**: WAV file with vocals
### Stage 4: Mixing
- **Service**: `PostProcessingService`
- **Purpose**: Mix instrumental and vocal tracks
- **Output**: Mixed audio file
### Stage 5: Post-Processing/Mastering
- **Service**: `PostProcessingService`
- **Purpose**: Apply compression, EQ, normalization
- **Output**: Final mastered audio file
### Stage 6: Metadata Storage
- **Service**: Database layer
- **Purpose**: Store generation metadata, paths, status
- **Output**: Database record
## Technology Stack
### Backend
- **Framework**: FastAPI (async Python)
- **Database**: PostgreSQL with SQLAlchemy async
- **Caching**: Redis
- **ML Framework**: PyTorch
- **Music Models**:
- MusicGen (Meta AudioCraft)
- Bark (for vocals)
- **Audio Processing**: librosa, soundfile, scipy
### Frontend
- **Framework**: Next.js 14+ (App Router)
- **Language**: TypeScript (strict mode)
- **Styling**: Tailwind CSS
- **UI Components**: Radix UI primitives
- **State Management**: React Query + Zustand
- **Forms**: React Hook Form + Zod
### Observability
- **Logging**: structlog (structured JSON logs)
- **Metrics**: Prometheus
- **Tracing**: OpenTelemetry (optional)
## Data Flow
1. User submits prompt via frontend
2. Frontend sends POST to `/api/v1/generations`
3. Backend creates generation record (status: pending)
4. Background task starts processing
5. Pipeline executes stages 1-6
6. Frontend polls `/api/v1/generations/{id}` for status
7. On completion, audio available at `/api/v1/generations/{id}/audio`
## Database Schema
### Generations Table
- `id`: UUID (primary key)
- `prompt`: Text (user input)
- `lyrics`: Text (optional)
- `style`: String (extracted style)
- `duration`: Integer (seconds)
- `status`: String (pending/processing/completed/failed)
- `audio_path`: String (final audio file path)
- `instrumental_path`: String (instrumental track path)
- `vocal_path`: String (vocal track path, if applicable)
- `metadata`: JSON (analysis results, etc.)
- `created_at`, `updated_at`, `completed_at`: Timestamps
- `error_message`: Text (if failed)
- `processing_time_seconds`: Float
## API Endpoints
### Generations
- `POST /api/v1/generations` - Create generation
- `GET /api/v1/generations/{id}` - Get generation status
- `GET /api/v1/generations/{id}/audio` - Download audio
- `GET /api/v1/generations` - List generations (paginated)
## Configuration
All configuration via environment variables (see `.env.example`):
- Database connection
- Redis connection
- Model paths and devices (CPU/CUDA)
- Storage paths
- Logging levels
- Feature flags
## Scalability Considerations
- **Horizontal Scaling**: Stateless API, can run multiple instances
- **Queue System**: Background tasks can be moved to Celery/RQ
- **Model Serving**: Models can be served separately via TorchServe
- **Storage**: Audio files can be stored in S3/object storage
- **Caching**: Redis caches prompt analysis results
## Security
- Input validation via Pydantic schemas
- SQL injection prevention via SQLAlchemy ORM
- CORS configuration
- Rate limiting (to be added)
- Authentication (to be added)
## Performance Optimizations
- Async/await throughout
- Model lazy loading
- Background task processing
- Connection pooling (database, Redis)
- Audio file streaming
## Future Enhancements
- User authentication & authorization
- Rate limiting
- WebSocket for real-time updates
- Advanced post-processing (reverb, delay, etc.)
- Multiple model support (switch between MusicGen variants)
- Batch generation
- Playlist creation
- Social features (sharing, likes)
|