File size: 5,121 Bytes
09fa60b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# AudioForge Architecture

## Overview

AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.

## System Architecture

```

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚   Frontend   β”‚ (Next.js + React)

β”‚  Port 3000   β”‚

β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜

       β”‚ HTTP/REST

       β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚   Backend    β”‚ (FastAPI)

β”‚  Port 8000   β”‚

β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜

       β”‚

       β”œβ”€β”€β–Ί PostgreSQL (Metadata Storage)

       β”œβ”€β”€β–Ί Redis (Caching)

       └──► Storage (Audio Files)

```

## Generation Pipeline

### Stage 1: Prompt Understanding
- **Service**: `PromptUnderstandingService`
- **Purpose**: Analyze user prompt to extract:
  - Musical style/genre
  - Tempo/BPM
  - Mood
  - Instrumentation hints
  - Lyrics (if provided)
  - Duration preferences
- **Output**: Enriched prompt with metadata

### Stage 2: Music Generation
- **Service**: `MusicGenerationService`
- **Model**: Meta MusicGen (via AudioCraft)
- **Purpose**: Generate instrumental music track
- **Output**: WAV file with instrumental track

### Stage 3: Vocal Generation (Optional)
- **Service**: `VocalGenerationService`
- **Model**: Bark or XTTS
- **Purpose**: Generate vocals from lyrics
- **Output**: WAV file with vocals

### Stage 4: Mixing
- **Service**: `PostProcessingService`
- **Purpose**: Mix instrumental and vocal tracks
- **Output**: Mixed audio file

### Stage 5: Post-Processing/Mastering
- **Service**: `PostProcessingService`
- **Purpose**: Apply compression, EQ, normalization
- **Output**: Final mastered audio file

### Stage 6: Metadata Storage
- **Service**: Database layer
- **Purpose**: Store generation metadata, paths, status
- **Output**: Database record

## Technology Stack

### Backend
- **Framework**: FastAPI (async Python)
- **Database**: PostgreSQL with SQLAlchemy async
- **Caching**: Redis
- **ML Framework**: PyTorch
- **Music Models**: 
  - MusicGen (Meta AudioCraft)
  - Bark (for vocals)
- **Audio Processing**: librosa, soundfile, scipy

### Frontend
- **Framework**: Next.js 14+ (App Router)
- **Language**: TypeScript (strict mode)
- **Styling**: Tailwind CSS
- **UI Components**: Radix UI primitives
- **State Management**: React Query + Zustand
- **Forms**: React Hook Form + Zod

### Observability
- **Logging**: structlog (structured JSON logs)
- **Metrics**: Prometheus
- **Tracing**: OpenTelemetry (optional)

## Data Flow

1. User submits prompt via frontend
2. Frontend sends POST to `/api/v1/generations`
3. Backend creates generation record (status: pending)
4. Background task starts processing
5. Pipeline executes stages 1-6
6. Frontend polls `/api/v1/generations/{id}` for status
7. On completion, audio available at `/api/v1/generations/{id}/audio`

## Database Schema

### Generations Table
- `id`: UUID (primary key)
- `prompt`: Text (user input)
- `lyrics`: Text (optional)
- `style`: String (extracted style)
- `duration`: Integer (seconds)
- `status`: String (pending/processing/completed/failed)
- `audio_path`: String (final audio file path)
- `instrumental_path`: String (instrumental track path)
- `vocal_path`: String (vocal track path, if applicable)
- `metadata`: JSON (analysis results, etc.)
- `created_at`, `updated_at`, `completed_at`: Timestamps
- `error_message`: Text (if failed)
- `processing_time_seconds`: Float

## API Endpoints

### Generations
- `POST /api/v1/generations` - Create generation
- `GET /api/v1/generations/{id}` - Get generation status
- `GET /api/v1/generations/{id}/audio` - Download audio
- `GET /api/v1/generations` - List generations (paginated)

## Configuration

All configuration via environment variables (see `.env.example`):

- Database connection
- Redis connection
- Model paths and devices (CPU/CUDA)
- Storage paths
- Logging levels
- Feature flags

## Scalability Considerations

- **Horizontal Scaling**: Stateless API, can run multiple instances
- **Queue System**: Background tasks can be moved to Celery/RQ
- **Model Serving**: Models can be served separately via TorchServe
- **Storage**: Audio files can be stored in S3/object storage
- **Caching**: Redis caches prompt analysis results

## Security

- Input validation via Pydantic schemas
- SQL injection prevention via SQLAlchemy ORM
- CORS configuration
- Rate limiting (to be added)
- Authentication (to be added)

## Performance Optimizations

- Async/await throughout
- Model lazy loading
- Background task processing
- Connection pooling (database, Redis)
- Audio file streaming

## Future Enhancements

- User authentication & authorization
- Rate limiting
- WebSocket for real-time updates
- Advanced post-processing (reverb, delay, etc.)
- Multiple model support (switch between MusicGen variants)
- Batch generation
- Playlist creation
- Social features (sharing, likes)