AudioForge / ARCHITECTURE.md
OnyxlMunkey's picture
c618549
# AudioForge Architecture
## Overview
AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.
## System Architecture
```
┌─────────────┐
│ Frontend │ (Next.js + React)
│ Port 3000 │
└──────┬───────┘
│ HTTP/REST
┌─────────────┐
│ Backend │ (FastAPI)
│ Port 8000 │
└──────┬───────┘
├──► PostgreSQL (Metadata Storage)
├──► Redis (Caching)
└──► Storage (Audio Files)
```
## Generation Pipeline
### Stage 1: Prompt Understanding
- **Service**: `PromptUnderstandingService`
- **Purpose**: Analyze user prompt to extract:
- Musical style/genre
- Tempo/BPM
- Mood
- Instrumentation hints
- Lyrics (if provided)
- Duration preferences
- **Output**: Enriched prompt with metadata
### Stage 2: Music Generation
- **Service**: `MusicGenerationService`
- **Model**: Meta MusicGen (via AudioCraft)
- **Purpose**: Generate instrumental music track
- **Output**: WAV file with instrumental track
### Stage 3: Vocal Generation (Optional)
- **Service**: `VocalGenerationService`
- **Model**: Bark or XTTS
- **Purpose**: Generate vocals from lyrics
- **Output**: WAV file with vocals
### Stage 4: Mixing
- **Service**: `PostProcessingService`
- **Purpose**: Mix instrumental and vocal tracks
- **Output**: Mixed audio file
### Stage 5: Post-Processing/Mastering
- **Service**: `PostProcessingService`
- **Purpose**: Apply compression, EQ, normalization
- **Output**: Final mastered audio file
### Stage 6: Metadata Storage
- **Service**: Database layer
- **Purpose**: Store generation metadata, paths, status
- **Output**: Database record
## Technology Stack
### Backend
- **Framework**: FastAPI (async Python)
- **Database**: PostgreSQL with SQLAlchemy async
- **Caching**: Redis
- **ML Framework**: PyTorch
- **Music Models**:
- MusicGen (Meta AudioCraft)
- Bark (for vocals)
- **Audio Processing**: librosa, soundfile, scipy
### Frontend
- **Framework**: Next.js 14+ (App Router)
- **Language**: TypeScript (strict mode)
- **Styling**: Tailwind CSS
- **UI Components**: Radix UI primitives
- **State Management**: React Query + Zustand
- **Forms**: React Hook Form + Zod
### Observability
- **Logging**: structlog (structured JSON logs)
- **Metrics**: Prometheus
- **Tracing**: OpenTelemetry (optional)
## Data Flow
1. User submits prompt via frontend
2. Frontend sends POST to `/api/v1/generations`
3. Backend creates generation record (status: pending)
4. Background task starts processing
5. Pipeline executes stages 1-6
6. Frontend polls `/api/v1/generations/{id}` for status
7. On completion, audio available at `/api/v1/generations/{id}/audio`
## Database Schema
### Generations Table
- `id`: UUID (primary key)
- `prompt`: Text (user input)
- `lyrics`: Text (optional)
- `style`: String (extracted style)
- `duration`: Integer (seconds)
- `status`: String (pending/processing/completed/failed)
- `audio_path`: String (final audio file path)
- `instrumental_path`: String (instrumental track path)
- `vocal_path`: String (vocal track path, if applicable)
- `metadata`: JSON (analysis results, etc.)
- `created_at`, `updated_at`, `completed_at`: Timestamps
- `error_message`: Text (if failed)
- `processing_time_seconds`: Float
## API Endpoints
### Generations
- `POST /api/v1/generations` - Create generation
- `GET /api/v1/generations/{id}` - Get generation status
- `GET /api/v1/generations/{id}/audio` - Download audio
- `GET /api/v1/generations` - List generations (paginated)
## Configuration
All configuration via environment variables (see `.env.example`):
- Database connection
- Redis connection
- Model paths and devices (CPU/CUDA)
- Storage paths
- Logging levels
- Feature flags
## Scalability Considerations
- **Horizontal Scaling**: Stateless API, can run multiple instances
- **Queue System**: Background tasks can be moved to Celery/RQ
- **Model Serving**: Models can be served separately via TorchServe
- **Storage**: Audio files can be stored in S3/object storage
- **Caching**: Redis caches prompt analysis results
## Security
- Input validation via Pydantic schemas
- SQL injection prevention via SQLAlchemy ORM
- CORS configuration
- Rate limiting (to be added)
- Authentication (to be added)
## Performance Optimizations
- Async/await throughout
- Model lazy loading
- Background task processing
- Connection pooling (database, Redis)
- Audio file streaming
## Future Enhancements
- User authentication & authorization
- Rate limiting
- WebSocket for real-time updates
- Advanced post-processing (reverb, delay, etc.)
- Multiple model support (switch between MusicGen variants)
- Batch generation
- Playlist creation
- Social features (sharing, likes)