Spaces:

OnyxMunk
/

AudioForge

Build error

File size: 5,121 Bytes

09fa60b

# AudioForge Architecture

## Overview

AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.

## System Architecture

```

┌─────────────┐

│   Frontend   │ (Next.js + React)

│  Port 3000   │

└──────┬───────┘

       │ HTTP/REST

       ▼

┌─────────────┐

│   Backend    │ (FastAPI)

│  Port 8000   │

└──────┬───────┘

       │

       ├──► PostgreSQL (Metadata Storage)

       ├──► Redis (Caching)

       └──► Storage (Audio Files)

```

## Generation Pipeline

### Stage 1: Prompt Understanding
- **Service**: `PromptUnderstandingService`
- **Purpose**: Analyze user prompt to extract:
  - Musical style/genre
  - Tempo/BPM
  - Mood
  - Instrumentation hints
  - Lyrics (if provided)
  - Duration preferences
- **Output**: Enriched prompt with metadata

### Stage 2: Music Generation
- **Service**: `MusicGenerationService`
- **Model**: Meta MusicGen (via AudioCraft)
- **Purpose**: Generate instrumental music track
- **Output**: WAV file with instrumental track

### Stage 3: Vocal Generation (Optional)
- **Service**: `VocalGenerationService`
- **Model**: Bark or XTTS
- **Purpose**: Generate vocals from lyrics
- **Output**: WAV file with vocals

### Stage 4: Mixing
- **Service**: `PostProcessingService`
- **Purpose**: Mix instrumental and vocal tracks
- **Output**: Mixed audio file

### Stage 5: Post-Processing/Mastering
- **Service**: `PostProcessingService`
- **Purpose**: Apply compression, EQ, normalization
- **Output**: Final mastered audio file

### Stage 6: Metadata Storage
- **Service**: Database layer
- **Purpose**: Store generation metadata, paths, status
- **Output**: Database record

## Technology Stack

### Backend
- **Framework**: FastAPI (async Python)
- **Database**: PostgreSQL with SQLAlchemy async
- **Caching**: Redis
- **ML Framework**: PyTorch
- **Music Models**: 
  - MusicGen (Meta AudioCraft)
  - Bark (for vocals)
- **Audio Processing**: librosa, soundfile, scipy

### Frontend
- **Framework**: Next.js 14+ (App Router)
- **Language**: TypeScript (strict mode)
- **Styling**: Tailwind CSS
- **UI Components**: Radix UI primitives
- **State Management**: React Query + Zustand
- **Forms**: React Hook Form + Zod

### Observability
- **Logging**: structlog (structured JSON logs)
- **Metrics**: Prometheus
- **Tracing**: OpenTelemetry (optional)

## Data Flow

1. User submits prompt via frontend
2. Frontend sends POST to `/api/v1/generations`
3. Backend creates generation record (status: pending)
4. Background task starts processing
5. Pipeline executes stages 1-6
6. Frontend polls `/api/v1/generations/{id}` for status
7. On completion, audio available at `/api/v1/generations/{id}/audio`

## Database Schema

### Generations Table
- `id`: UUID (primary key)
- `prompt`: Text (user input)
- `lyrics`: Text (optional)
- `style`: String (extracted style)
- `duration`: Integer (seconds)
- `status`: String (pending/processing/completed/failed)
- `audio_path`: String (final audio file path)
- `instrumental_path`: String (instrumental track path)
- `vocal_path`: String (vocal track path, if applicable)
- `metadata`: JSON (analysis results, etc.)
- `created_at`, `updated_at`, `completed_at`: Timestamps
- `error_message`: Text (if failed)
- `processing_time_seconds`: Float

## API Endpoints

### Generations
- `POST /api/v1/generations` - Create generation
- `GET /api/v1/generations/{id}` - Get generation status
- `GET /api/v1/generations/{id}/audio` - Download audio
- `GET /api/v1/generations` - List generations (paginated)

## Configuration

All configuration via environment variables (see `.env.example`):

- Database connection
- Redis connection
- Model paths and devices (CPU/CUDA)
- Storage paths
- Logging levels
- Feature flags

## Scalability Considerations

- **Horizontal Scaling**: Stateless API, can run multiple instances
- **Queue System**: Background tasks can be moved to Celery/RQ
- **Model Serving**: Models can be served separately via TorchServe
- **Storage**: Audio files can be stored in S3/object storage
- **Caching**: Redis caches prompt analysis results

## Security

- Input validation via Pydantic schemas
- SQL injection prevention via SQLAlchemy ORM
- CORS configuration
- Rate limiting (to be added)
- Authentication (to be added)

## Performance Optimizations

- Async/await throughout
- Model lazy loading
- Background task processing
- Connection pooling (database, Redis)
- Audio file streaming

## Future Enhancements

- User authentication & authorization
- Rate limiting
- WebSocket for real-time updates
- Advanced post-processing (reverb, delay, etc.)
- Multiple model support (switch between MusicGen variants)
- Batch generation
- Playlist creation
- Social features (sharing, likes)