Spaces:

OnyxMunk
/

AudioForge

Build error

App Files Files Community

AudioForge / ARCHITECTURE.md

OnyxlMunkey

c618549 15 days ago

preview code

raw

history blame contribute delete

5.12 kB

AudioForge Architecture

Overview

AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.

System Architecture

┌─────────────┐
│   Frontend   │ (Next.js + React)
│  Port 3000   │
└──────┬───────┘
       │ HTTP/REST
       ▼
┌─────────────┐
│   Backend    │ (FastAPI)
│  Port 8000   │
└──────┬───────┘
       │
       ├──► PostgreSQL (Metadata Storage)
       ├──► Redis (Caching)
       └──► Storage (Audio Files)

Generation Pipeline

Stage 1: Prompt Understanding

Service: PromptUnderstandingService
Purpose: Analyze user prompt to extract:
- Musical style/genre
- Tempo/BPM
- Mood
- Instrumentation hints
- Lyrics (if provided)
- Duration preferences
Output: Enriched prompt with metadata

Stage 2: Music Generation

Service: MusicGenerationService
Model: Meta MusicGen (via AudioCraft)
Purpose: Generate instrumental music track
Output: WAV file with instrumental track

Stage 3: Vocal Generation (Optional)

Service: VocalGenerationService
Model: Bark or XTTS
Purpose: Generate vocals from lyrics
Output: WAV file with vocals

Stage 4: Mixing

Service: PostProcessingService
Purpose: Mix instrumental and vocal tracks
Output: Mixed audio file

Stage 5: Post-Processing/Mastering

Service: PostProcessingService
Purpose: Apply compression, EQ, normalization
Output: Final mastered audio file

Stage 6: Metadata Storage

Service: Database layer
Purpose: Store generation metadata, paths, status
Output: Database record

Technology Stack

Backend

Framework: FastAPI (async Python)
Database: PostgreSQL with SQLAlchemy async
Caching: Redis
ML Framework: PyTorch
Music Models:
- MusicGen (Meta AudioCraft)
- Bark (for vocals)
Audio Processing: librosa, soundfile, scipy

Frontend

Framework: Next.js 14+ (App Router)
Language: TypeScript (strict mode)
Styling: Tailwind CSS
UI Components: Radix UI primitives
State Management: React Query + Zustand
Forms: React Hook Form + Zod

Observability

Logging: structlog (structured JSON logs)
Metrics: Prometheus
Tracing: OpenTelemetry (optional)

Data Flow

User submits prompt via frontend
Frontend sends POST to /api/v1/generations
Backend creates generation record (status: pending)
Background task starts processing
Pipeline executes stages 1-6
Frontend polls /api/v1/generations/{id} for status
On completion, audio available at /api/v1/generations/{id}/audio

Database Schema

Generations Table

id: UUID (primary key)
prompt: Text (user input)
lyrics: Text (optional)
style: String (extracted style)
duration: Integer (seconds)
status: String (pending/processing/completed/failed)
audio_path: String (final audio file path)
instrumental_path: String (instrumental track path)
vocal_path: String (vocal track path, if applicable)
metadata: JSON (analysis results, etc.)
created_at, updated_at, completed_at: Timestamps
error_message: Text (if failed)
processing_time_seconds: Float

API Endpoints

Generations

POST /api/v1/generations - Create generation
GET /api/v1/generations/{id} - Get generation status
GET /api/v1/generations/{id}/audio - Download audio
GET /api/v1/generations - List generations (paginated)

Configuration

All configuration via environment variables (see .env.example):

Database connection
Redis connection
Model paths and devices (CPU/CUDA)
Storage paths
Logging levels
Feature flags

Scalability Considerations

Horizontal Scaling: Stateless API, can run multiple instances
Queue System: Background tasks can be moved to Celery/RQ
Model Serving: Models can be served separately via TorchServe
Storage: Audio files can be stored in S3/object storage
Caching: Redis caches prompt analysis results

Security

Input validation via Pydantic schemas
SQL injection prevention via SQLAlchemy ORM
CORS configuration
Rate limiting (to be added)
Authentication (to be added)

Performance Optimizations

Async/await throughout
Model lazy loading
Background task processing
Connection pooling (database, Redis)
Audio file streaming

Future Enhancements

User authentication & authorization
Rate limiting
WebSocket for real-time updates
Advanced post-processing (reverb, delay, etc.)
Multiple model support (switch between MusicGen variants)
Batch generation
Playlist creation
Social features (sharing, likes)