AudioForge / ARCHITECTURE.md
OnyxlMunkey's picture
c618549

AudioForge Architecture

Overview

AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend   β”‚ (Next.js + React)
β”‚  Port 3000   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚ HTTP/REST
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Backend    β”‚ (FastAPI)
β”‚  Port 8000   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β”œβ”€β”€β–Ί PostgreSQL (Metadata Storage)
       β”œβ”€β”€β–Ί Redis (Caching)
       └──► Storage (Audio Files)

Generation Pipeline

Stage 1: Prompt Understanding

  • Service: PromptUnderstandingService
  • Purpose: Analyze user prompt to extract:
    • Musical style/genre
    • Tempo/BPM
    • Mood
    • Instrumentation hints
    • Lyrics (if provided)
    • Duration preferences
  • Output: Enriched prompt with metadata

Stage 2: Music Generation

  • Service: MusicGenerationService
  • Model: Meta MusicGen (via AudioCraft)
  • Purpose: Generate instrumental music track
  • Output: WAV file with instrumental track

Stage 3: Vocal Generation (Optional)

  • Service: VocalGenerationService
  • Model: Bark or XTTS
  • Purpose: Generate vocals from lyrics
  • Output: WAV file with vocals

Stage 4: Mixing

  • Service: PostProcessingService
  • Purpose: Mix instrumental and vocal tracks
  • Output: Mixed audio file

Stage 5: Post-Processing/Mastering

  • Service: PostProcessingService
  • Purpose: Apply compression, EQ, normalization
  • Output: Final mastered audio file

Stage 6: Metadata Storage

  • Service: Database layer
  • Purpose: Store generation metadata, paths, status
  • Output: Database record

Technology Stack

Backend

  • Framework: FastAPI (async Python)
  • Database: PostgreSQL with SQLAlchemy async
  • Caching: Redis
  • ML Framework: PyTorch
  • Music Models:
    • MusicGen (Meta AudioCraft)
    • Bark (for vocals)
  • Audio Processing: librosa, soundfile, scipy

Frontend

  • Framework: Next.js 14+ (App Router)
  • Language: TypeScript (strict mode)
  • Styling: Tailwind CSS
  • UI Components: Radix UI primitives
  • State Management: React Query + Zustand
  • Forms: React Hook Form + Zod

Observability

  • Logging: structlog (structured JSON logs)
  • Metrics: Prometheus
  • Tracing: OpenTelemetry (optional)

Data Flow

  1. User submits prompt via frontend
  2. Frontend sends POST to /api/v1/generations
  3. Backend creates generation record (status: pending)
  4. Background task starts processing
  5. Pipeline executes stages 1-6
  6. Frontend polls /api/v1/generations/{id} for status
  7. On completion, audio available at /api/v1/generations/{id}/audio

Database Schema

Generations Table

  • id: UUID (primary key)
  • prompt: Text (user input)
  • lyrics: Text (optional)
  • style: String (extracted style)
  • duration: Integer (seconds)
  • status: String (pending/processing/completed/failed)
  • audio_path: String (final audio file path)
  • instrumental_path: String (instrumental track path)
  • vocal_path: String (vocal track path, if applicable)
  • metadata: JSON (analysis results, etc.)
  • created_at, updated_at, completed_at: Timestamps
  • error_message: Text (if failed)
  • processing_time_seconds: Float

API Endpoints

Generations

  • POST /api/v1/generations - Create generation
  • GET /api/v1/generations/{id} - Get generation status
  • GET /api/v1/generations/{id}/audio - Download audio
  • GET /api/v1/generations - List generations (paginated)

Configuration

All configuration via environment variables (see .env.example):

  • Database connection
  • Redis connection
  • Model paths and devices (CPU/CUDA)
  • Storage paths
  • Logging levels
  • Feature flags

Scalability Considerations

  • Horizontal Scaling: Stateless API, can run multiple instances
  • Queue System: Background tasks can be moved to Celery/RQ
  • Model Serving: Models can be served separately via TorchServe
  • Storage: Audio files can be stored in S3/object storage
  • Caching: Redis caches prompt analysis results

Security

  • Input validation via Pydantic schemas
  • SQL injection prevention via SQLAlchemy ORM
  • CORS configuration
  • Rate limiting (to be added)
  • Authentication (to be added)

Performance Optimizations

  • Async/await throughout
  • Model lazy loading
  • Background task processing
  • Connection pooling (database, Redis)
  • Audio file streaming

Future Enhancements

  • User authentication & authorization
  • Rate limiting
  • WebSocket for real-time updates
  • Advanced post-processing (reverb, delay, etc.)
  • Multiple model support (switch between MusicGen variants)
  • Batch generation
  • Playlist creation
  • Social features (sharing, likes)