Spaces:
Build error
Build error
AudioForge Architecture
Overview
AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.
System Architecture
βββββββββββββββ
β Frontend β (Next.js + React)
β Port 3000 β
ββββββββ¬ββββββββ
β HTTP/REST
βΌ
βββββββββββββββ
β Backend β (FastAPI)
β Port 8000 β
ββββββββ¬ββββββββ
β
ββββΊ PostgreSQL (Metadata Storage)
ββββΊ Redis (Caching)
ββββΊ Storage (Audio Files)
Generation Pipeline
Stage 1: Prompt Understanding
- Service:
PromptUnderstandingService - Purpose: Analyze user prompt to extract:
- Musical style/genre
- Tempo/BPM
- Mood
- Instrumentation hints
- Lyrics (if provided)
- Duration preferences
- Output: Enriched prompt with metadata
Stage 2: Music Generation
- Service:
MusicGenerationService - Model: Meta MusicGen (via AudioCraft)
- Purpose: Generate instrumental music track
- Output: WAV file with instrumental track
Stage 3: Vocal Generation (Optional)
- Service:
VocalGenerationService - Model: Bark or XTTS
- Purpose: Generate vocals from lyrics
- Output: WAV file with vocals
Stage 4: Mixing
- Service:
PostProcessingService - Purpose: Mix instrumental and vocal tracks
- Output: Mixed audio file
Stage 5: Post-Processing/Mastering
- Service:
PostProcessingService - Purpose: Apply compression, EQ, normalization
- Output: Final mastered audio file
Stage 6: Metadata Storage
- Service: Database layer
- Purpose: Store generation metadata, paths, status
- Output: Database record
Technology Stack
Backend
- Framework: FastAPI (async Python)
- Database: PostgreSQL with SQLAlchemy async
- Caching: Redis
- ML Framework: PyTorch
- Music Models:
- MusicGen (Meta AudioCraft)
- Bark (for vocals)
- Audio Processing: librosa, soundfile, scipy
Frontend
- Framework: Next.js 14+ (App Router)
- Language: TypeScript (strict mode)
- Styling: Tailwind CSS
- UI Components: Radix UI primitives
- State Management: React Query + Zustand
- Forms: React Hook Form + Zod
Observability
- Logging: structlog (structured JSON logs)
- Metrics: Prometheus
- Tracing: OpenTelemetry (optional)
Data Flow
- User submits prompt via frontend
- Frontend sends POST to
/api/v1/generations - Backend creates generation record (status: pending)
- Background task starts processing
- Pipeline executes stages 1-6
- Frontend polls
/api/v1/generations/{id}for status - On completion, audio available at
/api/v1/generations/{id}/audio
Database Schema
Generations Table
id: UUID (primary key)prompt: Text (user input)lyrics: Text (optional)style: String (extracted style)duration: Integer (seconds)status: String (pending/processing/completed/failed)audio_path: String (final audio file path)instrumental_path: String (instrumental track path)vocal_path: String (vocal track path, if applicable)metadata: JSON (analysis results, etc.)created_at,updated_at,completed_at: Timestampserror_message: Text (if failed)processing_time_seconds: Float
API Endpoints
Generations
POST /api/v1/generations- Create generationGET /api/v1/generations/{id}- Get generation statusGET /api/v1/generations/{id}/audio- Download audioGET /api/v1/generations- List generations (paginated)
Configuration
All configuration via environment variables (see .env.example):
- Database connection
- Redis connection
- Model paths and devices (CPU/CUDA)
- Storage paths
- Logging levels
- Feature flags
Scalability Considerations
- Horizontal Scaling: Stateless API, can run multiple instances
- Queue System: Background tasks can be moved to Celery/RQ
- Model Serving: Models can be served separately via TorchServe
- Storage: Audio files can be stored in S3/object storage
- Caching: Redis caches prompt analysis results
Security
- Input validation via Pydantic schemas
- SQL injection prevention via SQLAlchemy ORM
- CORS configuration
- Rate limiting (to be added)
- Authentication (to be added)
Performance Optimizations
- Async/await throughout
- Model lazy loading
- Background task processing
- Connection pooling (database, Redis)
- Audio file streaming
Future Enhancements
- User authentication & authorization
- Rate limiting
- WebSocket for real-time updates
- Advanced post-processing (reverb, delay, etc.)
- Multiple model support (switch between MusicGen variants)
- Batch generation
- Playlist creation
- Social features (sharing, likes)