Spaces:

OnyxMunk
/

AudioForge

Build error

App Files Files Community

AudioForge / ARCHITECTURE.md

OnyxlMunkey

c618549 15 days ago

preview code

raw

history blame contribute delete

5.12 kB

	# AudioForge Architecture

	## Overview

	AudioForge is a production-ready, open-source music generation platform inspired by Suno. It uses a multi-stage pipeline to generate music from text descriptions.

	## System Architecture

	```
	┌─────────────┐
	│ Frontend │ (Next.js + React)
	│ Port 3000 │
	└──────┬───────┘
	│ HTTP/REST
	▼
	┌─────────────┐
	│ Backend │ (FastAPI)
	│ Port 8000 │
	└──────┬───────┘
	│
	├──► PostgreSQL (Metadata Storage)
	├──► Redis (Caching)
	└──► Storage (Audio Files)
	```

	## Generation Pipeline

	### Stage 1: Prompt Understanding
	- Service: `PromptUnderstandingService`
	- Purpose: Analyze user prompt to extract:
	- Musical style/genre
	- Tempo/BPM
	- Mood
	- Instrumentation hints
	- Lyrics (if provided)
	- Duration preferences
	- Output: Enriched prompt with metadata

	### Stage 2: Music Generation
	- Service: `MusicGenerationService`
	- Model: Meta MusicGen (via AudioCraft)
	- Purpose: Generate instrumental music track
	- Output: WAV file with instrumental track

	### Stage 3: Vocal Generation (Optional)
	- Service: `VocalGenerationService`
	- Model: Bark or XTTS
	- Purpose: Generate vocals from lyrics
	- Output: WAV file with vocals

	### Stage 4: Mixing
	- Service: `PostProcessingService`
	- Purpose: Mix instrumental and vocal tracks
	- Output: Mixed audio file

	### Stage 5: Post-Processing/Mastering
	- Service: `PostProcessingService`
	- Purpose: Apply compression, EQ, normalization
	- Output: Final mastered audio file

	### Stage 6: Metadata Storage
	- Service: Database layer
	- Purpose: Store generation metadata, paths, status
	- Output: Database record

	## Technology Stack

	### Backend
	- Framework: FastAPI (async Python)
	- Database: PostgreSQL with SQLAlchemy async
	- Caching: Redis
	- ML Framework: PyTorch
	- Music Models:
	- MusicGen (Meta AudioCraft)
	- Bark (for vocals)
	- Audio Processing: librosa, soundfile, scipy

	### Frontend
	- Framework: Next.js 14+ (App Router)
	- Language: TypeScript (strict mode)
	- Styling: Tailwind CSS
	- UI Components: Radix UI primitives
	- State Management: React Query + Zustand
	- Forms: React Hook Form + Zod

	### Observability
	- Logging: structlog (structured JSON logs)
	- Metrics: Prometheus
	- Tracing: OpenTelemetry (optional)

	## Data Flow

	1. User submits prompt via frontend
	2. Frontend sends POST to `/api/v1/generations`
	3. Backend creates generation record (status: pending)
	4. Background task starts processing
	5. Pipeline executes stages 1-6
	6. Frontend polls `/api/v1/generations/{id}` for status
	7. On completion, audio available at `/api/v1/generations/{id}/audio`

	## Database Schema

	### Generations Table
	- `id`: UUID (primary key)
	- `prompt`: Text (user input)
	- `lyrics`: Text (optional)
	- `style`: String (extracted style)
	- `duration`: Integer (seconds)
	- `status`: String (pending/processing/completed/failed)
	- `audio_path`: String (final audio file path)
	- `instrumental_path`: String (instrumental track path)
	- `vocal_path`: String (vocal track path, if applicable)
	- `metadata`: JSON (analysis results, etc.)
	- `created_at`, `updated_at`, `completed_at`: Timestamps
	- `error_message`: Text (if failed)
	- `processing_time_seconds`: Float

	## API Endpoints

	### Generations
	- `POST /api/v1/generations` - Create generation
	- `GET /api/v1/generations/{id}` - Get generation status
	- `GET /api/v1/generations/{id}/audio` - Download audio
	- `GET /api/v1/generations` - List generations (paginated)

	## Configuration

	All configuration via environment variables (see `.env.example`):

	- Database connection
	- Redis connection
	- Model paths and devices (CPU/CUDA)
	- Storage paths
	- Logging levels
	- Feature flags

	## Scalability Considerations

	- Horizontal Scaling: Stateless API, can run multiple instances
	- Queue System: Background tasks can be moved to Celery/RQ
	- Model Serving: Models can be served separately via TorchServe
	- Storage: Audio files can be stored in S3/object storage
	- Caching: Redis caches prompt analysis results

	## Security

	- Input validation via Pydantic schemas
	- SQL injection prevention via SQLAlchemy ORM
	- CORS configuration
	- Rate limiting (to be added)
	- Authentication (to be added)

	## Performance Optimizations

	- Async/await throughout
	- Model lazy loading
	- Background task processing
	- Connection pooling (database, Redis)
	- Audio file streaming

	## Future Enhancements

	- User authentication & authorization
	- Rate limiting
	- WebSocket for real-time updates
	- Advanced post-processing (reverb, delay, etc.)
	- Multiple model support (switch between MusicGen variants)
	- Batch generation
	- Playlist creation
	- Social features (sharing, likes)