Spaces:
Build error
Build error
AudioForge Agent Architecture
Problem Statement
Python 3.13 compatibility issues with ML libraries (PyTorch, AudioCraft, xformers) that only support Python 3.11/3.12.
Solution: Microservices Agent Architecture
Instead of monolithic deployment, separate concerns into independent agents.
Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client (Browser) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js - Port 3000) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Main API Service (FastAPI - Python 3.13) β
β - User management, authentication β
β - Database operations (PostgreSQL) β
β - Job orchestration β
β - WebSocket for real-time updates β
β Port: 8001 β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Message Queue (Redis/Celery) β
β - Task distribution β
β - Job status tracking β
β - Result aggregation β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ¬ββββββββββββββ
βΌ βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Music Agent β β Vocal Agent β β Mixing Agent β β Master Agent β
β Python 3.11 β β Python 3.11 β β Python 3.11 β β Python 3.11 β
β Port: 8002 β β Port: 8003 β β Port: 8004 β β Port: 8005 β
β β β β β β β β
β - MusicGen β β - Bark β β - Demucs β β - Mastering β
β - AudioCraft β β - RVC β β - Mixing β β - Effects β
β - Encodec β β - TTS β β - Stems β β - Normalize β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
Benefits
1. Dependency Isolation
- Each agent has its own Python version
- No version conflicts between packages
- Easy to update individual components
2. Scalability
- Scale agents independently based on load
- Music generation heavy? Spin up more music agents
- Horizontal scaling per service
3. Fault Tolerance
- If one agent crashes, others continue working
- Retry failed tasks automatically
- Graceful degradation
4. Development Velocity
- Teams can work on different agents independently
- Deploy agents separately
- Test in isolation
5. Resource Optimization
- GPU allocation per agent type
- CPU-only agents for lightweight tasks
- Memory limits per service
Implementation Plan
Phase 1: Create Agent Services (Week 1)
Music Generation Agent (
agents/music/)- Python 3.11 environment
- FastAPI service on port 8002
- Endpoints:
/generate,/status,/health - Dependencies: torch, audiocraft, transformers
Vocal Generation Agent (
agents/vocal/)- Python 3.11 environment
- FastAPI service on port 8003
- Endpoints:
/generate,/status,/health - Dependencies: bark, RVC, TTS libraries
Post-Processing Agent (
agents/processing/)- Python 3.11 environment
- FastAPI service on port 8004
- Endpoints:
/mix,/separate,/master,/health - Dependencies: demucs, librosa, pydub
Phase 2: Update Main API (Week 1-2)
Orchestrator Service (
backend/app/services/orchestrator.py)- Manages workflow across agents
- Handles task distribution
- Aggregates results
- Error handling and retries
Agent Communication (
backend/app/clients/)- HTTP clients for each agent
- Async/await for non-blocking calls
- Circuit breaker pattern
- Health checks
Phase 3: Message Queue Integration (Week 2)
Celery Tasks (
backend/app/tasks/)- Background job processing
- Task routing to appropriate agents
- Result callbacks
- Progress tracking
Redis Integration
- Job queue management
- Status updates
- Caching
- Pub/Sub for real-time updates
Phase 4: Docker Compose (Week 2-3)
version: '3.8'
services:
# Main API - Python 3.13
api:
build: ./backend
ports: ["8001:8001"]
depends_on: [postgres, redis]
# Music Agent - Python 3.11
music-agent:
build: ./agents/music
ports: ["8002:8002"]
environment:
- PYTHON_VERSION=3.11
- TORCH_VERSION=2.1.0
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# Vocal Agent - Python 3.11
vocal-agent:
build: ./agents/vocal
ports: ["8003:8003"]
# Processing Agent - Python 3.11
processing-agent:
build: ./agents/processing
ports: ["8004:8004"]
# Infrastructure
postgres:
image: postgres:16-alpine
redis:
image: redis:7-alpine
# Celery Workers
celery-worker:
build: ./backend
command: celery -A app.tasks worker --loglevel=info
depends_on: [redis]
API Contract Example
Main API β Music Agent
Request:
POST http://localhost:8002/generate
{
"prompt": "Epic orchestral soundtrack",
"duration": 30,
"model": "facebook/musicgen-medium",
"temperature": 1.0,
"top_k": 250,
"callback_url": "http://api:8001/callbacks/generation/123"
}
Response:
{
"task_id": "music_gen_abc123",
"status": "processing",
"estimated_time": 45
}
Callback (when complete):
POST http://api:8001/callbacks/generation/123
{
"task_id": "music_gen_abc123",
"status": "completed",
"audio_path": "/storage/audio/music/abc123.wav",
"metadata": {
"duration": 30.5,
"sample_rate": 32000,
"model": "facebook/musicgen-medium"
}
}
Migration Path
Option A: Gradual Migration (Recommended)
- Keep existing monolithic service running
- Deploy music agent alongside
- Route new requests to agent
- Monitor and validate
- Migrate other services one by one
- Deprecate monolithic service
Option B: Big Bang Migration
- Build all agents
- Test thoroughly in staging
- Switch over in one deployment
- Higher risk, faster completion
Monitoring & Observability
Metrics to Track
- Request latency per agent
- Success/failure rates
- Queue depth
- Agent health status
- Resource utilization (CPU/GPU/Memory)
- Generation time per model
Tools
- Prometheus for metrics
- Grafana for dashboards
- Jaeger for distributed tracing
- Structlog for centralized logging
Cost Considerations
Infrastructure
- Current: 1 server with all dependencies
- Agent: Multiple smaller services
- Savings: Scale only what you need
Development
- Initial: Higher (build agents)
- Ongoing: Lower (easier maintenance)
- Team: Can parallelize work
Alternative: Subprocess Approach
If full microservices is too heavy, consider:
# backend/app/services/music_generation.py
import subprocess
import json
class MusicGenerationService:
def __init__(self):
self.python311 = "C:/Python311/python.exe"
self.agent_script = "./agents/music_agent.py"
async def generate(self, prompt: str, duration: int):
# Call Python 3.11 subprocess
result = subprocess.run([
self.python311,
self.agent_script,
"--prompt", prompt,
"--duration", str(duration)
], capture_output=True, text=True)
return json.loads(result.stdout)
Pros: Simpler, no network overhead Cons: Harder to scale, less fault-tolerant
Recommendation
Start with Agent Architecture because:
- β Solves Python version issues permanently
- β Better scalability for future growth
- β Industry standard for ML services
- β Easier to add new models/features
- β Better resource utilization
- β Aligns with modern cloud-native patterns
Next Steps
- Create
agents/directory structure - Build Music Agent first (highest priority)
- Update orchestrator to call agent
- Test end-to-end workflow
- Deploy to staging
- Monitor and iterate
Timeline Estimate
- Week 1: Music Agent + Orchestrator updates
- Week 2: Vocal & Processing Agents + Celery
- Week 3: Docker Compose + Testing
- Week 4: Production deployment + Monitoring
Total: 3-4 weeks for full implementation