Spaces:
Build error
Build error
| # AudioForge Agent Architecture | |
| ## Problem Statement | |
| Python 3.13 compatibility issues with ML libraries (PyTorch, AudioCraft, xformers) that only support Python 3.11/3.12. | |
| ## Solution: Microservices Agent Architecture | |
| Instead of monolithic deployment, separate concerns into independent agents. | |
| ## Architecture Overview | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Client (Browser) │ | |
| └─────────────────────┬───────────────────────────────────────┘ | |
| │ | |
| ▼ | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Frontend (Next.js - Port 3000) │ | |
| └─────────────────────┬───────────────────────────────────────┘ | |
| │ | |
| ▼ | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Main API Service (FastAPI - Python 3.13) │ | |
| │ - User management, authentication │ | |
| │ - Database operations (PostgreSQL) │ | |
| │ - Job orchestration │ | |
| │ - WebSocket for real-time updates │ | |
| │ Port: 8001 │ | |
| └─────────────────────┬───────────────────────────────────────┘ | |
| │ | |
| ▼ | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Message Queue (Redis/Celery) │ | |
| │ - Task distribution │ | |
| │ - Job status tracking │ | |
| │ - Result aggregation │ | |
| └─────────────────────┬───────────────────────────────────────┘ | |
| │ | |
| ┌─────────────┼─────────────┬─────────────┐ | |
| ▼ ▼ ▼ ▼ | |
| ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ | |
| │ Music Agent │ │ Vocal Agent │ │ Mixing Agent │ │ Master Agent │ | |
| │ Python 3.11 │ │ Python 3.11 │ │ Python 3.11 │ │ Python 3.11 │ | |
| │ Port: 8002 │ │ Port: 8003 │ │ Port: 8004 │ │ Port: 8005 │ | |
| │ │ │ │ │ │ │ │ | |
| │ - MusicGen │ │ - Bark │ │ - Demucs │ │ - Mastering │ | |
| │ - AudioCraft │ │ - RVC │ │ - Mixing │ │ - Effects │ | |
| │ - Encodec │ │ - TTS │ │ - Stems │ │ - Normalize │ | |
| └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ | |
| ``` | |
| ## Benefits | |
| ### 1. **Dependency Isolation** | |
| - Each agent has its own Python version | |
| - No version conflicts between packages | |
| - Easy to update individual components | |
| ### 2. **Scalability** | |
| - Scale agents independently based on load | |
| - Music generation heavy? Spin up more music agents | |
| - Horizontal scaling per service | |
| ### 3. **Fault Tolerance** | |
| - If one agent crashes, others continue working | |
| - Retry failed tasks automatically | |
| - Graceful degradation | |
| ### 4. **Development Velocity** | |
| - Teams can work on different agents independently | |
| - Deploy agents separately | |
| - Test in isolation | |
| ### 5. **Resource Optimization** | |
| - GPU allocation per agent type | |
| - CPU-only agents for lightweight tasks | |
| - Memory limits per service | |
| ## Implementation Plan | |
| ### Phase 1: Create Agent Services (Week 1) | |
| 1. **Music Generation Agent** (`agents/music/`) | |
| - Python 3.11 environment | |
| - FastAPI service on port 8002 | |
| - Endpoints: `/generate`, `/status`, `/health` | |
| - Dependencies: torch, audiocraft, transformers | |
| 2. **Vocal Generation Agent** (`agents/vocal/`) | |
| - Python 3.11 environment | |
| - FastAPI service on port 8003 | |
| - Endpoints: `/generate`, `/status`, `/health` | |
| - Dependencies: bark, RVC, TTS libraries | |
| 3. **Post-Processing Agent** (`agents/processing/`) | |
| - Python 3.11 environment | |
| - FastAPI service on port 8004 | |
| - Endpoints: `/mix`, `/separate`, `/master`, `/health` | |
| - Dependencies: demucs, librosa, pydub | |
| ### Phase 2: Update Main API (Week 1-2) | |
| 1. **Orchestrator Service** (`backend/app/services/orchestrator.py`) | |
| - Manages workflow across agents | |
| - Handles task distribution | |
| - Aggregates results | |
| - Error handling and retries | |
| 2. **Agent Communication** (`backend/app/clients/`) | |
| - HTTP clients for each agent | |
| - Async/await for non-blocking calls | |
| - Circuit breaker pattern | |
| - Health checks | |
| ### Phase 3: Message Queue Integration (Week 2) | |
| 1. **Celery Tasks** (`backend/app/tasks/`) | |
| - Background job processing | |
| - Task routing to appropriate agents | |
| - Result callbacks | |
| - Progress tracking | |
| 2. **Redis Integration** | |
| - Job queue management | |
| - Status updates | |
| - Caching | |
| - Pub/Sub for real-time updates | |
| ### Phase 4: Docker Compose (Week 2-3) | |
| ```yaml | |
| version: '3.8' | |
| services: | |
| # Main API - Python 3.13 | |
| api: | |
| build: ./backend | |
| ports: ["8001:8001"] | |
| depends_on: [postgres, redis] | |
| # Music Agent - Python 3.11 | |
| music-agent: | |
| build: ./agents/music | |
| ports: ["8002:8002"] | |
| environment: | |
| - PYTHON_VERSION=3.11 | |
| - TORCH_VERSION=2.1.0 | |
| deploy: | |
| resources: | |
| reservations: | |
| devices: | |
| - driver: nvidia | |
| count: 1 | |
| capabilities: [gpu] | |
| # Vocal Agent - Python 3.11 | |
| vocal-agent: | |
| build: ./agents/vocal | |
| ports: ["8003:8003"] | |
| # Processing Agent - Python 3.11 | |
| processing-agent: | |
| build: ./agents/processing | |
| ports: ["8004:8004"] | |
| # Infrastructure | |
| postgres: | |
| image: postgres:16-alpine | |
| redis: | |
| image: redis:7-alpine | |
| # Celery Workers | |
| celery-worker: | |
| build: ./backend | |
| command: celery -A app.tasks worker --loglevel=info | |
| depends_on: [redis] | |
| ``` | |
| ## API Contract Example | |
| ### Main API → Music Agent | |
| **Request:** | |
| ```json | |
| POST http://localhost:8002/generate | |
| { | |
| "prompt": "Epic orchestral soundtrack", | |
| "duration": 30, | |
| "model": "facebook/musicgen-medium", | |
| "temperature": 1.0, | |
| "top_k": 250, | |
| "callback_url": "http://api:8001/callbacks/generation/123" | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "task_id": "music_gen_abc123", | |
| "status": "processing", | |
| "estimated_time": 45 | |
| } | |
| ``` | |
| **Callback (when complete):** | |
| ```json | |
| POST http://api:8001/callbacks/generation/123 | |
| { | |
| "task_id": "music_gen_abc123", | |
| "status": "completed", | |
| "audio_path": "/storage/audio/music/abc123.wav", | |
| "metadata": { | |
| "duration": 30.5, | |
| "sample_rate": 32000, | |
| "model": "facebook/musicgen-medium" | |
| } | |
| } | |
| ``` | |
| ## Migration Path | |
| ### Option A: Gradual Migration (Recommended) | |
| 1. Keep existing monolithic service running | |
| 2. Deploy music agent alongside | |
| 3. Route new requests to agent | |
| 4. Monitor and validate | |
| 5. Migrate other services one by one | |
| 6. Deprecate monolithic service | |
| ### Option B: Big Bang Migration | |
| 1. Build all agents | |
| 2. Test thoroughly in staging | |
| 3. Switch over in one deployment | |
| 4. Higher risk, faster completion | |
| ## Monitoring & Observability | |
| ### Metrics to Track | |
| - Request latency per agent | |
| - Success/failure rates | |
| - Queue depth | |
| - Agent health status | |
| - Resource utilization (CPU/GPU/Memory) | |
| - Generation time per model | |
| ### Tools | |
| - Prometheus for metrics | |
| - Grafana for dashboards | |
| - Jaeger for distributed tracing | |
| - Structlog for centralized logging | |
| ## Cost Considerations | |
| ### Infrastructure | |
| - **Current:** 1 server with all dependencies | |
| - **Agent:** Multiple smaller services | |
| - **Savings:** Scale only what you need | |
| ### Development | |
| - **Initial:** Higher (build agents) | |
| - **Ongoing:** Lower (easier maintenance) | |
| - **Team:** Can parallelize work | |
| ## Alternative: Subprocess Approach | |
| If full microservices is too heavy, consider: | |
| ```python | |
| # backend/app/services/music_generation.py | |
| import subprocess | |
| import json | |
| class MusicGenerationService: | |
| def __init__(self): | |
| self.python311 = "C:/Python311/python.exe" | |
| self.agent_script = "./agents/music_agent.py" | |
| async def generate(self, prompt: str, duration: int): | |
| # Call Python 3.11 subprocess | |
| result = subprocess.run([ | |
| self.python311, | |
| self.agent_script, | |
| "--prompt", prompt, | |
| "--duration", str(duration) | |
| ], capture_output=True, text=True) | |
| return json.loads(result.stdout) | |
| ``` | |
| **Pros:** Simpler, no network overhead | |
| **Cons:** Harder to scale, less fault-tolerant | |
| ## Recommendation | |
| **Start with Agent Architecture** because: | |
| 1. ✅ Solves Python version issues permanently | |
| 2. ✅ Better scalability for future growth | |
| 3. ✅ Industry standard for ML services | |
| 4. ✅ Easier to add new models/features | |
| 5. ✅ Better resource utilization | |
| 6. ✅ Aligns with modern cloud-native patterns | |
| ## Next Steps | |
| 1. Create `agents/` directory structure | |
| 2. Build Music Agent first (highest priority) | |
| 3. Update orchestrator to call agent | |
| 4. Test end-to-end workflow | |
| 5. Deploy to staging | |
| 6. Monitor and iterate | |
| ## Timeline Estimate | |
| - **Week 1:** Music Agent + Orchestrator updates | |
| - **Week 2:** Vocal & Processing Agents + Celery | |
| - **Week 3:** Docker Compose + Testing | |
| - **Week 4:** Production deployment + Monitoring | |
| **Total:** 3-4 weeks for full implementation | |