# Deployment Guide ## Launching DeepBoner: Gradio, MCP, & Modal --- ## Overview DeepBoner is designed for a multi-platform deployment strategy to maximize hackathon impact: 1. **HuggingFace Spaces**: Host the Gradio UI (User Interface). 2. **MCP Server**: Expose research tools to Claude Desktop/Agents. 3. **Modal (Optional)**: Run heavy inference or local LLMs if API costs are prohibitive. --- ## 1. HuggingFace Spaces (Gradio UI) **Goal**: A public URL where judges/users can try the research agent. ### Prerequisites - HuggingFace Account - `gradio` installed (`uv add gradio`) ### Steps 1. **Create Space**: - Go to HF Spaces -> Create New Space. - SDK: **Gradio**. - Hardware: **CPU Basic** (Free) is sufficient (since we use APIs). 2. **Prepare Files**: - Ensure `app.py` contains the Gradio interface construction. - Ensure `requirements.txt` or `pyproject.toml` lists all dependencies. 3. **Secrets**: - Go to Space Settings -> **Repository secrets**. - Add `ANTHROPIC_API_KEY` (or your chosen LLM provider key). - Add `BRAVE_API_KEY` (for web search). 4. **Deploy**: - Push code to the Space's git repo. - Watch "Build" logs. ### Streaming Optimization Ensure `app.py` uses generator functions for the chat interface to prevent timeouts: ```python # app.py def predict(message, history): agent = ResearchAgent() for update in agent.research_stream(message): yield update ``` --- ## 2. MCP Server Deployment **Goal**: Allow other agents (like Claude Desktop) to use our PubMed/Research tools directly. ### Local Usage (Claude Desktop) 1. **Install**: ```bash uv sync ``` 2. **Configure Claude Desktop**: Edit `~/Library/Application Support/Claude/claude_desktop_config.json`: ```json { "mcpServers": { "deepboner": { "command": "uv", "args": ["run", "fastmcp", "run", "src/mcp_servers/pubmed_server.py"], "cwd": "/absolute/path/to/DeepBoner" } } } ``` 3. **Restart Claude**: You should see a 🔌 icon indicating connected tools. ### Remote Deployment (Smithery/Glama) *Target for "MCP Track" bonus points.* 1. **Dockerize**: Create a `Dockerfile` for the MCP server. ```dockerfile FROM python:3.11-slim COPY . /app RUN pip install fastmcp httpx CMD ["fastmcp", "run", "src/mcp_servers/pubmed_server.py", "--transport", "sse"] ``` *Note: Use SSE transport for remote/HTTP servers.* 2. **Deploy**: Host on Fly.io or Railway. --- ## 3. Modal (GPU/Heavy Compute) **Goal**: Run a local LLM (e.g., Llama-3-70B) or handle massive parallel searches if APIs are too slow/expensive. ### Setup 1. **Install**: `uv add modal` 2. **Auth**: `modal token new` ### Logic Instead of calling Anthropic API, we call a Modal function: ```python # src/llm/modal_client.py import modal stub = modal.Stub("deepboner-inference") @stub.function(gpu="A100") def generate_text(prompt: str): # Load vLLM or similar ... ``` ### When to use? - **Hackathon Demo**: Stick to Anthropic/OpenAI APIs for speed/reliability. - **Production/Stretch**: Use Modal if you hit rate limits or want to show off "Open Source Models" capability. --- ## Deployment Checklist ### Pre-Flight - [ ] Run `pytest -m unit` to ensure logic is sound. - [ ] Run `pytest -m e2e` (one pass) to verify APIs connect. - [ ] Check `requirements.txt` matches `pyproject.toml`. ### Secrets Management - [ ] **NEVER** commit `.env` files. - [ ] Verify keys are added to HF Space settings. ### Post-Launch - [ ] Test the live URL. - [ ] Verify "Stop" button in Gradio works (interrupts the agent). - [ ] Record a walkthrough video (crucial for hackathon submission).