File size: 10,974 Bytes
dc4e6da | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 | # ποΈ DocGenie Architecture & Dependency Resolution
## π¦ Package Structure
```
docgenie/ β Root monorepo
βββ docgenie/ β Core package (importable)
β βββ __init__.py
β βββ generation/ β Used by API
β β βββ pipeline_01/
β β β βββ claude_batching.py β ClaudeBatchedClient
β β βββ pipeline_03/
β β βββ pipeline_04/
β β βββ utils/
β βββ evaluation/
β βββ utils/
β
βββ api/ β API Service (imports docgenie.*)
β βββ main.py from docgenie import ENV
β βββ worker.py from docgenie.generation.pipeline_01...
β βββ utils.py from docgenie.generation...
β βββ requirements.txt Extra: Redis, Supabase, Google
β
βββ handwriting_service/ β GPU Service (NO docgenie imports!)
β βββ main.py β Self-contained
β βββ inference.py β No external deps
β βββ models.py
β
βββ WordStylist/ β Model code (used by handwriting)
```
## π Dependency Graph
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Service β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β api/main.py β β
β β β imports β β
β β api/utils.py (call_claude_api_direct) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β api/worker.py β β
β β β imports β β
β β from docgenie.generation.pipeline_01.claude_batching β β
β β from docgenie.generation.constants β β
β β from docgenie.generation.pipeline_03_process_responseβ β
β β from docgenie.generation.pipeline_04_render_pdf... β β
β β from docgenie import ENV β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β REQUIRES β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β docgenie/ package β β
β β (entire generation module) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Handwriting Service β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β handwriting_service/main.py β β
β β β imports β β
β β from handwriting_service.inference import ... β β
β β from handwriting_service.models import ... β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β REQUIRES β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β WordStylist/ model β β
β β (diffusion model code) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β β NO docgenie imports - completely independent! β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## π³ Docker Build Strategy
### β What Doesn't Work
```dockerfile
# β WRONG: Can't copy just api/ folder
FROM python:3.11
COPY api/ /app/api/ # Missing docgenie package!
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app"] # ImportError: No module named 'docgenie'
```
### β
What Works
```dockerfile
# β
CORRECT: Copy entire monorepo
FROM python:3.11
WORKDIR /app
# Copy everything
COPY . .
# Install docgenie as package
RUN pip install -e . # Makes docgenie.* importable
# Install API requirements
RUN pip install -r api/requirements.txt
WORKDIR /app/api
CMD ["uvicorn", "main:app"] # β docgenie imports work!
```
## π’ Deployment Strategy Comparison
### Option 1: Separate Deployments (β Won't Work)
```
API Deployment:
βββ api/ folder only
βββ β Missing docgenie package β ImportError
Handwriting Deployment:
βββ handwriting_service/ folder
βββ WordStylist/
```
**Problem:** API can't find docgenie imports!
### Option 2: Monorepo Deployment (β
Works)
```
API Deployment:
βββ docgenie/ package (core)
βββ api/ service (imports docgenie)
βββ setup.py
βββ requirements.txt
Handwriting Deployment:
βββ handwriting_service/
βββ WordStylist/
```
**Solution:** Deploy entire repo for API, standalone for handwriting!
## π File Structure in Containers
### API Container (Railway/EC2)
```
/app/
βββ docgenie/ β Installed as Python package
β βββ __init__.py
β βββ generation/
β βββ utils/
βββ api/ β Working directory
β βββ main.py
β βββ worker.py
β βββ utils.py
βββ setup.py
βββ pyproject.toml
Python can import:
β from docgenie.generation.pipeline_01 import ...
β from docgenie import ENV
```
### Handwriting Container (RunPod)
```
/app/
βββ handwriting_service/
β βββ main.py β No docgenie imports!
β βββ inference.py
β βββ models.py
βββ WordStylist/ β Model code
βββ ldm/
βββ wordstylist_inference.py
Python can import:
β from handwriting_service.inference import ...
β No docgenie dependencies needed!
```
## π― Import Resolution Flow
### API Service Import Chain
1. **FastAPI starts:**
```python
uvicorn main:app
```
2. **main.py imports utils:**
```python
from api.utils import call_claude_api_direct
```
3. **utils.py imports docgenie:**
```python
from docgenie.generation.pipeline_01.claude_batching import ClaudeBatchedClient
```
4. **Python looks for docgenie:**
- Checks sys.path
- Finds `/app` (where `pip install -e .` installed it)
- Loads `docgenie/__init__.py`
- β Import succeeds!
### Handwriting Service Import Chain
1. **FastAPI starts:**
```python
uvicorn main:app
```
2. **main.py imports local modules:**
```python
from handwriting_service.inference import HandwritingGenerator
```
3. **inference.py imports WordStylist:**
```python
sys.path.insert(0, str(Path(__file__).parent.parent / "WordStylist"))
from ldm.models.diffusion.ddpm import LatentDiffusion
```
4. **Python loads local modules:**
- No external package dependencies
- β Completely self-contained!
## π Verifying Imports
### Test API Imports
```bash
# Inside API container
python3 -c "from docgenie.generation.pipeline_01.claude_batching import ClaudeBatchedClient; print('β Import works!')"
```
### Test Handwriting Imports
```bash
# Inside handwriting container
python3 -c "from handwriting_service.inference import HandwritingGenerator; print('β Import works!')"
```
## π‘ Key Insights
1. **API needs monorepo:** Must deploy entire `docgenie/` folder structure
2. **Handwriting is independent:** Can deploy just `handwriting_service/` + `WordStylist/`
3. **Docker layer caching:** Install docgenie package first, then API requirements
4. **Working directory matters:** Set WORKDIR to /app/api for API service
5. **Python package installation:** `pip install -e .` makes docgenie importable globally
## π Deployment Size Comparison
| Deployment | Size | Contents |
|------------|------|----------|
| API (Railway) | ~2GB | Python 3.11 + docgenie + API deps + Playwright |
| Worker (Railway) | ~2GB | Same as API (shares image) |
| Handwriting (RunPod) | ~8GB | CUDA 11.8 + PyTorch + Diffusers + WordStylist |
**Total:** ~12GB (but cached independently)
## β
Checklist for Successful Deployment
- [ ] Dockerfile copies **entire monorepo** for API
- [ ] `pip install -e .` runs before API requirements
- [ ] WORKDIR set to /app/api for runtime
- [ ] Handwriting Dockerfile copies only handwriting_service/ + WordStylist/
- [ ] .dockerignore excludes data/ folders (too large)
- [ ] Environment variables set in Railway/EC2
- [ ] Redis URL points to Upstash
- [ ] HANDWRITING_SERVICE_URL points to RunPod endpoint
## π Result
```
β API can import from docgenie package
β Worker can use ClaudeBatchedClient
β Handwriting service runs independently
β All services communicate via HTTP
β No more ImportError!
```
|