# Medium-MCP Architecture ## System Overview ``` ┌─────────────────────────────────────────────────────────────┐ │ Entry Points │ ├─────────────────┬─────────────────┬─────────────────────────┤ │ server.py │ app.py │ CLI │ │ (MCP Server) │ (Gradio UI) │ (Commands) │ └────────┬────────┴────────┬────────┴────────┬────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────┐ │ ScraperService │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Tier Chain │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ Cache │→│GraphQL │→│ HTTPX │→│Browser │→... │ │ │ │ │ (T0) │ │ (T10) │ │ (T20) │ │ (T30) │ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ database │ │ extractor │ │ parser │ │ (SQLite) │ │ (Apollo/LD) │ │ (HTML) │ └─────────────┘ └─────────────┘ └─────────────┘ ``` --- ## Core Modules ### Entry Points | Module | Purpose | |--------|---------| | `server.py` | MCP server (FastMCP) | | `app.py` | Gradio web UI | ### Service Layer | Module | Purpose | |--------|---------| | `src/service.py` | ScraperService orchestration | | `src/tiers/` | Tier chain pattern | ### Data Layer | Module | Purpose | |--------|---------| | `src/database.py` | SQLite caching | | `src/extractor.py` | Content extraction | | `src/parser.py` | HTML parsing | ### Infrastructure | Module | Purpose | |--------|---------| | `src/http_pool.py` | Connection pooling | | `src/resilience.py` | Circuit breaker | | `src/validation.py` | Input validation | | `src/security.py` | Rate limiting | --- ## Tier Chain | Priority | Tier | Description | |----------|------|-------------| | 0 | Cache | SQLite lookup | | 10 | GraphQL | Medium API | | 20 | HTTPX | Fast HTTP | | 30 | Browser | Playwright | | 40 | Wayback | Archive.org | Each tier returns `TierResult(success, data, error)`. --- ## Data Flow ``` URL → validate → resolve_id → tier_chain → extract → cache → render ``` 1. **Validate**: Check URL format 2. **Resolve**: Extract post ID 3. **Tier Chain**: Try each tier until success 4. **Extract**: Parse Apollo/JSON-LD 5. **Cache**: Store in SQLite 6. **Render**: Output markdown/HTML --- ## Package Structure ``` Medium-MCP/ ├── src/ │ ├── tiers/ # Tier implementations │ ├── validation.py # Input validation │ ├── security.py # Rate limiting │ ├── exceptions.py # Error handling │ └── metrics.py # Performance metrics ├── mcp/ │ ├── schemas.py # Pydantic models │ └── tools/ # Tool implementations ├── ui/ │ └── styles/ # CSS modules ├── tests/ │ ├── unit/ │ ├── integration/ │ └── e2e/ └── docs/ ```