# Complete System Architecture

> **Document purpose:** System-wide architecture covering backend, frontend, database, deployment, and all component interactions. Read this to understand how all pieces fit together.

---

## Table of Contents

1. [High-Level System Diagram](#high-level-system-diagram)
2. [Backend Architecture (src/)](#backend-architecture-src)
3. [Frontend Architecture (frontend/)](#frontend-architecture-frontend)
4. [API Contract](#api-contract)
5. [Data Flow](#data-flow)
6. [Deployment Architecture](#deployment-architecture)

---

## High-Level System Diagram

```
┌────────────────────────────────────────────────────────────────────────────────┐
│                           EXTERNAL DATA SOURCES                                │
│  ┌──────────┐ ┌────────────┐ ┌────────┐ ┌──────┐ ┌──────┐ ┌──────────────┐   │
│  │ Notion   │ │ Confluence │ │ GitHub │ │ Slack │ │ Jira  │ │URLs + Firecrawl│   │
│  └──────────┘ └────────────┘ └────────┘ └──────┘ └──────┘ └──────────────┘   │
└────────────────────┬───────────────────────────────────────────────────────────┘
                     │ (Webhooks + Polling)
                     ▼
┌────────────────────────────────────────────────────────────────────────────────┐
│                            BACKEND (Python/FastAPI)                            │
│  ┌────────────────┐  ┌──────────────────┐  ┌──────────────────────────────┐   │
│  │  Data Ingestion │  │ RAG + Retrieval  │  │  Analytics & Intelligence   │   │
│  │  ├─ Adapters    │  │ ├─ Hybrid search │  │  ├─ Query events           │   │
│  │  ├─ Docling     │  │ ├─ BGE-M3        │  │  ├─ Knowledge graph        │   │
│  │  ├─ GLiNER PII  │  │ ├─ Qdrant        │  │  └─ Anomaly detection      │   │
│  │  └─ Chunking    │  │ └─ LLM agents    │  └──────────────────────────────┘   │
│  └────────────────┘  └──────────────────┘                                      │
│         ▲                      ▲                          ▲                     │
│         │                      │                          │                     │
│  ┌──────┴──────────────────────┴──────────────────────────┴────────────┐      │
│  │              FastAPI Backend (Uvicorn)                             │      │
│  │              ├─ /api/query/* (search + follow-up)                 │      │
│  │              ├─ /api/analytics/* (dashboards)                    │      │
│  │              ├─ /api/admin/* (data source management)            │      │
│  │              └─ /ws (WebSocket for real-time alerts)             │      │
│  └──────┬──────────────────────────────────────────────────────────┘      │
│         │                                                                   │
│  ┌──────▼────────────────────────────────────────────────────────┐        │
│  │     Data Layer (PostgreSQL, Qdrant, Neo4j, Redis, S3)         │        │
│  │  ├─ PostgreSQL: Metadata, RBAC, audit trails, queries        │        │
│  │  ├─ Qdrant: Vector embeddings (dense + sparse)              │        │
│  │  ├─ Neo4j: Knowledge graph (Service/Library/Incident/Team)  │        │
│  │  ├─ Redis: Cache, session state, pub/sub, task queues       │        │
│  │  └─ S3: PDFs, user uploads, exports                          │        │
│  └──────┬────────────────────────────────────────────────────────┘        │
└─────────┼────────────────────────────────────────────────────────────────────┘
          │
          │ (REST API + WebSocket)
          ▼
┌────────────────────────────────────────────────────────────────────────────────┐
│                        FRONTEND (React/TypeScript)                             │
│  ┌───────────────────────┐  ┌──────────────────┐  ┌────────────────────────┐  │
│  │  Query Interface      │  │  Dashboards      │  │  Admin UI              │  │
│  │  ├─ Search box        │  │  ├─ Query trends │  │  ├─ Data source mgmt   │  │
│  │  ├─ Results display   │  │  ├─ Knowledge    │  │  ├─ User management    │  │
│  │  ├─ Citations         │  │  │   health      │  │  ├─ RBAC editor        │  │
│  │  ├─ Follow-ups        │  │  ├─ Dependencies │  │  ├─ API keys           │  │
│  │  └─ Knowledge graph   │  │  └─ Alerts       │  │  └─ System health      │  │
│  └───────────────────────┘  └──────────────────┘  └────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────────────────────────┐  │
│  │  Component Layer (shadcn/ui + Tailwind)                                 │  │
│  │  ├─ Query & Search components                                           │  │
│  │  ├─ Chart & data table components (Recharts, TanStack Table)           │  │
│  │  ├─ Knowledge graph visualizer (Force-Graph)                           │  │
│  │  ├─ Authentication flow (JWT)                                          │  │
│  │  └─ Real-time notifications (WebSocket)                               │  │
│  └──────────────────────────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────────────────────────┐  │
│  │  State Management (TanStack Query + Zustand)                            │  │
│  │  ├─ Server state: Queries, analytics, user data (TanStack Query)      │  │
│  │  └─ Client state: UI state, theme, filters (Zustand)                  │  │
│  └──────────────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────────────────┘
```

---

## Backend Architecture (src/) — Agent-Based Design

### Core Principle: Per-Source Agents

Rather than generic adapters flowing through a single pipeline, each data source is an **independent agent** with:
- Source-specific authentication & adapters
- Source-optimized chunking (preserves context like Confluence breadcrumbs, Jira comment threading)
- Independent Celery tasks (different polling cadences, priorities)
- Independent FastAPI routers (explicit webhooks like `/webhooks/jira`)
- Self-contained testing (`test_run.py` per agent)

This design ensures **scalability by source**, **operational clarity**, and **production-grade maintainability**.

### Directory Structure

> **Note:** The actual repo layout diverges from early plans. The implemented structure is below. `src/query_engine/` and `src/retrieval/` referenced in earlier design docs do not exist — that logic lives in `agent/`. Graph endpoints live in `graph_store/`, not `src/api/graph.py`.

```
agent/                          # LangGraph multi-agent query engine (IMPLEMENTED)
├── api.py                      # POST /agent/query — SSE streaming endpoint
├── graph.py                    # LangGraph build: planner → [doc_search|ticket_lookup|live_docs|sql_query] → join → synthesiser → guardrail
├── models.py                   # KnowledgeGraphState, QueryInput, ExecutionPlan, AgentResult, RetrievedChunk
├── config.py                   # LLM + agent config
├── prompts.py                  # Prompt templates
├── agents/
│   ├── planner.py              # Breaks query into AgentTask list
│   ├── synthesiser.py          # Streams answer tokens from top chunks
│   ├── guardrail.py            # Validates answer against sources; sets escalate flag
│   └── _gemini.py              # Gemini client helper (used in planner/synthesiser)
└── tools/
    ├── doc_search.py           # Qdrant hybrid dense+sparse search
    ├── ticket_lookup.py        # Jira-specific retrieval
    ├── live_docs.py            # Firecrawl real-time doc fetching
    ├── sql_query.py            # NL-to-SQL: translates query → validated SELECT → asyncpg execution
    └── summariser.py           # Context compression before synthesis

graph_store/                    # Neo4j knowledge graph (IMPLEMENTED)
├── api.py                      # GET /graph/nodes, POST /graph/ingest, GET /graph/traverse
├── stream.py                   # WS /graph/stream — streams nodes+edges with 50ms delay
├── extractor.py                # Gemini 2.5 Pro entity+relationship extraction (4 types, whitelist rels)
├── writer.py                   # Async Neo4j MERGE upserts, index creation
├── reader.py                   # Cypher traversal: incident→service→library→chunks
├── models.py                   # ExtractedEntity, ExtractedRelationship, ExtractionResult
└── config.py                   # Neo4j connection settings

src/
├── agents_app.py               # Combined FastAPI app: all agent routers + Qdrant/Redis init
│
├── jira_agent/                 # JIRA ingestion agent (IMPLEMENTED)
│   ├── __init__.py
│   ├── config.py               # JiraAgentConfig — JIRA_BASE_URL, JIRA_EMAIL, JIRA_API_TOKEN,
│   │                           #   JIRA_PROJECT_KEYS (csv), JIRA_WEBHOOK_SECRET, TEAM_ID
│   ├── adapter.py              # JiraAdapter — fetch_issue, fetch_all (JQL), fetch_incremental
│   │                           #   Basic auth (base64 email:api_token), ADF text extraction
│   ├── chunker.py              # chunk_jira_issue → chunk 0: issue body, chunks 1..N: comments
│   │                           #   Preserves thread structure for relation extraction
│   ├── pipeline.py             # ingest_issue / ingest_project → chunk → PII mask → embed → Qdrant
│   │                           #   Returns entity graph nodes for real-time streaming
│   ├── tasks.py                # Celery: jira_process_issue (queue=critical), 
│   │                           #   jira_sync_project (queue=polling)
│   ├── router.py               # FastAPI: POST /webhooks/jira, POST /jira/sync/{project_key}
│   └── test_run.py             # Mock + real runthrough; works without credentials
│
├── confluence_agent/           # Confluence ingestion agent (IMPLEMENTED)
│   ├── __init__.py
│   ├── config.py               # ConfluenceAgentConfig — BASE_URL, TOKEN, EMAIL,
│   │                           #   CONFLUENCE_SPACES (csv), CONFLUENCE_WEBHOOK_SECRET, TEAM_ID
│   ├── adapter.py              # ConfluenceAdapter — fetch_page, fetch_space, fetch_incremental (CQL)
│   │                           #   REST v2 API with pagination
│   ├── chunker.py              # chunk_confluence_page — BeautifulSoup heading-split + breadcrumbs
│   │                           #   [Space > Ancestor > Page] prefix on every chunk; tables = 1 chunk each
│   │                           #   Preserves hierarchy for entity linking
│   ├── pipeline.py             # ingest_page / ingest_space → chunk → PII mask → embed → Qdrant
│   │                           #   Returns entity graph nodes
│   ├── tasks.py                # Celery: confluence_process_page (queue=critical), 
│   │                           #   confluence_sync_space (queue=polling),
│   │                           #   confluence_periodic_sync (beat, 60 min incremental sync)
│   ├── router.py               # FastAPI: POST /webhooks/confluence, POST /confluence/sync/{space_key}
│   │                           #   POST /confluence/search (for admin dashboard)
│   └── test_run.py             # Mock + real runthrough; works without credentials
│
├── file_agent/                 # File ingestion agent (IMPLEMENTED)
│   ├── __init__.py
│   ├── config.py               # FileAgentConfig — UPLOAD_DIR, MAX_FILE_SIZE, ALLOWED_TYPES
│   ├── adapter.py              # FileAdapter — handle PDFs, DOCX, PPTX, TXT
│   │                           #   Uses docling for multi-format parsing
│   ├── chunker.py              # chunk_file_document — respects document structure (sections, pages)
│   ├── pipeline.py             # ingest_file → chunk → PII mask → embed → Qdrant
│   ├── tasks.py                # Celery: file_process_upload (queue=critical)
│   ├── router.py               # FastAPI: POST /files/upload, GET /files/{file_id}
│   └── test_run.py
│
├── shared/                     # Shared utilities (used by all agents)
│   ├── __init__.py
│   ├── pii_masker.py           # GLiNER-based PII detection (local, zero egress)
│   ├── embedder.py             # BGE-M3 embeddings (local inference)
│   ├── qdrant_client.py        # Qdrant connection + upsert helpers
│   ├── entity_extractor.py     # Extract entities/relationships from chunks (used per-agent)
│   ├── models.py               # Pydantic models (RawDocument, ChunkedDocument, Entity, Graph)
│   └── config.py               # Shared config (QDRANT_URL, REDIS_URL, etc.)
│
├── retrieval/                  # T1, T2, T3 retrieval layers (shared across queries)
│   ├── __init__.py
│   ├── hybrid_search.py        # T1: Dense + Sparse (RRF fusion) — queries Qdrant
│   ├── reranker.py             # BGE-reranker-v2-m3 integration
│   ├── context_compressor.py   # Compress top-5 into LLM context
│   ├── cag_agent.py            # T2: Cache-Augmented Generation (recent syncs)
│   ├── live_doc_agent.py       # T3: Real-time doc fetching (Firecrawl)
│   └── models.py               # Pydantic models for retrieval
│
├── query_engine/               # Query execution (LangGraph-based)
│   ├── __init__.py
│   ├── generator_agent.py      # Generator LLM agent (creates answer from context)
│   ├── critic_agent.py         # Critic LLM agent (validates against sources)
│   ├── orchestrator.py         # LangGraph: routes query through retrieval → generation → validation
│   ├── streaming.py            # Stream answer chunks + citations + graph to frontend
│   └── models.py               # Pydantic models for query responses
│
├── redis/                      # Redis utilities (shared)
│   ├── __init__.py
│   ├── cache.py                # Caching layer (with TTL)
│   ├── queues.py               # Task queues (per-agent ingestion, webhook events)
│   ├── session_state.py        # Query session state
│   ├── locks.py                # Distributed locks (prevent concurrent agent syncs)
│   └── pubsub.py               # Pub/sub for real-time graph updates to frontend (query_id → node)
│
├── api/                        # FastAPI main app + shared endpoints
│   ├── __init__.py
│   ├── auth.py                 # POST /auth/login, /auth/logout, /auth/refresh
│   ├── query.py                # POST /api/query (streaming), /api/query/{id}/follow-up
│   ├── workspace.py            # GET/POST /api/workspace/queries, /saved
│   ├── admin.py                # GET /api/admin/agents (show all agent statuses)
│   └── graph.py                # GET /api/graph/entities, /api/graph/query/{query_id}
│
├── db/                         # Database models & utilities
│   ├── __init__.py
│   ├── models.py               # SQLAlchemy models (User, Query, Document, Entity, Graph)
│   ├── session.py              # Database session management
│   └── init_db.py              # Schema initialization
│
├── auth/                       # Authentication & authorization
│   ├── __init__.py
│   ├── jwt_handler.py          # JWT encode/decode, token refresh
│   ├── oauth.py                # OAuth2 + SSO integration (phase 2)
│   ├── rbac.py                 # Role-based access control decorator
│   ├── permissions.py          # Permission checks
│   └── models.py               # User, Role, Permission models
│
├── utils/                      # Shared utilities
│   ├── __init__.py
│   ├── logger.py               # Structured logging (JSON)
│   ├── metrics.py              # Prometheus metrics
│   ├── telemetry.py            # OpenTelemetry (phase 2)
│   └── exceptions.py           # Custom exceptions
│
└── tests/                      # Comprehensive test suite
    ├── __init__.py
    ├── agents/                 # Per-agent tests (JIRA, Confluence, File)
    ├── retrieval/              # Retrieval pipeline tests
    ├── query_engine/           # Query generation + validation tests
    ├── fixtures/               # Pytest fixtures (mock data)
    └── integration/            # End-to-end scenarios
```

### Key Backend Design Decisions

1. **Per-Source Agents:** Each source (Jira, Confluence, File) is an independent module with its own adapter, chunker, pipeline, and Celery tasks. This enables source-specific optimization and independent scaling.

2. **Source-Optimized Chunking:** 
   - Confluence: Preserves `[Space > Ancestor > Page]` hierarchy for entity linking
   - Jira: Preserves comment threading for relation extraction
   - File: Respects document structure (sections, pages)
   - Each source extracts its own entity relationships

3. **Independent Celery Scheduling:**
   - `jira_sync_project` → configurable interval (often 1 hour)
   - `confluence_periodic_sync` → beat scheduler (60 min incremental)
   - `file_process_upload` → immediate (queue=critical)
   - Each agent controls its own cadence

4. **PII Masking First:** GLiNER runs in `shared/pii_masker.py` — local, zero-egress, runs before Qdrant indexing.

5. **Entity Extraction Per-Agent:** Each pipeline returns a graph of entities + relationships (e.g., Jira: issue→linked_issue, Confluence: page→linked_page). Frontend streams these nodes as they're extracted.

6. **Real-Time Graph Streaming:** Via Redis pub/sub (`query_id → {nodes, edges}`) — frontend doesn't wait for full completion.

7. **Redis Everywhere:** Cache, queues, session state, distributed locks, and pub/sub all via Redis.

8. **Hybrid Retrieval (T1):** Dense (BGE-M3) + Sparse (BM25) via RRF — queries Qdrant.

---

## Frontend Architecture (frontend/)

### Directory Structure

```
frontend/
├── index.html                   # Entry HTML (Vite serves this)
├── vite.config.ts              # Vite build config
├── tsconfig.json               # TypeScript config
├── tailwind.config.ts          # Tailwind design tokens + dark mode
├── postcss.config.js           # PostCSS + Tailwind plugins
├── package.json                # Dependencies + scripts
├── .env.example                # Required environment variables
│
├── src/
│   ├── main.tsx                # React app entry point
│   ├── App.tsx                 # Root component + routing
│   │
│   ├── components/
│   │   ├── common/             # Reusable components
│   │   │   ├── Header.tsx      # Top nav bar
│   │   │   ├── Sidebar.tsx     # Left navigation
│   │   │   ├── Footer.tsx      # Footer
│   │   │   ├── Button.tsx      # Button variants (from shadcn)
│   │   │   ├── Input.tsx       # Text input (from shadcn)
│   │   │   ├── Card.tsx        # Card container
│   │   │   ├── Modal.tsx       # Modal/dialog
│   │   │   ├── Badge.tsx       # Status badges
│   │   │   ├── Tooltip.tsx     # Tooltips
│   │   │   ├── Toast.tsx       # Toast notifications
│   │   │   └── Loading.tsx     # Loading skeleton
│   │   │
│   │   ├── query/              # Query interface (Engineer primary)
│   │   │   ├── SearchBox.tsx   # Main search input (Cmd+K support)
│   │   │   ├── QueryModal.tsx  # Modal for new query
│   │   │   ├── QueryHistory.tsx # Query history panel
│   │   │   ├── SuggestedTopics.tsx # Related queries
│   │   │   └── QueryFeedback.tsx # Thumbs up/down
│   │   │
│   │   ├── results/            # Results display + knowledge graph
│   │   │   ├── ResultsPage.tsx # Main results container
│   │   │   ├── Answer.tsx      # Generated answer with citations
│   │   │   ├── Citations.tsx   # Cited source chunks
│   │   │   ├── FollowUp.tsx    # Follow-up prompt
│   │   │   ├── KnowledgeGraph.tsx # Knowledge graph visualization
│   │   │   ├── GraphNode.tsx   # Individual node component
│   │   │   ├── RelatedDocs.tsx # Related document snippets
│   │   │   └── ShareResults.tsx # Share/export options
│   │   │
│   │   ├── analytics/          # Dashboards (Manager primary)
│   │   │   ├── AnalyticsDashboard.tsx # Main analytics page
│   │   │   ├── QueryTrendChart.tsx # Line chart for query volume
│   │   │   ├── TopicsChart.tsx # Bar chart for topics
│   │   │   ├── SuccessRateGauge.tsx # Gauge chart
│   │   │   ├── KnowledgeHealthDashboard.tsx # Health metrics
│   │   │   ├── DependencyTracker.tsx # Breaking changes table
│   │   │   ├── EscalationTable.tsx # Unresolved queries
│   │   │   ├── TeamSettings.tsx # Team configuration
│   │   │   └── AnalyticsExport.tsx # Export reports
│   │   │
│   │   ├── admin/              # Admin UI (Admin primary)
│   │   │   ├── AdminDashboard.tsx # Main admin page
│   │   │   ├── SystemHealth.tsx # Health status cards
│   │   │   ├── DataSourceManager.tsx # Add/edit sources
│   │   │   ├── DataSourceForm.tsx # Source configuration wizard
│   │   │   ├── UserManager.tsx # User list + invite
│   │   │   ├── RBACEditor.tsx  # RBAC policy editor
│   │   │   ├── APIKeyManager.tsx # Generate/revoke keys
│   │   │   └── SystemLogs.tsx  # View logs + alerts
│   │   │
│   │   └── auth/               # Authentication UI
│   │       ├── LoginPage.tsx   # Login form (SSO + fallback)
│   │       ├── SSORedirect.tsx # OAuth callback handler
│   │       └── ProtectedRoute.tsx # Route guard
│   │
│   ├── pages/                  # Route pages (using TanStack Router)
│   │   ├── Home.tsx            # Dashboard home
│   │   ├── QueryPage.tsx       # Query results page
│   │   ├── AnalyticsPage.tsx   # Analytics dashboards
│   │   ├── AdminPage.tsx       # Admin dashboards
│   │   ├── WorkspacePage.tsx   # Personal/team workspace
│   │   ├── NotFoundPage.tsx    # 404 page
│   │   └── ErrorPage.tsx       # Error boundary
│   │
│   ├── hooks/                  # Custom React hooks
│   │   ├── useSSEStream.ts     # SSE consumer for POST /agent/query — manages fetch + ReadableStream parsing
│   │   ├── useGraphStream.ts   # WebSocket consumer for WS /graph/stream — feeds Force-Graph 2D progressively
│   │   ├── useNotifications.ts # WebSocket consumer for WS /ws system notifications (future)
│   │   ├── useAnalytics.ts     # Fetch analytics data
│   │   ├── useAuth.ts          # Authentication state
│   │   ├── useTheme.ts         # Dark mode toggle
│   │   ├── useLocalStorage.ts  # Persist state to localStorage
│   │   ├── usePagination.ts    # Pagination logic
│   │   └── useDebounce.ts      # Debounce search input
│   │
│   ├── stores/                 # Zustand state management
│   │   ├── authStore.ts        # User + auth state
│   │   ├── uiStore.ts          # UI state (theme, sidebar open, etc.)
│   │   ├── filterStore.ts      # Dashboard filters
│   │   └── workspaceStore.ts   # Workspace selections
│   │
│   ├── lib/
│   │   ├── api.ts              # TanStack Query setup + HTTP client
│   │   ├── http.ts             # httpx client wrapper (JWT refresh)
│   │   ├── auth.ts             # JWT helpers, localStorage auth
│   │   ├── websocket.ts        # WebSocket manager for alerts
│   │   ├── utils.ts            # General utilities (debounce, etc.)
│   │   ├── validators.ts       # Input validation (Zod)
│   │   ├── constants.ts        # App-wide constants
│   │   ├── error-handler.ts    # Centralized error handling
│   │   └── date.ts             # Date formatting helpers
│   │
│   ├── types/
│   │   ├── index.ts            # Re-export all types
│   │   ├── api.ts              # API response types
│   │   ├── user.ts             # User + auth types
│   │   ├── query.ts            # Query + results types
│   │   ├── analytics.ts        # Analytics types
│   │   ├── components.ts       # Component prop types
│   │   └── errors.ts           # Error types
│   │
│   ├── styles/
│   │   ├── globals.css         # Global styles + Tailwind imports
│   │   ├── design-tokens.css   # Design tokens (terracotta, white, dark mode)
│   │   ├── animations.css      # Custom animations (optional)
│   │   └── responsive.css      # Responsive utility classes
│   │
│   └── config/
│       ├── routes.ts           # TanStack Router configuration
│       ├── env.ts              # Environment variables + validation
│       └── queryClient.ts      # TanStack Query client config
│
├── public/                     # Static assets
│   ├── logo.svg                # Logo
│   ├── favicon.ico             # Favicon
│   └── assets/                 # Images, icons
│
├── tests/
│   ├── __mocks__/              # Mock data + API responses
│   ├── components/             # Component tests (Vitest + RTL)
│   ├── hooks/                  # Hook tests
│   ├── utils/                  # Utility tests
│   └── setup.ts                # Vitest + RTL setup
│
├── .eslintrc.json              # ESLint config
├── .prettierrc                 # Prettier config
└── README.md                   # Frontend development guide
```

### Frontend Design Decisions

1. **Vite + React 18:** Fast dev, instant HMR, minimal config. No SSR needed for SPA.
2. **TanStack Router:** Fully typed routing; better DX than React Router v6.
3. **TanStack Query:** Server state management with automatic caching/refetching.
4. **Zustand:** Lightweight client state (theme, UI, filters); no Redux boilerplate.
5. **shadcn/ui + Tailwind:** Copy-paste components, full control, design tokens system.
6. **Responsive Design:** Mobile (320px), Tablet (768px), Desktop (1024px+).
7. **WebSocket:** Native API for real-time alerts; no Socket.io overhead.
8. **JWT + httpOnly Cookies:** Secure auth; backend validates on every request.

---

## API Contract

### Authentication

```
POST /api/auth/login
├─ Request: { email, password } or { sso_provider, sso_token }
├─ Response: { access_token, refresh_token, user: { id, email, role, team_id } }
├─ Sets httpOnly cookie: __auth_token
└─ Bearer token in Authorization header for all subsequent requests

POST /api/auth/refresh
├─ Request: { refresh_token }
├─ Response: { access_token }
└─ Auto-called by frontend before token expires

POST /api/auth/logout
├─ Clears httpOnly cookie
└─ Backend invalidates refresh token in Redis
```

### Agent Webhook Endpoints (Per-Source)

```
POST /webhooks/jira
├─ Validates Jira webhook signature (X-Atlassian-Webhook-Signature)
├─ Extracts issue_created, issue_updated, comment_created events
├─ Routes to jira_process_issue Celery task (queue=critical)
└─ Returns immediately (202 Accepted)

POST /webhooks/confluence
├─ Validates Confluence webhook signature
├─ Extracts page_created, page_updated, page_trashed events
├─ Routes to confluence_process_page Celery task (queue=critical)
└─ Returns immediately (202 Accepted)

POST /files/upload
├─ Accepts multipart/form-data with file + team_id
├─ Routes to file_process_upload Celery task (queue=critical)
├─ Returns file_id immediately; processing async
└─ Frontend polls /files/{file_id} for status

POST /jira/sync/{project_key}
├─ Manual trigger; requires admin role
├─ Routes to jira_sync_project Celery task (queue=polling)
└─ Returns job_id for polling

POST /confluence/sync/{space_key}
├─ Manual trigger; requires admin role
├─ Routes to confluence_sync_space Celery task (queue=polling)
└─ Returns job_id for polling
```

### Query API (Streaming SSE)

```
POST /agent/query
├─ Request: { query: string, team_id: string, session_id: string }
├─ Response: Content-Type: text/event-stream
│  ├─ event: plan_ready        → { tasks: [AgentTask], reasoning: string }
│  ├─ event: agent_started     → { agent: "doc_search"|"ticket_lookup"|"live_docs"|"sql_query"|"summariser" }
│  ├─ event: agent_done        → { agent: string, chunks: [RetrievedChunk], confidence: "high"|"medium"|"low" }
│  ├─ event: synthesis_started → {}
│  ├─ event: answer_chunk      → { chunk: string }   (repeats, one per token)
│  ├─ event: guardrail_result  → { score: float, escalate: bool }
│  ├─ event: done              → {}
│  └─ event: error             → { message: string }
└─ Headers: Cache-Control: no-cache, X-Accel-Buffering: no

POST /api/query/{query_id}/feedback
├─ Request: { sentiment: "helpful"|"not_helpful"|"hallucinated", text?: string }
└─ Response: { success: true }
```

### Knowledge Graph API

```
GET /graph/nodes?limit=50
└─ Response: { count: int, nodes: [{ label: string, name: string }] }
   (excludes Chunk and Document nodes — returns Service/Library/Incident/Team only)

POST /graph/ingest
├─ Request: { chunk_ids: [string], team_id: string }
└─ Response: { ingested: int }
   (fetches chunks from Supabase, runs Gemini extraction, upserts to Neo4j)

GET /graph/traverse?type=incident|service|library&name=string&team_id=string
└─ Response: { type, name, team_id, chunks: [string] }
   (multi-hop Cypher traversal — returns text chunks for context augmentation)

WS /graph/stream
└─ Streams: node events, edge events, then done event (see Real-Time API above)
```

### Analytics API

```
GET /api/analytics/queries?date_range=30d&team_id=...
├─ Response: {
│    query_count: 1243,
│    unique_users: 243,
│    avg_response_time_ms: 1200,
│    success_rate: 0.76,
│    trend: { data: [{date, count}] }
│  }

GET /api/analytics/knowledge-health
├─ Response: {
│    overall_score: 7.2,
│    coverage: 0.68,
│    freshness: 0.82,
│    accuracy: 0.76,
│    accessibility: 0.71,
│    gaps: [{ topic: "ORM patterns", queries: 12, solutions: 0 }]
│  }

GET /api/analytics/dependencies
├─ Response: {
│    dependencies: [{name, current_version, latest_version, breaking_changes}],
│    alerts: 3
│  }
```

### Admin API

```
POST /api/admin/sources
├─ Request: { type, config, rbac_level }
├─ Response: { id, status, test_result }
└─ Triggers background sync

GET /api/admin/sources
├─ Response: [{ id, type, status, last_sync, record_count }]

PATCH /api/admin/sources/{id}
├─ Request: { name, config, rbac_level }
├─ Response: { updated_source }

DELETE /api/admin/sources/{id}
├─ Soft delete; preserves audit trail

---

POST /api/admin/users/invite
├─ Request: { emails: ["alice@..."], role, team_id }
├─ Response: { invitations: [{ email, invitation_id, expires_at }] }
└─ Sends email invite

GET /api/admin/users
├─ Response: [{ id, email, role, team_id, status, created_at }]

DELETE /api/admin/users/{user_id}
├─ Deactivates user (no hard delete for compliance)

---

POST /api/admin/rbac
├─ Request: { name, description, teams, sources, filters }
├─ Response: { id, policy }
└─ Returns doc count matching policy

GET /api/admin/rbac
├─ Response: [{ id, name, doc_count }]

PATCH /api/admin/rbac/{id}
├─ Update existing policy

---

POST /api/admin/api-keys
├─ Request: { name, permissions, rate_limits, expiry }
├─ Response: { key: "sk_...", created_at }
└─ Only returned once

GET /api/admin/api-keys
├─ Response: [{ name, created_at, last_used, permissions }]
```

### Bash Development Testing

Use these instead of Swagger UI when you need to test streaming behaviour from the terminal.

**Test SSE query stream (replaces Swagger — Swagger can't stream SSE):**
```bash
#!/usr/bin/env bash
# test_query.sh — streams the SSE response token-by-token to stdout

BASE_URL="${GODSPEED_API:-http://localhost:8000}"

curl -N -s \
  -X POST "${BASE_URL}/agent/query" \
  -H "Content-Type: application/json" \
  -d '{"query":"What is the auth service?","team_id":"team-1","session_id":"test-001"}' \
| while IFS= read -r line; do
    echo "$line"
  done
```

**Test graph REST endpoints:**
```bash
BASE_URL="${GODSPEED_API:-http://localhost:8000}"

# List all graph nodes
curl -s "${BASE_URL}/graph/nodes?limit=20" | python3 -m json.tool

# Traverse from a service
curl -s "${BASE_URL}/graph/traverse?type=service&name=auth-service&team_id=team-1" \
  | python3 -m json.tool

# Ingest chunks into graph
curl -s -X POST "${BASE_URL}/graph/ingest" \
  -H "Content-Type: application/json" \
  -d '{"chunk_ids":["chunk-abc123"],"team_id":"team-1"}' \
  | python3 -m json.tool
```

**Test WebSocket graph stream (requires `wscat` — install with `npm i -g wscat`):**
```bash
BASE_URL="${GODSPEED_WS:-ws://localhost:8000}"
wscat -c "${BASE_URL}/graph/stream"
# Prints node/edge/done events as they arrive
```

**Test Jira webhook signature (bash + openssl):**
```bash
BASE_URL="${GODSPEED_API:-http://localhost:8000}"
BODY='{"webhookEvent":"jira:issue_created","issue":{"id":"TEST-1","fields":{"summary":"Auth service down"}}}'
SECRET="your_jira_webhook_secret"
SIG="sha256=$(echo -n "${BODY}" | openssl dgst -sha256 -hmac "${SECRET}" | awk '{print $2}')"

curl -s -X POST "${BASE_URL}/webhooks/jira" \
  -H "Content-Type: application/json" \
  -H "X-Atlassian-Webhook-Signature: ${SIG}" \
  -d "${BODY}"
```

**Test file upload:**
```bash
BASE_URL="${GODSPEED_API:-http://localhost:8000}"
curl -s -X POST "${BASE_URL}/files/upload" \
  -F "file=@/path/to/doc.pdf" \
  -F "team_id=team-1"
```

---

### Real-Time API

There are two distinct real-time channels — do not conflate them:

**Channel 1: Query streaming (SSE)**
```
POST /agent/query   →   Content-Type: text/event-stream

Emits events in order:
  event: plan_ready        data: { tasks: [...], reasoning: "..." }
  event: agent_started     data: { agent: "doc_search" }
  event: agent_done        data: { agent: "doc_search", chunks: [...], confidence: "high" }
  event: synthesis_started data: {}
  event: answer_chunk      data: { chunk: "token text" }   ← repeats per token
  event: guardrail_result  data: { score: 0.92, escalate: false }
  event: done              data: {}
  event: error             data: { message: "..." }        ← on failure

Request body: { query: string, team_id: string, session_id: string }
```

**Channel 2: Knowledge graph visualization (WebSocket)**
```
WS /graph/stream

Emits in order (50ms delay between each):
  { event: "node", id: "...", label: "Service", name: "auth-service" }
  { event: "edge", from: "...", to: "...", rel: "DEPENDS_ON" }
  ...
  { event: "done", nodes_count: 42, edges_count: 87 }
```

**Channel 3: System notifications (WebSocket)**
```
WS /ws   (future — not yet implemented)

Will emit:
  event: "query_answered"  → { query_id, new_docs_count }
  event: "escalation_spike" → { topic, spike_rate }         (manager-only)
  event: "breaking_change"  → { dependency, version, url }  (admin-only)
  event: "data_sync_failed" → { source, error }             (admin-only)
  event: "knowledge_gap"    → { topic, query_count }        (all users)
```

---

## Data Flow

### Flow 1: Engineer Query → Answer

```
1. Engineer types query in SearchBox
   ├─ frontend sends POST /agent/query { query, team_id, session_id }
   └─ frontend simultaneously opens WS /graph/stream for parallel graph rendering

2. Backend receives query via SSE stream
   ├─ LangGraph planner breaks query into AgentTask list → emits plan_ready
   ├─ Each agent runs (doc_search / ticket_lookup / live_docs) → emits agent_started + agent_done
   ├─ doc_search: BGE-M3 embed → Qdrant hybrid search (dense+sparse RRF) → top 50 → BGE reranker → top 5
   ├─ Synthesiser streams answer tokens → emits answer_chunk per token
   └─ Guardrail validates answer against source chunks → emits guardrail_result

3. Guardrail result
   ├─ guardrail_passed=true → done event
   ├─ guardrail_passed=false + escalate=true → warning banner shown in frontend
   └─ Citations come from agent_done chunks (already streamed in step 2)

4. Frontend connects to graph stream (parallel to query SSE)
   ├─ WS /graph/stream streams the pre-built Neo4j graph (query-scoped subgraph)
   ├─ Nodes arrive one-by-one with 50ms delays: { event:"node", label, name }
   ├─ Edges arrive after nodes: { event:"edge", from, to, rel }
   └─ { event:"done" } signals completion
   Note: The knowledge graph is pre-built at ingestion time by Gemini 2.5 Pro
   (graph_store/extractor.py), not extracted from the answer at query time.

5. Frontend receives stream
   ├─ Displays answer immediately (no waiting)
   ├─ Renders citations as they arrive
   ├─ Knowledge graph appears once first connection established
   ├─ Related docs populate as backend fetches
   └─ Full page interactive once final "done" event received

6. Feedback recorded
   ├─ Engineer clicks thumbs up/down
   ├─ Frontend POSTs /api/query/{id}/feedback
   ├─ Backend records sentiment + triggers analytics update
   └─ Feedback visible in query history + aggregated for managers
```

### Flow 2: Data Ingestion (Daily/Polling)

```
1. Ingestion task triggered
   ├─ Webhook from source (e.g., Notion) OR Celery periodic task

2. Fetch stage
   ├─ Adapter queries source API
   ├─ Detects new/updated items (via timestamps or ETags)
   ├─ Downloads content

3. Normalize stage (Docling)
   ├─ Converts PDF/HTML/markdown to clean markdown
   ├─ Extracts tables as markdown tables
   ├─ Detects code blocks + language

4. PII Mask stage (GLiNER, local)
   ├─ Scans text for PII (names, emails, IDs, etc.)
   ├─ Replaces PII with placeholders (e.g., [REDACTED_EMAIL])
   ├─ Logs redaction for audit trail

5. Chunk stage (Semantic)
   ├─ Splits by paragraph/sentence boundaries
   ├─ Never splits code blocks or lists
   ├─ 15% overlap between chunks
   ├─ 256–512 tokens per chunk

6. Tag stage (Metadata)
   ├─ Adds source_uri, source_type, ingested_at
   ├─ Adds RBAC tag (public / team / restricted)
   ├─ Computes content_hash (for change detection)
   ├─ Detects doc_type (SOP, API doc, PR, etc.)

7. Embed stage
   ├─ Sends chunks to BGE-M3
   ├─ Gets 384-dim dense vectors
   ├─ Extracts sparse BM25-like vectors

8. Index stage
   ├─ Uploads dense vectors to Qdrant HNSW index
   ├─ Uploads sparse vectors to Qdrant sparse index
   ├─ Upserts metadata (PostgreSQL)
   ├─ Updates Redis cache (last_sync_timestamp)

9. Complete
   ├─ Backend records sync success in PostgreSQL
   ├─ Triggers webhook for frontend real-time update
   └─ Notifies admins if errors
```

### Flow 3: Manager Views Analytics

```
1. Manager navigates to /analytics

2. Frontend loads analytics data
   ├─ POST /api/analytics/queries?date_range=30d
   ├─ POST /api/analytics/knowledge-health
   ├─ POST /api/analytics/dependencies

3. Backend aggregates from event logs
   ├─ Queries PostgreSQL (query_events table)
   ├─ Aggregates by date, team, topic
   ├─ Computes trends, success rates
   ├─ Identifies gaps from failed queries

4. Frontend renders dashboards
   ├─ Query trends (Recharts line chart)
   ├─ Topics (bar chart)
   ├─ Success rate (gauge)
   ├─ Escalations table
   ├─ Knowledge health heatmap

5. Optional: Manager exports report
   ├─ Frontend POSTs /api/analytics/export?format=pdf
   ├─ Backend generates PDF via ReportLab
   ├─ Streams PDF download to browser
```

---

## Deployment Architecture

### Development (docker-compose)

```yaml
services:
  postgres:
    image: postgres:15
    volumes: [./data/postgres:/var/lib/postgresql/data]
    ports: [5432:5432]

  redis:
    image: redis:7-alpine
    ports: [6379:6379]

  qdrant:
    image: qdrant/qdrant:latest
    volumes: [./data/qdrant:/qdrant/storage]
    ports: [6333:6333]

  backend:
    build: ./backend
    ports: [8000:8000]
    depends_on: [postgres, redis, qdrant]
    environment:
      SQLALCHEMY_DATABASE_URL: postgresql://user:pass@postgres:5432/godspeed
      REDIS_URL: redis://redis:6379
      QDRANT_URL: http://qdrant:6333

  frontend:
    build: ./frontend
    ports: [3000:3000]
    depends_on: [backend]
    environment:
      VITE_API_BASE_URL: http://localhost:8000

  neo4j:
    image: neo4j:5
    ports: ["7474:7474", "7687:7687"]
    volumes: [./data/neo4j:/data]
    environment:
      NEO4J_AUTH: neo4j/godspeed_dev
      NEO4J_PLUGINS: '["apoc"]'

  celery:
    build: ./backend
    command: celery -A src.celery_app worker -Q critical,default,polling -l info
    depends_on: [postgres, redis, qdrant, neo4j]
    environment:
      SQLALCHEMY_DATABASE_URL: postgresql://user:pass@postgres:5432/godspeed
      REDIS_URL: redis://redis:6379
      NEO4J_URI: bolt://neo4j:7687
      NEO4J_USERNAME: neo4j
      NEO4J_PASSWORD: godspeed_dev
```

### Production (Kubernetes)

```yaml
# Deployments
- backend (FastAPI, 3 replicas, HPA)
- frontend (Nginx, 2 replicas, CDN)
- celery-worker (5 replicas, autoscaling on queue depth)

# StatefulSets
- postgres (with backup via S3)
- redis (cluster mode)
- qdrant (with persistence)

# Services
- backend-svc (ClusterIP)
- frontend-svc (LoadBalancer)
- postgres-svc (ClusterIP)
- redis-svc (ClusterIP)
- qdrant-svc (ClusterIP)

# ConfigMaps & Secrets
- app-config (env vars)
- api-keys (AWS S3, Notion OAuth, etc.)
- tls-certs (HTTPS)

# Ingress
- Routes /api/* to backend
- Routes /* to frontend
- TLS termination
```

### Self-Hosted (Single Server)

```
nginx (reverse proxy, static frontend)
  ├─ localhost:8000 (FastAPI backend)
  ├─ localhost:5432 (PostgreSQL)
  ├─ localhost:6379 (Redis)
  └─ localhost:6333 (Qdrant)

All services in systemd or Docker containers
Automated backups via Cron + S3
Monitoring via Prometheus + Grafana (optional)
```

---

## Key Architectural Principles

1. **Separation of Concerns:** Each layer (adapter, ingestion, retrieval, agent, API) has one responsibility.
2. **Stateless Backend:** FastAPI scales horizontally; state lives in PostgreSQL/Redis.
3. **Async Everywhere:** Celery for long-running tasks; FastAPI with asyncio for I/O.
4. **RBAC First:** All queries filtered by user's team/permissions at retrieval time.
5. **Streaming Results:** Don't wait for complete answer; stream chunks to frontend progressively.
6. **Local PII:** GLiNER runs on-premises; zero data egress for compliance.
7. **Cacheable at Every Layer:** Embeddings cached, searches cached, answers cached (with refresh policy).
8. **Observable:** Structured logging, metrics, traces (OpenTelemetry phase 2).