Spaces:
Sleeping
Complete System Architecture
Document purpose: System-wide architecture covering backend, frontend, database, deployment, and all component interactions. Read this to understand how all pieces fit together.
Table of Contents
- High-Level System Diagram
- Backend Architecture (src/)
- Frontend Architecture (frontend/)
- API Contract
- Data Flow
- Deployment Architecture
High-Level System Diagram
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXTERNAL DATA SOURCES β
β ββββββββββββ ββββββββββββββ ββββββββββ ββββββββ ββββββββ ββββββββββββββββ β
β β Notion β β Confluence β β GitHub β β Slack β β Jira β βURLs + Firecrawlβ β
β ββββββββββββ ββββββββββββββ ββββββββββ ββββββββ ββββββββ ββββββββββββββββ β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (Webhooks + Polling)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND (Python/FastAPI) β
β ββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββββββ β
β β Data Ingestion β β RAG + Retrieval β β Analytics & Intelligence β β
β β ββ Adapters β β ββ Hybrid search β β ββ Query events β β
β β ββ Docling β β ββ BGE-M3 β β ββ Knowledge graph β β
β β ββ GLiNER PII β β ββ Qdrant β β ββ Anomaly detection β β
β β ββ Chunking β β ββ LLM agents β ββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββ ββββββββββββββββββββ β
β β² β² β² β
β β β β β
β ββββββββ΄βββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄βββββββββββββ β
β β FastAPI Backend (Uvicorn) β β
β β ββ /api/query/* (search + follow-up) β β
β β ββ /api/analytics/* (dashboards) β β
β β ββ /api/admin/* (data source management) β β
β β ββ /ws (WebSocket for real-time alerts) β β
β ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Data Layer (PostgreSQL, Qdrant, Neo4j, Redis, S3) β β
β β ββ PostgreSQL: Metadata, RBAC, audit trails, queries β β
β β ββ Qdrant: Vector embeddings (dense + sparse) β β
β β ββ Neo4j: Knowledge graph (Service/Library/Incident/Team) β β
β β ββ Redis: Cache, session state, pub/sub, task queues β β
β β ββ S3: PDFs, user uploads, exports β β
β ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β (REST API + WebSocket)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (React/TypeScript) β
β βββββββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββ β
β β Query Interface β β Dashboards β β Admin UI β β
β β ββ Search box β β ββ Query trends β β ββ Data source mgmt β β
β β ββ Results display β β ββ Knowledge β β ββ User management β β
β β ββ Citations β β β health β β ββ RBAC editor β β
β β ββ Follow-ups β β ββ Dependencies β β ββ API keys β β
β β ββ Knowledge graph β β ββ Alerts β β ββ System health β β
β βββββββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Component Layer (shadcn/ui + Tailwind) β β
β β ββ Query & Search components β β
β β ββ Chart & data table components (Recharts, TanStack Table) β β
β β ββ Knowledge graph visualizer (Force-Graph) β β
β β ββ Authentication flow (JWT) β β
β β ββ Real-time notifications (WebSocket) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β State Management (TanStack Query + Zustand) β β
β β ββ Server state: Queries, analytics, user data (TanStack Query) β β
β β ββ Client state: UI state, theme, filters (Zustand) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Backend Architecture (src/) β Agent-Based Design
Core Principle: Per-Source Agents
Rather than generic adapters flowing through a single pipeline, each data source is an independent agent with:
- Source-specific authentication & adapters
- Source-optimized chunking (preserves context like Confluence breadcrumbs, Jira comment threading)
- Independent Celery tasks (different polling cadences, priorities)
- Independent FastAPI routers (explicit webhooks like
/webhooks/jira) - Self-contained testing (
test_run.pyper agent)
This design ensures scalability by source, operational clarity, and production-grade maintainability.
Directory Structure
Note: The actual repo layout diverges from early plans. The implemented structure is below.
src/query_engine/andsrc/retrieval/referenced in earlier design docs do not exist β that logic lives inagent/. Graph endpoints live ingraph_store/, notsrc/api/graph.py.
agent/ # LangGraph multi-agent query engine (IMPLEMENTED)
βββ api.py # POST /agent/query β SSE streaming endpoint
βββ graph.py # LangGraph build: planner β [doc_search|ticket_lookup|live_docs|sql_query] β join β synthesiser β guardrail
βββ models.py # KnowledgeGraphState, QueryInput, ExecutionPlan, AgentResult, RetrievedChunk
βββ config.py # LLM + agent config
βββ prompts.py # Prompt templates
βββ agents/
β βββ planner.py # Breaks query into AgentTask list
β βββ synthesiser.py # Streams answer tokens from top chunks
β βββ guardrail.py # Validates answer against sources; sets escalate flag
β βββ _gemini.py # Gemini client helper (used in planner/synthesiser)
βββ tools/
βββ doc_search.py # Qdrant hybrid dense+sparse search
βββ ticket_lookup.py # Jira-specific retrieval
βββ live_docs.py # Firecrawl real-time doc fetching
βββ sql_query.py # NL-to-SQL: translates query β validated SELECT β asyncpg execution
βββ summariser.py # Context compression before synthesis
graph_store/ # Neo4j knowledge graph (IMPLEMENTED)
βββ api.py # GET /graph/nodes, POST /graph/ingest, GET /graph/traverse
βββ stream.py # WS /graph/stream β streams nodes+edges with 50ms delay
βββ extractor.py # Gemini 2.5 Pro entity+relationship extraction (4 types, whitelist rels)
βββ writer.py # Async Neo4j MERGE upserts, index creation
βββ reader.py # Cypher traversal: incidentβserviceβlibraryβchunks
βββ models.py # ExtractedEntity, ExtractedRelationship, ExtractionResult
βββ config.py # Neo4j connection settings
src/
βββ agents_app.py # Combined FastAPI app: all agent routers + Qdrant/Redis init
β
βββ jira_agent/ # JIRA ingestion agent (IMPLEMENTED)
β βββ __init__.py
β βββ config.py # JiraAgentConfig β JIRA_BASE_URL, JIRA_EMAIL, JIRA_API_TOKEN,
β β # JIRA_PROJECT_KEYS (csv), JIRA_WEBHOOK_SECRET, TEAM_ID
β βββ adapter.py # JiraAdapter β fetch_issue, fetch_all (JQL), fetch_incremental
β β # Basic auth (base64 email:api_token), ADF text extraction
β βββ chunker.py # chunk_jira_issue β chunk 0: issue body, chunks 1..N: comments
β β # Preserves thread structure for relation extraction
β βββ pipeline.py # ingest_issue / ingest_project β chunk β PII mask β embed β Qdrant
β β # Returns entity graph nodes for real-time streaming
β βββ tasks.py # Celery: jira_process_issue (queue=critical),
β β # jira_sync_project (queue=polling)
β βββ router.py # FastAPI: POST /webhooks/jira, POST /jira/sync/{project_key}
β βββ test_run.py # Mock + real runthrough; works without credentials
β
βββ confluence_agent/ # Confluence ingestion agent (IMPLEMENTED)
β βββ __init__.py
β βββ config.py # ConfluenceAgentConfig β BASE_URL, TOKEN, EMAIL,
β β # CONFLUENCE_SPACES (csv), CONFLUENCE_WEBHOOK_SECRET, TEAM_ID
β βββ adapter.py # ConfluenceAdapter β fetch_page, fetch_space, fetch_incremental (CQL)
β β # REST v2 API with pagination
β βββ chunker.py # chunk_confluence_page β BeautifulSoup heading-split + breadcrumbs
β β # [Space > Ancestor > Page] prefix on every chunk; tables = 1 chunk each
β β # Preserves hierarchy for entity linking
β βββ pipeline.py # ingest_page / ingest_space β chunk β PII mask β embed β Qdrant
β β # Returns entity graph nodes
β βββ tasks.py # Celery: confluence_process_page (queue=critical),
β β # confluence_sync_space (queue=polling),
β β # confluence_periodic_sync (beat, 60 min incremental sync)
β βββ router.py # FastAPI: POST /webhooks/confluence, POST /confluence/sync/{space_key}
β β # POST /confluence/search (for admin dashboard)
β βββ test_run.py # Mock + real runthrough; works without credentials
β
βββ file_agent/ # File ingestion agent (IMPLEMENTED)
β βββ __init__.py
β βββ config.py # FileAgentConfig β UPLOAD_DIR, MAX_FILE_SIZE, ALLOWED_TYPES
β βββ adapter.py # FileAdapter β handle PDFs, DOCX, PPTX, TXT
β β # Uses docling for multi-format parsing
β βββ chunker.py # chunk_file_document β respects document structure (sections, pages)
β βββ pipeline.py # ingest_file β chunk β PII mask β embed β Qdrant
β βββ tasks.py # Celery: file_process_upload (queue=critical)
β βββ router.py # FastAPI: POST /files/upload, GET /files/{file_id}
β βββ test_run.py
β
βββ shared/ # Shared utilities (used by all agents)
β βββ __init__.py
β βββ pii_masker.py # GLiNER-based PII detection (local, zero egress)
β βββ embedder.py # BGE-M3 embeddings (local inference)
β βββ qdrant_client.py # Qdrant connection + upsert helpers
β βββ entity_extractor.py # Extract entities/relationships from chunks (used per-agent)
β βββ models.py # Pydantic models (RawDocument, ChunkedDocument, Entity, Graph)
β βββ config.py # Shared config (QDRANT_URL, REDIS_URL, etc.)
β
βββ retrieval/ # T1, T2, T3 retrieval layers (shared across queries)
β βββ __init__.py
β βββ hybrid_search.py # T1: Dense + Sparse (RRF fusion) β queries Qdrant
β βββ reranker.py # BGE-reranker-v2-m3 integration
β βββ context_compressor.py # Compress top-5 into LLM context
β βββ cag_agent.py # T2: Cache-Augmented Generation (recent syncs)
β βββ live_doc_agent.py # T3: Real-time doc fetching (Firecrawl)
β βββ models.py # Pydantic models for retrieval
β
βββ query_engine/ # Query execution (LangGraph-based)
β βββ __init__.py
β βββ generator_agent.py # Generator LLM agent (creates answer from context)
β βββ critic_agent.py # Critic LLM agent (validates against sources)
β βββ orchestrator.py # LangGraph: routes query through retrieval β generation β validation
β βββ streaming.py # Stream answer chunks + citations + graph to frontend
β βββ models.py # Pydantic models for query responses
β
βββ redis/ # Redis utilities (shared)
β βββ __init__.py
β βββ cache.py # Caching layer (with TTL)
β βββ queues.py # Task queues (per-agent ingestion, webhook events)
β βββ session_state.py # Query session state
β βββ locks.py # Distributed locks (prevent concurrent agent syncs)
β βββ pubsub.py # Pub/sub for real-time graph updates to frontend (query_id β node)
β
βββ api/ # FastAPI main app + shared endpoints
β βββ __init__.py
β βββ auth.py # POST /auth/login, /auth/logout, /auth/refresh
β βββ query.py # POST /api/query (streaming), /api/query/{id}/follow-up
β βββ workspace.py # GET/POST /api/workspace/queries, /saved
β βββ admin.py # GET /api/admin/agents (show all agent statuses)
β βββ graph.py # GET /api/graph/entities, /api/graph/query/{query_id}
β
βββ db/ # Database models & utilities
β βββ __init__.py
β βββ models.py # SQLAlchemy models (User, Query, Document, Entity, Graph)
β βββ session.py # Database session management
β βββ init_db.py # Schema initialization
β
βββ auth/ # Authentication & authorization
β βββ __init__.py
β βββ jwt_handler.py # JWT encode/decode, token refresh
β βββ oauth.py # OAuth2 + SSO integration (phase 2)
β βββ rbac.py # Role-based access control decorator
β βββ permissions.py # Permission checks
β βββ models.py # User, Role, Permission models
β
βββ utils/ # Shared utilities
β βββ __init__.py
β βββ logger.py # Structured logging (JSON)
β βββ metrics.py # Prometheus metrics
β βββ telemetry.py # OpenTelemetry (phase 2)
β βββ exceptions.py # Custom exceptions
β
βββ tests/ # Comprehensive test suite
βββ __init__.py
βββ agents/ # Per-agent tests (JIRA, Confluence, File)
βββ retrieval/ # Retrieval pipeline tests
βββ query_engine/ # Query generation + validation tests
βββ fixtures/ # Pytest fixtures (mock data)
βββ integration/ # End-to-end scenarios
Key Backend Design Decisions
Per-Source Agents: Each source (Jira, Confluence, File) is an independent module with its own adapter, chunker, pipeline, and Celery tasks. This enables source-specific optimization and independent scaling.
Source-Optimized Chunking:
- Confluence: Preserves
[Space > Ancestor > Page]hierarchy for entity linking - Jira: Preserves comment threading for relation extraction
- File: Respects document structure (sections, pages)
- Each source extracts its own entity relationships
- Confluence: Preserves
Independent Celery Scheduling:
jira_sync_projectβ configurable interval (often 1 hour)confluence_periodic_syncβ beat scheduler (60 min incremental)file_process_uploadβ immediate (queue=critical)- Each agent controls its own cadence
PII Masking First: GLiNER runs in
shared/pii_masker.pyβ local, zero-egress, runs before Qdrant indexing.Entity Extraction Per-Agent: Each pipeline returns a graph of entities + relationships (e.g., Jira: issueβlinked_issue, Confluence: pageβlinked_page). Frontend streams these nodes as they're extracted.
Real-Time Graph Streaming: Via Redis pub/sub (
query_id β {nodes, edges}) β frontend doesn't wait for full completion.Redis Everywhere: Cache, queues, session state, distributed locks, and pub/sub all via Redis.
Hybrid Retrieval (T1): Dense (BGE-M3) + Sparse (BM25) via RRF β queries Qdrant.
Frontend Architecture (frontend/)
Directory Structure
frontend/
βββ index.html # Entry HTML (Vite serves this)
βββ vite.config.ts # Vite build config
βββ tsconfig.json # TypeScript config
βββ tailwind.config.ts # Tailwind design tokens + dark mode
βββ postcss.config.js # PostCSS + Tailwind plugins
βββ package.json # Dependencies + scripts
βββ .env.example # Required environment variables
β
βββ src/
β βββ main.tsx # React app entry point
β βββ App.tsx # Root component + routing
β β
β βββ components/
β β βββ common/ # Reusable components
β β β βββ Header.tsx # Top nav bar
β β β βββ Sidebar.tsx # Left navigation
β β β βββ Footer.tsx # Footer
β β β βββ Button.tsx # Button variants (from shadcn)
β β β βββ Input.tsx # Text input (from shadcn)
β β β βββ Card.tsx # Card container
β β β βββ Modal.tsx # Modal/dialog
β β β βββ Badge.tsx # Status badges
β β β βββ Tooltip.tsx # Tooltips
β β β βββ Toast.tsx # Toast notifications
β β β βββ Loading.tsx # Loading skeleton
β β β
β β βββ query/ # Query interface (Engineer primary)
β β β βββ SearchBox.tsx # Main search input (Cmd+K support)
β β β βββ QueryModal.tsx # Modal for new query
β β β βββ QueryHistory.tsx # Query history panel
β β β βββ SuggestedTopics.tsx # Related queries
β β β βββ QueryFeedback.tsx # Thumbs up/down
β β β
β β βββ results/ # Results display + knowledge graph
β β β βββ ResultsPage.tsx # Main results container
β β β βββ Answer.tsx # Generated answer with citations
β β β βββ Citations.tsx # Cited source chunks
β β β βββ FollowUp.tsx # Follow-up prompt
β β β βββ KnowledgeGraph.tsx # Knowledge graph visualization
β β β βββ GraphNode.tsx # Individual node component
β β β βββ RelatedDocs.tsx # Related document snippets
β β β βββ ShareResults.tsx # Share/export options
β β β
β β βββ analytics/ # Dashboards (Manager primary)
β β β βββ AnalyticsDashboard.tsx # Main analytics page
β β β βββ QueryTrendChart.tsx # Line chart for query volume
β β β βββ TopicsChart.tsx # Bar chart for topics
β β β βββ SuccessRateGauge.tsx # Gauge chart
β β β βββ KnowledgeHealthDashboard.tsx # Health metrics
β β β βββ DependencyTracker.tsx # Breaking changes table
β β β βββ EscalationTable.tsx # Unresolved queries
β β β βββ TeamSettings.tsx # Team configuration
β β β βββ AnalyticsExport.tsx # Export reports
β β β
β β βββ admin/ # Admin UI (Admin primary)
β β β βββ AdminDashboard.tsx # Main admin page
β β β βββ SystemHealth.tsx # Health status cards
β β β βββ DataSourceManager.tsx # Add/edit sources
β β β βββ DataSourceForm.tsx # Source configuration wizard
β β β βββ UserManager.tsx # User list + invite
β β β βββ RBACEditor.tsx # RBAC policy editor
β β β βββ APIKeyManager.tsx # Generate/revoke keys
β β β βββ SystemLogs.tsx # View logs + alerts
β β β
β β βββ auth/ # Authentication UI
β β βββ LoginPage.tsx # Login form (SSO + fallback)
β β βββ SSORedirect.tsx # OAuth callback handler
β β βββ ProtectedRoute.tsx # Route guard
β β
β βββ pages/ # Route pages (using TanStack Router)
β β βββ Home.tsx # Dashboard home
β β βββ QueryPage.tsx # Query results page
β β βββ AnalyticsPage.tsx # Analytics dashboards
β β βββ AdminPage.tsx # Admin dashboards
β β βββ WorkspacePage.tsx # Personal/team workspace
β β βββ NotFoundPage.tsx # 404 page
β β βββ ErrorPage.tsx # Error boundary
β β
β βββ hooks/ # Custom React hooks
β β βββ useSSEStream.ts # SSE consumer for POST /agent/query β manages fetch + ReadableStream parsing
β β βββ useGraphStream.ts # WebSocket consumer for WS /graph/stream β feeds Force-Graph 2D progressively
β β βββ useNotifications.ts # WebSocket consumer for WS /ws system notifications (future)
β β βββ useAnalytics.ts # Fetch analytics data
β β βββ useAuth.ts # Authentication state
β β βββ useTheme.ts # Dark mode toggle
β β βββ useLocalStorage.ts # Persist state to localStorage
β β βββ usePagination.ts # Pagination logic
β β βββ useDebounce.ts # Debounce search input
β β
β βββ stores/ # Zustand state management
β β βββ authStore.ts # User + auth state
β β βββ uiStore.ts # UI state (theme, sidebar open, etc.)
β β βββ filterStore.ts # Dashboard filters
β β βββ workspaceStore.ts # Workspace selections
β β
β βββ lib/
β β βββ api.ts # TanStack Query setup + HTTP client
β β βββ http.ts # httpx client wrapper (JWT refresh)
β β βββ auth.ts # JWT helpers, localStorage auth
β β βββ websocket.ts # WebSocket manager for alerts
β β βββ utils.ts # General utilities (debounce, etc.)
β β βββ validators.ts # Input validation (Zod)
β β βββ constants.ts # App-wide constants
β β βββ error-handler.ts # Centralized error handling
β β βββ date.ts # Date formatting helpers
β β
β βββ types/
β β βββ index.ts # Re-export all types
β β βββ api.ts # API response types
β β βββ user.ts # User + auth types
β β βββ query.ts # Query + results types
β β βββ analytics.ts # Analytics types
β β βββ components.ts # Component prop types
β β βββ errors.ts # Error types
β β
β βββ styles/
β β βββ globals.css # Global styles + Tailwind imports
β β βββ design-tokens.css # Design tokens (terracotta, white, dark mode)
β β βββ animations.css # Custom animations (optional)
β β βββ responsive.css # Responsive utility classes
β β
β βββ config/
β βββ routes.ts # TanStack Router configuration
β βββ env.ts # Environment variables + validation
β βββ queryClient.ts # TanStack Query client config
β
βββ public/ # Static assets
β βββ logo.svg # Logo
β βββ favicon.ico # Favicon
β βββ assets/ # Images, icons
β
βββ tests/
β βββ __mocks__/ # Mock data + API responses
β βββ components/ # Component tests (Vitest + RTL)
β βββ hooks/ # Hook tests
β βββ utils/ # Utility tests
β βββ setup.ts # Vitest + RTL setup
β
βββ .eslintrc.json # ESLint config
βββ .prettierrc # Prettier config
βββ README.md # Frontend development guide
Frontend Design Decisions
- Vite + React 18: Fast dev, instant HMR, minimal config. No SSR needed for SPA.
- TanStack Router: Fully typed routing; better DX than React Router v6.
- TanStack Query: Server state management with automatic caching/refetching.
- Zustand: Lightweight client state (theme, UI, filters); no Redux boilerplate.
- shadcn/ui + Tailwind: Copy-paste components, full control, design tokens system.
- Responsive Design: Mobile (320px), Tablet (768px), Desktop (1024px+).
- WebSocket: Native API for real-time alerts; no Socket.io overhead.
- JWT + httpOnly Cookies: Secure auth; backend validates on every request.
API Contract
Authentication
POST /api/auth/login
ββ Request: { email, password } or { sso_provider, sso_token }
ββ Response: { access_token, refresh_token, user: { id, email, role, team_id } }
ββ Sets httpOnly cookie: __auth_token
ββ Bearer token in Authorization header for all subsequent requests
POST /api/auth/refresh
ββ Request: { refresh_token }
ββ Response: { access_token }
ββ Auto-called by frontend before token expires
POST /api/auth/logout
ββ Clears httpOnly cookie
ββ Backend invalidates refresh token in Redis
Agent Webhook Endpoints (Per-Source)
POST /webhooks/jira
ββ Validates Jira webhook signature (X-Atlassian-Webhook-Signature)
ββ Extracts issue_created, issue_updated, comment_created events
ββ Routes to jira_process_issue Celery task (queue=critical)
ββ Returns immediately (202 Accepted)
POST /webhooks/confluence
ββ Validates Confluence webhook signature
ββ Extracts page_created, page_updated, page_trashed events
ββ Routes to confluence_process_page Celery task (queue=critical)
ββ Returns immediately (202 Accepted)
POST /files/upload
ββ Accepts multipart/form-data with file + team_id
ββ Routes to file_process_upload Celery task (queue=critical)
ββ Returns file_id immediately; processing async
ββ Frontend polls /files/{file_id} for status
POST /jira/sync/{project_key}
ββ Manual trigger; requires admin role
ββ Routes to jira_sync_project Celery task (queue=polling)
ββ Returns job_id for polling
POST /confluence/sync/{space_key}
ββ Manual trigger; requires admin role
ββ Routes to confluence_sync_space Celery task (queue=polling)
ββ Returns job_id for polling
Query API (Streaming SSE)
POST /agent/query
ββ Request: { query: string, team_id: string, session_id: string }
ββ Response: Content-Type: text/event-stream
β ββ event: plan_ready β { tasks: [AgentTask], reasoning: string }
β ββ event: agent_started β { agent: "doc_search"|"ticket_lookup"|"live_docs"|"sql_query"|"summariser" }
β ββ event: agent_done β { agent: string, chunks: [RetrievedChunk], confidence: "high"|"medium"|"low" }
β ββ event: synthesis_started β {}
β ββ event: answer_chunk β { chunk: string } (repeats, one per token)
β ββ event: guardrail_result β { score: float, escalate: bool }
β ββ event: done β {}
β ββ event: error β { message: string }
ββ Headers: Cache-Control: no-cache, X-Accel-Buffering: no
POST /api/query/{query_id}/feedback
ββ Request: { sentiment: "helpful"|"not_helpful"|"hallucinated", text?: string }
ββ Response: { success: true }
Knowledge Graph API
GET /graph/nodes?limit=50
ββ Response: { count: int, nodes: [{ label: string, name: string }] }
(excludes Chunk and Document nodes β returns Service/Library/Incident/Team only)
POST /graph/ingest
ββ Request: { chunk_ids: [string], team_id: string }
ββ Response: { ingested: int }
(fetches chunks from Supabase, runs Gemini extraction, upserts to Neo4j)
GET /graph/traverse?type=incident|service|library&name=string&team_id=string
ββ Response: { type, name, team_id, chunks: [string] }
(multi-hop Cypher traversal β returns text chunks for context augmentation)
WS /graph/stream
ββ Streams: node events, edge events, then done event (see Real-Time API above)
Analytics API
GET /api/analytics/queries?date_range=30d&team_id=...
ββ Response: {
β query_count: 1243,
β unique_users: 243,
β avg_response_time_ms: 1200,
β success_rate: 0.76,
β trend: { data: [{date, count}] }
β }
GET /api/analytics/knowledge-health
ββ Response: {
β overall_score: 7.2,
β coverage: 0.68,
β freshness: 0.82,
β accuracy: 0.76,
β accessibility: 0.71,
β gaps: [{ topic: "ORM patterns", queries: 12, solutions: 0 }]
β }
GET /api/analytics/dependencies
ββ Response: {
β dependencies: [{name, current_version, latest_version, breaking_changes}],
β alerts: 3
β }
Admin API
POST /api/admin/sources
ββ Request: { type, config, rbac_level }
ββ Response: { id, status, test_result }
ββ Triggers background sync
GET /api/admin/sources
ββ Response: [{ id, type, status, last_sync, record_count }]
PATCH /api/admin/sources/{id}
ββ Request: { name, config, rbac_level }
ββ Response: { updated_source }
DELETE /api/admin/sources/{id}
ββ Soft delete; preserves audit trail
---
POST /api/admin/users/invite
ββ Request: { emails: ["alice@..."], role, team_id }
ββ Response: { invitations: [{ email, invitation_id, expires_at }] }
ββ Sends email invite
GET /api/admin/users
ββ Response: [{ id, email, role, team_id, status, created_at }]
DELETE /api/admin/users/{user_id}
ββ Deactivates user (no hard delete for compliance)
---
POST /api/admin/rbac
ββ Request: { name, description, teams, sources, filters }
ββ Response: { id, policy }
ββ Returns doc count matching policy
GET /api/admin/rbac
ββ Response: [{ id, name, doc_count }]
PATCH /api/admin/rbac/{id}
ββ Update existing policy
---
POST /api/admin/api-keys
ββ Request: { name, permissions, rate_limits, expiry }
ββ Response: { key: "sk_...", created_at }
ββ Only returned once
GET /api/admin/api-keys
ββ Response: [{ name, created_at, last_used, permissions }]
Bash Development Testing
Use these instead of Swagger UI when you need to test streaming behaviour from the terminal.
Test SSE query stream (replaces Swagger β Swagger can't stream SSE):
#!/usr/bin/env bash
# test_query.sh β streams the SSE response token-by-token to stdout
BASE_URL="${GODSPEED_API:-http://localhost:8000}"
curl -N -s \
-X POST "${BASE_URL}/agent/query" \
-H "Content-Type: application/json" \
-d '{"query":"What is the auth service?","team_id":"team-1","session_id":"test-001"}' \
| while IFS= read -r line; do
echo "$line"
done
Test graph REST endpoints:
BASE_URL="${GODSPEED_API:-http://localhost:8000}"
# List all graph nodes
curl -s "${BASE_URL}/graph/nodes?limit=20" | python3 -m json.tool
# Traverse from a service
curl -s "${BASE_URL}/graph/traverse?type=service&name=auth-service&team_id=team-1" \
| python3 -m json.tool
# Ingest chunks into graph
curl -s -X POST "${BASE_URL}/graph/ingest" \
-H "Content-Type: application/json" \
-d '{"chunk_ids":["chunk-abc123"],"team_id":"team-1"}' \
| python3 -m json.tool
Test WebSocket graph stream (requires wscat β install with npm i -g wscat):
BASE_URL="${GODSPEED_WS:-ws://localhost:8000}"
wscat -c "${BASE_URL}/graph/stream"
# Prints node/edge/done events as they arrive
Test Jira webhook signature (bash + openssl):
BASE_URL="${GODSPEED_API:-http://localhost:8000}"
BODY='{"webhookEvent":"jira:issue_created","issue":{"id":"TEST-1","fields":{"summary":"Auth service down"}}}'
SECRET="your_jira_webhook_secret"
SIG="sha256=$(echo -n "${BODY}" | openssl dgst -sha256 -hmac "${SECRET}" | awk '{print $2}')"
curl -s -X POST "${BASE_URL}/webhooks/jira" \
-H "Content-Type: application/json" \
-H "X-Atlassian-Webhook-Signature: ${SIG}" \
-d "${BODY}"
Test file upload:
BASE_URL="${GODSPEED_API:-http://localhost:8000}"
curl -s -X POST "${BASE_URL}/files/upload" \
-F "file=@/path/to/doc.pdf" \
-F "team_id=team-1"
Real-Time API
There are two distinct real-time channels β do not conflate them:
Channel 1: Query streaming (SSE)
POST /agent/query β Content-Type: text/event-stream
Emits events in order:
event: plan_ready data: { tasks: [...], reasoning: "..." }
event: agent_started data: { agent: "doc_search" }
event: agent_done data: { agent: "doc_search", chunks: [...], confidence: "high" }
event: synthesis_started data: {}
event: answer_chunk data: { chunk: "token text" } β repeats per token
event: guardrail_result data: { score: 0.92, escalate: false }
event: done data: {}
event: error data: { message: "..." } β on failure
Request body: { query: string, team_id: string, session_id: string }
Channel 2: Knowledge graph visualization (WebSocket)
WS /graph/stream
Emits in order (50ms delay between each):
{ event: "node", id: "...", label: "Service", name: "auth-service" }
{ event: "edge", from: "...", to: "...", rel: "DEPENDS_ON" }
...
{ event: "done", nodes_count: 42, edges_count: 87 }
Channel 3: System notifications (WebSocket)
WS /ws (future β not yet implemented)
Will emit:
event: "query_answered" β { query_id, new_docs_count }
event: "escalation_spike" β { topic, spike_rate } (manager-only)
event: "breaking_change" β { dependency, version, url } (admin-only)
event: "data_sync_failed" β { source, error } (admin-only)
event: "knowledge_gap" β { topic, query_count } (all users)
Data Flow
Flow 1: Engineer Query β Answer
1. Engineer types query in SearchBox
ββ frontend sends POST /agent/query { query, team_id, session_id }
ββ frontend simultaneously opens WS /graph/stream for parallel graph rendering
2. Backend receives query via SSE stream
ββ LangGraph planner breaks query into AgentTask list β emits plan_ready
ββ Each agent runs (doc_search / ticket_lookup / live_docs) β emits agent_started + agent_done
ββ doc_search: BGE-M3 embed β Qdrant hybrid search (dense+sparse RRF) β top 50 β BGE reranker β top 5
ββ Synthesiser streams answer tokens β emits answer_chunk per token
ββ Guardrail validates answer against source chunks β emits guardrail_result
3. Guardrail result
ββ guardrail_passed=true β done event
ββ guardrail_passed=false + escalate=true β warning banner shown in frontend
ββ Citations come from agent_done chunks (already streamed in step 2)
4. Frontend connects to graph stream (parallel to query SSE)
ββ WS /graph/stream streams the pre-built Neo4j graph (query-scoped subgraph)
ββ Nodes arrive one-by-one with 50ms delays: { event:"node", label, name }
ββ Edges arrive after nodes: { event:"edge", from, to, rel }
ββ { event:"done" } signals completion
Note: The knowledge graph is pre-built at ingestion time by Gemini 2.5 Pro
(graph_store/extractor.py), not extracted from the answer at query time.
5. Frontend receives stream
ββ Displays answer immediately (no waiting)
ββ Renders citations as they arrive
ββ Knowledge graph appears once first connection established
ββ Related docs populate as backend fetches
ββ Full page interactive once final "done" event received
6. Feedback recorded
ββ Engineer clicks thumbs up/down
ββ Frontend POSTs /api/query/{id}/feedback
ββ Backend records sentiment + triggers analytics update
ββ Feedback visible in query history + aggregated for managers
Flow 2: Data Ingestion (Daily/Polling)
1. Ingestion task triggered
ββ Webhook from source (e.g., Notion) OR Celery periodic task
2. Fetch stage
ββ Adapter queries source API
ββ Detects new/updated items (via timestamps or ETags)
ββ Downloads content
3. Normalize stage (Docling)
ββ Converts PDF/HTML/markdown to clean markdown
ββ Extracts tables as markdown tables
ββ Detects code blocks + language
4. PII Mask stage (GLiNER, local)
ββ Scans text for PII (names, emails, IDs, etc.)
ββ Replaces PII with placeholders (e.g., [REDACTED_EMAIL])
ββ Logs redaction for audit trail
5. Chunk stage (Semantic)
ββ Splits by paragraph/sentence boundaries
ββ Never splits code blocks or lists
ββ 15% overlap between chunks
ββ 256β512 tokens per chunk
6. Tag stage (Metadata)
ββ Adds source_uri, source_type, ingested_at
ββ Adds RBAC tag (public / team / restricted)
ββ Computes content_hash (for change detection)
ββ Detects doc_type (SOP, API doc, PR, etc.)
7. Embed stage
ββ Sends chunks to BGE-M3
ββ Gets 384-dim dense vectors
ββ Extracts sparse BM25-like vectors
8. Index stage
ββ Uploads dense vectors to Qdrant HNSW index
ββ Uploads sparse vectors to Qdrant sparse index
ββ Upserts metadata (PostgreSQL)
ββ Updates Redis cache (last_sync_timestamp)
9. Complete
ββ Backend records sync success in PostgreSQL
ββ Triggers webhook for frontend real-time update
ββ Notifies admins if errors
Flow 3: Manager Views Analytics
1. Manager navigates to /analytics
2. Frontend loads analytics data
ββ POST /api/analytics/queries?date_range=30d
ββ POST /api/analytics/knowledge-health
ββ POST /api/analytics/dependencies
3. Backend aggregates from event logs
ββ Queries PostgreSQL (query_events table)
ββ Aggregates by date, team, topic
ββ Computes trends, success rates
ββ Identifies gaps from failed queries
4. Frontend renders dashboards
ββ Query trends (Recharts line chart)
ββ Topics (bar chart)
ββ Success rate (gauge)
ββ Escalations table
ββ Knowledge health heatmap
5. Optional: Manager exports report
ββ Frontend POSTs /api/analytics/export?format=pdf
ββ Backend generates PDF via ReportLab
ββ Streams PDF download to browser
Deployment Architecture
Development (docker-compose)
services:
postgres:
image: postgres:15
volumes: [./data/postgres:/var/lib/postgresql/data]
ports: [5432:5432]
redis:
image: redis:7-alpine
ports: [6379:6379]
qdrant:
image: qdrant/qdrant:latest
volumes: [./data/qdrant:/qdrant/storage]
ports: [6333:6333]
backend:
build: ./backend
ports: [8000:8000]
depends_on: [postgres, redis, qdrant]
environment:
SQLALCHEMY_DATABASE_URL: postgresql://user:pass@postgres:5432/godspeed
REDIS_URL: redis://redis:6379
QDRANT_URL: http://qdrant:6333
frontend:
build: ./frontend
ports: [3000:3000]
depends_on: [backend]
environment:
VITE_API_BASE_URL: http://localhost:8000
neo4j:
image: neo4j:5
ports: ["7474:7474", "7687:7687"]
volumes: [./data/neo4j:/data]
environment:
NEO4J_AUTH: neo4j/godspeed_dev
NEO4J_PLUGINS: '["apoc"]'
celery:
build: ./backend
command: celery -A src.celery_app worker -Q critical,default,polling -l info
depends_on: [postgres, redis, qdrant, neo4j]
environment:
SQLALCHEMY_DATABASE_URL: postgresql://user:pass@postgres:5432/godspeed
REDIS_URL: redis://redis:6379
NEO4J_URI: bolt://neo4j:7687
NEO4J_USERNAME: neo4j
NEO4J_PASSWORD: godspeed_dev
Production (Kubernetes)
# Deployments
- backend (FastAPI, 3 replicas, HPA)
- frontend (Nginx, 2 replicas, CDN)
- celery-worker (5 replicas, autoscaling on queue depth)
# StatefulSets
- postgres (with backup via S3)
- redis (cluster mode)
- qdrant (with persistence)
# Services
- backend-svc (ClusterIP)
- frontend-svc (LoadBalancer)
- postgres-svc (ClusterIP)
- redis-svc (ClusterIP)
- qdrant-svc (ClusterIP)
# ConfigMaps & Secrets
- app-config (env vars)
- api-keys (AWS S3, Notion OAuth, etc.)
- tls-certs (HTTPS)
# Ingress
- Routes /api/* to backend
- Routes /* to frontend
- TLS termination
Self-Hosted (Single Server)
nginx (reverse proxy, static frontend)
ββ localhost:8000 (FastAPI backend)
ββ localhost:5432 (PostgreSQL)
ββ localhost:6379 (Redis)
ββ localhost:6333 (Qdrant)
All services in systemd or Docker containers
Automated backups via Cron + S3
Monitoring via Prometheus + Grafana (optional)
Key Architectural Principles
- Separation of Concerns: Each layer (adapter, ingestion, retrieval, agent, API) has one responsibility.
- Stateless Backend: FastAPI scales horizontally; state lives in PostgreSQL/Redis.
- Async Everywhere: Celery for long-running tasks; FastAPI with asyncio for I/O.
- RBAC First: All queries filtered by user's team/permissions at retrieval time.
- Streaming Results: Don't wait for complete answer; stream chunks to frontend progressively.
- Local PII: GLiNER runs on-premises; zero data egress for compliance.
- Cacheable at Every Layer: Embeddings cached, searches cached, answers cached (with refresh policy).
- Observable: Structured logging, metrics, traces (OpenTelemetry phase 2).