GodSpeed / Docs /TECHSTACK.md
Ananth Shyam
feat: implement NL-to-SQL agent with PostgreSQL integration and enhance related documentation
825e852
# Technical Stack
> **Document purpose:** Complete tech stack decisions for backend, frontend, infrastructure, and deployment. Rationale for each choice included.
---
## Table of Contents
1. [Backend Stack](#backend-stack)
2. [Frontend Stack](#frontend-stack)
3. [Infrastructure & DevOps](#infrastructure--devops)
4. [Data Storage & Retrieval](#data-storage--retrieval)
5. [ML / NLP Components](#ml--nlp-components)
6. [Deployment Targets](#deployment-targets)
---
## Backend Stack
### Core Runtime & API Framework
| Component | Technology | Rationale | Version |
|---|---|---|---|
| **Runtime** | Python 3.11+ | Standard for ML/data pipelines; excellent async support (asyncio); rich data science ecosystem | 3.11+ |
| **API Framework** | FastAPI + Uvicorn | Native async, automatic OpenAPI docs, built-in dependency injection, <100ms request overhead | 0.104+ |
| **Async Task Queue** | Celery + Redis broker | Battle-tested for background jobs; Redis for pub/sub and state; supports task retries and rate limiting | Celery 5.3+ |
| **HTTP Client** | httpx | Async-first, mirrors requests API; used for external doc fetching (Firecrawl, Tavily) | 0.25+ |
| **Type Hints** | Pydantic v2 | Runtime validation, JSON schema generation, strong typing for API contracts | 2.5+ |
### Data Processing & Ingestion
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Document Parsing** | Docling | Handles multi-format docs (PDF, HTML, markdown); preserves tables and code blocks | PDF/HTML/markdown normalization |
| **PII Detection** | GLiNER (local) | Zero egress; runs on-device; detects names, emails, IDs, addresses; GDPR/DPDP compliant | Pre-ingestion masking |
| **Chunking** | Custom semantic chunker (src/ingestion/) | Respects paragraph/code block boundaries; 15% overlap; prevents mid-function splits | Semantic chunking |
| **Embedding Model** | BGE-M3 (BAAI) | Multilingual dense + sparse in one pass; 8,192 token context; 384-dim output | Dense embeddings |
| **Reranker** | BGE-reranker-v2-m3 | Reranks top 50 to top 5; multilingual; efficient | Re-ranking after fusion |
### Vector Store & Semantic Search
| Component | Technology | Rationale | Notes |
|---|---|---|---|
| **Vector Database** | Qdrant (self-hosted or cloud) | Horizontal scalable; batch upload API; RBAC support; integrated sparse vector index (Rust backend) | Production-grade RAG |
| **Sparse Indexing** | Qdrant's native sparse index | BM25-equivalent without separate Elasticsearch; reduces infrastructure | Hybrid search |
| **Full-Text Search** | Qdrant sparse vectors + RRF | Reciprocal Rank Fusion merges dense + sparse scores; deterministic, no tuning needed | Hybrid RAG |
### Agentic System & Orchestration
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Agent Framework** | LangGraph | Stateful graph execution; explicit node definitions; ReAct pattern built-in | Multi-agent coordination |
| **LLM Integration** | langchain-google-genai | Unified Gemini interface via LangChain; all agents use Gemini models β€” no OpenAI dependency in production | Generator + Critic agents |
| **Primary LLM** | Gemini 2.5 Pro (Google) | Used for planner, synthesiser, and CAG; long context window; best reasoning quality | Planner, Synthesiser (`PLANNER_MODEL`, `SYNTHESISER_MODEL`) |
| **Fast LLM** | Gemini 2.5 Flash (Google) | Cheap, low-latency inference for light tasks | Summariser, Guardrail, Graph extraction, NL-to-SQL translation (`SUMMARISER_MODEL`, `GUARDRAIL_MODEL`, `GRAPH_EXTRACTION_MODEL`) |
| **Extraction LLM** | Gemini 2.5 Flash (Google) | Entity/relationship extraction from chunks at ingestion time; strict output schema | graph_store/extractor.py |
| **Local LLM Fallback** | Ollama + Mistral-7B | For air-gapped deployments; lower quality but zero API dependency | HIPAA/FedRAMP compliance |
### Data Storage
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Primary DB** | PostgreSQL 15+ (Supabase) | ACID guarantees; full-text search via pg_trgm; JSON support; RBAC audit tables; direct connection via asyncpg for NL-to-SQL agent | Metadata, audit trails, RBAC, sql_query tool |
| **PostgreSQL Driver** | asyncpg 0.29+ | Native async PostgreSQL driver; used by sql_query tool for direct parameterized query execution | NL-to-SQL agent (`DATABASE_URL`) |
| **Cache Layer** | Redis (self-hosted or Upstash) | Sub-millisecond reads; sorted sets for leaderboards; pub/sub for real-time notifications | Session state, caching |
| **Knowledge Graph DB** | Neo4j (self-hosted) | Native graph traversal; Cypher query language; MERGE for idempotent upserts; indexes on chunk_id, doc_id, team_id | Entity/relationship graph, multi-hop traversal for context augmentation |
| **Document Storage** | S3-compatible (AWS S3 or MinIO) | Durable long-term storage for PDFs, uploaded files, exports | PDFs, user uploads |
| **Time-Series DB** (Phase 2) | TimescaleDB (PostgreSQL extension) | Query volume trends, latency metrics, user activity patterns; built on PostgreSQL | Analytics time-series |
### Integrations & Connectors
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Notion** | notion-sdk-py | Official SDK; handles pagination, auth refresh | Notion workspace sync |
| **Confluence** | atlassian-python-api | Community-maintained; REST v2 support | Confluence space sync |
| **GitHub** | PyGithub | Community standard; handles GraphQL fallback; rate limit management | GitHub repo + PR sync |
| **Slack** | slack-sdk | Official SDK; event subscriptions, bolt framework for webhooks | Slack DM queries, alerts |
| **Jira** | atlassian-python-api | Same maintainer as Confluence; REST v3 | Jira issue sync |
| **External Docs** | Firecrawl (primary) + BeautifulSoup4 (fallback) | Firecrawl handles JS-rendered SPAs; BS4 fallback for static sites | Live doc fetching |
| **Web Search** | Tavily API | Specialized for RAG; better results than Google for documentation | Web search fallback |
### Development & Testing
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Testing Framework** | pytest + pytest-asyncio | De-facto standard; strong fixture support; async test support | Unit + integration tests |
| **Test Coverage** | Coverage.py | Track line and branch coverage; integrates with pytest | Coverage reporting |
| **Linting** | Ruff | Ultra-fast (Rust-based); replaces flake8 + isort | Code quality |
| **Type Checking** | pyright | Strict mode by default; catches more errors than mypy | Type safety |
| **Code Formatting** | Black | Deterministic; removes style debates | Code style |
---
## Frontend Stack
### Core Framework & Build
| Component | Technology | Rationale | Version |
|---|---|---|---|
| **Framework** | React 18 + TypeScript | Component-driven; strong typing; ecosystem; learning curve justified by team size | 18.2+ |
| **Build Tool** | Vite | 10x faster dev startup than webpack; near-instant HMR; minimal config | 5.0+ |
| **Type Safety** | TypeScript 5 | End-to-end type safety; integrates with Vite; strict mode enforced | 5.3+ |
| **Routing** | TanStack Router | Fully typed routing; nested routes; TypeScript-first; smaller bundle than React Router v6 | 1.28+ |
| **HTTP Client** | TanStack Query (React Query) | Server state management; automatic caching/invalidation; background refetching | 5.28+ |
| **Client State** | Zustand | Lightweight; no boilerplate; TypeScript-first; DevTools support | 4.4+ |
| **Forms** | React Hook Form + Zod | Minimal re-renders; schema-driven validation; TypeScript support; integrates with shadcn | 7.0+ / 3.22+ |
### UI Components & Styling
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Component Library** | shadcn/ui (Radix primitives) | Unstyled accessibility-first primitives; copy-paste model (full control); Tailwind-based | UI components |
| **Styling** | Tailwind CSS 3 | Utility-first; design tokens system; dark mode support; minimal CSS bloat | Styling system |
| **Design Tokens** | Tailwind config + CSS variables | Terracotta/White primary palette; dark mode support; custom spacing/colors | Design consistency |
| **Icons** | Lucide React | 1,500+ modern icons; tree-shakeable; TypeScript support | Icon set |
| **Animations** | Framer Motion (opt-in) | Smooth transitions where needed; not required for MVP; zero-config for basics | Micro-interactions |
### Data Visualization & Tables
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Charts** | Recharts | React-first; responsive; TypeScript support; integrates well with Tailwind | Query analytics graphs |
| **Data Tables** | TanStack React Table (headless) | Headless table library; sorting/filtering/pagination built-in; zero styling opinions; composable | Query history, data tables |
| **Real-Time Graph** | Force-Graph 2D (canvas-based) | WebGL rendering; 100k+ nodes performant; knowledge graph visualization | Knowledge graph display |
| **Time-Series** | Recharts (LineChart) | Sufficient for phase 1; can swap to Apache ECharts if needed | Query volume trends |
### Authentication & Security
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Auth Flow** | JWT + refresh tokens (backend-issued) | Stateless; supports mobile/SPA; backend validates on each request | User authentication |
| **Storage** | httpOnly cookies (primary) + sessionStorage fallback | Protects against XSS (httpOnly); CORS-aware | Token storage |
| **RBAC Enforcement** | Client-side + backend validation | Client-side for UX (show/hide features); backend for security (always validate) | Role-based access |
| **HTTPS** | TLS 1.3+ | Enforced in production; self-signed in dev (certbot for self-hosted) | Transport security |
### Real-Time Communication
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **SSE (Server-Sent Events)** | Native `EventSource` / `fetch` with `text/event-stream` | Used for query streaming: `plan_ready`, `agent_started`, `agent_done`, `answer_chunk`, `done`, `error` β€” each event arrives as tokens stream from the LLM | Query result streaming from `POST /agent/query` |
| **WebSocket Client** | Native WebSocket API (no Socket.io for MVP) | Two uses: (1) knowledge graph visualization β€” streams nodes/edges from `WS /graph/stream` as Neo4j traversal results arrive; (2) real-time system notifications | Graph visualization + alerts |
| **Message Format** | JSON over SSE/WebSocket | SSE events: `data: {...}\n\n`; WebSocket: `{event, id, label, name}` / `{event, from, to, rel}` | Real-time updates |
| **Reconnection** | Exponential backoff (client-side) | Handles network hiccups; prevents thundering herd on backend | Connection resilience |
### Build & Development
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Package Manager** | pnpm | Faster than npm; lower disk usage; monorepo-friendly | Dependency management |
| **Linting** | ESLint + @typescript-eslint | Enforce code quality; catch common mistakes; TypeScript-aware | Code quality |
| **Formatting** | Prettier | Opinionated; removes style debates; integrates with ESLint | Code formatting |
| **Testing** | Vitest + React Testing Library | Vitest is Vite-native (instant); RTL tests real behavior, not implementation | Unit + integration tests |
| **E2E Testing** | Playwright | No JavaScript setup required; cross-browser; fast; integrates with CI | End-to-end tests |
| **Environment Management** | .env.local + .env.example | Vite built-in support; clear documentation of required vars | Config management |
### Deployment
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Build Output** | Static SPA (dist/) | No SSR needed; served by CDN or simple HTTP server; cacheable assets | Production build |
| **Hosting (Self-Hosted)** | Nginx reverse proxy + Docker | Minimal footprint; serves static assets + proxies /api to backend | Self-hosted deployment |
| **Hosting (Cloud)** | Vercel or Netlify (optional) | One-command deploy; auto-staging; analytics; functions for edge logic | Cloud deployment |
| **Docker** | Node.js 20-alpine (builder) + nginx:alpine (runtime) | Multi-stage build; minimal final image size; nginx for efficient static serving | Containerized deployment |
---
## Infrastructure & DevOps
### Containerization
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Container Runtime** | Docker | Industry standard; Dockerfile for reproducible builds | Containerization |
| **Orchestration** | Kubernetes (optional for scale) | Docker Compose for dev/single-server; K8s for multi-region scale | Container orchestration |
| **Image Registry** | Docker Hub or private ECR | Public for OSS; private for enterprise images | Image storage |
### CI/CD Pipeline
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **CI Platform** | GitHub Actions | Free for public repos; integrated with GitHub; sufficient for MVP | Automation |
| **Stages** | Lint β†’ Test β†’ Build β†’ Deploy | Standard pipeline; catches issues early; reproducible builds | Code quality gates |
| **Artifact Storage** | GitHub Artifacts (CI) / Docker Hub (images) | Temporary CI artifacts; persistent Docker image registry | Build artifacts |
### Monitoring & Observability (Phase 2)
| Component | Technology | Rationale | Purpose |
|---|---|---|---|
| **Logs** | Structured JSON logs (backend) + browser console (frontend) | ELK stack optional; JSON enables machine parsing | Log aggregation |
| **Metrics** | Prometheus-compatible endpoint (backend) | Counterpart Grafana dashboards; industry standard | Metrics collection |
| **Tracing** | OpenTelemetry (optional) | Distributed tracing for multi-service requests | Request tracing |
| **Alerts** | Prometheus AlertManager + PagerDuty (optional) | Route critical alerts to on-call | Alert routing |
---
## Data Storage & Retrieval
### Vector Embeddings Pipeline
| Stage | Technology | Input | Output |
|---|---|---|---|
| **Encode Dense** | BGE-M3 (BAAI) | Text chunks (up to 8,192 tokens) | 384-dim dense vectors |
| **Encode Sparse** | BGE-M3 built-in | Text chunks | Sparse BM25-like vectors |
| **Index Dense** | Qdrant HNSW | 384-dim vectors | HNSW graph (fast ANN search) |
| **Index Sparse** | Qdrant sparse index | Sparse vectors | Inverted index |
| **Fuse Rankings** | RRF (Reciprocal Rank Fusion) | Top 50 dense + Top 50 sparse | Top 50 merged by score |
| **Rerank** | BGE-reranker-v2-m3 | Top 50 fused results + query | Top 5 re-ranked |
| **Compress Context** | LLMContextCompress (custom) | Top 5 chunks (2–3 pages) | Summarised, deduplicated context |
### Similarity Metrics
| Query Type | Metric | Notes |
|---|---|---|
| **Semantic search** (dense) | Cosine similarity | Default; works well for queries with paraphrasing |
| **Keyword search** (sparse) | BM25-equivalent | Catches exact phrase matches |
| **Hybrid** (both) | RRF | Merges dense + sparse scores deterministically |
---
## ML / NLP Components
### Transformer Models (Hosted Locally or via API)
| Model | Purpose | Provider | Size | Cost |
|---|---|---|---|---|
| **BGE-M3** | Dense + Sparse embedding | Hugging Face (self-host) | 568M | Free |
| **BGE-reranker-v2-m3** | Re-ranking | Hugging Face (self-host) | 1.2B | Free |
| **GLiNER (base)** | PII detection | Hugging Face (self-host) | 333M | Free |
| **Gemini 2.5 Pro** | Planner + Synthesiser LLM | Google AI API (`GOOGLE_API_KEY`) | Hosted | Pay-per-token |
| **Gemini 2.5 Flash** | Summariser + Guardrail + Graph extraction | Google AI API (`GOOGLE_API_KEY`) | Hosted | Pay-per-token (cheaper) |
| **Mistral-7B** | Air-gapped fallback | Ollama (self-host) | 7B | Free (inference) |
### NLP Pipelines (Python)
| Component | Library | Purpose |
|---|---|---|
| **Tokenization** | spaCy v3 | Sentence tokenization, POS tagging for chunking logic |
| **Text Preprocessing** | NLTK / spaCy | Stop word removal, lemmatization for sparse indexing |
| **Named Entity Recognition** | GLiNER + spaCy | PII detection (GLiNER), domain entities (spaCy) |
| **Semantic Similarity** | BGE-M3 locally | Chunk-to-chunk similarity for deduplication |
---
## Deployment Targets
### Target 1: Self-Hosted (Single Server)
**Stack:**
- PostgreSQL 15 (primary DB)
- Redis (cache + queues)
- Qdrant (vector DB)
- FastAPI backend (single process + Gunicorn)
- React SPA frontend (Nginx)
- Ollama (optional, for air-gapped)
**Deployment:**
```bash
docker-compose up -d # Prod-grade docker-compose.yml with all services
```
**Hardware:** 8 vCPU, 32GB RAM, 500GB SSD (scales to 4TB for large deployments)
### Target 2: Kubernetes Multi-Region
**Services:**
- PostgreSQL (managed RDS / Cloud SQL) + read replicas
- Redis Cluster (Upstash / ElastiCache)
- Qdrant cluster (Helm chart, S3 backup)
- FastAPI service (HPA auto-scaling)
- React frontend (CDN + load balancer)
**Deployment:** Helm charts (phase 2)
### Target 3: Hybrid Air-Gapped (No External APIs)
**Fallbacks:**
- Ollama (Mistral-7B) instead of Gemini API
- Local embeddings (BGE-M3 on CPU or GPU)
- Self-hosted Firecrawl instance (optional)
- No external search (Tavily fallback disabled)
**Data Compliance:** GDPR/HIPAA/FedRAMP ready
---
## Summary: MVP Tech Stack Checklist
- [x] Backend: FastAPI + PostgreSQL + Redis + Qdrant
- [x] LLM: Gemini 2.5 Pro (planner + synthesiser) + Gemini 2.5 Flash (summariser + guardrail + graph extraction)
- [x] Frontend: React 18 + TypeScript + Vite + shadcn/ui
- [x] Ingestion: Docling + GLiNER + semantic chunking
- [x] Retrieval: BGE-M3 + RRF + BGE-reranker-v2-m3
- [x] NL-to-SQL: sql_query agent (Gemini Flash β†’ validated SELECT β†’ asyncpg)
- [x] Validation: Generator + Critic (LangGraph)
- [x] Real-Time: SSE (query streaming) + WebSocket (graph visualization + notifications) + Redis pub/sub
- [x] Knowledge Graph: Neo4j + Gemini 2.5 Pro extraction (graph_store/)
- [x] Auth: JWT + httpOnly cookies
- [x] Deployment: Docker + docker-compose (MVP) + Kubernetes (scale)
---
## Phase 2 Additions
- [ ] Monitoring: Prometheus + Grafana
- [ ] Tracing: OpenTelemetry + Jaeger
- [ ] Advanced Auth: OAuth2 + SSO (OIDC)
- [ ] Caching Strategy: Edge caching via CDN
- [ ] Rate Limiting: Redis-based token bucket
- [x] Webhook Signing: HMAC-SHA256 (Slack, GitHub, Jira) β€” already implemented in src/integrations/webhooks.py