Spaces:

Aaryan17
/

MAC

Running

App Files Files Community

MAC / docs /ARCHITECTURE.md

Aaryan17

chore: upload MAC codebase to HF Space

0e76632 verified 20 days ago

preview code

raw

history blame contribute delete

32.2 kB

	![1777301208961](image/ARCHITECTURE/1777301208961.png)![1777301226270](image/ARCHITECTURE/1777301226270.png)# MAC — Architecture Reference

	> Audience: an AI coding agent (or new engineer) dropped into this repo with no prior context.
	> Goal: understand the system end-to-end — every subsystem, the data flow, where state lives, and how the pieces secure and observe each other.
	> Read [README.md](README.md) for the elevator pitch and [MAC-PROGRESS.md](MAC-PROGRESS.md) for the build log. This file is the map.

	---

	## 0. Identity in one paragraph

	MAC (MBM AI Cloud) is a self-hosted, on-prem AI platform for MBM University Jodhpur. It gives students/faculty a private ChatGPT-style chat, a notebook IDE, RAG over college docs, an attendance system using face capture, an exam copy-check workflow with AI vision + plagiarism detection, and an admin/cluster console — all powered by open-source LLMs running on the college's own GPUs. There are no external API calls; vLLM serves models locally, and worker GPUs are added by enrolling them into the cluster.

	---

	## 1. Top-level topology

	```
	┌────────────────────────────────────────────────────────────────┐
	│ CLIENTS │
	│ • Web (SvelteKit PWA, served by Nginx in prod) │
	│ • API consumers (curl / Python SDK / scripts) │
	└──────────────────────────┬─────────────────────────────────────┘
	│ HTTPS
	▼
	┌──────────────────────┐
	│ NGINX │ ← TLS, gzip, /api → mac, / → static
	└──────────┬───────────┘
	│
	┌──────────────────┴──────────────────┐
	▼ ▼
	┌─────────────────┐ ┌──────────────────────┐
	│ SvelteKit │ │ FastAPI (mac.main) │
	│ static build │ │ /api/v1/* │
	└─────────────────┘ └──────────┬───────────┘
	│
	┌───────────────────────┬───────────────────┼─────────────────────────┐
	▼ ▼ ▼ ▼
	┌────────────┐ ┌────────────┐ ┌──────────────┐ ┌────────────────┐
	│ PostgreSQL │ │ Redis │ │ Qdrant │ │ SearXNG │
	│ (primary) │ │ cache / │ │ (RAG vec) │ │ (web search) │
	│ Alembic │ │ bl / rl │ └──────────────┘ └────────────────┘
	└────────────┘ └────────────┘

	▲ load_balancer.get_best_worker()
	│
	┌───────────────────────┴───────────────────────────────────────────────┐
	│ MAC CLUSTER (GPU workers, any LAN PC) │
	│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │
	│ │ vLLM (OpenAI │ │ Jupyter kernel │ │ worker_agent.py│ │
	│ │ compatible) │ │ gateway (opt.) │ │ (heartbeat) │ │
	│ └─────────────────┘ └─────────────────┘ └────────────────┘ │
	└────────────────────────────────────────────────────────────────────────┘
	```

	- Master node runs FastAPI + Postgres + Redis + Nginx + Qdrant + SearXNG.
	- Worker nodes run vLLM + an optional Jupyter kernel gateway, plus [worker_agent.py](worker_agent.py) which self-registers via an enrollment token and sends a heartbeat every 10s (GPU util, VRAM, RAM, CPU).
	- Routing is master-side: every user request hits the master API, which uses [mac/services/load_balancer.py](mac/services/load_balancer.py) to score-pick the best worker for an LLM call or notebook kernel.

	---

	## 2. Repository map (what lives where)

	```
	mac/
	main.py FastAPI app, lifespan (DB init, dev seeds, bg tasks),
	router mounts under /api/v1, root SPA fallback.
	config.py Pydantic Settings — every env var + .env loader.
	database.py Async SQLAlchemy engine + session factory; `Base`.
	utils/security.py JWT encode/decode + jti generation; password hash.
	middleware/
	auth_middleware.py Bearer extractor → JWT \| legacy-key \| scoped-key → User.
	rate_limit.py Per-user req/hour + token/day; injects X-RateLimit-*.
	feature_gate.py feature_required("ai_chat") dependency.
	models/ SQLAlchemy ORM models (one file per domain).
	schemas/ Pydantic request/response schemas.
	services/ Pure business logic, no HTTP — called by routers.
	routers/ FastAPI routers, thin: validate → call service → return.

	frontend/ SvelteKit 2 + Svelte 5 PWA.
	src/routes/ File-system routing: login, setup, chat, dashboard,
	admin, cluster, keys, settings, notifications, rag.
	src/lib/api.js Single fetch wrapper; one export per backend domain.
	src/lib/stores.js Svelte stores (auth, setup, features, chat, toast).
	src/lib/i18n.js 19 Indian languages, lazy-loaded strings, RTL support.
	static/manifest.json PWA manifest; static/sw.js is a no-cache worker.

	alembic/ Migration env + versioned revisions.
	nginx/ nginx.conf (HTTP) + nginx.https.conf (TLS).
	docker-compose.yml Master stack.
	docker-compose.worker.yml Worker stack (vLLM + worker_agent).
	worker_agent.py Enrollment + heartbeat agent for a GPU node.
	installer/ Windows installer (PyInstaller) + branding assets.
	tests/ pytest suite.
	```

	---

	## 3. Request lifecycle (the universal path)

	Every authenticated `/api/v1/*` request goes through these layers in order. Knowing this map means you can audit any new endpoint quickly.

	```
	HTTP request
	│
	▼
	[1] CORS middleware (mac/main.py — allow_origins from settings)
	│
	▼
	[2] Route handler (FastAPI) (mac/routers/*.py)
	│ Depends(get_current_user)
	▼
	[3] Auth resolver (mac/middleware/auth_middleware.py)
	│ Bearer token → branch:
	│ • mac_sk_live_* → legacy API key (User.api_key)
	│ • mac_sk_* → scoped API key (hashed, scopes, expiry, revocable)
	│ • else → JWT (verify sig, check exp, check jti blacklist)
	│ → returns User or raises 401
	│
	▼
	[4] Role guard (optional) require_admin / require_faculty_or_admin
	│
	▼
	[5] Feature gate (optional) feature_required("ai_chat")
	│ → reads system_config / feature_flags table → 403 if disabled for role
	│
	▼
	[6] Rate limit (optional) check_rate_limit
	│ • requests/hour from usage_log (per-user)
	│ • tokens/day from usage_log (per-user)
	│ • injects X-RateLimit-* into request.state
	│
	▼
	[7] Service layer mac/services/*.py
	│ Business logic — never imports FastAPI; takes db: AsyncSession.
	│
	▼
	[8] Response → HTTP middleware inject_rate_limit_headers reads request.state
	and stamps headers onto the response
	```

	This separation is the single most important design rule:
	routers do parsing + auth + I/O orchestration; services do business logic; models do persistence. Anything calling FastAPI types from a service is a smell.

	---

	## 4. Identity & access — auth, sessions, keys

	There are three ways a request authenticates, all collapsed to a `User` by `get_current_user`:

	### 4.1 JWT (interactive users)
	- Login: `POST /api/v1/auth/login` with `{roll_number, password}` → `{access_token, refresh_token, user}`.
	- Access token lifetime: `JWT_ACCESS_TOKEN_EXPIRE_MINUTES` (default 1440 = 24h).
	- Every access token carries a `jti` (random UUID) baked into the JWT claims by [mac/utils/security.py](mac/utils/security.py).
	- `POST /api/v1/auth/logout` blacklists the current `jti` in Redis with a TTL equal to remaining token life ([token_blacklist_service.py](mac/services/token_blacklist_service.py)). Refresh tokens are also revoked. Falls back to an in-process set if Redis is unreachable (dev only).
	- The JWT signing secret is not read from env in production — it's stored in `system_config` and seeded on first boot by [setup_service.get_or_generate_jwt_secret](mac/services/setup_service.py). This means restarting the app does not invalidate everyone's sessions.

	### 4.2 Legacy API keys
	- Format: `mac_sk_live_<48 hex chars>`. Stored on `users.api_key`. One per user.
	- Use case: scripts that need a stable long-lived credential.
	- Resolved before JWT in `auth_middleware` because of the prefix check.

	### 4.3 Scoped API keys
	- Format: `mac_sk_<random>`, hashed at rest. Created via `/api/v1/scoped-keys`.
	- Carry: scopes (list of allowed endpoints), optional expiry, label, revoke flag.
	- Resolved by [scoped_key_service.get_key_by_hash](mac/services/scoped_key_service.py).
	- Attached to `user._scoped_key` for downstream scope enforcement.

	### 4.4 Roles
	- `admin` \| `faculty` \| `student`. Enforced at the router layer via `require_admin` / `require_faculty_or_admin` dependencies.
	- Feature flags layer on top: a feature can be enabled globally but restricted to specific roles (see `feature_flags.roles`).

	### 4.5 First-run onboarding
	- `GET /api/v1/setup/status` → `{is_first_run, has_jwt_secret, version}`. Frontend uses this to decide whether to show the setup wizard or login.
	- `POST /api/v1/setup/create-admin` provisions the first admin and seals the system.

	---

	## 5. LLM serving & cluster routing

	### 5.1 Model registry — three layers of override
	`mac/services/llm_service.py::_BUILTIN_MODELS` holds the defaults (Qwen2.5 7B, Qwen2.5-Coder 7B/AWQ, DeepSeek-R1, etc.). Each entry knows its `served_name` (HF repo), `category` (`speed \| code \| reasoning \| intelligence`), `capabilities`, and `url_key` pointing at one of `vllm_speed_url \| vllm_code_url \| …` in `Settings`.

	Override priority:
	1. `MAC_MODELS_JSON` env var (a full JSON array) — replaces the registry entirely.
	2. `MAC_ENABLED_MODELS` env var (comma-separated IDs) — filters which built-ins are exposed.
	3. `MAC_AUTO_FALLBACK` — what `model="auto"` resolves to.

	### 5.2 The system prompt is forced
	`_inject_system_prompt` in `llm_service` prepends a hard-coded MAC identity prompt to every chat completion. This prevents the underlying Qwen/DeepSeek model from claiming to be "Qwen made by Alibaba" — it always says it is MAC, built by MBM University. If the user supplied a system message, MAC's identity is concatenated in front of theirs.

	### 5.3 Routing decision (where does this call go?)
	```
	chat request
	│
	▼
	llm_service._resolve_model_cluster(model_id)
	│
	▼
	load_balancer.get_best_worker(db, model_id)
	│ SELECT WorkerNode JOIN NodeModelDeployment
	│ WHERE node.status='active' AND deployment.status='ready'
	│ AND last_heartbeat within 30s
	│ ORDER BY gpu_util0.5 + (vram_used/total)0.3
	│
	├── candidate found → POST http://{node.ip}:{deployment.port}/v1/chat/completions
	│
	└── none → fall back to local config (settings.vllm_<category>_url)
	```

	vLLM speaks the OpenAI-compatible API, so the proxy is a near-pass-through with SSE streaming preserved end-to-end.

	### 5.4 Cluster lifecycle
	\| Event \| Endpoint \| Auth \| Effect \|
	\|---\|---\|---\|---\|
	\| Admin mints token \| `POST /cluster/enroll-token` \| admin JWT \| Single-use, expiring `EnrollmentToken` row \|
	\| Worker registers \| `POST /cluster/register` \| enroll token \| Creates `WorkerNode` (status `pending`) + reports IP, GPU specs \|
	\| Admin approves \| `POST /cluster/nodes/{id}/action {action:"approve"}` \| admin \| `status → active` \|
	\| Worker heartbeats \| `POST /cluster/heartbeat` \| node token \| Updates `last_heartbeat`, GPU util, VRAM, CPU, RAM, queue depth — also append-only into `cluster_heartbeats` (time-series for charts) \|
	\| Worker reports models \| (in heartbeat payload) \| — \| Upserts `NodeModelDeployment` rows \|
	\| Drain / remove \| `POST /cluster/nodes/{id}/action` \| admin \| Stops new traffic; allows in-flight to finish \|

	Workers older than 30s without a heartbeat are silently skipped by the balancer — no manual intervention needed if a worker dies.

	---

	## 6. Notebooks — multi-language code execution

	This is the most operationally complex subsystem. The design supports two backends and distributed execution.

	### 6.1 Architecture
	```
	Client (browser)
	│ WebSocket /ws/notebook/{notebook_id}?token=JWT
	▼
	mac/routers/notebook_ws.py
	│ • verifies JWT (decode_access_token, no DB hit on hot path)
	│ • registers connection in _connections[notebook_id]
	▼
	kernel_manager (mac/services/kernel_manager.py)
	│ Backend selection at startup:
	│ _docker_available() → Docker mode
	│ else → subprocess mode (dev)
	│
	├── DOCKER MODE (production)
	│ • spawns mac-kernel-{lang} container (image_prefix in config)
	│ • applies memory + CPU limits from settings
	│ • optionally attaches GPU (--gpus all) for ML kernels
	│ • streams stdout/stderr back as JSONL events
	│
	├── SUBPROCESS MODE (dev)
	│ • runs the language interpreter directly on the host
	│ • no isolation; only safe for trusted local dev
	│
	└── REMOTE WORKER MODE
	• load_balancer.get_notebook_worker(db) picks a worker with notebook_port
	• forwards the execute via the worker's Jupyter kernel gateway
	• output streams back to the master, then to the client
	```

	### 6.2 WebSocket protocol
	Defined at the top of [notebook_ws.py](mac/routers/notebook_ws.py):

	\| Direction \| Type \| Payload \|
	\|---\|---\|---\|
	\| C→S \| `execute` \| `{cell_id, code, language}` \|
	\| C→S \| `interrupt` \| `{kernel_id}` \|
	\| C→S \| `ping` \| — \|
	\| S→C \| `status` \| `{cell_id, execution_state: busy\\|idle}` \|
	\| S→C \| `stream` \| `{cell_id, name: stdout\\|stderr, text}` \|
	\| S→C \| `error` \| `{cell_id, ename, evalue, traceback[]}` \|
	\| S→C \| `pong` \| — \|

	### 6.3 State & limits
	- `KernelInstance` per session: `id`, `language`, `node_id`, `container_id`, `status`, `last_activity`, `execution_count`.
	- Idle kernels are reaped after `kernel_timeout` seconds (default 120).
	- Max concurrent kernels per node: `kernel_max_per_node` (default 10).
	- Persistent notebook content: `notebooks` table; cells stored as JSON, ordered.

	### 6.4 Why a custom protocol and not raw Jupyter?
	Three reasons: (a) we need user-scoped auth via our JWT; (b) we need to fan-out execution across the cluster, not just one local kernel; (c) we want the option to swap kernels for sandboxed runners later without changing the wire format.

	---

	## 7. RAG — private document search

	Pipeline: upload → chunk → embed → store → retrieve → augment.

	```
	PDF/MD/TXT upload (POST /rag/upload)
	│
	▼
	rag_service.ingest_document
	│ • text extraction (pypdf for PDF, plain read otherwise)
	│ • chunk_text(words=512, overlap=50) ← simple word-window
	│ • for each chunk:
	│ emb = await llm_service.embed(text) ← uses EMBEDDING_URL or vLLM
	│ qdrant.upsert(point=(uuid, emb, payload))
	│ • RAGDocument row in Postgres with chunk count & status
	▼
	QUERY TIME (chat with rag context)
	│
	▼
	rag_service.query(question, top_k=5)
	│ • emb_q = embed(question)
	│ • qdrant.search(collection, emb_q, top_k)
	│ • returns chunks + source metadata
	▼
	llm_service.chat with messages = [
	{role:"system", content: MAC_PROMPT + "\n\nContext:\n" + chunks},
	*user_messages,
	]
	```

	Collections (`RAGCollection`) namespace documents — e.g. one per subject. Documents (`RAGDocument`) track ownership and indexing status so the UI can show "Indexing 42/120 chunks…".

	---

	## 8. Attendance — face-based check-in

	### 8.1 Models
	- `FaceTemplate` — one per user, holds a face encoding (64-byte hash in dev; pluggable to `face_recognition`/`dlib` for production).
	- `AttendanceSession` — created by faculty: `{branch, section, subject, date, window_minutes}`.
	- `AttendanceRecord` — one per (session, student): `present \| absent \| late`, captured selfie hash, confidence, timestamp.

	### 8.2 Flow
	```
	1. Faculty: POST /attendance/sessions → creates session, returns join token + QR
	2. Student: GET /attendance/active → returns currently open sessions for them
	3. Student: POST /attendance/check-in → uploads base64 selfie
	server:
	• decodes image
	• hashes (sha256) — dedupe replay
	• computes encoding
	• compares to stored FaceTemplate
	• if (match && within window) → AttendanceRecord(present)
	• else → 401 with reason
	4. Faculty: GET /attendance/sessions/{id}/report → CSV / PDF roster
	```

	### 8.3 Anti-cheat heuristics
	- Session has a strict `window_minutes` — late arrivals are recorded as `late`, not `present`.
	- Same selfie hash twice in a session → rejected (replay block).
	- One record per (session, student) — UPSERT prevents stuffing.
	- Production: swap `_compute_face_encoding` for the real `face_recognition.face_encodings()` (the call sites already accept it; only the function body changes).

	---

	## 9. Copy Check — exam paper evaluation

	A faculty workflow that grades scanned answer sheets using vision-capable LLMs and runs cross-paper plagiarism detection. Models in `mac/models/copy_check.py`:

	\| Model \| Role \|
	\|---\|---\|
	\| `CopyCheckSession` \| One exam: subject, class, total_marks, syllabus_text \|
	\| `CopyCheckSheet` \| One student's submission: roll, scanned pages, AI score, feedback \|
	\| `CopyCheckPlagiarism` \| Pairwise similarity between two sheets in the same session \|

	### 9.1 Flow
	```
	Faculty creates session → uploads syllabus / answer key
	│
	▼
	For each student answer sheet (PDF or image bundle):
	• file saved under uploads/copy_check/{session_id}/{roll}/
	• AI vision model reads each page (multimodal LLM)
	• Service builds a structured prompt: syllabus + answer key + student answer
	• LLM returns { per_question_marks, total, weakness_summary, suggestions }
	• CopyCheckSheet upserted with score + JSON feedback
	│
	▼
	Plagiarism pass:
	• difflib.SequenceMatcher on extracted text per pair within session
	• CopyCheckPlagiarism row written for (sheet_a, sheet_b, similarity, flagged_passages)
	│
	▼
	Faculty reviews:
	• per-student PDF report (fpdf2)
	• plagiarism heatmap
	• can override AI marks before "publish"
	```

	### 9.2 Why the AI doesn't have final authority
	The faculty UI explicitly requires a "Reviewed & Approved" flag before any score becomes visible to students. The AI is graded as a recommendation — the audit trail records both the AI suggestion and the faculty's override. This is the legal/academic-integrity boundary.

	---

	## 10. Other domain modules (one-paragraph each)

	- Doubts forum ([doubts.py](mac/routers/doubts.py)): students post questions; faculty/peers answer; AI generates a draft answer that the asker can accept or replace. Threaded, taggable.
	- File sharing ([file_share.py](mac/routers/file_share.py)): admin/faculty upload class materials; per-file access scoping; per-download analytics in `file_downloads`.
	- Notifications ([notifications.py](mac/routers/notifications.py)): in-app + Web Push (`pywebpush`); endpoints registered via VAPID; one row per user-notification with read/unread state.
	- Academic ([academic.py](mac/routers/academic.py)): branches & sections — used to scope attendance, file sharing, and admin lists.
	- Doubt copy-check submissions ([model_submission_service.py](mac/services/model_submission_service.py)): community-trained adapter / LoRA submissions queued for admin review before being published as model registry entries.
	- Search ([search.py](mac/routers/search.py) + SearXNG): private metasearch, no Google, no telemetry, returned to the chat as a tool result.
	- Hardware / Network / System ([hardware.py](mac/routers/hardware.py), [network.py](mac/routers/network.py), [system.py](mac/routers/system.py)): admin diagnostics — local CPU/GPU/RAM, recommended models for the detected GPU, LAN discovery (`mac/services/discovery.py` UDP broadcast on port 7700), version & update status (`mac/services/updater.py` polls GitHub releases).
	- Quota ([quota.py](mac/routers/quota.py)): per-user requests/hour and tokens/day; admin can override per user; default from `RATE_LIMIT_*` env.
	- Guardrails ([guardrails.py](mac/routers/guardrails.py) + `guardrail_service`): admin-editable ruleset (banned terms, forbidden topics) applied as a pre-check on chat input and a post-check on model output.

	---

	## 11. Cross-cutting concerns

	### 11.1 Configuration
	One source of truth: [mac/config.py](mac/config.py) `Settings(BaseSettings)`. Every value reads from env or `.env`. `_fix_database_url` auto-promotes `postgres://` and `postgresql://` to `postgresql+asyncpg://` and strips `sslmode=` (it's handled in `connect_args` separately for Neon/Supabase). Adding a new tunable means: add a field to `Settings`, document it in `.env.example`, use `settings.your_field` everywhere — never read `os.environ` directly.

	### 11.2 Migrations
	Alembic-managed. Two revisions today:
	- `20260426_0001_initial_schema.py` — full original schema.
	- `20260427_0002_session1_tables.py` — feature flags, system_config, branches, sections, cluster_heartbeats, shared_files, file_downloads, video_projects, video_jobs.

	In dev (`MAC_ENV=development`), `init_db()` in `lifespan` creates tables idempotently from `Base.metadata`. In prod, you must run `alembic upgrade head` before serving traffic; tables are not auto-created. Whenever you add a column to a model, write a new revision.

	### 11.3 Background tasks
	Started in `lifespan` and cancelled on shutdown:
	- [updater.background_check_loop](mac/services/updater.py) — polls GitHub for new releases every `MAC_UPDATE_CHECK_INTERVAL_HOURS`.
	- [discovery.start_discovery_server](mac/services/discovery.py) — UDP broadcast listener so worker PCs on the LAN can find the master without manual IP entry.

	### 11.4 Caching, blacklisting, rate limits
	All Redis-backed with graceful in-process fallback:
	- JWT blacklist → `mac:bl:{jti}` keys with TTL = remaining token life.
	- Rate-limit counters → derived from `usage_log` rows (no Redis needed for counts).
	- Session/feature caches → not implemented yet; designed to live under `mac:cache:*`.

	### 11.5 Observability
	Every chat call is logged to `usage_log`: user_id, model_id, tokens_in, tokens_out, latency_ms, status, request_id (`generate_request_id` in `utils/security`). The dashboard route reads these for per-user charts. Cluster heartbeats are append-only into `cluster_heartbeats` so node history charts are just `SELECT … ORDER BY ts`.

	---

	## 12. Frontend — SvelteKit PWA

	### 12.1 Stack
	SvelteKit 2 + Svelte 5 + Tailwind 3 + Vite 6. Built as a static site (`@sveltejs/adapter-static` with `fallback: 'index.html'`) and served by Nginx in production, by Vite dev server with `/api` proxy to the FastAPI port in development.

	### 12.2 SPA mode
	The root has `+layout.js` with `export const ssr = false; export const prerender = false;` so the entire app is rendered client-side. This is intentional — it sidesteps hydration issues, and there is no SEO need for an internal college tool.

	### 12.3 State
	[src/lib/stores.js](frontend/src/lib/stores.js) holds Svelte stores:
	- `authStore` — `{user, token, refreshToken}`, with `init()` that re-hydrates from `localStorage` and re-fetches `/auth/me`, plus `login`/`logout`.
	- `setupStore` — `is_first_run` flag.
	- `featureStore` — feature flag map for conditional UI.
	- `chatStore` — local conversation history (per-session, not yet server-persisted).
	- `toast` — single-message notifier.

	### 12.4 API client
	[src/lib/api.js](frontend/src/lib/api.js) is the only place that talks HTTP. One `headers()` helper attaches the bearer token from `localStorage`. Each backend domain (`auth`, `query`, `models`, `cluster`, `rag`, `files`, …) is its own export with named methods. Adding a new endpoint = add a method here, never `fetch()` from a component directly.

	### 12.5 Auth/setup gate
	[+layout.svelte](frontend/src/routes/+layout.svelte) boots the app on first paint:
	1. `initLocale()` — detect language from `localStorage` / browser.
	2. `authStore.init()` — restore session.
	3. `checkSetup()` — first-run check.
	4. `loadFeatures()` — fetch flags.
	5. Redirect: first-run → `/setup`, no user on protected route → `/login`, root → `/chat` or `/login`.
	6. Render either `Sidebar + slot` (logged in) or bare `slot` (login/setup).

	### 12.6 Internationalisation
	[src/lib/i18n.js](frontend/src/lib/i18n.js) ships 19 Indian languages with lazy-loaded string maps and an `RTL_LOCALES` set (Urdu) that flips the layout direction. Adding a new locale = add to `SUPPORTED_LOCALES`, drop a translation map, no other file changes.

	### 12.7 PWA + service worker
	[static/manifest.json](frontend/static/manifest.json) declares the installable app + shortcuts. [static/sw.js](frontend/static/sw.js) is intentionally caching-disabled — every install/activate wipes all caches and there is no `fetch` handler. This was a deliberate decision: caching the SPA shell caused stale-build problems during rapid dev. Re-introduce caching only behind a versioned cache name with a clear invalidation strategy.

	---

	## 13. Deployment

	### 13.1 Master node (single command)
	```bash
	cd frontend && npm install && npm run build && cd ..
	cp .env.example .env # edit secrets
	docker compose up postgres -d
	docker compose run --rm mac alembic upgrade head
	docker compose up -d
	```
	Compose brings up: `mac` (FastAPI), `postgres`, `redis`, `qdrant`, `searxng`, `vllm-speed`, `nginx`. (Whisper/TTS commented out by default.)

	### 13.2 Adding a worker
	On master:
	```bash
	curl -X POST http://MASTER:8000/api/v1/cluster/enroll-token \
	-H "Authorization: Bearer ADMIN_JWT" -d '{"label":"Lab PC 1","expires_hours":24}'
	```
	On the worker PC:
	```bash
	MAC_MASTER_URL=http://MASTER:8000 \
	MAC_ENROLL_TOKEN=<token> \
	MAC_VLLM_PORT=8001 \
	docker compose -f docker-compose.worker.yml up -d
	```
	Then approve in admin → Cluster.

	### 13.3 HTTPS
	Drop certs into `nginx/ssl/`, swap the bind-mounted config to `nginx/nginx.https.conf` in `docker-compose.yml`, restart Nginx.

	### 13.4 Windows installer
	[installer/build_installer.ps1](installer/build_installer.ps1) builds a one-shot `dist/MAC-Installer.exe` (PyInstaller) that bootstraps Docker Desktop checks, clones/updates the repo, writes a sane `.env` with detected host IP, and starts the master stack. Branding assets are embedded base64 in [installer/embedded_assets.py](installer/embedded_assets.py) so the binary works even if image files are missing at runtime.

	---

	## 14. Security checklist (what every reviewer should verify)

	1. No external API calls. `grep -r "openai.com\\|api.anthropic\\|googleapis" mac/` should be empty. All inference is local.
	2. JWT secret is not in env in production. It's seeded in `system_config` on first boot and re-used across restarts.
	3. JWT carries `jti` and the auth middleware checks blacklist on every request.
	4. Every router requiring auth uses `Depends(get_current_user)` — search for any `@router.*` that doesn't and justify it.
	5. Role guards on admin-only operations: `Depends(require_admin)` on token mints, user list, cluster mutations, feature toggles, system restart.
	6. Rate limits on user-facing inference endpoints (`/query/*`, `/rag/query`).
	7. Scoped keys never logged in full; only the prefix is shown after creation.
	8. Worker enrollment tokens are single-use and time-limited (`expires_at` checked on register).
	9. Heartbeats authenticate by `node_token`, not by JWT — rotated on every approve/reactivate.
	10. CORS: `MAC_CORS_ORIGINS` defaults to `[""]` for ease of dev; set explicit origins in prod*.
	11. Uploads: `uploads/` is outside the static mount; copy-check sheets and RAG docs are served via authenticated endpoints, never directly.
	12. WebSocket auth: `notebook_ws` validates the JWT in the query string before `accept()`. Don't move the accept above the validation.

	---

	## 15. How to add a new feature (the recipe)

	1. Model: add a SQLAlchemy class in `mac/models/<domain>.py`, import it in `mac/main.py::lifespan` so `Base.metadata` knows.
	2. Migration: `alembic revision --autogenerate -m "add <thing>"` → review → commit.
	3. Schema: Pydantic request/response in `mac/schemas/<domain>.py`.
	4. Service: pure logic in `mac/services/<domain>_service.py`. Takes `db: AsyncSession` and primitive args. No FastAPI types.
	5. Router: thin handler in `mac/routers/<domain>.py`. Order of `Depends`: `get_db` → `get_current_user` → `require_*` → `feature_required("…")` → `check_rate_limit` (only if user-driven inference). Mount in `mac/main.py`.
	6. Feature flag: add a default to `feature_seeder.DEFAULT_FLAGS` so it can be toggled per role from admin.
	7. API client: add a method to `frontend/src/lib/api.js` under the matching export.
	8. Store (if it has UI state): add to `frontend/src/lib/stores.js`.
	9. Route: new directory under `frontend/src/routes/<feature>/+page.svelte`.
	10. Sidebar entry: edit `frontend/src/lib/components/Sidebar.svelte`.
	11. i18n: add new strings to `BASE` in `frontend/src/lib/i18n.js`.
	12. Test: at least one happy-path + one auth-failure pytest in `tests/`.

	Follow this and the system stays consistent. Skip steps and you'll end up with a feature that's invisible to the admin, untranslated, untested, or worse — bypassing the auth chain.

	---

	Last updated: 2026-04-27. If you change a subsystem and this file no longer matches reality, update it in the same PR.