![1777301208961](image/ARCHITECTURE/1777301208961.png)![1777301226270](image/ARCHITECTURE/1777301226270.png)# MAC — Architecture Reference > **Audience:** an AI coding agent (or new engineer) dropped into this repo with no prior context. > **Goal:** understand the system end-to-end — every subsystem, the data flow, where state lives, and how the pieces secure and observe each other. > Read [README.md](README.md) for the elevator pitch and [MAC-PROGRESS.md](MAC-PROGRESS.md) for the build log. This file is the *map*. --- ## 0. Identity in one paragraph MAC (MBM AI Cloud) is a **self-hosted, on-prem AI platform** for MBM University Jodhpur. It gives students/faculty a private ChatGPT-style chat, a notebook IDE, RAG over college docs, an attendance system using face capture, an exam copy-check workflow with AI vision + plagiarism detection, and an admin/cluster console — all powered by **open-source LLMs** running on the college's own GPUs. There are no external API calls; vLLM serves models locally, and worker GPUs are added by enrolling them into the cluster. --- ## 1. Top-level topology ``` ┌────────────────────────────────────────────────────────────────┐ │ CLIENTS │ │ • Web (SvelteKit PWA, served by Nginx in prod) │ │ • API consumers (curl / Python SDK / scripts) │ └──────────────────────────┬─────────────────────────────────────┘ │ HTTPS ▼ ┌──────────────────────┐ │ NGINX │ ← TLS, gzip, /api → mac, / → static └──────────┬───────────┘ │ ┌──────────────────┴──────────────────┐ ▼ ▼ ┌─────────────────┐ ┌──────────────────────┐ │ SvelteKit │ │ FastAPI (mac.main) │ │ static build │ │ /api/v1/* │ └─────────────────┘ └──────────┬───────────┘ │ ┌───────────────────────┬───────────────────┼─────────────────────────┐ ▼ ▼ ▼ ▼ ┌────────────┐ ┌────────────┐ ┌──────────────┐ ┌────────────────┐ │ PostgreSQL │ │ Redis │ │ Qdrant │ │ SearXNG │ │ (primary) │ │ cache / │ │ (RAG vec) │ │ (web search) │ │ Alembic │ │ bl / rl │ └──────────────┘ └────────────────┘ └────────────┘ └────────────┘ ▲ load_balancer.get_best_worker() │ ┌───────────────────────┴───────────────────────────────────────────────┐ │ MAC CLUSTER (GPU workers, any LAN PC) │ │ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │ │ │ vLLM (OpenAI │ │ Jupyter kernel │ │ worker_agent.py│ │ │ │ compatible) │ │ gateway (opt.) │ │ (heartbeat) │ │ │ └─────────────────┘ └─────────────────┘ └────────────────┘ │ └────────────────────────────────────────────────────────────────────────┘ ``` - **Master node** runs FastAPI + Postgres + Redis + Nginx + Qdrant + SearXNG. - **Worker nodes** run vLLM + an optional Jupyter kernel gateway, plus [worker_agent.py](worker_agent.py) which self-registers via an enrollment token and sends a heartbeat every 10s (GPU util, VRAM, RAM, CPU). - **Routing** is master-side: every user request hits the master API, which uses [mac/services/load_balancer.py](mac/services/load_balancer.py) to score-pick the best worker for an LLM call or notebook kernel. --- ## 2. Repository map (what lives where) ``` mac/ main.py FastAPI app, lifespan (DB init, dev seeds, bg tasks), router mounts under /api/v1, root SPA fallback. config.py Pydantic Settings — every env var + .env loader. database.py Async SQLAlchemy engine + session factory; `Base`. utils/security.py JWT encode/decode + jti generation; password hash. middleware/ auth_middleware.py Bearer extractor → JWT | legacy-key | scoped-key → User. rate_limit.py Per-user req/hour + token/day; injects X-RateLimit-*. feature_gate.py feature_required("ai_chat") dependency. models/ SQLAlchemy ORM models (one file per domain). schemas/ Pydantic request/response schemas. services/ Pure business logic, no HTTP — called by routers. routers/ FastAPI routers, thin: validate → call service → return. frontend/ SvelteKit 2 + Svelte 5 PWA. src/routes/ File-system routing: login, setup, chat, dashboard, admin, cluster, keys, settings, notifications, rag. src/lib/api.js Single fetch wrapper; one export per backend domain. src/lib/stores.js Svelte stores (auth, setup, features, chat, toast). src/lib/i18n.js 19 Indian languages, lazy-loaded strings, RTL support. static/manifest.json PWA manifest; static/sw.js is a no-cache worker. alembic/ Migration env + versioned revisions. nginx/ nginx.conf (HTTP) + nginx.https.conf (TLS). docker-compose.yml Master stack. docker-compose.worker.yml Worker stack (vLLM + worker_agent). worker_agent.py Enrollment + heartbeat agent for a GPU node. installer/ Windows installer (PyInstaller) + branding assets. tests/ pytest suite. ``` --- ## 3. Request lifecycle (the universal path) Every authenticated `/api/v1/*` request goes through these layers in order. Knowing this map means you can audit any new endpoint quickly. ``` HTTP request │ ▼ [1] CORS middleware (mac/main.py — allow_origins from settings) │ ▼ [2] Route handler (FastAPI) (mac/routers/*.py) │ Depends(get_current_user) ▼ [3] Auth resolver (mac/middleware/auth_middleware.py) │ Bearer token → branch: │ • mac_sk_live_* → legacy API key (User.api_key) │ • mac_sk_* → scoped API key (hashed, scopes, expiry, revocable) │ • else → JWT (verify sig, check exp, check jti blacklist) │ → returns User or raises 401 │ ▼ [4] Role guard (optional) require_admin / require_faculty_or_admin │ ▼ [5] Feature gate (optional) feature_required("ai_chat") │ → reads system_config / feature_flags table → 403 if disabled for role │ ▼ [6] Rate limit (optional) check_rate_limit │ • requests/hour from usage_log (per-user) │ • tokens/day from usage_log (per-user) │ • injects X-RateLimit-* into request.state │ ▼ [7] Service layer mac/services/*.py │ Business logic — never imports FastAPI; takes db: AsyncSession. │ ▼ [8] Response → HTTP middleware inject_rate_limit_headers reads request.state and stamps headers onto the response ``` This separation is the single most important design rule: **routers do parsing + auth + I/O orchestration; services do business logic; models do persistence.** Anything calling FastAPI types from a service is a smell. --- ## 4. Identity & access — auth, sessions, keys There are **three** ways a request authenticates, all collapsed to a `User` by `get_current_user`: ### 4.1 JWT (interactive users) - Login: `POST /api/v1/auth/login` with `{roll_number, password}` → `{access_token, refresh_token, user}`. - Access token lifetime: `JWT_ACCESS_TOKEN_EXPIRE_MINUTES` (default 1440 = 24h). - Every access token carries a `jti` (random UUID) baked into the JWT claims by [mac/utils/security.py](mac/utils/security.py). - `POST /api/v1/auth/logout` blacklists the current `jti` in Redis with a TTL equal to remaining token life ([token_blacklist_service.py](mac/services/token_blacklist_service.py)). Refresh tokens are also revoked. Falls back to an in-process set if Redis is unreachable (dev only). - The JWT signing secret is **not** read from env in production — it's stored in `system_config` and seeded on first boot by [setup_service.get_or_generate_jwt_secret](mac/services/setup_service.py). This means restarting the app does not invalidate everyone's sessions. ### 4.2 Legacy API keys - Format: `mac_sk_live_<48 hex chars>`. Stored on `users.api_key`. One per user. - Use case: scripts that need a stable long-lived credential. - Resolved before JWT in `auth_middleware` because of the prefix check. ### 4.3 Scoped API keys - Format: `mac_sk_`, hashed at rest. Created via `/api/v1/scoped-keys`. - Carry: scopes (list of allowed endpoints), optional expiry, label, revoke flag. - Resolved by [scoped_key_service.get_key_by_hash](mac/services/scoped_key_service.py). - Attached to `user._scoped_key` for downstream scope enforcement. ### 4.4 Roles - `admin` | `faculty` | `student`. Enforced at the router layer via `require_admin` / `require_faculty_or_admin` dependencies. - Feature flags layer on top: a feature can be enabled globally but restricted to specific roles (see `feature_flags.roles`). ### 4.5 First-run onboarding - `GET /api/v1/setup/status` → `{is_first_run, has_jwt_secret, version}`. Frontend uses this to decide whether to show the setup wizard or login. - `POST /api/v1/setup/create-admin` provisions the first admin and seals the system. --- ## 5. LLM serving & cluster routing ### 5.1 Model registry — three layers of override `mac/services/llm_service.py::_BUILTIN_MODELS` holds the defaults (Qwen2.5 7B, Qwen2.5-Coder 7B/AWQ, DeepSeek-R1, etc.). Each entry knows its `served_name` (HF repo), `category` (`speed | code | reasoning | intelligence`), `capabilities`, and `url_key` pointing at one of `vllm_speed_url | vllm_code_url | …` in `Settings`. Override priority: 1. `MAC_MODELS_JSON` env var (a full JSON array) — replaces the registry entirely. 2. `MAC_ENABLED_MODELS` env var (comma-separated IDs) — filters which built-ins are exposed. 3. `MAC_AUTO_FALLBACK` — what `model="auto"` resolves to. ### 5.2 The system prompt is forced `_inject_system_prompt` in `llm_service` prepends a hard-coded MAC identity prompt to **every** chat completion. This prevents the underlying Qwen/DeepSeek model from claiming to be "Qwen made by Alibaba" — it always says it is MAC, built by MBM University. If the user supplied a system message, MAC's identity is concatenated in front of theirs. ### 5.3 Routing decision (where does this call go?) ``` chat request │ ▼ llm_service._resolve_model_cluster(model_id) │ ▼ load_balancer.get_best_worker(db, model_id) │ SELECT WorkerNode JOIN NodeModelDeployment │ WHERE node.status='active' AND deployment.status='ready' │ AND last_heartbeat within 30s │ ORDER BY gpu_util*0.5 + (vram_used/total)*0.3 │ ├── candidate found → POST http://{node.ip}:{deployment.port}/v1/chat/completions │ └── none → fall back to local config (settings.vllm__url) ``` vLLM speaks the **OpenAI-compatible** API, so the proxy is a near-pass-through with SSE streaming preserved end-to-end. ### 5.4 Cluster lifecycle | Event | Endpoint | Auth | Effect | |---|---|---|---| | Admin mints token | `POST /cluster/enroll-token` | admin JWT | Single-use, expiring `EnrollmentToken` row | | Worker registers | `POST /cluster/register` | enroll token | Creates `WorkerNode` (status `pending`) + reports IP, GPU specs | | Admin approves | `POST /cluster/nodes/{id}/action {action:"approve"}` | admin | `status → active` | | Worker heartbeats | `POST /cluster/heartbeat` | node token | Updates `last_heartbeat`, GPU util, VRAM, CPU, RAM, queue depth — also append-only into `cluster_heartbeats` (time-series for charts) | | Worker reports models | (in heartbeat payload) | — | Upserts `NodeModelDeployment` rows | | Drain / remove | `POST /cluster/nodes/{id}/action` | admin | Stops new traffic; allows in-flight to finish | Workers older than 30s without a heartbeat are silently skipped by the balancer — no manual intervention needed if a worker dies. --- ## 6. Notebooks — multi-language code execution This is the most operationally complex subsystem. The design supports **two backends** and **distributed execution**. ### 6.1 Architecture ``` Client (browser) │ WebSocket /ws/notebook/{notebook_id}?token=JWT ▼ mac/routers/notebook_ws.py │ • verifies JWT (decode_access_token, no DB hit on hot path) │ • registers connection in _connections[notebook_id] ▼ kernel_manager (mac/services/kernel_manager.py) │ Backend selection at startup: │ _docker_available() → Docker mode │ else → subprocess mode (dev) │ ├── DOCKER MODE (production) │ • spawns mac-kernel-{lang} container (image_prefix in config) │ • applies memory + CPU limits from settings │ • optionally attaches GPU (--gpus all) for ML kernels │ • streams stdout/stderr back as JSONL events │ ├── SUBPROCESS MODE (dev) │ • runs the language interpreter directly on the host │ • no isolation; only safe for trusted local dev │ └── REMOTE WORKER MODE • load_balancer.get_notebook_worker(db) picks a worker with notebook_port • forwards the execute via the worker's Jupyter kernel gateway • output streams back to the master, then to the client ``` ### 6.2 WebSocket protocol Defined at the top of [notebook_ws.py](mac/routers/notebook_ws.py): | Direction | Type | Payload | |---|---|---| | C→S | `execute` | `{cell_id, code, language}` | | C→S | `interrupt` | `{kernel_id}` | | C→S | `ping` | — | | S→C | `status` | `{cell_id, execution_state: busy\|idle}` | | S→C | `stream` | `{cell_id, name: stdout\|stderr, text}` | | S→C | `error` | `{cell_id, ename, evalue, traceback[]}` | | S→C | `pong` | — | ### 6.3 State & limits - `KernelInstance` per session: `id`, `language`, `node_id`, `container_id`, `status`, `last_activity`, `execution_count`. - Idle kernels are reaped after `kernel_timeout` seconds (default 120). - Max concurrent kernels per node: `kernel_max_per_node` (default 10). - Persistent notebook content: `notebooks` table; cells stored as JSON, ordered. ### 6.4 Why a custom protocol and not raw Jupyter? Three reasons: (a) we need user-scoped auth via our JWT; (b) we need to fan-out execution across the cluster, not just one local kernel; (c) we want the option to swap kernels for sandboxed runners later without changing the wire format. --- ## 7. RAG — private document search Pipeline: **upload → chunk → embed → store → retrieve → augment**. ``` PDF/MD/TXT upload (POST /rag/upload) │ ▼ rag_service.ingest_document │ • text extraction (pypdf for PDF, plain read otherwise) │ • chunk_text(words=512, overlap=50) ← simple word-window │ • for each chunk: │ emb = await llm_service.embed(text) ← uses EMBEDDING_URL or vLLM │ qdrant.upsert(point=(uuid, emb, payload)) │ • RAGDocument row in Postgres with chunk count & status ▼ QUERY TIME (chat with rag context) │ ▼ rag_service.query(question, top_k=5) │ • emb_q = embed(question) │ • qdrant.search(collection, emb_q, top_k) │ • returns chunks + source metadata ▼ llm_service.chat with messages = [ {role:"system", content: MAC_PROMPT + "\n\nContext:\n" + chunks}, *user_messages, ] ``` Collections (`RAGCollection`) namespace documents — e.g. one per subject. Documents (`RAGDocument`) track ownership and indexing status so the UI can show "Indexing 42/120 chunks…". --- ## 8. Attendance — face-based check-in ### 8.1 Models - `FaceTemplate` — one per user, holds a face encoding (64-byte hash in dev; pluggable to `face_recognition`/`dlib` for production). - `AttendanceSession` — created by faculty: `{branch, section, subject, date, window_minutes}`. - `AttendanceRecord` — one per (session, student): `present | absent | late`, captured selfie hash, confidence, timestamp. ### 8.2 Flow ``` 1. Faculty: POST /attendance/sessions → creates session, returns join token + QR 2. Student: GET /attendance/active → returns currently open sessions for them 3. Student: POST /attendance/check-in → uploads base64 selfie server: • decodes image • hashes (sha256) — dedupe replay • computes encoding • compares to stored FaceTemplate • if (match && within window) → AttendanceRecord(present) • else → 401 with reason 4. Faculty: GET /attendance/sessions/{id}/report → CSV / PDF roster ``` ### 8.3 Anti-cheat heuristics - Session has a strict `window_minutes` — late arrivals are recorded as `late`, not `present`. - Same selfie hash twice in a session → rejected (replay block). - One record per (session, student) — UPSERT prevents stuffing. - Production: swap `_compute_face_encoding` for the real `face_recognition.face_encodings()` (the call sites already accept it; only the function body changes). --- ## 9. Copy Check — exam paper evaluation A faculty workflow that grades scanned answer sheets using vision-capable LLMs and runs cross-paper plagiarism detection. Models in `mac/models/copy_check.py`: | Model | Role | |---|---| | `CopyCheckSession` | One exam: subject, class, total_marks, syllabus_text | | `CopyCheckSheet` | One student's submission: roll, scanned pages, AI score, feedback | | `CopyCheckPlagiarism` | Pairwise similarity between two sheets in the same session | ### 9.1 Flow ``` Faculty creates session → uploads syllabus / answer key │ ▼ For each student answer sheet (PDF or image bundle): • file saved under uploads/copy_check/{session_id}/{roll}/ • AI vision model reads each page (multimodal LLM) • Service builds a structured prompt: syllabus + answer key + student answer • LLM returns { per_question_marks, total, weakness_summary, suggestions } • CopyCheckSheet upserted with score + JSON feedback │ ▼ Plagiarism pass: • difflib.SequenceMatcher on extracted text per pair within session • CopyCheckPlagiarism row written for (sheet_a, sheet_b, similarity, flagged_passages) │ ▼ Faculty reviews: • per-student PDF report (fpdf2) • plagiarism heatmap • can override AI marks before "publish" ``` ### 9.2 Why the AI doesn't have final authority The faculty UI explicitly requires a **"Reviewed & Approved"** flag before any score becomes visible to students. The AI is graded as a *recommendation* — the audit trail records both the AI suggestion and the faculty's override. This is the legal/academic-integrity boundary. --- ## 10. Other domain modules (one-paragraph each) - **Doubts forum** ([doubts.py](mac/routers/doubts.py)): students post questions; faculty/peers answer; AI generates a draft answer that the asker can accept or replace. Threaded, taggable. - **File sharing** ([file_share.py](mac/routers/file_share.py)): admin/faculty upload class materials; per-file access scoping; per-download analytics in `file_downloads`. - **Notifications** ([notifications.py](mac/routers/notifications.py)): in-app + Web Push (`pywebpush`); endpoints registered via VAPID; one row per user-notification with read/unread state. - **Academic** ([academic.py](mac/routers/academic.py)): branches & sections — used to scope attendance, file sharing, and admin lists. - **Doubt copy-check submissions** ([model_submission_service.py](mac/services/model_submission_service.py)): community-trained adapter / LoRA submissions queued for admin review before being published as model registry entries. - **Search** ([search.py](mac/routers/search.py) + SearXNG): private metasearch, no Google, no telemetry, returned to the chat as a tool result. - **Hardware / Network / System** ([hardware.py](mac/routers/hardware.py), [network.py](mac/routers/network.py), [system.py](mac/routers/system.py)): admin diagnostics — local CPU/GPU/RAM, recommended models for the detected GPU, LAN discovery (`mac/services/discovery.py` UDP broadcast on port 7700), version & update status (`mac/services/updater.py` polls GitHub releases). - **Quota** ([quota.py](mac/routers/quota.py)): per-user requests/hour and tokens/day; admin can override per user; default from `RATE_LIMIT_*` env. - **Guardrails** ([guardrails.py](mac/routers/guardrails.py) + `guardrail_service`): admin-editable ruleset (banned terms, forbidden topics) applied as a pre-check on chat input and a post-check on model output. --- ## 11. Cross-cutting concerns ### 11.1 Configuration **One source of truth:** [mac/config.py](mac/config.py) `Settings(BaseSettings)`. Every value reads from env or `.env`. `_fix_database_url` auto-promotes `postgres://` and `postgresql://` to `postgresql+asyncpg://` and strips `sslmode=` (it's handled in `connect_args` separately for Neon/Supabase). Adding a new tunable means: add a field to `Settings`, document it in `.env.example`, use `settings.your_field` everywhere — never read `os.environ` directly. ### 11.2 Migrations Alembic-managed. Two revisions today: - `20260426_0001_initial_schema.py` — full original schema. - `20260427_0002_session1_tables.py` — feature flags, system_config, branches, sections, cluster_heartbeats, shared_files, file_downloads, video_projects, video_jobs. In dev (`MAC_ENV=development`), `init_db()` in `lifespan` creates tables idempotently from `Base.metadata`. In prod, you **must** run `alembic upgrade head` before serving traffic; tables are not auto-created. Whenever you add a column to a model, write a new revision. ### 11.3 Background tasks Started in `lifespan` and cancelled on shutdown: - [updater.background_check_loop](mac/services/updater.py) — polls GitHub for new releases every `MAC_UPDATE_CHECK_INTERVAL_HOURS`. - [discovery.start_discovery_server](mac/services/discovery.py) — UDP broadcast listener so worker PCs on the LAN can find the master without manual IP entry. ### 11.4 Caching, blacklisting, rate limits All Redis-backed with **graceful in-process fallback**: - JWT blacklist → `mac:bl:{jti}` keys with TTL = remaining token life. - Rate-limit counters → derived from `usage_log` rows (no Redis needed for counts). - Session/feature caches → not implemented yet; designed to live under `mac:cache:*`. ### 11.5 Observability Every chat call is logged to `usage_log`: user_id, model_id, tokens_in, tokens_out, latency_ms, status, request_id (`generate_request_id` in `utils/security`). The dashboard route reads these for per-user charts. Cluster heartbeats are append-only into `cluster_heartbeats` so node history charts are just `SELECT … ORDER BY ts`. --- ## 12. Frontend — SvelteKit PWA ### 12.1 Stack SvelteKit 2 + Svelte 5 + Tailwind 3 + Vite 6. Built as a static site (`@sveltejs/adapter-static` with `fallback: 'index.html'`) and served by Nginx in production, by Vite dev server with `/api` proxy to the FastAPI port in development. ### 12.2 SPA mode The root has `+layout.js` with `export const ssr = false; export const prerender = false;` so the entire app is rendered client-side. This is intentional — it sidesteps hydration issues, and there is no SEO need for an internal college tool. ### 12.3 State [src/lib/stores.js](frontend/src/lib/stores.js) holds Svelte stores: - `authStore` — `{user, token, refreshToken}`, with `init()` that re-hydrates from `localStorage` and re-fetches `/auth/me`, plus `login`/`logout`. - `setupStore` — `is_first_run` flag. - `featureStore` — feature flag map for conditional UI. - `chatStore` — local conversation history (per-session, not yet server-persisted). - `toast` — single-message notifier. ### 12.4 API client [src/lib/api.js](frontend/src/lib/api.js) is the *only* place that talks HTTP. One `headers()` helper attaches the bearer token from `localStorage`. Each backend domain (`auth`, `query`, `models`, `cluster`, `rag`, `files`, …) is its own export with named methods. Adding a new endpoint = add a method here, never `fetch()` from a component directly. ### 12.5 Auth/setup gate [+layout.svelte](frontend/src/routes/+layout.svelte) boots the app on first paint: 1. `initLocale()` — detect language from `localStorage` / browser. 2. `authStore.init()` — restore session. 3. `checkSetup()` — first-run check. 4. `loadFeatures()` — fetch flags. 5. Redirect: first-run → `/setup`, no user on protected route → `/login`, root → `/chat` or `/login`. 6. Render either `Sidebar + slot` (logged in) or bare `slot` (login/setup). ### 12.6 Internationalisation [src/lib/i18n.js](frontend/src/lib/i18n.js) ships **19 Indian languages** with lazy-loaded string maps and an `RTL_LOCALES` set (Urdu) that flips the layout direction. Adding a new locale = add to `SUPPORTED_LOCALES`, drop a translation map, no other file changes. ### 12.7 PWA + service worker [static/manifest.json](frontend/static/manifest.json) declares the installable app + shortcuts. [static/sw.js](frontend/static/sw.js) is intentionally **caching-disabled** — every install/activate wipes all caches and there is no `fetch` handler. This was a deliberate decision: caching the SPA shell caused stale-build problems during rapid dev. Re-introduce caching only behind a versioned cache name with a clear invalidation strategy. --- ## 13. Deployment ### 13.1 Master node (single command) ```bash cd frontend && npm install && npm run build && cd .. cp .env.example .env # edit secrets docker compose up postgres -d docker compose run --rm mac alembic upgrade head docker compose up -d ``` Compose brings up: `mac` (FastAPI), `postgres`, `redis`, `qdrant`, `searxng`, `vllm-speed`, `nginx`. (Whisper/TTS commented out by default.) ### 13.2 Adding a worker On master: ```bash curl -X POST http://MASTER:8000/api/v1/cluster/enroll-token \ -H "Authorization: Bearer ADMIN_JWT" -d '{"label":"Lab PC 1","expires_hours":24}' ``` On the worker PC: ```bash MAC_MASTER_URL=http://MASTER:8000 \ MAC_ENROLL_TOKEN= \ MAC_VLLM_PORT=8001 \ docker compose -f docker-compose.worker.yml up -d ``` Then approve in admin → Cluster. ### 13.3 HTTPS Drop certs into `nginx/ssl/`, swap the bind-mounted config to `nginx/nginx.https.conf` in `docker-compose.yml`, restart Nginx. ### 13.4 Windows installer [installer/build_installer.ps1](installer/build_installer.ps1) builds a one-shot `dist/MAC-Installer.exe` (PyInstaller) that bootstraps Docker Desktop checks, clones/updates the repo, writes a sane `.env` with detected host IP, and starts the master stack. Branding assets are embedded base64 in [installer/embedded_assets.py](installer/embedded_assets.py) so the binary works even if image files are missing at runtime. --- ## 14. Security checklist (what every reviewer should verify) 1. **No external API calls.** `grep -r "openai.com\|api.anthropic\|googleapis" mac/` should be empty. All inference is local. 2. **JWT secret is not in env in production.** It's seeded in `system_config` on first boot and re-used across restarts. 3. **JWT carries `jti`** and the auth middleware checks blacklist on every request. 4. **Every router** requiring auth uses `Depends(get_current_user)` — search for any `@router.*` that doesn't and justify it. 5. **Role guards** on admin-only operations: `Depends(require_admin)` on token mints, user list, cluster mutations, feature toggles, system restart. 6. **Rate limits** on user-facing inference endpoints (`/query/*`, `/rag/query`). 7. **Scoped keys** never logged in full; only the prefix is shown after creation. 8. **Worker enrollment tokens** are single-use and time-limited (`expires_at` checked on register). 9. **Heartbeats authenticate by `node_token`**, not by JWT — rotated on every approve/reactivate. 10. **CORS:** `MAC_CORS_ORIGINS` defaults to `["*"]` for ease of dev; **set explicit origins in prod**. 11. **Uploads:** `uploads/` is outside the static mount; copy-check sheets and RAG docs are served via authenticated endpoints, never directly. 12. **WebSocket auth:** `notebook_ws` validates the JWT in the query string before `accept()`. Don't move the accept above the validation. --- ## 15. How to add a new feature (the recipe) 1. **Model:** add a SQLAlchemy class in `mac/models/.py`, import it in `mac/main.py::lifespan` so `Base.metadata` knows. 2. **Migration:** `alembic revision --autogenerate -m "add "` → review → commit. 3. **Schema:** Pydantic request/response in `mac/schemas/.py`. 4. **Service:** pure logic in `mac/services/_service.py`. Takes `db: AsyncSession` and primitive args. No FastAPI types. 5. **Router:** thin handler in `mac/routers/.py`. Order of `Depends`: `get_db` → `get_current_user` → `require_*` → `feature_required("…")` → `check_rate_limit` (only if user-driven inference). Mount in `mac/main.py`. 6. **Feature flag:** add a default to `feature_seeder.DEFAULT_FLAGS` so it can be toggled per role from admin. 7. **API client:** add a method to `frontend/src/lib/api.js` under the matching export. 8. **Store (if it has UI state):** add to `frontend/src/lib/stores.js`. 9. **Route:** new directory under `frontend/src/routes//+page.svelte`. 10. **Sidebar entry:** edit `frontend/src/lib/components/Sidebar.svelte`. 11. **i18n:** add new strings to `BASE` in `frontend/src/lib/i18n.js`. 12. **Test:** at least one happy-path + one auth-failure pytest in `tests/`. Follow this and the system stays consistent. Skip steps and you'll end up with a feature that's invisible to the admin, untranslated, untested, or worse — bypassing the auth chain. --- *Last updated: 2026-04-27. If you change a subsystem and this file no longer matches reality, update it in the same PR.*