![1777301208961](image/ARCHITECTURE/1777301208961.png)![1777301226270](image/ARCHITECTURE/1777301226270.png)# MAC — Architecture Reference

> **Audience:** an AI coding agent (or new engineer) dropped into this repo with no prior context.
> **Goal:** understand the system end-to-end — every subsystem, the data flow, where state lives, and how the pieces secure and observe each other.
> Read [README.md](README.md) for the elevator pitch and [MAC-PROGRESS.md](MAC-PROGRESS.md) for the build log. This file is the *map*.

---

## 0. Identity in one paragraph

MAC (MBM AI Cloud) is a **self-hosted, on-prem AI platform** for MBM University Jodhpur. It gives students/faculty a private ChatGPT-style chat, a notebook IDE, RAG over college docs, an attendance system using face capture, an exam copy-check workflow with AI vision + plagiarism detection, and an admin/cluster console — all powered by **open-source LLMs** running on the college's own GPUs. There are no external API calls; vLLM serves models locally, and worker GPUs are added by enrolling them into the cluster.

---

## 1. Top-level topology

```
┌────────────────────────────────────────────────────────────────┐
│  CLIENTS                                                        │
│   • Web (SvelteKit PWA, served by Nginx in prod)                │
│   • API consumers (curl / Python SDK / scripts)                 │
└──────────────────────────┬─────────────────────────────────────┘
                           │ HTTPS
                           ▼
                ┌──────────────────────┐
                │  NGINX               │ ← TLS, gzip, /api → mac, / → static
                └──────────┬───────────┘
                           │
        ┌──────────────────┴──────────────────┐
        ▼                                     ▼
┌─────────────────┐                 ┌──────────────────────┐
│  SvelteKit      │                 │  FastAPI  (mac.main) │
│  static build   │                 │  /api/v1/*           │
└─────────────────┘                 └──────────┬───────────┘
                                               │
   ┌───────────────────────┬───────────────────┼─────────────────────────┐
   ▼                       ▼                   ▼                         ▼
┌────────────┐      ┌────────────┐     ┌──────────────┐         ┌────────────────┐
│ PostgreSQL │      │  Redis     │     │  Qdrant      │         │  SearXNG       │
│  (primary) │      │  cache /   │     │  (RAG vec)   │         │  (web search)  │
│   Alembic  │      │  bl / rl   │     └──────────────┘         └────────────────┘
└────────────┘      └────────────┘
                                                                                 
                           ▲   load_balancer.get_best_worker()
                           │
   ┌───────────────────────┴───────────────────────────────────────────────┐
   │                          MAC CLUSTER (GPU workers, any LAN PC)         │
   │   ┌─────────────────┐    ┌─────────────────┐   ┌────────────────┐     │
   │   │ vLLM (OpenAI    │    │ Jupyter kernel  │   │ worker_agent.py│     │
   │   │ compatible)     │    │ gateway (opt.)  │   │ (heartbeat)    │     │
   │   └─────────────────┘    └─────────────────┘   └────────────────┘     │
   └────────────────────────────────────────────────────────────────────────┘
```

- **Master node** runs FastAPI + Postgres + Redis + Nginx + Qdrant + SearXNG.
- **Worker nodes** run vLLM + an optional Jupyter kernel gateway, plus [worker_agent.py](worker_agent.py) which self-registers via an enrollment token and sends a heartbeat every 10s (GPU util, VRAM, RAM, CPU).
- **Routing** is master-side: every user request hits the master API, which uses [mac/services/load_balancer.py](mac/services/load_balancer.py) to score-pick the best worker for an LLM call or notebook kernel.

---

## 2. Repository map (what lives where)

```
mac/
  main.py                    FastAPI app, lifespan (DB init, dev seeds, bg tasks),
                             router mounts under /api/v1, root SPA fallback.
  config.py                  Pydantic Settings — every env var + .env loader.
  database.py                Async SQLAlchemy engine + session factory; `Base`.
  utils/security.py          JWT encode/decode + jti generation; password hash.
  middleware/
    auth_middleware.py       Bearer extractor → JWT | legacy-key | scoped-key → User.
    rate_limit.py            Per-user req/hour + token/day; injects X-RateLimit-*.
    feature_gate.py          feature_required("ai_chat") dependency.
  models/                    SQLAlchemy ORM models (one file per domain).
  schemas/                   Pydantic request/response schemas.
  services/                  Pure business logic, no HTTP — called by routers.
  routers/                   FastAPI routers, thin: validate → call service → return.

frontend/                    SvelteKit 2 + Svelte 5 PWA.
  src/routes/                File-system routing: login, setup, chat, dashboard,
                             admin, cluster, keys, settings, notifications, rag.
  src/lib/api.js             Single fetch wrapper; one export per backend domain.
  src/lib/stores.js          Svelte stores (auth, setup, features, chat, toast).
  src/lib/i18n.js            19 Indian languages, lazy-loaded strings, RTL support.
  static/manifest.json       PWA manifest; static/sw.js is a no-cache worker.

alembic/                     Migration env + versioned revisions.
nginx/                       nginx.conf (HTTP) + nginx.https.conf (TLS).
docker-compose.yml           Master stack.
docker-compose.worker.yml    Worker stack (vLLM + worker_agent).
worker_agent.py              Enrollment + heartbeat agent for a GPU node.
installer/                   Windows installer (PyInstaller) + branding assets.
tests/                       pytest suite.
```

---

## 3. Request lifecycle (the universal path)

Every authenticated `/api/v1/*` request goes through these layers in order. Knowing this map means you can audit any new endpoint quickly.

```
HTTP request
  │
  ▼
[1] CORS middleware                  (mac/main.py — allow_origins from settings)
  │
  ▼
[2] Route handler (FastAPI)          (mac/routers/*.py)
  │   Depends(get_current_user)
  ▼
[3] Auth resolver                    (mac/middleware/auth_middleware.py)
  │   Bearer token → branch:
  │     • mac_sk_live_*  → legacy API key (User.api_key)
  │     • mac_sk_*       → scoped API key (hashed, scopes, expiry, revocable)
  │     • else           → JWT (verify sig, check exp, check jti blacklist)
  │   → returns User or raises 401
  │
  ▼
[4] Role guard (optional)            require_admin / require_faculty_or_admin
  │
  ▼
[5] Feature gate (optional)          feature_required("ai_chat")
  │   → reads system_config / feature_flags table → 403 if disabled for role
  │
  ▼
[6] Rate limit (optional)            check_rate_limit
  │   • requests/hour from usage_log (per-user)
  │   • tokens/day from usage_log (per-user)
  │   • injects X-RateLimit-* into request.state
  │
  ▼
[7] Service layer                    mac/services/*.py
  │   Business logic — never imports FastAPI; takes db: AsyncSession.
  │
  ▼
[8] Response → HTTP middleware       inject_rate_limit_headers reads request.state
                                     and stamps headers onto the response
```

This separation is the single most important design rule:
**routers do parsing + auth + I/O orchestration; services do business logic; models do persistence.** Anything calling FastAPI types from a service is a smell.

---

## 4. Identity & access — auth, sessions, keys

There are **three** ways a request authenticates, all collapsed to a `User` by `get_current_user`:

### 4.1 JWT (interactive users)
- Login: `POST /api/v1/auth/login` with `{roll_number, password}` → `{access_token, refresh_token, user}`.
- Access token lifetime: `JWT_ACCESS_TOKEN_EXPIRE_MINUTES` (default 1440 = 24h).
- Every access token carries a `jti` (random UUID) baked into the JWT claims by [mac/utils/security.py](mac/utils/security.py).
- `POST /api/v1/auth/logout` blacklists the current `jti` in Redis with a TTL equal to remaining token life ([token_blacklist_service.py](mac/services/token_blacklist_service.py)). Refresh tokens are also revoked. Falls back to an in-process set if Redis is unreachable (dev only).
- The JWT signing secret is **not** read from env in production — it's stored in `system_config` and seeded on first boot by [setup_service.get_or_generate_jwt_secret](mac/services/setup_service.py). This means restarting the app does not invalidate everyone's sessions.

### 4.2 Legacy API keys
- Format: `mac_sk_live_<48 hex chars>`. Stored on `users.api_key`. One per user.
- Use case: scripts that need a stable long-lived credential.
- Resolved before JWT in `auth_middleware` because of the prefix check.

### 4.3 Scoped API keys
- Format: `mac_sk_<random>`, hashed at rest. Created via `/api/v1/scoped-keys`.
- Carry: scopes (list of allowed endpoints), optional expiry, label, revoke flag.
- Resolved by [scoped_key_service.get_key_by_hash](mac/services/scoped_key_service.py).
- Attached to `user._scoped_key` for downstream scope enforcement.

### 4.4 Roles
- `admin` | `faculty` | `student`. Enforced at the router layer via `require_admin` / `require_faculty_or_admin` dependencies.
- Feature flags layer on top: a feature can be enabled globally but restricted to specific roles (see `feature_flags.roles`).

### 4.5 First-run onboarding
- `GET /api/v1/setup/status` → `{is_first_run, has_jwt_secret, version}`. Frontend uses this to decide whether to show the setup wizard or login.
- `POST /api/v1/setup/create-admin` provisions the first admin and seals the system.

---

## 5. LLM serving & cluster routing

### 5.1 Model registry — three layers of override
`mac/services/llm_service.py::_BUILTIN_MODELS` holds the defaults (Qwen2.5 7B, Qwen2.5-Coder 7B/AWQ, DeepSeek-R1, etc.). Each entry knows its `served_name` (HF repo), `category` (`speed | code | reasoning | intelligence`), `capabilities`, and `url_key` pointing at one of `vllm_speed_url | vllm_code_url | …` in `Settings`.

Override priority:
1. `MAC_MODELS_JSON` env var (a full JSON array) — replaces the registry entirely.
2. `MAC_ENABLED_MODELS` env var (comma-separated IDs) — filters which built-ins are exposed.
3. `MAC_AUTO_FALLBACK` — what `model="auto"` resolves to.

### 5.2 The system prompt is forced
`_inject_system_prompt` in `llm_service` prepends a hard-coded MAC identity prompt to **every** chat completion. This prevents the underlying Qwen/DeepSeek model from claiming to be "Qwen made by Alibaba" — it always says it is MAC, built by MBM University. If the user supplied a system message, MAC's identity is concatenated in front of theirs.

### 5.3 Routing decision (where does this call go?)
```
chat request
  │
  ▼
llm_service._resolve_model_cluster(model_id)
  │
  ▼
load_balancer.get_best_worker(db, model_id)
  │   SELECT WorkerNode JOIN NodeModelDeployment
  │   WHERE node.status='active' AND deployment.status='ready'
  │   AND last_heartbeat within 30s
  │   ORDER BY  gpu_util*0.5 + (vram_used/total)*0.3
  │
  ├── candidate found → POST http://{node.ip}:{deployment.port}/v1/chat/completions
  │
  └── none → fall back to local config (settings.vllm_<category>_url)
```

vLLM speaks the **OpenAI-compatible** API, so the proxy is a near-pass-through with SSE streaming preserved end-to-end.

### 5.4 Cluster lifecycle
| Event | Endpoint | Auth | Effect |
|---|---|---|---|
| Admin mints token | `POST /cluster/enroll-token` | admin JWT | Single-use, expiring `EnrollmentToken` row |
| Worker registers | `POST /cluster/register` | enroll token | Creates `WorkerNode` (status `pending`) + reports IP, GPU specs |
| Admin approves | `POST /cluster/nodes/{id}/action {action:"approve"}` | admin | `status → active` |
| Worker heartbeats | `POST /cluster/heartbeat` | node token | Updates `last_heartbeat`, GPU util, VRAM, CPU, RAM, queue depth — also append-only into `cluster_heartbeats` (time-series for charts) |
| Worker reports models | (in heartbeat payload) | — | Upserts `NodeModelDeployment` rows |
| Drain / remove | `POST /cluster/nodes/{id}/action` | admin | Stops new traffic; allows in-flight to finish |

Workers older than 30s without a heartbeat are silently skipped by the balancer — no manual intervention needed if a worker dies.

---

## 6. Notebooks — multi-language code execution

This is the most operationally complex subsystem. The design supports **two backends** and **distributed execution**.

### 6.1 Architecture
```
Client (browser)
  │ WebSocket /ws/notebook/{notebook_id}?token=JWT
  ▼
mac/routers/notebook_ws.py
  │ • verifies JWT (decode_access_token, no DB hit on hot path)
  │ • registers connection in _connections[notebook_id]
  ▼
kernel_manager (mac/services/kernel_manager.py)
  │ Backend selection at startup:
  │   _docker_available()  →  Docker mode
  │   else                 →  subprocess mode (dev)
  │
  ├── DOCKER MODE (production)
  │     • spawns mac-kernel-{lang} container (image_prefix in config)
  │     • applies memory + CPU limits from settings
  │     • optionally attaches GPU (--gpus all) for ML kernels
  │     • streams stdout/stderr back as JSONL events
  │
  ├── SUBPROCESS MODE (dev)
  │     • runs the language interpreter directly on the host
  │     • no isolation; only safe for trusted local dev
  │
  └── REMOTE WORKER MODE
        • load_balancer.get_notebook_worker(db) picks a worker with notebook_port
        • forwards the execute via the worker's Jupyter kernel gateway
        • output streams back to the master, then to the client
```

### 6.2 WebSocket protocol
Defined at the top of [notebook_ws.py](mac/routers/notebook_ws.py):

| Direction | Type | Payload |
|---|---|---|
| C→S | `execute` | `{cell_id, code, language}` |
| C→S | `interrupt` | `{kernel_id}` |
| C→S | `ping` | — |
| S→C | `status` | `{cell_id, execution_state: busy\|idle}` |
| S→C | `stream` | `{cell_id, name: stdout\|stderr, text}` |
| S→C | `error` | `{cell_id, ename, evalue, traceback[]}` |
| S→C | `pong` | — |

### 6.3 State & limits
- `KernelInstance` per session: `id`, `language`, `node_id`, `container_id`, `status`, `last_activity`, `execution_count`.
- Idle kernels are reaped after `kernel_timeout` seconds (default 120).
- Max concurrent kernels per node: `kernel_max_per_node` (default 10).
- Persistent notebook content: `notebooks` table; cells stored as JSON, ordered.

### 6.4 Why a custom protocol and not raw Jupyter?
Three reasons: (a) we need user-scoped auth via our JWT; (b) we need to fan-out execution across the cluster, not just one local kernel; (c) we want the option to swap kernels for sandboxed runners later without changing the wire format.

---

## 7. RAG — private document search

Pipeline: **upload → chunk → embed → store → retrieve → augment**.

```
PDF/MD/TXT upload (POST /rag/upload)
  │
  ▼
rag_service.ingest_document
  │ • text extraction (pypdf for PDF, plain read otherwise)
  │ • chunk_text(words=512, overlap=50)         ← simple word-window
  │ • for each chunk:
  │     emb = await llm_service.embed(text)     ← uses EMBEDDING_URL or vLLM
  │     qdrant.upsert(point=(uuid, emb, payload))
  │ • RAGDocument row in Postgres with chunk count & status
  ▼
QUERY TIME (chat with rag context)
  │
  ▼
rag_service.query(question, top_k=5)
  │ • emb_q = embed(question)
  │ • qdrant.search(collection, emb_q, top_k)
  │ • returns chunks + source metadata
  ▼
llm_service.chat with messages = [
    {role:"system", content: MAC_PROMPT + "\n\nContext:\n" + chunks},
    *user_messages,
  ]
```

Collections (`RAGCollection`) namespace documents — e.g. one per subject. Documents (`RAGDocument`) track ownership and indexing status so the UI can show "Indexing 42/120 chunks…".

---

## 8. Attendance — face-based check-in

### 8.1 Models
- `FaceTemplate` — one per user, holds a face encoding (64-byte hash in dev; pluggable to `face_recognition`/`dlib` for production).
- `AttendanceSession` — created by faculty: `{branch, section, subject, date, window_minutes}`.
- `AttendanceRecord` — one per (session, student): `present | absent | late`, captured selfie hash, confidence, timestamp.

### 8.2 Flow
```
1. Faculty: POST /attendance/sessions   → creates session, returns join token + QR
2. Student: GET  /attendance/active     → returns currently open sessions for them
3. Student: POST /attendance/check-in   → uploads base64 selfie
                                          server:
                                            • decodes image
                                            • hashes (sha256) — dedupe replay
                                            • computes encoding
                                            • compares to stored FaceTemplate
                                            • if (match && within window) → AttendanceRecord(present)
                                            • else → 401 with reason
4. Faculty: GET  /attendance/sessions/{id}/report  → CSV / PDF roster
```

### 8.3 Anti-cheat heuristics
- Session has a strict `window_minutes` — late arrivals are recorded as `late`, not `present`.
- Same selfie hash twice in a session → rejected (replay block).
- One record per (session, student) — UPSERT prevents stuffing.
- Production: swap `_compute_face_encoding` for the real `face_recognition.face_encodings()` (the call sites already accept it; only the function body changes).

---

## 9. Copy Check — exam paper evaluation

A faculty workflow that grades scanned answer sheets using vision-capable LLMs and runs cross-paper plagiarism detection. Models in `mac/models/copy_check.py`:

| Model | Role |
|---|---|
| `CopyCheckSession` | One exam: subject, class, total_marks, syllabus_text |
| `CopyCheckSheet` | One student's submission: roll, scanned pages, AI score, feedback |
| `CopyCheckPlagiarism` | Pairwise similarity between two sheets in the same session |

### 9.1 Flow
```
Faculty creates session → uploads syllabus / answer key
   │
   ▼
For each student answer sheet (PDF or image bundle):
   • file saved under uploads/copy_check/{session_id}/{roll}/
   • AI vision model reads each page (multimodal LLM)
   • Service builds a structured prompt: syllabus + answer key + student answer
   • LLM returns { per_question_marks, total, weakness_summary, suggestions }
   • CopyCheckSheet upserted with score + JSON feedback
   │
   ▼
Plagiarism pass:
   • difflib.SequenceMatcher on extracted text per pair within session
   • CopyCheckPlagiarism row written for (sheet_a, sheet_b, similarity, flagged_passages)
   │
   ▼
Faculty reviews:
   • per-student PDF report (fpdf2)
   • plagiarism heatmap
   • can override AI marks before "publish"
```

### 9.2 Why the AI doesn't have final authority
The faculty UI explicitly requires a **"Reviewed & Approved"** flag before any score becomes visible to students. The AI is graded as a *recommendation* — the audit trail records both the AI suggestion and the faculty's override. This is the legal/academic-integrity boundary.

---

## 10. Other domain modules (one-paragraph each)

- **Doubts forum** ([doubts.py](mac/routers/doubts.py)): students post questions; faculty/peers answer; AI generates a draft answer that the asker can accept or replace. Threaded, taggable.
- **File sharing** ([file_share.py](mac/routers/file_share.py)): admin/faculty upload class materials; per-file access scoping; per-download analytics in `file_downloads`.
- **Notifications** ([notifications.py](mac/routers/notifications.py)): in-app + Web Push (`pywebpush`); endpoints registered via VAPID; one row per user-notification with read/unread state.
- **Academic** ([academic.py](mac/routers/academic.py)): branches & sections — used to scope attendance, file sharing, and admin lists.
- **Doubt copy-check submissions** ([model_submission_service.py](mac/services/model_submission_service.py)): community-trained adapter / LoRA submissions queued for admin review before being published as model registry entries.
- **Search** ([search.py](mac/routers/search.py) + SearXNG): private metasearch, no Google, no telemetry, returned to the chat as a tool result.
- **Hardware / Network / System** ([hardware.py](mac/routers/hardware.py), [network.py](mac/routers/network.py), [system.py](mac/routers/system.py)): admin diagnostics — local CPU/GPU/RAM, recommended models for the detected GPU, LAN discovery (`mac/services/discovery.py` UDP broadcast on port 7700), version & update status (`mac/services/updater.py` polls GitHub releases).
- **Quota** ([quota.py](mac/routers/quota.py)): per-user requests/hour and tokens/day; admin can override per user; default from `RATE_LIMIT_*` env.
- **Guardrails** ([guardrails.py](mac/routers/guardrails.py) + `guardrail_service`): admin-editable ruleset (banned terms, forbidden topics) applied as a pre-check on chat input and a post-check on model output.

---

## 11. Cross-cutting concerns

### 11.1 Configuration
**One source of truth:** [mac/config.py](mac/config.py) `Settings(BaseSettings)`. Every value reads from env or `.env`. `_fix_database_url` auto-promotes `postgres://` and `postgresql://` to `postgresql+asyncpg://` and strips `sslmode=` (it's handled in `connect_args` separately for Neon/Supabase). Adding a new tunable means: add a field to `Settings`, document it in `.env.example`, use `settings.your_field` everywhere — never read `os.environ` directly.

### 11.2 Migrations
Alembic-managed. Two revisions today:
- `20260426_0001_initial_schema.py` — full original schema.
- `20260427_0002_session1_tables.py` — feature flags, system_config, branches, sections, cluster_heartbeats, shared_files, file_downloads, video_projects, video_jobs.

In dev (`MAC_ENV=development`), `init_db()` in `lifespan` creates tables idempotently from `Base.metadata`. In prod, you **must** run `alembic upgrade head` before serving traffic; tables are not auto-created. Whenever you add a column to a model, write a new revision.

### 11.3 Background tasks
Started in `lifespan` and cancelled on shutdown:
- [updater.background_check_loop](mac/services/updater.py) — polls GitHub for new releases every `MAC_UPDATE_CHECK_INTERVAL_HOURS`.
- [discovery.start_discovery_server](mac/services/discovery.py) — UDP broadcast listener so worker PCs on the LAN can find the master without manual IP entry.

### 11.4 Caching, blacklisting, rate limits
All Redis-backed with **graceful in-process fallback**:
- JWT blacklist → `mac:bl:{jti}` keys with TTL = remaining token life.
- Rate-limit counters → derived from `usage_log` rows (no Redis needed for counts).
- Session/feature caches → not implemented yet; designed to live under `mac:cache:*`.

### 11.5 Observability
Every chat call is logged to `usage_log`: user_id, model_id, tokens_in, tokens_out, latency_ms, status, request_id (`generate_request_id` in `utils/security`). The dashboard route reads these for per-user charts. Cluster heartbeats are append-only into `cluster_heartbeats` so node history charts are just `SELECT … ORDER BY ts`.

---

## 12. Frontend — SvelteKit PWA

### 12.1 Stack
SvelteKit 2 + Svelte 5 + Tailwind 3 + Vite 6. Built as a static site (`@sveltejs/adapter-static` with `fallback: 'index.html'`) and served by Nginx in production, by Vite dev server with `/api` proxy to the FastAPI port in development.

### 12.2 SPA mode
The root has `+layout.js` with `export const ssr = false; export const prerender = false;` so the entire app is rendered client-side. This is intentional — it sidesteps hydration issues, and there is no SEO need for an internal college tool.

### 12.3 State
[src/lib/stores.js](frontend/src/lib/stores.js) holds Svelte stores:
- `authStore` — `{user, token, refreshToken}`, with `init()` that re-hydrates from `localStorage` and re-fetches `/auth/me`, plus `login`/`logout`.
- `setupStore` — `is_first_run` flag.
- `featureStore` — feature flag map for conditional UI.
- `chatStore` — local conversation history (per-session, not yet server-persisted).
- `toast` — single-message notifier.

### 12.4 API client
[src/lib/api.js](frontend/src/lib/api.js) is the *only* place that talks HTTP. One `headers()` helper attaches the bearer token from `localStorage`. Each backend domain (`auth`, `query`, `models`, `cluster`, `rag`, `files`, …) is its own export with named methods. Adding a new endpoint = add a method here, never `fetch()` from a component directly.

### 12.5 Auth/setup gate
[+layout.svelte](frontend/src/routes/+layout.svelte) boots the app on first paint:
1. `initLocale()` — detect language from `localStorage` / browser.
2. `authStore.init()` — restore session.
3. `checkSetup()` — first-run check.
4. `loadFeatures()` — fetch flags.
5. Redirect: first-run → `/setup`, no user on protected route → `/login`, root → `/chat` or `/login`.
6. Render either `Sidebar + slot` (logged in) or bare `slot` (login/setup).

### 12.6 Internationalisation
[src/lib/i18n.js](frontend/src/lib/i18n.js) ships **19 Indian languages** with lazy-loaded string maps and an `RTL_LOCALES` set (Urdu) that flips the layout direction. Adding a new locale = add to `SUPPORTED_LOCALES`, drop a translation map, no other file changes.

### 12.7 PWA + service worker
[static/manifest.json](frontend/static/manifest.json) declares the installable app + shortcuts. [static/sw.js](frontend/static/sw.js) is intentionally **caching-disabled** — every install/activate wipes all caches and there is no `fetch` handler. This was a deliberate decision: caching the SPA shell caused stale-build problems during rapid dev. Re-introduce caching only behind a versioned cache name with a clear invalidation strategy.

---

## 13. Deployment

### 13.1 Master node (single command)
```bash
cd frontend && npm install && npm run build && cd ..
cp .env.example .env  # edit secrets
docker compose up postgres -d
docker compose run --rm mac alembic upgrade head
docker compose up -d
```
Compose brings up: `mac` (FastAPI), `postgres`, `redis`, `qdrant`, `searxng`, `vllm-speed`, `nginx`. (Whisper/TTS commented out by default.)

### 13.2 Adding a worker
On master:
```bash
curl -X POST http://MASTER:8000/api/v1/cluster/enroll-token \
  -H "Authorization: Bearer ADMIN_JWT" -d '{"label":"Lab PC 1","expires_hours":24}'
```
On the worker PC:
```bash
MAC_MASTER_URL=http://MASTER:8000 \
MAC_ENROLL_TOKEN=<token> \
MAC_VLLM_PORT=8001 \
docker compose -f docker-compose.worker.yml up -d
```
Then approve in admin → Cluster.

### 13.3 HTTPS
Drop certs into `nginx/ssl/`, swap the bind-mounted config to `nginx/nginx.https.conf` in `docker-compose.yml`, restart Nginx.

### 13.4 Windows installer
[installer/build_installer.ps1](installer/build_installer.ps1) builds a one-shot `dist/MAC-Installer.exe` (PyInstaller) that bootstraps Docker Desktop checks, clones/updates the repo, writes a sane `.env` with detected host IP, and starts the master stack. Branding assets are embedded base64 in [installer/embedded_assets.py](installer/embedded_assets.py) so the binary works even if image files are missing at runtime.

---

## 14. Security checklist (what every reviewer should verify)

1. **No external API calls.** `grep -r "openai.com\|api.anthropic\|googleapis" mac/` should be empty. All inference is local.
2. **JWT secret is not in env in production.** It's seeded in `system_config` on first boot and re-used across restarts.
3. **JWT carries `jti`** and the auth middleware checks blacklist on every request.
4. **Every router** requiring auth uses `Depends(get_current_user)` — search for any `@router.*` that doesn't and justify it.
5. **Role guards** on admin-only operations: `Depends(require_admin)` on token mints, user list, cluster mutations, feature toggles, system restart.
6. **Rate limits** on user-facing inference endpoints (`/query/*`, `/rag/query`).
7. **Scoped keys** never logged in full; only the prefix is shown after creation.
8. **Worker enrollment tokens** are single-use and time-limited (`expires_at` checked on register).
9. **Heartbeats authenticate by `node_token`**, not by JWT — rotated on every approve/reactivate.
10. **CORS:** `MAC_CORS_ORIGINS` defaults to `["*"]` for ease of dev; **set explicit origins in prod**.
11. **Uploads:** `uploads/` is outside the static mount; copy-check sheets and RAG docs are served via authenticated endpoints, never directly.
12. **WebSocket auth:** `notebook_ws` validates the JWT in the query string before `accept()`. Don't move the accept above the validation.

---

## 15. How to add a new feature (the recipe)

1. **Model:** add a SQLAlchemy class in `mac/models/<domain>.py`, import it in `mac/main.py::lifespan` so `Base.metadata` knows.
2. **Migration:** `alembic revision --autogenerate -m "add <thing>"` → review → commit.
3. **Schema:** Pydantic request/response in `mac/schemas/<domain>.py`.
4. **Service:** pure logic in `mac/services/<domain>_service.py`. Takes `db: AsyncSession` and primitive args. No FastAPI types.
5. **Router:** thin handler in `mac/routers/<domain>.py`. Order of `Depends`: `get_db` → `get_current_user` → `require_*` → `feature_required("…")` → `check_rate_limit` (only if user-driven inference). Mount in `mac/main.py`.
6. **Feature flag:** add a default to `feature_seeder.DEFAULT_FLAGS` so it can be toggled per role from admin.
7. **API client:** add a method to `frontend/src/lib/api.js` under the matching export.
8. **Store (if it has UI state):** add to `frontend/src/lib/stores.js`.
9. **Route:** new directory under `frontend/src/routes/<feature>/+page.svelte`.
10. **Sidebar entry:** edit `frontend/src/lib/components/Sidebar.svelte`.
11. **i18n:** add new strings to `BASE` in `frontend/src/lib/i18n.js`.
12. **Test:** at least one happy-path + one auth-failure pytest in `tests/`.

Follow this and the system stays consistent. Skip steps and you'll end up with a feature that's invisible to the admin, untranslated, untested, or worse — bypassing the auth chain.

---

*Last updated: 2026-04-27. If you change a subsystem and this file no longer matches reality, update it in the same PR.*