MAC / docs /ARCHITECTURE.md
Aaryan17's picture
chore: upload MAC codebase to HF Space
0e76632 verified
![1777301208961](image/ARCHITECTURE/1777301208961.png)![1777301226270](image/ARCHITECTURE/1777301226270.png)# MAC β€” Architecture Reference
> **Audience:** an AI coding agent (or new engineer) dropped into this repo with no prior context.
> **Goal:** understand the system end-to-end β€” every subsystem, the data flow, where state lives, and how the pieces secure and observe each other.
> Read [README.md](README.md) for the elevator pitch and [MAC-PROGRESS.md](MAC-PROGRESS.md) for the build log. This file is the *map*.
---
## 0. Identity in one paragraph
MAC (MBM AI Cloud) is a **self-hosted, on-prem AI platform** for MBM University Jodhpur. It gives students/faculty a private ChatGPT-style chat, a notebook IDE, RAG over college docs, an attendance system using face capture, an exam copy-check workflow with AI vision + plagiarism detection, and an admin/cluster console β€” all powered by **open-source LLMs** running on the college's own GPUs. There are no external API calls; vLLM serves models locally, and worker GPUs are added by enrolling them into the cluster.
---
## 1. Top-level topology
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ CLIENTS β”‚
β”‚ β€’ Web (SvelteKit PWA, served by Nginx in prod) β”‚
β”‚ β€’ API consumers (curl / Python SDK / scripts) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ HTTPS
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ NGINX β”‚ ← TLS, gzip, /api β†’ mac, / β†’ static
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SvelteKit β”‚ β”‚ FastAPI (mac.main) β”‚
β”‚ static build β”‚ β”‚ /api/v1/* β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PostgreSQL β”‚ β”‚ Redis β”‚ β”‚ Qdrant β”‚ β”‚ SearXNG β”‚
β”‚ (primary) β”‚ β”‚ cache / β”‚ β”‚ (RAG vec) β”‚ β”‚ (web search) β”‚
β”‚ Alembic β”‚ β”‚ bl / rl β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–² load_balancer.get_best_worker()
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MAC CLUSTER (GPU workers, any LAN PC) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ vLLM (OpenAI β”‚ β”‚ Jupyter kernel β”‚ β”‚ worker_agent.pyβ”‚ β”‚
β”‚ β”‚ compatible) β”‚ β”‚ gateway (opt.) β”‚ β”‚ (heartbeat) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
- **Master node** runs FastAPI + Postgres + Redis + Nginx + Qdrant + SearXNG.
- **Worker nodes** run vLLM + an optional Jupyter kernel gateway, plus [worker_agent.py](worker_agent.py) which self-registers via an enrollment token and sends a heartbeat every 10s (GPU util, VRAM, RAM, CPU).
- **Routing** is master-side: every user request hits the master API, which uses [mac/services/load_balancer.py](mac/services/load_balancer.py) to score-pick the best worker for an LLM call or notebook kernel.
---
## 2. Repository map (what lives where)
```
mac/
main.py FastAPI app, lifespan (DB init, dev seeds, bg tasks),
router mounts under /api/v1, root SPA fallback.
config.py Pydantic Settings β€” every env var + .env loader.
database.py Async SQLAlchemy engine + session factory; `Base`.
utils/security.py JWT encode/decode + jti generation; password hash.
middleware/
auth_middleware.py Bearer extractor β†’ JWT | legacy-key | scoped-key β†’ User.
rate_limit.py Per-user req/hour + token/day; injects X-RateLimit-*.
feature_gate.py feature_required("ai_chat") dependency.
models/ SQLAlchemy ORM models (one file per domain).
schemas/ Pydantic request/response schemas.
services/ Pure business logic, no HTTP β€” called by routers.
routers/ FastAPI routers, thin: validate β†’ call service β†’ return.
frontend/ SvelteKit 2 + Svelte 5 PWA.
src/routes/ File-system routing: login, setup, chat, dashboard,
admin, cluster, keys, settings, notifications, rag.
src/lib/api.js Single fetch wrapper; one export per backend domain.
src/lib/stores.js Svelte stores (auth, setup, features, chat, toast).
src/lib/i18n.js 19 Indian languages, lazy-loaded strings, RTL support.
static/manifest.json PWA manifest; static/sw.js is a no-cache worker.
alembic/ Migration env + versioned revisions.
nginx/ nginx.conf (HTTP) + nginx.https.conf (TLS).
docker-compose.yml Master stack.
docker-compose.worker.yml Worker stack (vLLM + worker_agent).
worker_agent.py Enrollment + heartbeat agent for a GPU node.
installer/ Windows installer (PyInstaller) + branding assets.
tests/ pytest suite.
```
---
## 3. Request lifecycle (the universal path)
Every authenticated `/api/v1/*` request goes through these layers in order. Knowing this map means you can audit any new endpoint quickly.
```
HTTP request
β”‚
β–Ό
[1] CORS middleware (mac/main.py β€” allow_origins from settings)
β”‚
β–Ό
[2] Route handler (FastAPI) (mac/routers/*.py)
β”‚ Depends(get_current_user)
β–Ό
[3] Auth resolver (mac/middleware/auth_middleware.py)
β”‚ Bearer token β†’ branch:
β”‚ β€’ mac_sk_live_* β†’ legacy API key (User.api_key)
β”‚ β€’ mac_sk_* β†’ scoped API key (hashed, scopes, expiry, revocable)
β”‚ β€’ else β†’ JWT (verify sig, check exp, check jti blacklist)
β”‚ β†’ returns User or raises 401
β”‚
β–Ό
[4] Role guard (optional) require_admin / require_faculty_or_admin
β”‚
β–Ό
[5] Feature gate (optional) feature_required("ai_chat")
β”‚ β†’ reads system_config / feature_flags table β†’ 403 if disabled for role
β”‚
β–Ό
[6] Rate limit (optional) check_rate_limit
β”‚ β€’ requests/hour from usage_log (per-user)
β”‚ β€’ tokens/day from usage_log (per-user)
β”‚ β€’ injects X-RateLimit-* into request.state
β”‚
β–Ό
[7] Service layer mac/services/*.py
β”‚ Business logic β€” never imports FastAPI; takes db: AsyncSession.
β”‚
β–Ό
[8] Response β†’ HTTP middleware inject_rate_limit_headers reads request.state
and stamps headers onto the response
```
This separation is the single most important design rule:
**routers do parsing + auth + I/O orchestration; services do business logic; models do persistence.** Anything calling FastAPI types from a service is a smell.
---
## 4. Identity & access β€” auth, sessions, keys
There are **three** ways a request authenticates, all collapsed to a `User` by `get_current_user`:
### 4.1 JWT (interactive users)
- Login: `POST /api/v1/auth/login` with `{roll_number, password}` β†’ `{access_token, refresh_token, user}`.
- Access token lifetime: `JWT_ACCESS_TOKEN_EXPIRE_MINUTES` (default 1440 = 24h).
- Every access token carries a `jti` (random UUID) baked into the JWT claims by [mac/utils/security.py](mac/utils/security.py).
- `POST /api/v1/auth/logout` blacklists the current `jti` in Redis with a TTL equal to remaining token life ([token_blacklist_service.py](mac/services/token_blacklist_service.py)). Refresh tokens are also revoked. Falls back to an in-process set if Redis is unreachable (dev only).
- The JWT signing secret is **not** read from env in production β€” it's stored in `system_config` and seeded on first boot by [setup_service.get_or_generate_jwt_secret](mac/services/setup_service.py). This means restarting the app does not invalidate everyone's sessions.
### 4.2 Legacy API keys
- Format: `mac_sk_live_<48 hex chars>`. Stored on `users.api_key`. One per user.
- Use case: scripts that need a stable long-lived credential.
- Resolved before JWT in `auth_middleware` because of the prefix check.
### 4.3 Scoped API keys
- Format: `mac_sk_<random>`, hashed at rest. Created via `/api/v1/scoped-keys`.
- Carry: scopes (list of allowed endpoints), optional expiry, label, revoke flag.
- Resolved by [scoped_key_service.get_key_by_hash](mac/services/scoped_key_service.py).
- Attached to `user._scoped_key` for downstream scope enforcement.
### 4.4 Roles
- `admin` | `faculty` | `student`. Enforced at the router layer via `require_admin` / `require_faculty_or_admin` dependencies.
- Feature flags layer on top: a feature can be enabled globally but restricted to specific roles (see `feature_flags.roles`).
### 4.5 First-run onboarding
- `GET /api/v1/setup/status` β†’ `{is_first_run, has_jwt_secret, version}`. Frontend uses this to decide whether to show the setup wizard or login.
- `POST /api/v1/setup/create-admin` provisions the first admin and seals the system.
---
## 5. LLM serving & cluster routing
### 5.1 Model registry β€” three layers of override
`mac/services/llm_service.py::_BUILTIN_MODELS` holds the defaults (Qwen2.5 7B, Qwen2.5-Coder 7B/AWQ, DeepSeek-R1, etc.). Each entry knows its `served_name` (HF repo), `category` (`speed | code | reasoning | intelligence`), `capabilities`, and `url_key` pointing at one of `vllm_speed_url | vllm_code_url | …` in `Settings`.
Override priority:
1. `MAC_MODELS_JSON` env var (a full JSON array) β€” replaces the registry entirely.
2. `MAC_ENABLED_MODELS` env var (comma-separated IDs) β€” filters which built-ins are exposed.
3. `MAC_AUTO_FALLBACK` β€” what `model="auto"` resolves to.
### 5.2 The system prompt is forced
`_inject_system_prompt` in `llm_service` prepends a hard-coded MAC identity prompt to **every** chat completion. This prevents the underlying Qwen/DeepSeek model from claiming to be "Qwen made by Alibaba" β€” it always says it is MAC, built by MBM University. If the user supplied a system message, MAC's identity is concatenated in front of theirs.
### 5.3 Routing decision (where does this call go?)
```
chat request
β”‚
β–Ό
llm_service._resolve_model_cluster(model_id)
β”‚
β–Ό
load_balancer.get_best_worker(db, model_id)
β”‚ SELECT WorkerNode JOIN NodeModelDeployment
β”‚ WHERE node.status='active' AND deployment.status='ready'
β”‚ AND last_heartbeat within 30s
β”‚ ORDER BY gpu_util*0.5 + (vram_used/total)*0.3
β”‚
β”œβ”€β”€ candidate found β†’ POST http://{node.ip}:{deployment.port}/v1/chat/completions
β”‚
└── none β†’ fall back to local config (settings.vllm_<category>_url)
```
vLLM speaks the **OpenAI-compatible** API, so the proxy is a near-pass-through with SSE streaming preserved end-to-end.
### 5.4 Cluster lifecycle
| Event | Endpoint | Auth | Effect |
|---|---|---|---|
| Admin mints token | `POST /cluster/enroll-token` | admin JWT | Single-use, expiring `EnrollmentToken` row |
| Worker registers | `POST /cluster/register` | enroll token | Creates `WorkerNode` (status `pending`) + reports IP, GPU specs |
| Admin approves | `POST /cluster/nodes/{id}/action {action:"approve"}` | admin | `status β†’ active` |
| Worker heartbeats | `POST /cluster/heartbeat` | node token | Updates `last_heartbeat`, GPU util, VRAM, CPU, RAM, queue depth β€” also append-only into `cluster_heartbeats` (time-series for charts) |
| Worker reports models | (in heartbeat payload) | β€” | Upserts `NodeModelDeployment` rows |
| Drain / remove | `POST /cluster/nodes/{id}/action` | admin | Stops new traffic; allows in-flight to finish |
Workers older than 30s without a heartbeat are silently skipped by the balancer β€” no manual intervention needed if a worker dies.
---
## 6. Notebooks β€” multi-language code execution
This is the most operationally complex subsystem. The design supports **two backends** and **distributed execution**.
### 6.1 Architecture
```
Client (browser)
β”‚ WebSocket /ws/notebook/{notebook_id}?token=JWT
β–Ό
mac/routers/notebook_ws.py
β”‚ β€’ verifies JWT (decode_access_token, no DB hit on hot path)
β”‚ β€’ registers connection in _connections[notebook_id]
β–Ό
kernel_manager (mac/services/kernel_manager.py)
β”‚ Backend selection at startup:
β”‚ _docker_available() β†’ Docker mode
β”‚ else β†’ subprocess mode (dev)
β”‚
β”œβ”€β”€ DOCKER MODE (production)
β”‚ β€’ spawns mac-kernel-{lang} container (image_prefix in config)
β”‚ β€’ applies memory + CPU limits from settings
β”‚ β€’ optionally attaches GPU (--gpus all) for ML kernels
β”‚ β€’ streams stdout/stderr back as JSONL events
β”‚
β”œβ”€β”€ SUBPROCESS MODE (dev)
β”‚ β€’ runs the language interpreter directly on the host
β”‚ β€’ no isolation; only safe for trusted local dev
β”‚
└── REMOTE WORKER MODE
β€’ load_balancer.get_notebook_worker(db) picks a worker with notebook_port
β€’ forwards the execute via the worker's Jupyter kernel gateway
β€’ output streams back to the master, then to the client
```
### 6.2 WebSocket protocol
Defined at the top of [notebook_ws.py](mac/routers/notebook_ws.py):
| Direction | Type | Payload |
|---|---|---|
| C→S | `execute` | `{cell_id, code, language}` |
| C→S | `interrupt` | `{kernel_id}` |
| C→S | `ping` | — |
| S→C | `status` | `{cell_id, execution_state: busy\|idle}` |
| S→C | `stream` | `{cell_id, name: stdout\|stderr, text}` |
| S→C | `error` | `{cell_id, ename, evalue, traceback[]}` |
| S→C | `pong` | — |
### 6.3 State & limits
- `KernelInstance` per session: `id`, `language`, `node_id`, `container_id`, `status`, `last_activity`, `execution_count`.
- Idle kernels are reaped after `kernel_timeout` seconds (default 120).
- Max concurrent kernels per node: `kernel_max_per_node` (default 10).
- Persistent notebook content: `notebooks` table; cells stored as JSON, ordered.
### 6.4 Why a custom protocol and not raw Jupyter?
Three reasons: (a) we need user-scoped auth via our JWT; (b) we need to fan-out execution across the cluster, not just one local kernel; (c) we want the option to swap kernels for sandboxed runners later without changing the wire format.
---
## 7. RAG β€” private document search
Pipeline: **upload β†’ chunk β†’ embed β†’ store β†’ retrieve β†’ augment**.
```
PDF/MD/TXT upload (POST /rag/upload)
β”‚
β–Ό
rag_service.ingest_document
β”‚ β€’ text extraction (pypdf for PDF, plain read otherwise)
β”‚ β€’ chunk_text(words=512, overlap=50) ← simple word-window
β”‚ β€’ for each chunk:
β”‚ emb = await llm_service.embed(text) ← uses EMBEDDING_URL or vLLM
β”‚ qdrant.upsert(point=(uuid, emb, payload))
β”‚ β€’ RAGDocument row in Postgres with chunk count & status
β–Ό
QUERY TIME (chat with rag context)
β”‚
β–Ό
rag_service.query(question, top_k=5)
β”‚ β€’ emb_q = embed(question)
β”‚ β€’ qdrant.search(collection, emb_q, top_k)
β”‚ β€’ returns chunks + source metadata
β–Ό
llm_service.chat with messages = [
{role:"system", content: MAC_PROMPT + "\n\nContext:\n" + chunks},
*user_messages,
]
```
Collections (`RAGCollection`) namespace documents β€” e.g. one per subject. Documents (`RAGDocument`) track ownership and indexing status so the UI can show "Indexing 42/120 chunks…".
---
## 8. Attendance β€” face-based check-in
### 8.1 Models
- `FaceTemplate` β€” one per user, holds a face encoding (64-byte hash in dev; pluggable to `face_recognition`/`dlib` for production).
- `AttendanceSession` β€” created by faculty: `{branch, section, subject, date, window_minutes}`.
- `AttendanceRecord` β€” one per (session, student): `present | absent | late`, captured selfie hash, confidence, timestamp.
### 8.2 Flow
```
1. Faculty: POST /attendance/sessions β†’ creates session, returns join token + QR
2. Student: GET /attendance/active β†’ returns currently open sessions for them
3. Student: POST /attendance/check-in β†’ uploads base64 selfie
server:
β€’ decodes image
β€’ hashes (sha256) β€” dedupe replay
β€’ computes encoding
β€’ compares to stored FaceTemplate
β€’ if (match && within window) β†’ AttendanceRecord(present)
β€’ else β†’ 401 with reason
4. Faculty: GET /attendance/sessions/{id}/report β†’ CSV / PDF roster
```
### 8.3 Anti-cheat heuristics
- Session has a strict `window_minutes` β€” late arrivals are recorded as `late`, not `present`.
- Same selfie hash twice in a session β†’ rejected (replay block).
- One record per (session, student) β€” UPSERT prevents stuffing.
- Production: swap `_compute_face_encoding` for the real `face_recognition.face_encodings()` (the call sites already accept it; only the function body changes).
---
## 9. Copy Check β€” exam paper evaluation
A faculty workflow that grades scanned answer sheets using vision-capable LLMs and runs cross-paper plagiarism detection. Models in `mac/models/copy_check.py`:
| Model | Role |
|---|---|
| `CopyCheckSession` | One exam: subject, class, total_marks, syllabus_text |
| `CopyCheckSheet` | One student's submission: roll, scanned pages, AI score, feedback |
| `CopyCheckPlagiarism` | Pairwise similarity between two sheets in the same session |
### 9.1 Flow
```
Faculty creates session β†’ uploads syllabus / answer key
β”‚
β–Ό
For each student answer sheet (PDF or image bundle):
β€’ file saved under uploads/copy_check/{session_id}/{roll}/
β€’ AI vision model reads each page (multimodal LLM)
β€’ Service builds a structured prompt: syllabus + answer key + student answer
β€’ LLM returns { per_question_marks, total, weakness_summary, suggestions }
β€’ CopyCheckSheet upserted with score + JSON feedback
β”‚
β–Ό
Plagiarism pass:
β€’ difflib.SequenceMatcher on extracted text per pair within session
β€’ CopyCheckPlagiarism row written for (sheet_a, sheet_b, similarity, flagged_passages)
β”‚
β–Ό
Faculty reviews:
β€’ per-student PDF report (fpdf2)
β€’ plagiarism heatmap
β€’ can override AI marks before "publish"
```
### 9.2 Why the AI doesn't have final authority
The faculty UI explicitly requires a **"Reviewed & Approved"** flag before any score becomes visible to students. The AI is graded as a *recommendation* β€” the audit trail records both the AI suggestion and the faculty's override. This is the legal/academic-integrity boundary.
---
## 10. Other domain modules (one-paragraph each)
- **Doubts forum** ([doubts.py](mac/routers/doubts.py)): students post questions; faculty/peers answer; AI generates a draft answer that the asker can accept or replace. Threaded, taggable.
- **File sharing** ([file_share.py](mac/routers/file_share.py)): admin/faculty upload class materials; per-file access scoping; per-download analytics in `file_downloads`.
- **Notifications** ([notifications.py](mac/routers/notifications.py)): in-app + Web Push (`pywebpush`); endpoints registered via VAPID; one row per user-notification with read/unread state.
- **Academic** ([academic.py](mac/routers/academic.py)): branches & sections β€” used to scope attendance, file sharing, and admin lists.
- **Doubt copy-check submissions** ([model_submission_service.py](mac/services/model_submission_service.py)): community-trained adapter / LoRA submissions queued for admin review before being published as model registry entries.
- **Search** ([search.py](mac/routers/search.py) + SearXNG): private metasearch, no Google, no telemetry, returned to the chat as a tool result.
- **Hardware / Network / System** ([hardware.py](mac/routers/hardware.py), [network.py](mac/routers/network.py), [system.py](mac/routers/system.py)): admin diagnostics β€” local CPU/GPU/RAM, recommended models for the detected GPU, LAN discovery (`mac/services/discovery.py` UDP broadcast on port 7700), version & update status (`mac/services/updater.py` polls GitHub releases).
- **Quota** ([quota.py](mac/routers/quota.py)): per-user requests/hour and tokens/day; admin can override per user; default from `RATE_LIMIT_*` env.
- **Guardrails** ([guardrails.py](mac/routers/guardrails.py) + `guardrail_service`): admin-editable ruleset (banned terms, forbidden topics) applied as a pre-check on chat input and a post-check on model output.
---
## 11. Cross-cutting concerns
### 11.1 Configuration
**One source of truth:** [mac/config.py](mac/config.py) `Settings(BaseSettings)`. Every value reads from env or `.env`. `_fix_database_url` auto-promotes `postgres://` and `postgresql://` to `postgresql+asyncpg://` and strips `sslmode=` (it's handled in `connect_args` separately for Neon/Supabase). Adding a new tunable means: add a field to `Settings`, document it in `.env.example`, use `settings.your_field` everywhere β€” never read `os.environ` directly.
### 11.2 Migrations
Alembic-managed. Two revisions today:
- `20260426_0001_initial_schema.py` β€” full original schema.
- `20260427_0002_session1_tables.py` β€” feature flags, system_config, branches, sections, cluster_heartbeats, shared_files, file_downloads, video_projects, video_jobs.
In dev (`MAC_ENV=development`), `init_db()` in `lifespan` creates tables idempotently from `Base.metadata`. In prod, you **must** run `alembic upgrade head` before serving traffic; tables are not auto-created. Whenever you add a column to a model, write a new revision.
### 11.3 Background tasks
Started in `lifespan` and cancelled on shutdown:
- [updater.background_check_loop](mac/services/updater.py) β€” polls GitHub for new releases every `MAC_UPDATE_CHECK_INTERVAL_HOURS`.
- [discovery.start_discovery_server](mac/services/discovery.py) β€” UDP broadcast listener so worker PCs on the LAN can find the master without manual IP entry.
### 11.4 Caching, blacklisting, rate limits
All Redis-backed with **graceful in-process fallback**:
- JWT blacklist β†’ `mac:bl:{jti}` keys with TTL = remaining token life.
- Rate-limit counters β†’ derived from `usage_log` rows (no Redis needed for counts).
- Session/feature caches β†’ not implemented yet; designed to live under `mac:cache:*`.
### 11.5 Observability
Every chat call is logged to `usage_log`: user_id, model_id, tokens_in, tokens_out, latency_ms, status, request_id (`generate_request_id` in `utils/security`). The dashboard route reads these for per-user charts. Cluster heartbeats are append-only into `cluster_heartbeats` so node history charts are just `SELECT … ORDER BY ts`.
---
## 12. Frontend β€” SvelteKit PWA
### 12.1 Stack
SvelteKit 2 + Svelte 5 + Tailwind 3 + Vite 6. Built as a static site (`@sveltejs/adapter-static` with `fallback: 'index.html'`) and served by Nginx in production, by Vite dev server with `/api` proxy to the FastAPI port in development.
### 12.2 SPA mode
The root has `+layout.js` with `export const ssr = false; export const prerender = false;` so the entire app is rendered client-side. This is intentional β€” it sidesteps hydration issues, and there is no SEO need for an internal college tool.
### 12.3 State
[src/lib/stores.js](frontend/src/lib/stores.js) holds Svelte stores:
- `authStore` β€” `{user, token, refreshToken}`, with `init()` that re-hydrates from `localStorage` and re-fetches `/auth/me`, plus `login`/`logout`.
- `setupStore` β€” `is_first_run` flag.
- `featureStore` β€” feature flag map for conditional UI.
- `chatStore` β€” local conversation history (per-session, not yet server-persisted).
- `toast` β€” single-message notifier.
### 12.4 API client
[src/lib/api.js](frontend/src/lib/api.js) is the *only* place that talks HTTP. One `headers()` helper attaches the bearer token from `localStorage`. Each backend domain (`auth`, `query`, `models`, `cluster`, `rag`, `files`, …) is its own export with named methods. Adding a new endpoint = add a method here, never `fetch()` from a component directly.
### 12.5 Auth/setup gate
[+layout.svelte](frontend/src/routes/+layout.svelte) boots the app on first paint:
1. `initLocale()` β€” detect language from `localStorage` / browser.
2. `authStore.init()` β€” restore session.
3. `checkSetup()` β€” first-run check.
4. `loadFeatures()` β€” fetch flags.
5. Redirect: first-run β†’ `/setup`, no user on protected route β†’ `/login`, root β†’ `/chat` or `/login`.
6. Render either `Sidebar + slot` (logged in) or bare `slot` (login/setup).
### 12.6 Internationalisation
[src/lib/i18n.js](frontend/src/lib/i18n.js) ships **19 Indian languages** with lazy-loaded string maps and an `RTL_LOCALES` set (Urdu) that flips the layout direction. Adding a new locale = add to `SUPPORTED_LOCALES`, drop a translation map, no other file changes.
### 12.7 PWA + service worker
[static/manifest.json](frontend/static/manifest.json) declares the installable app + shortcuts. [static/sw.js](frontend/static/sw.js) is intentionally **caching-disabled** β€” every install/activate wipes all caches and there is no `fetch` handler. This was a deliberate decision: caching the SPA shell caused stale-build problems during rapid dev. Re-introduce caching only behind a versioned cache name with a clear invalidation strategy.
---
## 13. Deployment
### 13.1 Master node (single command)
```bash
cd frontend && npm install && npm run build && cd ..
cp .env.example .env # edit secrets
docker compose up postgres -d
docker compose run --rm mac alembic upgrade head
docker compose up -d
```
Compose brings up: `mac` (FastAPI), `postgres`, `redis`, `qdrant`, `searxng`, `vllm-speed`, `nginx`. (Whisper/TTS commented out by default.)
### 13.2 Adding a worker
On master:
```bash
curl -X POST http://MASTER:8000/api/v1/cluster/enroll-token \
-H "Authorization: Bearer ADMIN_JWT" -d '{"label":"Lab PC 1","expires_hours":24}'
```
On the worker PC:
```bash
MAC_MASTER_URL=http://MASTER:8000 \
MAC_ENROLL_TOKEN=<token> \
MAC_VLLM_PORT=8001 \
docker compose -f docker-compose.worker.yml up -d
```
Then approve in admin β†’ Cluster.
### 13.3 HTTPS
Drop certs into `nginx/ssl/`, swap the bind-mounted config to `nginx/nginx.https.conf` in `docker-compose.yml`, restart Nginx.
### 13.4 Windows installer
[installer/build_installer.ps1](installer/build_installer.ps1) builds a one-shot `dist/MAC-Installer.exe` (PyInstaller) that bootstraps Docker Desktop checks, clones/updates the repo, writes a sane `.env` with detected host IP, and starts the master stack. Branding assets are embedded base64 in [installer/embedded_assets.py](installer/embedded_assets.py) so the binary works even if image files are missing at runtime.
---
## 14. Security checklist (what every reviewer should verify)
1. **No external API calls.** `grep -r "openai.com\|api.anthropic\|googleapis" mac/` should be empty. All inference is local.
2. **JWT secret is not in env in production.** It's seeded in `system_config` on first boot and re-used across restarts.
3. **JWT carries `jti`** and the auth middleware checks blacklist on every request.
4. **Every router** requiring auth uses `Depends(get_current_user)` β€” search for any `@router.*` that doesn't and justify it.
5. **Role guards** on admin-only operations: `Depends(require_admin)` on token mints, user list, cluster mutations, feature toggles, system restart.
6. **Rate limits** on user-facing inference endpoints (`/query/*`, `/rag/query`).
7. **Scoped keys** never logged in full; only the prefix is shown after creation.
8. **Worker enrollment tokens** are single-use and time-limited (`expires_at` checked on register).
9. **Heartbeats authenticate by `node_token`**, not by JWT β€” rotated on every approve/reactivate.
10. **CORS:** `MAC_CORS_ORIGINS` defaults to `["*"]` for ease of dev; **set explicit origins in prod**.
11. **Uploads:** `uploads/` is outside the static mount; copy-check sheets and RAG docs are served via authenticated endpoints, never directly.
12. **WebSocket auth:** `notebook_ws` validates the JWT in the query string before `accept()`. Don't move the accept above the validation.
---
## 15. How to add a new feature (the recipe)
1. **Model:** add a SQLAlchemy class in `mac/models/<domain>.py`, import it in `mac/main.py::lifespan` so `Base.metadata` knows.
2. **Migration:** `alembic revision --autogenerate -m "add <thing>"` β†’ review β†’ commit.
3. **Schema:** Pydantic request/response in `mac/schemas/<domain>.py`.
4. **Service:** pure logic in `mac/services/<domain>_service.py`. Takes `db: AsyncSession` and primitive args. No FastAPI types.
5. **Router:** thin handler in `mac/routers/<domain>.py`. Order of `Depends`: `get_db` β†’ `get_current_user` β†’ `require_*` β†’ `feature_required("…")` β†’ `check_rate_limit` (only if user-driven inference). Mount in `mac/main.py`.
6. **Feature flag:** add a default to `feature_seeder.DEFAULT_FLAGS` so it can be toggled per role from admin.
7. **API client:** add a method to `frontend/src/lib/api.js` under the matching export.
8. **Store (if it has UI state):** add to `frontend/src/lib/stores.js`.
9. **Route:** new directory under `frontend/src/routes/<feature>/+page.svelte`.
10. **Sidebar entry:** edit `frontend/src/lib/components/Sidebar.svelte`.
11. **i18n:** add new strings to `BASE` in `frontend/src/lib/i18n.js`.
12. **Test:** at least one happy-path + one auth-failure pytest in `tests/`.
Follow this and the system stays consistent. Skip steps and you'll end up with a feature that's invisible to the admin, untranslated, untested, or worse β€” bypassing the auth chain.
---
*Last updated: 2026-04-27. If you change a subsystem and this file no longer matches reality, update it in the same PR.*