| # MAC β Architecture Reference |
|
|
| > **Audience:** an AI coding agent (or new engineer) dropped into this repo with no prior context. |
| > **Goal:** understand the system end-to-end β every subsystem, the data flow, where state lives, and how the pieces secure and observe each other. |
| > Read [README.md](README.md) for the elevator pitch and [MAC-PROGRESS.md](MAC-PROGRESS.md) for the build log. This file is the *map*. |
|
|
| --- |
|
|
| ## 0. Identity in one paragraph |
|
|
| MAC (MBM AI Cloud) is a **self-hosted, on-prem AI platform** for MBM University Jodhpur. It gives students/faculty a private ChatGPT-style chat, a notebook IDE, RAG over college docs, an attendance system using face capture, an exam copy-check workflow with AI vision + plagiarism detection, and an admin/cluster console β all powered by **open-source LLMs** running on the college's own GPUs. There are no external API calls; vLLM serves models locally, and worker GPUs are added by enrolling them into the cluster. |
|
|
| --- |
|
|
| ## 1. Top-level topology |
|
|
| ``` |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β CLIENTS β |
| β β’ Web (SvelteKit PWA, served by Nginx in prod) β |
| β β’ API consumers (curl / Python SDK / scripts) β |
| ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ |
| β HTTPS |
| βΌ |
| ββββββββββββββββββββββββ |
| β NGINX β β TLS, gzip, /api β mac, / β static |
| ββββββββββββ¬ββββββββββββ |
| β |
| ββββββββββββββββββββ΄βββββββββββββββββββ |
| βΌ βΌ |
| βββββββββββββββββββ ββββββββββββββββββββββββ |
| β SvelteKit β β FastAPI (mac.main) β |
| β static build β β /api/v1/* β |
| βββββββββββββββββββ ββββββββββββ¬ββββββββββββ |
| β |
| βββββββββββββββββββββββββ¬ββββββββββββββββββββΌββββββββββββββββββββββββββ |
| βΌ βΌ βΌ βΌ |
| ββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββββββ |
| β PostgreSQL β β Redis β β Qdrant β β SearXNG β |
| β (primary) β β cache / β β (RAG vec) β β (web search) β |
| β Alembic β β bl / rl β ββββββββββββββββ ββββββββββββββββββ |
| ββββββββββββββ ββββββββββββββ |
| |
| β² load_balancer.get_best_worker() |
| β |
| βββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββ |
| β MAC CLUSTER (GPU workers, any LAN PC) β |
| β βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββ β |
| β β vLLM (OpenAI β β Jupyter kernel β β worker_agent.pyβ β |
| β β compatible) β β gateway (opt.) β β (heartbeat) β β |
| β βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββ β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| - **Master node** runs FastAPI + Postgres + Redis + Nginx + Qdrant + SearXNG. |
| - **Worker nodes** run vLLM + an optional Jupyter kernel gateway, plus [worker_agent.py](worker_agent.py) which self-registers via an enrollment token and sends a heartbeat every 10s (GPU util, VRAM, RAM, CPU). |
| - **Routing** is master-side: every user request hits the master API, which uses [mac/services/load_balancer.py](mac/services/load_balancer.py) to score-pick the best worker for an LLM call or notebook kernel. |
|
|
| --- |
|
|
| ## 2. Repository map (what lives where) |
|
|
| ``` |
| mac/ |
| main.py FastAPI app, lifespan (DB init, dev seeds, bg tasks), |
| router mounts under /api/v1, root SPA fallback. |
| config.py Pydantic Settings β every env var + .env loader. |
| database.py Async SQLAlchemy engine + session factory; `Base`. |
| utils/security.py JWT encode/decode + jti generation; password hash. |
| middleware/ |
| auth_middleware.py Bearer extractor β JWT | legacy-key | scoped-key β User. |
| rate_limit.py Per-user req/hour + token/day; injects X-RateLimit-*. |
| feature_gate.py feature_required("ai_chat") dependency. |
| models/ SQLAlchemy ORM models (one file per domain). |
| schemas/ Pydantic request/response schemas. |
| services/ Pure business logic, no HTTP β called by routers. |
| routers/ FastAPI routers, thin: validate β call service β return. |
| |
| frontend/ SvelteKit 2 + Svelte 5 PWA. |
| src/routes/ File-system routing: login, setup, chat, dashboard, |
| admin, cluster, keys, settings, notifications, rag. |
| src/lib/api.js Single fetch wrapper; one export per backend domain. |
| src/lib/stores.js Svelte stores (auth, setup, features, chat, toast). |
| src/lib/i18n.js 19 Indian languages, lazy-loaded strings, RTL support. |
| static/manifest.json PWA manifest; static/sw.js is a no-cache worker. |
| |
| alembic/ Migration env + versioned revisions. |
| nginx/ nginx.conf (HTTP) + nginx.https.conf (TLS). |
| docker-compose.yml Master stack. |
| docker-compose.worker.yml Worker stack (vLLM + worker_agent). |
| worker_agent.py Enrollment + heartbeat agent for a GPU node. |
| installer/ Windows installer (PyInstaller) + branding assets. |
| tests/ pytest suite. |
| ``` |
|
|
| --- |
|
|
| ## 3. Request lifecycle (the universal path) |
|
|
| Every authenticated `/api/v1/*` request goes through these layers in order. Knowing this map means you can audit any new endpoint quickly. |
|
|
| ``` |
| HTTP request |
| β |
| βΌ |
| [1] CORS middleware (mac/main.py β allow_origins from settings) |
| β |
| βΌ |
| [2] Route handler (FastAPI) (mac/routers/*.py) |
| β Depends(get_current_user) |
| βΌ |
| [3] Auth resolver (mac/middleware/auth_middleware.py) |
| β Bearer token β branch: |
| β β’ mac_sk_live_* β legacy API key (User.api_key) |
| β β’ mac_sk_* β scoped API key (hashed, scopes, expiry, revocable) |
| β β’ else β JWT (verify sig, check exp, check jti blacklist) |
| β β returns User or raises 401 |
| β |
| βΌ |
| [4] Role guard (optional) require_admin / require_faculty_or_admin |
| β |
| βΌ |
| [5] Feature gate (optional) feature_required("ai_chat") |
| β β reads system_config / feature_flags table β 403 if disabled for role |
| β |
| βΌ |
| [6] Rate limit (optional) check_rate_limit |
| β β’ requests/hour from usage_log (per-user) |
| β β’ tokens/day from usage_log (per-user) |
| β β’ injects X-RateLimit-* into request.state |
| β |
| βΌ |
| [7] Service layer mac/services/*.py |
| β Business logic β never imports FastAPI; takes db: AsyncSession. |
| β |
| βΌ |
| [8] Response β HTTP middleware inject_rate_limit_headers reads request.state |
| and stamps headers onto the response |
| ``` |
|
|
| This separation is the single most important design rule: |
| **routers do parsing + auth + I/O orchestration; services do business logic; models do persistence.** Anything calling FastAPI types from a service is a smell. |
|
|
| --- |
|
|
| ## 4. Identity & access β auth, sessions, keys |
|
|
| There are **three** ways a request authenticates, all collapsed to a `User` by `get_current_user`: |
|
|
| ### 4.1 JWT (interactive users) |
| - Login: `POST /api/v1/auth/login` with `{roll_number, password}` β `{access_token, refresh_token, user}`. |
| - Access token lifetime: `JWT_ACCESS_TOKEN_EXPIRE_MINUTES` (default 1440 = 24h). |
| - Every access token carries a `jti` (random UUID) baked into the JWT claims by [mac/utils/security.py](mac/utils/security.py). |
| - `POST /api/v1/auth/logout` blacklists the current `jti` in Redis with a TTL equal to remaining token life ([token_blacklist_service.py](mac/services/token_blacklist_service.py)). Refresh tokens are also revoked. Falls back to an in-process set if Redis is unreachable (dev only). |
| - The JWT signing secret is **not** read from env in production β it's stored in `system_config` and seeded on first boot by [setup_service.get_or_generate_jwt_secret](mac/services/setup_service.py). This means restarting the app does not invalidate everyone's sessions. |
|
|
| ### 4.2 Legacy API keys |
| - Format: `mac_sk_live_<48 hex chars>`. Stored on `users.api_key`. One per user. |
| - Use case: scripts that need a stable long-lived credential. |
| - Resolved before JWT in `auth_middleware` because of the prefix check. |
|
|
| ### 4.3 Scoped API keys |
| - Format: `mac_sk_<random>`, hashed at rest. Created via `/api/v1/scoped-keys`. |
| - Carry: scopes (list of allowed endpoints), optional expiry, label, revoke flag. |
| - Resolved by [scoped_key_service.get_key_by_hash](mac/services/scoped_key_service.py). |
| - Attached to `user._scoped_key` for downstream scope enforcement. |
|
|
| ### 4.4 Roles |
| - `admin` | `faculty` | `student`. Enforced at the router layer via `require_admin` / `require_faculty_or_admin` dependencies. |
| - Feature flags layer on top: a feature can be enabled globally but restricted to specific roles (see `feature_flags.roles`). |
|
|
| ### 4.5 First-run onboarding |
| - `GET /api/v1/setup/status` β `{is_first_run, has_jwt_secret, version}`. Frontend uses this to decide whether to show the setup wizard or login. |
| - `POST /api/v1/setup/create-admin` provisions the first admin and seals the system. |
|
|
| --- |
|
|
| ## 5. LLM serving & cluster routing |
|
|
| ### 5.1 Model registry β three layers of override |
| `mac/services/llm_service.py::_BUILTIN_MODELS` holds the defaults (Qwen2.5 7B, Qwen2.5-Coder 7B/AWQ, DeepSeek-R1, etc.). Each entry knows its `served_name` (HF repo), `category` (`speed | code | reasoning | intelligence`), `capabilities`, and `url_key` pointing at one of `vllm_speed_url | vllm_code_url | β¦` in `Settings`. |
|
|
| Override priority: |
| 1. `MAC_MODELS_JSON` env var (a full JSON array) β replaces the registry entirely. |
| 2. `MAC_ENABLED_MODELS` env var (comma-separated IDs) β filters which built-ins are exposed. |
| 3. `MAC_AUTO_FALLBACK` β what `model="auto"` resolves to. |
|
|
| ### 5.2 The system prompt is forced |
| `_inject_system_prompt` in `llm_service` prepends a hard-coded MAC identity prompt to **every** chat completion. This prevents the underlying Qwen/DeepSeek model from claiming to be "Qwen made by Alibaba" β it always says it is MAC, built by MBM University. If the user supplied a system message, MAC's identity is concatenated in front of theirs. |
|
|
| ### 5.3 Routing decision (where does this call go?) |
| ``` |
| chat request |
| β |
| βΌ |
| llm_service._resolve_model_cluster(model_id) |
| β |
| βΌ |
| load_balancer.get_best_worker(db, model_id) |
| β SELECT WorkerNode JOIN NodeModelDeployment |
| β WHERE node.status='active' AND deployment.status='ready' |
| β AND last_heartbeat within 30s |
| β ORDER BY gpu_util*0.5 + (vram_used/total)*0.3 |
| β |
| βββ candidate found β POST http://{node.ip}:{deployment.port}/v1/chat/completions |
| β |
| βββ none β fall back to local config (settings.vllm_<category>_url) |
| ``` |
|
|
| vLLM speaks the **OpenAI-compatible** API, so the proxy is a near-pass-through with SSE streaming preserved end-to-end. |
|
|
| ### 5.4 Cluster lifecycle |
| | Event | Endpoint | Auth | Effect | |
| |---|---|---|---| |
| | Admin mints token | `POST /cluster/enroll-token` | admin JWT | Single-use, expiring `EnrollmentToken` row | |
| | Worker registers | `POST /cluster/register` | enroll token | Creates `WorkerNode` (status `pending`) + reports IP, GPU specs | |
| | Admin approves | `POST /cluster/nodes/{id}/action {action:"approve"}` | admin | `status β active` | |
| | Worker heartbeats | `POST /cluster/heartbeat` | node token | Updates `last_heartbeat`, GPU util, VRAM, CPU, RAM, queue depth β also append-only into `cluster_heartbeats` (time-series for charts) | |
| | Worker reports models | (in heartbeat payload) | β | Upserts `NodeModelDeployment` rows | |
| | Drain / remove | `POST /cluster/nodes/{id}/action` | admin | Stops new traffic; allows in-flight to finish | |
|
|
| Workers older than 30s without a heartbeat are silently skipped by the balancer β no manual intervention needed if a worker dies. |
|
|
| --- |
|
|
| ## 6. Notebooks β multi-language code execution |
|
|
| This is the most operationally complex subsystem. The design supports **two backends** and **distributed execution**. |
|
|
| ### 6.1 Architecture |
| ``` |
| Client (browser) |
| β WebSocket /ws/notebook/{notebook_id}?token=JWT |
| βΌ |
| mac/routers/notebook_ws.py |
| β β’ verifies JWT (decode_access_token, no DB hit on hot path) |
| β β’ registers connection in _connections[notebook_id] |
| βΌ |
| kernel_manager (mac/services/kernel_manager.py) |
| β Backend selection at startup: |
| β _docker_available() β Docker mode |
| β else β subprocess mode (dev) |
| β |
| βββ DOCKER MODE (production) |
| β β’ spawns mac-kernel-{lang} container (image_prefix in config) |
| β β’ applies memory + CPU limits from settings |
| β β’ optionally attaches GPU (--gpus all) for ML kernels |
| β β’ streams stdout/stderr back as JSONL events |
| β |
| βββ SUBPROCESS MODE (dev) |
| β β’ runs the language interpreter directly on the host |
| β β’ no isolation; only safe for trusted local dev |
| β |
| βββ REMOTE WORKER MODE |
| β’ load_balancer.get_notebook_worker(db) picks a worker with notebook_port |
| β’ forwards the execute via the worker's Jupyter kernel gateway |
| β’ output streams back to the master, then to the client |
| ``` |
|
|
| ### 6.2 WebSocket protocol |
| Defined at the top of [notebook_ws.py](mac/routers/notebook_ws.py): |
|
|
| | Direction | Type | Payload | |
| |---|---|---| |
| | CβS | `execute` | `{cell_id, code, language}` | |
| | CβS | `interrupt` | `{kernel_id}` | |
| | CβS | `ping` | β | |
| | SβC | `status` | `{cell_id, execution_state: busy\|idle}` | |
| | SβC | `stream` | `{cell_id, name: stdout\|stderr, text}` | |
| | SβC | `error` | `{cell_id, ename, evalue, traceback[]}` | |
| | SβC | `pong` | β | |
|
|
| ### 6.3 State & limits |
| - `KernelInstance` per session: `id`, `language`, `node_id`, `container_id`, `status`, `last_activity`, `execution_count`. |
| - Idle kernels are reaped after `kernel_timeout` seconds (default 120). |
| - Max concurrent kernels per node: `kernel_max_per_node` (default 10). |
| - Persistent notebook content: `notebooks` table; cells stored as JSON, ordered. |
|
|
| ### 6.4 Why a custom protocol and not raw Jupyter? |
| Three reasons: (a) we need user-scoped auth via our JWT; (b) we need to fan-out execution across the cluster, not just one local kernel; (c) we want the option to swap kernels for sandboxed runners later without changing the wire format. |
|
|
| --- |
|
|
| ## 7. RAG β private document search |
|
|
| Pipeline: **upload β chunk β embed β store β retrieve β augment**. |
|
|
| ``` |
| PDF/MD/TXT upload (POST /rag/upload) |
| β |
| βΌ |
| rag_service.ingest_document |
| β β’ text extraction (pypdf for PDF, plain read otherwise) |
| β β’ chunk_text(words=512, overlap=50) β simple word-window |
| β β’ for each chunk: |
| β emb = await llm_service.embed(text) β uses EMBEDDING_URL or vLLM |
| β qdrant.upsert(point=(uuid, emb, payload)) |
| β β’ RAGDocument row in Postgres with chunk count & status |
| βΌ |
| QUERY TIME (chat with rag context) |
| β |
| βΌ |
| rag_service.query(question, top_k=5) |
| β β’ emb_q = embed(question) |
| β β’ qdrant.search(collection, emb_q, top_k) |
| β β’ returns chunks + source metadata |
| βΌ |
| llm_service.chat with messages = [ |
| {role:"system", content: MAC_PROMPT + "\n\nContext:\n" + chunks}, |
| *user_messages, |
| ] |
| ``` |
|
|
| Collections (`RAGCollection`) namespace documents β e.g. one per subject. Documents (`RAGDocument`) track ownership and indexing status so the UI can show "Indexing 42/120 chunksβ¦". |
|
|
| --- |
|
|
| ## 8. Attendance β face-based check-in |
|
|
| ### 8.1 Models |
| - `FaceTemplate` β one per user, holds a face encoding (64-byte hash in dev; pluggable to `face_recognition`/`dlib` for production). |
| - `AttendanceSession` β created by faculty: `{branch, section, subject, date, window_minutes}`. |
| - `AttendanceRecord` β one per (session, student): `present | absent | late`, captured selfie hash, confidence, timestamp. |
|
|
| ### 8.2 Flow |
| ``` |
| 1. Faculty: POST /attendance/sessions β creates session, returns join token + QR |
| 2. Student: GET /attendance/active β returns currently open sessions for them |
| 3. Student: POST /attendance/check-in β uploads base64 selfie |
| server: |
| β’ decodes image |
| β’ hashes (sha256) β dedupe replay |
| β’ computes encoding |
| β’ compares to stored FaceTemplate |
| β’ if (match && within window) β AttendanceRecord(present) |
| β’ else β 401 with reason |
| 4. Faculty: GET /attendance/sessions/{id}/report β CSV / PDF roster |
| ``` |
|
|
| ### 8.3 Anti-cheat heuristics |
| - Session has a strict `window_minutes` β late arrivals are recorded as `late`, not `present`. |
| - Same selfie hash twice in a session β rejected (replay block). |
| - One record per (session, student) β UPSERT prevents stuffing. |
| - Production: swap `_compute_face_encoding` for the real `face_recognition.face_encodings()` (the call sites already accept it; only the function body changes). |
|
|
| --- |
|
|
| ## 9. Copy Check β exam paper evaluation |
|
|
| A faculty workflow that grades scanned answer sheets using vision-capable LLMs and runs cross-paper plagiarism detection. Models in `mac/models/copy_check.py`: |
|
|
| | Model | Role | |
| |---|---| |
| | `CopyCheckSession` | One exam: subject, class, total_marks, syllabus_text | |
| | `CopyCheckSheet` | One student's submission: roll, scanned pages, AI score, feedback | |
| | `CopyCheckPlagiarism` | Pairwise similarity between two sheets in the same session | |
|
|
| ### 9.1 Flow |
| ``` |
| Faculty creates session β uploads syllabus / answer key |
| β |
| βΌ |
| For each student answer sheet (PDF or image bundle): |
| β’ file saved under uploads/copy_check/{session_id}/{roll}/ |
| β’ AI vision model reads each page (multimodal LLM) |
| β’ Service builds a structured prompt: syllabus + answer key + student answer |
| β’ LLM returns { per_question_marks, total, weakness_summary, suggestions } |
| β’ CopyCheckSheet upserted with score + JSON feedback |
| β |
| βΌ |
| Plagiarism pass: |
| β’ difflib.SequenceMatcher on extracted text per pair within session |
| β’ CopyCheckPlagiarism row written for (sheet_a, sheet_b, similarity, flagged_passages) |
| β |
| βΌ |
| Faculty reviews: |
| β’ per-student PDF report (fpdf2) |
| β’ plagiarism heatmap |
| β’ can override AI marks before "publish" |
| ``` |
|
|
| ### 9.2 Why the AI doesn't have final authority |
| The faculty UI explicitly requires a **"Reviewed & Approved"** flag before any score becomes visible to students. The AI is graded as a *recommendation* β the audit trail records both the AI suggestion and the faculty's override. This is the legal/academic-integrity boundary. |
|
|
| --- |
|
|
| ## 10. Other domain modules (one-paragraph each) |
|
|
| - **Doubts forum** ([doubts.py](mac/routers/doubts.py)): students post questions; faculty/peers answer; AI generates a draft answer that the asker can accept or replace. Threaded, taggable. |
| - **File sharing** ([file_share.py](mac/routers/file_share.py)): admin/faculty upload class materials; per-file access scoping; per-download analytics in `file_downloads`. |
| - **Notifications** ([notifications.py](mac/routers/notifications.py)): in-app + Web Push (`pywebpush`); endpoints registered via VAPID; one row per user-notification with read/unread state. |
| - **Academic** ([academic.py](mac/routers/academic.py)): branches & sections β used to scope attendance, file sharing, and admin lists. |
| - **Doubt copy-check submissions** ([model_submission_service.py](mac/services/model_submission_service.py)): community-trained adapter / LoRA submissions queued for admin review before being published as model registry entries. |
| - **Search** ([search.py](mac/routers/search.py) + SearXNG): private metasearch, no Google, no telemetry, returned to the chat as a tool result. |
| - **Hardware / Network / System** ([hardware.py](mac/routers/hardware.py), [network.py](mac/routers/network.py), [system.py](mac/routers/system.py)): admin diagnostics β local CPU/GPU/RAM, recommended models for the detected GPU, LAN discovery (`mac/services/discovery.py` UDP broadcast on port 7700), version & update status (`mac/services/updater.py` polls GitHub releases). |
| - **Quota** ([quota.py](mac/routers/quota.py)): per-user requests/hour and tokens/day; admin can override per user; default from `RATE_LIMIT_*` env. |
| - **Guardrails** ([guardrails.py](mac/routers/guardrails.py) + `guardrail_service`): admin-editable ruleset (banned terms, forbidden topics) applied as a pre-check on chat input and a post-check on model output. |
|
|
| --- |
|
|
| ## 11. Cross-cutting concerns |
|
|
| ### 11.1 Configuration |
| **One source of truth:** [mac/config.py](mac/config.py) `Settings(BaseSettings)`. Every value reads from env or `.env`. `_fix_database_url` auto-promotes `postgres://` and `postgresql://` to `postgresql+asyncpg://` and strips `sslmode=` (it's handled in `connect_args` separately for Neon/Supabase). Adding a new tunable means: add a field to `Settings`, document it in `.env.example`, use `settings.your_field` everywhere β never read `os.environ` directly. |
|
|
| ### 11.2 Migrations |
| Alembic-managed. Two revisions today: |
| - `20260426_0001_initial_schema.py` β full original schema. |
| - `20260427_0002_session1_tables.py` β feature flags, system_config, branches, sections, cluster_heartbeats, shared_files, file_downloads, video_projects, video_jobs. |
|
|
| In dev (`MAC_ENV=development`), `init_db()` in `lifespan` creates tables idempotently from `Base.metadata`. In prod, you **must** run `alembic upgrade head` before serving traffic; tables are not auto-created. Whenever you add a column to a model, write a new revision. |
|
|
| ### 11.3 Background tasks |
| Started in `lifespan` and cancelled on shutdown: |
| - [updater.background_check_loop](mac/services/updater.py) β polls GitHub for new releases every `MAC_UPDATE_CHECK_INTERVAL_HOURS`. |
| - [discovery.start_discovery_server](mac/services/discovery.py) β UDP broadcast listener so worker PCs on the LAN can find the master without manual IP entry. |
|
|
| ### 11.4 Caching, blacklisting, rate limits |
| All Redis-backed with **graceful in-process fallback**: |
| - JWT blacklist β `mac:bl:{jti}` keys with TTL = remaining token life. |
| - Rate-limit counters β derived from `usage_log` rows (no Redis needed for counts). |
| - Session/feature caches β not implemented yet; designed to live under `mac:cache:*`. |
|
|
| ### 11.5 Observability |
| Every chat call is logged to `usage_log`: user_id, model_id, tokens_in, tokens_out, latency_ms, status, request_id (`generate_request_id` in `utils/security`). The dashboard route reads these for per-user charts. Cluster heartbeats are append-only into `cluster_heartbeats` so node history charts are just `SELECT β¦ ORDER BY ts`. |
|
|
| --- |
|
|
| ## 12. Frontend β SvelteKit PWA |
|
|
| ### 12.1 Stack |
| SvelteKit 2 + Svelte 5 + Tailwind 3 + Vite 6. Built as a static site (`@sveltejs/adapter-static` with `fallback: 'index.html'`) and served by Nginx in production, by Vite dev server with `/api` proxy to the FastAPI port in development. |
|
|
| ### 12.2 SPA mode |
| The root has `+layout.js` with `export const ssr = false; export const prerender = false;` so the entire app is rendered client-side. This is intentional β it sidesteps hydration issues, and there is no SEO need for an internal college tool. |
|
|
| ### 12.3 State |
| [src/lib/stores.js](frontend/src/lib/stores.js) holds Svelte stores: |
| - `authStore` β `{user, token, refreshToken}`, with `init()` that re-hydrates from `localStorage` and re-fetches `/auth/me`, plus `login`/`logout`. |
| - `setupStore` β `is_first_run` flag. |
| - `featureStore` β feature flag map for conditional UI. |
| - `chatStore` β local conversation history (per-session, not yet server-persisted). |
| - `toast` β single-message notifier. |
|
|
| ### 12.4 API client |
| [src/lib/api.js](frontend/src/lib/api.js) is the *only* place that talks HTTP. One `headers()` helper attaches the bearer token from `localStorage`. Each backend domain (`auth`, `query`, `models`, `cluster`, `rag`, `files`, β¦) is its own export with named methods. Adding a new endpoint = add a method here, never `fetch()` from a component directly. |
|
|
| ### 12.5 Auth/setup gate |
| [+layout.svelte](frontend/src/routes/+layout.svelte) boots the app on first paint: |
| 1. `initLocale()` β detect language from `localStorage` / browser. |
| 2. `authStore.init()` β restore session. |
| 3. `checkSetup()` β first-run check. |
| 4. `loadFeatures()` β fetch flags. |
| 5. Redirect: first-run β `/setup`, no user on protected route β `/login`, root β `/chat` or `/login`. |
| 6. Render either `Sidebar + slot` (logged in) or bare `slot` (login/setup). |
|
|
| ### 12.6 Internationalisation |
| [src/lib/i18n.js](frontend/src/lib/i18n.js) ships **19 Indian languages** with lazy-loaded string maps and an `RTL_LOCALES` set (Urdu) that flips the layout direction. Adding a new locale = add to `SUPPORTED_LOCALES`, drop a translation map, no other file changes. |
|
|
| ### 12.7 PWA + service worker |
| [static/manifest.json](frontend/static/manifest.json) declares the installable app + shortcuts. [static/sw.js](frontend/static/sw.js) is intentionally **caching-disabled** β every install/activate wipes all caches and there is no `fetch` handler. This was a deliberate decision: caching the SPA shell caused stale-build problems during rapid dev. Re-introduce caching only behind a versioned cache name with a clear invalidation strategy. |
|
|
| --- |
|
|
| ## 13. Deployment |
|
|
| ### 13.1 Master node (single command) |
| ```bash |
| cd frontend && npm install && npm run build && cd .. |
| cp .env.example .env # edit secrets |
| docker compose up postgres -d |
| docker compose run --rm mac alembic upgrade head |
| docker compose up -d |
| ``` |
| Compose brings up: `mac` (FastAPI), `postgres`, `redis`, `qdrant`, `searxng`, `vllm-speed`, `nginx`. (Whisper/TTS commented out by default.) |
|
|
| ### 13.2 Adding a worker |
| On master: |
| ```bash |
| curl -X POST http://MASTER:8000/api/v1/cluster/enroll-token \ |
| -H "Authorization: Bearer ADMIN_JWT" -d '{"label":"Lab PC 1","expires_hours":24}' |
| ``` |
| On the worker PC: |
| ```bash |
| MAC_MASTER_URL=http://MASTER:8000 \ |
| MAC_ENROLL_TOKEN=<token> \ |
| MAC_VLLM_PORT=8001 \ |
| docker compose -f docker-compose.worker.yml up -d |
| ``` |
| Then approve in admin β Cluster. |
|
|
| ### 13.3 HTTPS |
| Drop certs into `nginx/ssl/`, swap the bind-mounted config to `nginx/nginx.https.conf` in `docker-compose.yml`, restart Nginx. |
|
|
| ### 13.4 Windows installer |
| [installer/build_installer.ps1](installer/build_installer.ps1) builds a one-shot `dist/MAC-Installer.exe` (PyInstaller) that bootstraps Docker Desktop checks, clones/updates the repo, writes a sane `.env` with detected host IP, and starts the master stack. Branding assets are embedded base64 in [installer/embedded_assets.py](installer/embedded_assets.py) so the binary works even if image files are missing at runtime. |
|
|
| --- |
|
|
| ## 14. Security checklist (what every reviewer should verify) |
|
|
| 1. **No external API calls.** `grep -r "openai.com\|api.anthropic\|googleapis" mac/` should be empty. All inference is local. |
| 2. **JWT secret is not in env in production.** It's seeded in `system_config` on first boot and re-used across restarts. |
| 3. **JWT carries `jti`** and the auth middleware checks blacklist on every request. |
| 4. **Every router** requiring auth uses `Depends(get_current_user)` β search for any `@router.*` that doesn't and justify it. |
| 5. **Role guards** on admin-only operations: `Depends(require_admin)` on token mints, user list, cluster mutations, feature toggles, system restart. |
| 6. **Rate limits** on user-facing inference endpoints (`/query/*`, `/rag/query`). |
| 7. **Scoped keys** never logged in full; only the prefix is shown after creation. |
| 8. **Worker enrollment tokens** are single-use and time-limited (`expires_at` checked on register). |
| 9. **Heartbeats authenticate by `node_token`**, not by JWT β rotated on every approve/reactivate. |
| 10. **CORS:** `MAC_CORS_ORIGINS` defaults to `["*"]` for ease of dev; **set explicit origins in prod**. |
| 11. **Uploads:** `uploads/` is outside the static mount; copy-check sheets and RAG docs are served via authenticated endpoints, never directly. |
| 12. **WebSocket auth:** `notebook_ws` validates the JWT in the query string before `accept()`. Don't move the accept above the validation. |
| |
| --- |
| |
| ## 15. How to add a new feature (the recipe) |
| |
| 1. **Model:** add a SQLAlchemy class in `mac/models/<domain>.py`, import it in `mac/main.py::lifespan` so `Base.metadata` knows. |
| 2. **Migration:** `alembic revision --autogenerate -m "add <thing>"` β review β commit. |
| 3. **Schema:** Pydantic request/response in `mac/schemas/<domain>.py`. |
| 4. **Service:** pure logic in `mac/services/<domain>_service.py`. Takes `db: AsyncSession` and primitive args. No FastAPI types. |
| 5. **Router:** thin handler in `mac/routers/<domain>.py`. Order of `Depends`: `get_db` β `get_current_user` β `require_*` β `feature_required("β¦")` β `check_rate_limit` (only if user-driven inference). Mount in `mac/main.py`. |
| 6. **Feature flag:** add a default to `feature_seeder.DEFAULT_FLAGS` so it can be toggled per role from admin. |
| 7. **API client:** add a method to `frontend/src/lib/api.js` under the matching export. |
| 8. **Store (if it has UI state):** add to `frontend/src/lib/stores.js`. |
| 9. **Route:** new directory under `frontend/src/routes/<feature>/+page.svelte`. |
| 10. **Sidebar entry:** edit `frontend/src/lib/components/Sidebar.svelte`. |
| 11. **i18n:** add new strings to `BASE` in `frontend/src/lib/i18n.js`. |
| 12. **Test:** at least one happy-path + one auth-failure pytest in `tests/`. |
|
|
| Follow this and the system stays consistent. Skip steps and you'll end up with a feature that's invisible to the admin, untranslated, untested, or worse β bypassing the auth chain. |
|
|
| --- |
|
|
| *Last updated: 2026-04-27. If you change a subsystem and this file no longer matches reality, update it in the same PR.* |
|
|