File size: 21,635 Bytes
ce5a372
 
 
 
 
 
32a4aca
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68dccd7
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68dccd7
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c362999
ce5a372
 
c362999
 
ce5a372
 
7a42df5
 
 
 
 
 
 
 
 
 
 
c362999
 
 
 
 
 
 
 
 
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3afbbdf
6b6afea
 
e9c2c73
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32a4aca
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32a4aca
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32a4aca
ce5a372
 
 
 
 
 
 
32a4aca
 
ce5a372
 
 
32a4aca
 
 
ce5a372
 
32a4aca
ce5a372
 
 
 
 
 
32a4aca
 
ce5a372
 
32a4aca
 
 
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b6afea
ce5a372
 
 
 
 
 
 
 
 
 
 
 
6b6afea
ce5a372
6b6afea
3afbbdf
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
6b6afea
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68dccd7
ce5a372
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32a4aca
ce5a372
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
# Collab Editor - Project Specification

## 1. Overview

A collaborative, real-time scientific article editor deployed as a Hugging Face Space. Users write rich content (math, citations, custom components) in a TipTap-based editor synced via Yjs/Hocuspocus, then publish a self-contained static HTML article. An AI assistant helps with writing and editing.

**Stack**: React 18 + TipTap 3 + Yjs (frontend) / Express + Hocuspocus + Node 20 (backend) / Docker on HF Spaces. No CSS-in-JS - all styling via vanilla CSS custom properties.

**Relationship to `research-article-template`**: the CSS foundation, design tokens, and visual language come from the [research-article-template](https://huggingface.co/spaces/tfrere/research-article-template) project. The editor imports the template's CSS files (`_variables.css`, `_reset.css`, `_base.css`, `_layout.css`, component partials) and the publisher injects them inline into published HTML. The published output is designed to look identical to articles built with the Astro-based template.

---

## 2. Architecture

```mermaid
graph TB
  subgraph browser [Browser]
    SPA["React SPA<br/>TipTap + Yjs"]
  end

  subgraph server [Node Backend - port 8080]
    Express["Express HTTP"]
    Hocuspocus["Hocuspocus<br/>WebSocket"]
    Publisher["Publisher Pipeline<br/>HTML + PDF"]
    Agent["AI Agent<br/>HF Inference"]
    Auth["HF OAuth"]
  end

  subgraph storage [Persistence]
    LocalFS["Local FS<br/>data/*.yjs"]
    HFDataset["HF Dataset<br/>articles/ published/"]
  end

  SPA -->|"WebSocket /collab"| Hocuspocus
  SPA -->|"REST /api/*"| Express
  Hocuspocus -->|"Database ext"| LocalFS
  LocalFS -->|"schedulePush"| HFDataset
  HFDataset -->|"pullDocument"| LocalFS
  Express --> Publisher
  Express --> Agent
  Express --> Auth
  Publisher --> LocalFS
  Publisher -->|"uploadPublishedAssets"| HFDataset
```

**Single process in production**: the backend serves the Vite-built frontend, all REST APIs, the WebSocket collab channel, and static published articles. No reverse proxy needed.

---

## 3. Data Model (Yjs Shared Types)

The entire collaborative state lives in a single `Y.Doc`:

- **`Y.XmlFragment("default")`** - TipTap document content (ProseMirror nodes synced via Collaboration extension)
- **`Y.Map("frontmatter")`** - scalar metadata: `title`, `subtitle`, `description`, `published`, `doi`, `template`, `licence`
- **`Y.Array("frontmatter.authors")`** - `{ name, url?, affiliations: number[] }[]`
- **`Y.Array("frontmatter.affiliations")`** - `{ name, url? }[]`
- **`Y.Map("citations")`** - CSL-JSON entries keyed by citation ID
- **`Y.Map("settings")`** - `citationStyle`, `primaryHue`, and future editor preferences
- **`Y.Map("comments")`** - comment threads keyed by `commentId`, each with author/text/resolved

All types are concurrently editable by multiple users and persist to `data/default.yjs`.

---

## 4. Backend Components

### 4.1 HTTP Routes

| Method | Path | Auth | Purpose |
|--------|------|------|---------|
| `GET` | `/oauth/authorize` | Public | Redirect to HF OAuth |
| `GET` | `/auth/callback` | Public (CSRF state) | Exchange code, set cookie, redirect to `/editor` |
| `GET` | `/api/auth/status` | Cookie | Return `{ authenticated, canEdit, user }` |
| `POST` | `/api/chat` | OAuth (optional) | Stream AI agent responses (HF Inference Providers) |
| `POST` | `/api/publish` | OAuth (canEdit) | Run publish pipeline, generate HTML/PDF |
| `POST` | `/api/admin/reset-document` | OAuth (canEdit) | Delete local `.yjs`, close connections |
| `POST` | `/api/upload` | None (uses cookie for HF) | Upload image (multipart, max 10MB) |
| `POST` | `/api/citations/resolve` | None | Resolve DOI/URL to CSL-JSON |
| `POST` | `/api/citations/format` | None | Format entries to HTML bibliography |
| `POST` | `/api/citations/import-bib` | None | Parse BibTeX to CSL-JSON |
| `GET` | `/editor` | OAuth (canEdit) | Serve SPA (or login page) |
| `GET` | `*` | Public | Serve published article (or login page) |

### 4.2 WebSocket Collaboration

- Upgrade on `/collab` only; all other paths rejected
- Single document: `DEFAULT_DOC_NAME = "default"`
- Hocuspocus `onAuthenticate`: validates OAuth token if enabled, checks `canEdit`
- `Database` extension: `fetch` reads local `.yjs` or pulls from HF; `store` writes local + schedules HF push (10s debounce)

### 4.3 HF Storage

- Dataset ID: `HF_DATASET_ID` or `{SPACE_ID}-data`
- Dataset is created **private by default** (`createRepo({ private: true })`). The OAuth grant needs `manage-repos` on the user's first write; subsequent containers reuse the cached token.
- Token: `HF_TOKEN` (env) or cached OAuth token from last authenticated user
- **Documents**: `articles/<name>.yjs` - debounced push on every Hocuspocus store
- **Published assets**: `published/<name>/{index.html, article.pdf, thumb.jpg, meta.json, llms.txt}`
- **Images**: `images/<uuid-filename>` referenced from articles via `/d/images/...` proxy URLs
- `flushAll()` on `SIGTERM`/`SIGINT` to push pending changes

### 4.3.1 Storage Status & Recovery

The persistence pipeline used to fail silently in multiple places (`createRepo` 403 on a missing scope, `uploadFile` 5xx mid-debounce, `writeFileSync` on a readonly FS, ...) and the editor would happily keep showing "Saved". To make data first-class:

- **In-memory tracker** in `hf-storage.ts` records `datasetReady`, `lastLocalSaveAt`, `lastCloudPushAt`, `pendingPush`, `lastError {stage, message, statusCode, at, docName}`. Every write path updates it; every error path records the failure.
- **`GET /api/storage/status`** exposes the tracker (canEdit-gated). The frontend `SyncIndicator` polls it every 5s and displays a three-state badge: green "Saved" / amber "Saving..." / **red "Storage error"** (pulsing, with the exact reason in the tooltip + actionable hint for the 403 / missing-scope case).
- **Eager `ensureDatasetExists`** on first `/api/auth/status` for a canEdit user. A misconfigured fork now surfaces its error within ~10s of login instead of waiting for an edit + 12s debounce cycle.
- **`beforeunload` guard** on the editor: if a local edit is in flight, a push is armed, the WS is offline, or the tracker reports an error, the browser pops the standard "Leave site?" confirm.
- **`GET /api/admin/export-doc`** (canEdit-gated) streams the on-disk `.yjs` snapshot as a download. The escape hatch for disaster recovery: when the cloud push has been failing and the container is about to rebuild, an admin can grab the doc bytes manually.

### 4.3.2 Dataset Reverse Proxy (`/d/*`)

Since the dataset is private, anonymous viewers of a published article can't fetch its images / PDF / og:image directly from `huggingface.co/datasets/...`. The editor server exposes `GET /d/:path*` as an authenticated forward-proxy:

- **Whitelist**: only `images/` and `published/` are reachable; `articles/` (raw `.yjs` drafts) is **always 404** regardless of caller.
- **Token cascade**: request cookie → cached user token → `HF_TOKEN` env → anonymous fetch. The cookie token is also promoted into the cache opportunistically, so the first signed-in viewer warms the proxy for subsequent anonymous viewers within the same container lifetime.
- **Streaming**: WHATWG body piped straight to the Express response - no buffering of full PDFs in Node memory.
- **Caching**: `images/*` is served as `immutable, max-age=1y` (UUID names, never overwritten); `published/*` as `max-age=300, stale-while-revalidate=60` (re-published in place).
- **Error mapping**: upstream 401/403 collapse to 502 so the browser never gets prompted for credentials it can't supply; upstream 404 passes through.

### 4.4 Publisher Pipeline

```mermaid
flowchart LR
  YDoc["Y.Doc (.yjs)"] --> Extract["extractFromYDoc<br/>frontmatter + JSON"]
  Extract --> GenHTML["generateHTML<br/>@tiptap/html"]
  GenHTML --> PostProc["postProcess<br/>accordion, biblio,<br/>mermaid, htmlEmbed"]
  PostProc --> Render["renderArticleHTML<br/>full HTML page"]
  CSS["loadCSS<br/>template styles"] --> Render
  Render --> LocalWrite["Write local<br/>index.html"]
  Render --> PDF["Playwright<br/>PDF + thumbnail"]
  LocalWrite --> HFUpload["uploadPublishedAssets<br/>HF dataset"]
```

- **CSS loading**: reads template CSS files, resolves `@custom-media` queries via `resolveCustomMedia()`, splits into variables/reset/base/layout/components/article/print
- **Post-processing**: accordion divs to `<details>`, bibliography injection, mermaid to `<pre>`, htmlEmbed to `<iframe>`
- **HTML output**: self-contained page with inline CSS, CDN assets (KaTeX, highlight.js, Mermaid), theme toggle (SVG sun/moon), TOC generation (scroll-based, collapsible), lightbox, footer with citation/BibTeX/DOI
- **PDF**: optional Playwright Chromium headless (1200x630 thumbnail + full PDF)
- **Server extensions**: mirror of frontend TipTap extensions for server-side HTML generation

### 4.5 Auth

- Enabled when `SPACE_ID` + `OAUTH_CLIENT_ID` are set
- OAuth 2.0 flow with HF as provider; cookie `hf_access_token` (httpOnly, secure, sameSite: none)
- `resolveUser`: `whoAmI` via `@huggingface/hub`, then `checkWriteAccess` (Space owner or org member with write/admin role)
- In-memory state map with 10-min TTL for CSRF protection

### 4.6 AI Agent

- Provider: Hugging Face Inference Providers (`https://router.huggingface.co/v1`), default model `openai/gpt-oss-120b`. Model ids may be suffixed with `:<provider>` (e.g. `meta-llama/Llama-3.3-70B-Instruct:together`) to bypass providers that enforce overly strict tool-call validation (notably Groq) or that don't support the `tools` parameter (Nscale, etc.).
- Auth: per-request bearer token resolved from the editor's OAuth cookie when available, falling back to the server-side `HF_TOKEN`. On a HF Space with `inference-api` scope, no extra secret is needed - the logged-in user pays for their own inference under their HF quota.
- Streaming via Vercel AI SDK `streamText` over `@ai-sdk/openai-compatible`
- Reasoning parts from prior assistant turns are stripped before re-sending the history: providers like Cerebras reject `reasoning_content` on round-trip, and the model doesn't need to see its own past reasoning to continue the conversation.
- **Context**: document text, current selection, frontmatter (sent by frontend with each message)
- **Tools** (declarative, executed client-side by the frontend):
  - `replaceSelection` - replace selected text
  - `insertAtCursor` - insert at cursor position
  - `applyDiff` - search/replace in document
  - `updateFrontmatter` - modify metadata fields
  - `addAuthor` / `removeAuthor` - manage author list
- Agent edits are grouped in a single Yjs `UndoManager` batch for Cmd+Z

### 4.7 Citations

- Uses `@citation-js/core` with bibtex, doi, csl plugins
- Resolve: DOI URL or identifier to CSL-JSON entries
- Format: entries + style + locale to HTML bibliography
- Import: BibTeX string to CSL-JSON

---

## 5. Frontend Components

### 5.1 App Shell

- **No router** - single view with conditional rendering
- **Theme**: CSS custom properties with dynamic primary color from `settings.primaryHue` (OKLCH color model, synced via Yjs settings)
- **Layout**: top bar (undo/redo, settings, publish, user chip) + 3-column CSS grid (TOC / editor / comments)
- **Chat**: floating button bottom-left, `ChatPanel` overlay
- **Modals**: comment dialog, settings drawer, publish confirmation

### 5.2 Editor

- Creates `Y.Doc` + `HocuspocusProvider` (WebSocket to `/collab`)
- **Seeding**: after provider `synced` event only, if `Y.XmlFragment("default")` is empty, inserts `DEFAULT_CONTENT` + `seedFrontmatter` + `SEED_CITATIONS`
- **Yjs Maps**: `citations`, `settings`, `comments`, `frontmatter` (via dedicated stores)
- **Image handling**: paste/drop with upload to `/api/upload`

### 5.3 TipTap Extensions

**Built-in (configured)**:
- StarterKit (no codeBlock, no undo), CodeBlockLowlight (all languages), Placeholder, Collaboration, CollaborationCursorV3, Mathematics (KaTeX), Image, Table/Row/Cell/Header

**Custom**:
- `CollaborationUndo` - bridges Yjs UndoManager for agent batch edits
- `Comment` - inline mark with `commentId` + `resolved`
- `SlashCommands` - `/` trigger with suggestion popup
- `ImageUpload` - drag-drop upload node with progress
- `Citation` - inline atomic node (key + label), links to `citationsMap`
- `Bibliography` - block node with rendered HTML from citations
- `Glossary` - inline atomic (term + definition tooltip)
- `Footnote` - inline atomic (content shown in footer)
- `Stack` + `StackColumn` - multi-column layout (2/3/4 cols)

### 5.4 Component System

Registry-based system for MDX-like custom components:

| Component | Kind | Purpose |
|-----------|------|---------|
| `accordion` | wrapper | Collapsible section (details/summary) |
| `note` | wrapper | Info/warning/danger/success callout |
| `quoteBlock` | wrapper | Styled blockquote |
| `wide` | wrapper | Content wider than column |
| `fullWidth` | wrapper | Full viewport width |
| `sidenote` | wrapper | Marginal note |
| `reference` | wrapper | Reference container |
| `htmlEmbed` | atomic | External HTML embed (iframe) |
| `hfUser` | atomic | HF user card |
| `rawHtml` | atomic | Raw HTML injection |
| `mermaid` | atomic | Mermaid diagram (live preview) |

- **Factory**: `createComponentExtension(def)` generates TipTap nodes from registry definitions (handles both wrapper and atomic kinds)
- **NodeViews**: `WrapperView` (editable content area + chrome), `AtomicView` (placeholder + field editor), `MermaidView` (textarea + SVG preview)
- **Slash menu integration**: each component generates a slash menu item via `getComponentSlashItems()`

### 5.5 Frontmatter System

- `FrontmatterStore`: wraps `Y.Map` + `Y.Array` for real-time collaborative metadata editing
- `useFrontmatter` hook: React state synced with Yjs observations
- `FrontmatterHero`: WYSIWYG editable hero section (title, subtitle, authors, affiliations, date, DOI)
- `SettingsDrawer`: template variant, SEO, banner, citation style, primary color hue slider, PDF/TOC/licence toggles
- `HueSlider`: OKLCH hue picker (0-360) with live preview, synced to `settingsMap.primaryHue`

### 5.6 Other UI

- **`TableOfContents`**: extracts headings from TipTap doc, scroll-based active state, collapsible sub-sections
- **`ChatPanel`**: message list + quick actions on selection + input with streaming
- **`CommentPopover`**: positioned comment popover anchored to the active thread (resolve/delete inline)
- **`BubbleToolbar`**: floating toolbar on text selection (bold, italic, link, comment, etc.)
- **`BlockHandle`**: drag handle for block-level nodes

### 5.7 CSS Architecture

```
styles/
  _variables.css       # Template tokens: --primary-color, breakpoints, @custom-media
  _reset.css           # Scoped reset for article content
  _base.css            # Typography, scoped to article content
  _layout.css          # 3-column grid, .wide/.full-width helpers
  _print.css           # Print styles
  _ui.css              # Editor chrome: buttons, dialogs, drawers, spinner
  tokens.css           # Design tokens (light/dark): text, bg, accent, code, danger, shadows
  article.css          # .tiptap content styles (shared editor/published)
  toc.css              # Editor TOC overrides
  editing.css          # Editor-only: layout, cursors, slash menu
  _publisher.css       # Published-only: theme toggle, wide/fullWidth, footer, lightbox
  components/
    _code.css          # Code blocks + syntax highlighting
    _table.css         # Tables
    _tag.css           # Tags
    _card.css          # Cards
    _mermaid.css       # Mermaid diagrams
    _embed.css         # Embed containers
    _embed-studio.css  # Embed studio overlay
    _hero.css          # Hero section (from template)
    _toc.css           # Base TOC styles (from template)
    _button.css        # Buttons (template)
    _form.css          # Form elements (template)
    _footer.css        # Footer (template)
```

The publisher reads these same CSS files server-side and injects them inline into published HTML, using `resolveCustomMedia()` to expand `@custom-media` queries into standard `@media` rules.

---

## 6. Deployment

### 6.1 Docker Build (3-stage)

1. **frontend-build**: `npm install` + `npm run build` (Vite)
2. **backend-build**: `npm install` + `npx tsc`
3. **runtime**: `node:20-slim` + Chromium system deps + `npm install --omit=dev` + Playwright Chromium + copy `frontend-dist/` + copy `frontend/src/styles/` to `frontend-styles/`

**CMD**: `node dist/server.js` on port 8080.

### 6.2 HF Space Configuration (README.md frontmatter)

- SDK: `docker`, port `8080`
- OAuth: `hf_oauth: true`, scopes: `manage-repos`, `inference-api`
- Two git remotes: `space` (tfrere/collab-editor, dev) and `prod` (tfrere/research-article-template-editor, production)

### 6.3 Environment Variables

| Variable | Required | Purpose |
|----------|----------|---------|
| `PORT` | No (default 8080) | HTTP listen port |
| `NODE_ENV` | No | `production` switches to `frontend-dist` path |
| `SPACE_ID` | For OAuth/HF | HF Space identifier, enables OAuth + dataset |
| `SPACE_HOST` | For OAuth | HTTPS callback URL host |
| `OAUTH_CLIENT_ID` | For OAuth | HF OAuth client |
| `OAUTH_CLIENT_SECRET` | For OAuth | HF OAuth secret |
| `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes. Add `manage-repos` for dataset persistence and `inference-api` to power AI features with the user's token |
| `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
| `HF_TOKEN` | For AI chat in local dev | Fallback Hub token for HF API + Inference Providers. Needs the "Make calls to Inference Providers" permission |
| `HF_INFERENCE_MODEL` | No (default `openai/gpt-oss-120b`) | Default chat-completion model id served by HF Inference Providers. May be suffixed with `:<provider>` to pin a specific routing |
| `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |

### 6.4 Local Development

```bash
# Terminal 1 - Backend
cd backend && npm install && npm run dev
# Starts on http://localhost:8080

# Terminal 2 - Frontend
cd frontend && npm install && npm run dev
# Starts on http://localhost:5678 (proxies /api and /collab to :8080)
```

Create a `.env` file in `backend/` with at minimum `HF_TOKEN` for AI chat (must have the "Make calls to Inference Providers" permission). Without `SPACE_ID`, OAuth is disabled and all users can edit.

---

## 7. Key Data Flows

### 7.1 Collaborative Editing

```mermaid
sequenceDiagram
  participant ClientA as Client A
  participant Server as Hocuspocus
  participant ClientB as Client B
  participant Disk as Local FS
  participant HF as HF Dataset

  ClientA->>Server: WebSocket connect /collab
  Server->>Disk: Database.fetch (load .yjs)
  Server-->>ClientA: sync Y.Doc state
  ClientA->>Server: Y.Doc update (edit)
  Server->>ClientB: broadcast update
  Server->>Disk: Database.store (write .yjs)
  Disk-->>HF: schedulePush (10s debounce)
```

### 7.2 Publish Flow

```mermaid
sequenceDiagram
  participant User as Editor UI
  participant API as POST /api/publish
  participant HP as Hocuspocus
  participant Pub as Publisher
  participant FS as Local FS
  participant HF as HF Dataset

  User->>API: Click Publish
  API->>HP: openDirectConnection
  HP-->>API: Y.Doc snapshot
  API->>FS: Write .yjs snapshot
  API->>Pub: publishDocument()
  Pub->>Pub: extractFromYDoc + loadCSS
  Pub->>Pub: renderArticleHTML + PDF
  Pub->>FS: Write index.html locally
  Pub->>HF: uploadPublishedAssets
  Pub-->>API: { htmlUrl, pdfUrl, success }
  API-->>User: Publish result
```

### 7.3 Published Article Lifecycle (Container Restarts)

HF Spaces containers are ephemeral. The local filesystem is wiped on every restart (git push, Space rebuild, idle timeout). The published article survives via this restore flow:

```mermaid
sequenceDiagram
  participant Container as New Container
  participant FS as Local FS
  participant HF as HF Dataset
  participant Visitor as GET /

  Container->>Container: Server starts
  Container->>HF: ensurePublishedRestored()
  HF-->>FS: Pull index.html, PDF, meta.json
  Note over FS: data/published/default/index.html

  Visitor->>Container: GET /
  Container->>FS: Check published path
  FS-->>Container: index.html exists
  Container-->>Visitor: Serve published article
```

On publish, HTML is **always written locally first** (so `GET /` serves the new version immediately), then uploaded to HF dataset for persistence across restarts.

### 7.4 AI Agent Chat

```mermaid
sequenceDiagram
  participant User as Chat Panel
  participant Hook as useAgentChat
  participant API as POST /api/chat
  participant LLM as HF Inference

  User->>Hook: sendMessage(text)
  Hook->>Hook: Build context (doc, selection, frontmatter)
  Hook->>API: { messages, context }
  API->>LLM: streamText (system prompt + tools)
  LLM-->>API: Stream (text + tool_calls)
  API-->>Hook: SSE stream
  Hook->>Hook: Execute tool calls client-side
  Note over Hook: replaceSelection, applyDiff,<br/>updateFrontmatter, etc.
  Hook->>Hook: UndoManager batch for Cmd+Z
```

---

## 8. Current Limitations and Known Issues

- **Test suite in progress**: P0 tests being added (see `docs/TESTS.md`)
- **Single document**: only `"default"` document supported; no multi-doc
- **Single-user token**: last OAuth token cached globally for all HF API calls
- **No rate limiting** on `/api/chat` or `/api/citations/*`
- **XSS surface**: `meta.licence` and `biblioHtml` not escaped in published HTML
- **WS debug logging**: every WebSocket message logged in production
- **No `.env.example`**: environment variables documented only in code