# Collab Editor - Project Specification ## 1. Overview A collaborative, real-time scientific article editor deployed as a Hugging Face Space. Users write rich content (math, citations, custom components) in a TipTap-based editor synced via Yjs/Hocuspocus, then publish a self-contained static HTML article. An AI assistant helps with writing and editing. **Stack**: React 18 + TipTap 3 + Yjs (frontend) / Express + Hocuspocus + Node 20 (backend) / Docker on HF Spaces. No CSS-in-JS - all styling via vanilla CSS custom properties. **Relationship to `research-article-template`**: the CSS foundation, design tokens, and visual language come from the [research-article-template](https://huggingface.co/spaces/tfrere/research-article-template) project. The editor imports the template's CSS files (`_variables.css`, `_reset.css`, `_base.css`, `_layout.css`, component partials) and the publisher injects them inline into published HTML. The published output is designed to look identical to articles built with the Astro-based template. --- ## 2. Architecture ```mermaid graph TB subgraph browser [Browser] SPA["React SPATipTap + Yjs"] end subgraph server [Node Backend - port 8080] Express["Express HTTP"] Hocuspocus["HocuspocusWebSocket"] Publisher["Publisher PipelineHTML + PDF"] Agent["AI AgentOpenRouter"] Auth["HF OAuth"] end subgraph storage [Persistence] LocalFS["Local FSdata/*.yjs"] HFDataset["HF Datasetarticles/ published/"] end SPA -->|"WebSocket /collab"| Hocuspocus SPA -->|"REST /api/*"| Express Hocuspocus -->|"Database ext"| LocalFS LocalFS -->|"schedulePush"| HFDataset HFDataset -->|"pullDocument"| LocalFS Express --> Publisher Express --> Agent Express --> Auth Publisher --> LocalFS Publisher -->|"uploadPublishedAssets"| HFDataset ``` **Single process in production**: the backend serves the Vite-built frontend, all REST APIs, the WebSocket collab channel, and static published articles. No reverse proxy needed. --- ## 3. Data Model (Yjs Shared Types) The entire collaborative state lives in a single `Y.Doc`: - **`Y.XmlFragment("default")`** - TipTap document content (ProseMirror nodes synced via Collaboration extension) - **`Y.Map("frontmatter")`** - scalar metadata: `title`, `subtitle`, `description`, `published`, `doi`, `template`, `licence` - **`Y.Array("frontmatter.authors")`** - `{ name, url?, affiliations: number[] }[]` - **`Y.Array("frontmatter.affiliations")`** - `{ name, url? }[]` - **`Y.Map("citations")`** - CSL-JSON entries keyed by citation ID - **`Y.Map("settings")`** - `citationStyle`, `primaryHue`, and future editor preferences - **`Y.Map("comments")`** - comment threads keyed by `commentId`, each with author/text/resolved All types are concurrently editable by multiple users and persist to `data/default.yjs`. --- ## 4. Backend Components ### 4.1 HTTP Routes | Method | Path | Auth | Purpose | |--------|------|------|---------| | `GET` | `/oauth/authorize` | Public | Redirect to HF OAuth | | `GET` | `/auth/callback` | Public (CSRF state) | Exchange code, set cookie, redirect to `/editor` | | `GET` | `/api/auth/status` | Cookie | Return `{ authenticated, canEdit, user }` | | `POST` | `/api/chat` | **None** | Stream AI agent responses (OpenRouter) | | `POST` | `/api/publish` | OAuth (canEdit) | Run publish pipeline, generate HTML/PDF | | `POST` | `/api/admin/reset-document` | OAuth (canEdit) | Delete local `.yjs`, close connections | | `POST` | `/api/upload` | None (uses cookie for HF) | Upload image (multipart, max 10MB) | | `POST` | `/api/citations/resolve` | None | Resolve DOI/URL to CSL-JSON | | `POST` | `/api/citations/format` | None | Format entries to HTML bibliography | | `POST` | `/api/citations/import-bib` | None | Parse BibTeX to CSL-JSON | | `GET` | `/editor` | OAuth (canEdit) | Serve SPA (or login page) | | `GET` | `*` | Public | Serve published article (or login page) | ### 4.2 WebSocket Collaboration - Upgrade on `/collab` only; all other paths rejected - Single document: `DEFAULT_DOC_NAME = "default"` - Hocuspocus `onAuthenticate`: validates OAuth token if enabled, checks `canEdit` - `Database` extension: `fetch` reads local `.yjs` or pulls from HF; `store` writes local + schedules HF push (10s debounce) ### 4.3 HF Storage - Dataset ID: `HF_DATASET_ID` or `{SPACE_ID}-data` - Token: `HF_TOKEN` (env) or cached OAuth token from last authenticated user - **Documents**: `articles/.yjs` - debounced push on every Hocuspocus store - **Published assets**: `published//{index.html, article.pdf, thumb.jpg, meta.json}` - **Images**: `images/` with public resolve URL - `flushAll()` on `SIGTERM`/`SIGINT` to push pending changes ### 4.4 Publisher Pipeline ```mermaid flowchart LR YDoc["Y.Doc (.yjs)"] --> Extract["extractFromYDocfrontmatter + JSON"] Extract --> GenHTML["generateHTML@tiptap/html"] GenHTML --> PostProc["postProcessaccordion, biblio,mermaid, htmlEmbed"] PostProc --> Render["renderArticleHTMLfull HTML page"] CSS["loadCSStemplate styles"] --> Render Render --> LocalWrite["Write localindex.html"] Render --> PDF["PlaywrightPDF + thumbnail"] LocalWrite --> HFUpload["uploadPublishedAssetsHF dataset"] ``` - **CSS loading**: reads template CSS files, resolves `@custom-media` queries via `resolveCustomMedia()`, splits into variables/reset/base/layout/components/article/print - **Post-processing**: accordion divs to ``, bibliography injection, mermaid to ``, htmlEmbed to `` - **HTML output**: self-contained page with inline CSS, CDN assets (KaTeX, highlight.js, Mermaid), theme toggle (SVG sun/moon), TOC generation (scroll-based, collapsible), lightbox, footer with citation/BibTeX/DOI - **PDF**: optional Playwright Chromium headless (1200x630 thumbnail + full PDF) - **Server extensions**: mirror of frontend TipTap extensions for server-side HTML generation ### 4.5 Auth - Enabled when `SPACE_ID` + `OAUTH_CLIENT_ID` are set - OAuth 2.0 flow with HF as provider; cookie `hf_access_token` (httpOnly, secure, sameSite: none) - `resolveUser`: `whoAmI` via `@huggingface/hub`, then `checkWriteAccess` (Space owner or org member with write/admin role) - In-memory state map with 10-min TTL for CSRF protection ### 4.6 AI Agent - Provider: OpenRouter (`OPENROUTER_API_KEY`), default model `anthropic/claude-sonnet-4` - Streaming via Vercel AI SDK `streamText` - **Context**: document text, current selection, frontmatter (sent by frontend with each message) - **Tools** (declarative, executed client-side by the frontend): - `replaceSelection` - replace selected text - `insertAtCursor` - insert at cursor position - `applyDiff` - search/replace in document - `updateFrontmatter` - modify metadata fields - `addAuthor` / `removeAuthor` - manage author list - Agent edits are grouped in a single Yjs `UndoManager` batch for Cmd+Z ### 4.7 Citations - Uses `@citation-js/core` with bibtex, doi, csl plugins - Resolve: DOI URL or identifier to CSL-JSON entries - Format: entries + style + locale to HTML bibliography - Import: BibTeX string to CSL-JSON --- ## 5. Frontend Components ### 5.1 App Shell - **No router** - single view with conditional rendering - **Theme**: CSS custom properties with dynamic primary color from `settings.primaryHue` (OKLCH color model, synced via Yjs settings) - **Layout**: top bar (undo/redo, settings, publish, user chip) + 3-column CSS grid (TOC / editor / comments) - **Chat**: floating button bottom-left, `ChatPanel` overlay - **Modals**: comment dialog, settings drawer, publish confirmation ### 5.2 Editor - Creates `Y.Doc` + `HocuspocusProvider` (WebSocket to `/collab`) - **Seeding**: after provider `synced` event only, if `Y.XmlFragment("default")` is empty, inserts `DEFAULT_CONTENT` + `seedFrontmatter` + `SEED_CITATIONS` - **Yjs Maps**: `citations`, `settings`, `comments`, `frontmatter` (via dedicated stores) - **Image handling**: paste/drop with upload to `/api/upload` ### 5.3 TipTap Extensions **Built-in (configured)**: - StarterKit (no codeBlock, no undo), CodeBlockLowlight (all languages), Placeholder, Collaboration, CollaborationCursorV3, Mathematics (KaTeX), Image, Table/Row/Cell/Header **Custom**: - `CollaborationUndo` - bridges Yjs UndoManager for agent batch edits - `Comment` - inline mark with `commentId` + `resolved` - `SlashCommands` - `/` trigger with suggestion popup - `ImageUpload` - drag-drop upload node with progress - `Citation` - inline atomic node (key + label), links to `citationsMap` - `Bibliography` - block node with rendered HTML from citations - `Glossary` - inline atomic (term + definition tooltip) - `Footnote` - inline atomic (content shown in footer) - `Stack` + `StackColumn` - multi-column layout (2/3/4 cols) ### 5.4 Component System Registry-based system for MDX-like custom components: | Component | Kind | Purpose | |-----------|------|---------| | `accordion` | wrapper | Collapsible section (details/summary) | | `note` | wrapper | Info/warning/danger/success callout | | `quoteBlock` | wrapper | Styled blockquote | | `wide` | wrapper | Content wider than column | | `fullWidth` | wrapper | Full viewport width | | `sidenote` | wrapper | Marginal note | | `reference` | wrapper | Reference container | | `htmlEmbed` | atomic | External HTML embed (iframe) | | `hfUser` | atomic | HF user card | | `rawHtml` | atomic | Raw HTML injection | | `mermaid` | atomic | Mermaid diagram (live preview) | - **Factory**: `createComponentExtension(def)` generates TipTap nodes from registry definitions (handles both wrapper and atomic kinds) - **NodeViews**: `WrapperView` (editable content area + chrome), `AtomicView` (placeholder + field editor), `MermaidView` (textarea + SVG preview) - **Slash menu integration**: each component generates a slash menu item via `getComponentSlashItems()` ### 5.5 Frontmatter System - `FrontmatterStore`: wraps `Y.Map` + `Y.Array` for real-time collaborative metadata editing - `useFrontmatter` hook: React state synced with Yjs observations - `FrontmatterHero`: WYSIWYG editable hero section (title, subtitle, authors, affiliations, date, DOI) - `SettingsDrawer`: template variant, SEO, banner, citation style, primary color hue slider, PDF/TOC/licence toggles - `HueSlider`: OKLCH hue picker (0-360) with live preview, synced to `settingsMap.primaryHue` ### 5.6 Other UI - **`TableOfContents`**: extracts headings from TipTap doc, scroll-based active state, collapsible sub-sections - **`ChatPanel`**: message list + quick actions on selection + input with streaming - **`CommentPopover`**: positioned comment popover anchored to the active thread (resolve/delete inline) - **`BubbleToolbar`**: floating toolbar on text selection (bold, italic, link, comment, etc.) - **`BlockHandle`**: drag handle for block-level nodes ### 5.7 CSS Architecture ``` styles/ _variables.css # Template tokens: --primary-color, breakpoints, @custom-media _reset.css # Scoped reset for article content _base.css # Typography, scoped to article content _layout.css # 3-column grid, .wide/.full-width helpers _print.css # Print styles _ui.css # Editor chrome: buttons, dialogs, drawers, spinner tokens.css # Design tokens (light/dark): text, bg, accent, code, danger, shadows article.css # .tiptap content styles (shared editor/published) toc.css # Editor TOC overrides editing.css # Editor-only: layout, cursors, slash menu _publisher.css # Published-only: theme toggle, wide/fullWidth, footer, lightbox components/ _code.css # Code blocks + syntax highlighting _table.css # Tables _tag.css # Tags _card.css # Cards _mermaid.css # Mermaid diagrams _embed.css # Embed containers _embed-studio.css # Embed studio overlay _hero.css # Hero section (from template) _toc.css # Base TOC styles (from template) _button.css # Buttons (template) _form.css # Form elements (template) _footer.css # Footer (template) ``` The publisher reads these same CSS files server-side and injects them inline into published HTML, using `resolveCustomMedia()` to expand `@custom-media` queries into standard `@media` rules. --- ## 6. Deployment ### 6.1 Docker Build (3-stage) 1. **frontend-build**: `npm install` + `npm run build` (Vite) 2. **backend-build**: `npm install` + `npx tsc` 3. **runtime**: `node:20-slim` + Chromium system deps + `npm install --omit=dev` + Playwright Chromium + copy `frontend-dist/` + copy `frontend/src/styles/` to `frontend-styles/` **CMD**: `node dist/server.js` on port 8080. ### 6.2 HF Space Configuration (README.md frontmatter) - SDK: `docker`, port `8080` - OAuth: `hf_oauth: true`, scopes: `manage-repos` - Two git remotes: `space` (tfrere/collab-editor, dev) and `prod` (tfrere/research-article-template-editor, production) ### 6.3 Environment Variables | Variable | Required | Purpose | |----------|----------|---------| | `PORT` | No (default 8080) | HTTP listen port | | `NODE_ENV` | No | `production` switches to `frontend-dist` path | | `SPACE_ID` | For OAuth/HF | HF Space identifier, enables OAuth + dataset | | `SPACE_HOST` | For OAuth | HTTPS callback URL host | | `OAUTH_CLIENT_ID` | For OAuth | HF OAuth client | | `OAUTH_CLIENT_SECRET` | For OAuth | HF OAuth secret | | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes | | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) | | `HF_TOKEN` | No | Fallback Hub token for HF API | | `OPENROUTER_API_KEY` | For AI chat | OpenRouter API key | | `OPENROUTER_MODEL` | No | Default AI model | | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation | ### 6.4 Local Development ```bash # Terminal 1 - Backend cd backend && npm install && npm run dev # Starts on http://localhost:8080 # Terminal 2 - Frontend cd frontend && npm install && npm run dev # Starts on http://localhost:5678 (proxies /api and /collab to :8080) ``` Create a `.env` file in `backend/` with at minimum `OPENROUTER_API_KEY` for AI chat. Without `SPACE_ID`, OAuth is disabled and all users can edit. --- ## 7. Key Data Flows ### 7.1 Collaborative Editing ```mermaid sequenceDiagram participant ClientA as Client A participant Server as Hocuspocus participant ClientB as Client B participant Disk as Local FS participant HF as HF Dataset ClientA->>Server: WebSocket connect /collab Server->>Disk: Database.fetch (load .yjs) Server-->>ClientA: sync Y.Doc state ClientA->>Server: Y.Doc update (edit) Server->>ClientB: broadcast update Server->>Disk: Database.store (write .yjs) Disk-->>HF: schedulePush (10s debounce) ``` ### 7.2 Publish Flow ```mermaid sequenceDiagram participant User as Editor UI participant API as POST /api/publish participant HP as Hocuspocus participant Pub as Publisher participant FS as Local FS participant HF as HF Dataset User->>API: Click Publish API->>HP: openDirectConnection HP-->>API: Y.Doc snapshot API->>FS: Write .yjs snapshot API->>Pub: publishDocument() Pub->>Pub: extractFromYDoc + loadCSS Pub->>Pub: renderArticleHTML + PDF Pub->>FS: Write index.html locally Pub->>HF: uploadPublishedAssets Pub-->>API: { htmlUrl, pdfUrl, success } API-->>User: Publish result ``` ### 7.3 Published Article Lifecycle (Container Restarts) HF Spaces containers are ephemeral. The local filesystem is wiped on every restart (git push, Space rebuild, idle timeout). The published article survives via this restore flow: ```mermaid sequenceDiagram participant Container as New Container participant FS as Local FS participant HF as HF Dataset participant Visitor as GET / Container->>Container: Server starts Container->>HF: ensurePublishedRestored() HF-->>FS: Pull index.html, PDF, meta.json Note over FS: data/published/default/index.html Visitor->>Container: GET / Container->>FS: Check published path FS-->>Container: index.html exists Container-->>Visitor: Serve published article ``` On publish, HTML is **always written locally first** (so `GET /` serves the new version immediately), then uploaded to HF dataset for persistence across restarts. ### 7.4 AI Agent Chat ```mermaid sequenceDiagram participant User as Chat Panel participant Hook as useAgentChat participant API as POST /api/chat participant LLM as OpenRouter User->>Hook: sendMessage(text) Hook->>Hook: Build context (doc, selection, frontmatter) Hook->>API: { messages, context } API->>LLM: streamText (system prompt + tools) LLM-->>API: Stream (text + tool_calls) API-->>Hook: SSE stream Hook->>Hook: Execute tool calls client-side Note over Hook: replaceSelection, applyDiff,updateFrontmatter, etc. Hook->>Hook: UndoManager batch for Cmd+Z ``` --- ## 8. Current Limitations and Known Issues - **Test suite in progress**: P0 tests being added (see `docs/TESTS.md`) - **Single document**: only `"default"` document supported; no multi-doc - **Single-user token**: last OAuth token cached globally for all HF API calls - **No rate limiting** on `/api/chat` or `/api/citations/*` - **XSS surface**: `meta.licence` and `biblioHtml` not escaped in published HTML - **WS debug logging**: every WebSocket message logged in production - **No `.env.example`**: environment variables documented only in code
`, htmlEmbed to `` - **HTML output**: self-contained page with inline CSS, CDN assets (KaTeX, highlight.js, Mermaid), theme toggle (SVG sun/moon), TOC generation (scroll-based, collapsible), lightbox, footer with citation/BibTeX/DOI - **PDF**: optional Playwright Chromium headless (1200x630 thumbnail + full PDF) - **Server extensions**: mirror of frontend TipTap extensions for server-side HTML generation ### 4.5 Auth - Enabled when `SPACE_ID` + `OAUTH_CLIENT_ID` are set - OAuth 2.0 flow with HF as provider; cookie `hf_access_token` (httpOnly, secure, sameSite: none) - `resolveUser`: `whoAmI` via `@huggingface/hub`, then `checkWriteAccess` (Space owner or org member with write/admin role) - In-memory state map with 10-min TTL for CSRF protection ### 4.6 AI Agent - Provider: OpenRouter (`OPENROUTER_API_KEY`), default model `anthropic/claude-sonnet-4` - Streaming via Vercel AI SDK `streamText` - **Context**: document text, current selection, frontmatter (sent by frontend with each message) - **Tools** (declarative, executed client-side by the frontend): - `replaceSelection` - replace selected text - `insertAtCursor` - insert at cursor position - `applyDiff` - search/replace in document - `updateFrontmatter` - modify metadata fields - `addAuthor` / `removeAuthor` - manage author list - Agent edits are grouped in a single Yjs `UndoManager` batch for Cmd+Z ### 4.7 Citations - Uses `@citation-js/core` with bibtex, doi, csl plugins - Resolve: DOI URL or identifier to CSL-JSON entries - Format: entries + style + locale to HTML bibliography - Import: BibTeX string to CSL-JSON --- ## 5. Frontend Components ### 5.1 App Shell - **No router** - single view with conditional rendering - **Theme**: CSS custom properties with dynamic primary color from `settings.primaryHue` (OKLCH color model, synced via Yjs settings) - **Layout**: top bar (undo/redo, settings, publish, user chip) + 3-column CSS grid (TOC / editor / comments) - **Chat**: floating button bottom-left, `ChatPanel` overlay - **Modals**: comment dialog, settings drawer, publish confirmation ### 5.2 Editor - Creates `Y.Doc` + `HocuspocusProvider` (WebSocket to `/collab`) - **Seeding**: after provider `synced` event only, if `Y.XmlFragment("default")` is empty, inserts `DEFAULT_CONTENT` + `seedFrontmatter` + `SEED_CITATIONS` - **Yjs Maps**: `citations`, `settings`, `comments`, `frontmatter` (via dedicated stores) - **Image handling**: paste/drop with upload to `/api/upload` ### 5.3 TipTap Extensions **Built-in (configured)**: - StarterKit (no codeBlock, no undo), CodeBlockLowlight (all languages), Placeholder, Collaboration, CollaborationCursorV3, Mathematics (KaTeX), Image, Table/Row/Cell/Header **Custom**: - `CollaborationUndo` - bridges Yjs UndoManager for agent batch edits - `Comment` - inline mark with `commentId` + `resolved` - `SlashCommands` - `/` trigger with suggestion popup - `ImageUpload` - drag-drop upload node with progress - `Citation` - inline atomic node (key + label), links to `citationsMap` - `Bibliography` - block node with rendered HTML from citations - `Glossary` - inline atomic (term + definition tooltip) - `Footnote` - inline atomic (content shown in footer) - `Stack` + `StackColumn` - multi-column layout (2/3/4 cols) ### 5.4 Component System Registry-based system for MDX-like custom components: | Component | Kind | Purpose | |-----------|------|---------| | `accordion` | wrapper | Collapsible section (details/summary) | | `note` | wrapper | Info/warning/danger/success callout | | `quoteBlock` | wrapper | Styled blockquote | | `wide` | wrapper | Content wider than column | | `fullWidth` | wrapper | Full viewport width | | `sidenote` | wrapper | Marginal note | | `reference` | wrapper | Reference container | | `htmlEmbed` | atomic | External HTML embed (iframe) | | `hfUser` | atomic | HF user card | | `rawHtml` | atomic | Raw HTML injection | | `mermaid` | atomic | Mermaid diagram (live preview) | - **Factory**: `createComponentExtension(def)` generates TipTap nodes from registry definitions (handles both wrapper and atomic kinds) - **NodeViews**: `WrapperView` (editable content area + chrome), `AtomicView` (placeholder + field editor), `MermaidView` (textarea + SVG preview) - **Slash menu integration**: each component generates a slash menu item via `getComponentSlashItems()` ### 5.5 Frontmatter System - `FrontmatterStore`: wraps `Y.Map` + `Y.Array` for real-time collaborative metadata editing - `useFrontmatter` hook: React state synced with Yjs observations - `FrontmatterHero`: WYSIWYG editable hero section (title, subtitle, authors, affiliations, date, DOI) - `SettingsDrawer`: template variant, SEO, banner, citation style, primary color hue slider, PDF/TOC/licence toggles - `HueSlider`: OKLCH hue picker (0-360) with live preview, synced to `settingsMap.primaryHue` ### 5.6 Other UI - **`TableOfContents`**: extracts headings from TipTap doc, scroll-based active state, collapsible sub-sections - **`ChatPanel`**: message list + quick actions on selection + input with streaming - **`CommentPopover`**: positioned comment popover anchored to the active thread (resolve/delete inline) - **`BubbleToolbar`**: floating toolbar on text selection (bold, italic, link, comment, etc.) - **`BlockHandle`**: drag handle for block-level nodes ### 5.7 CSS Architecture ``` styles/ _variables.css # Template tokens: --primary-color, breakpoints, @custom-media _reset.css # Scoped reset for article content _base.css # Typography, scoped to article content _layout.css # 3-column grid, .wide/.full-width helpers _print.css # Print styles _ui.css # Editor chrome: buttons, dialogs, drawers, spinner tokens.css # Design tokens (light/dark): text, bg, accent, code, danger, shadows article.css # .tiptap content styles (shared editor/published) toc.css # Editor TOC overrides editing.css # Editor-only: layout, cursors, slash menu _publisher.css # Published-only: theme toggle, wide/fullWidth, footer, lightbox components/ _code.css # Code blocks + syntax highlighting _table.css # Tables _tag.css # Tags _card.css # Cards _mermaid.css # Mermaid diagrams _embed.css # Embed containers _embed-studio.css # Embed studio overlay _hero.css # Hero section (from template) _toc.css # Base TOC styles (from template) _button.css # Buttons (template) _form.css # Form elements (template) _footer.css # Footer (template) ``` The publisher reads these same CSS files server-side and injects them inline into published HTML, using `resolveCustomMedia()` to expand `@custom-media` queries into standard `@media` rules. --- ## 6. Deployment ### 6.1 Docker Build (3-stage) 1. **frontend-build**: `npm install` + `npm run build` (Vite) 2. **backend-build**: `npm install` + `npx tsc` 3. **runtime**: `node:20-slim` + Chromium system deps + `npm install --omit=dev` + Playwright Chromium + copy `frontend-dist/` + copy `frontend/src/styles/` to `frontend-styles/` **CMD**: `node dist/server.js` on port 8080. ### 6.2 HF Space Configuration (README.md frontmatter) - SDK: `docker`, port `8080` - OAuth: `hf_oauth: true`, scopes: `manage-repos` - Two git remotes: `space` (tfrere/collab-editor, dev) and `prod` (tfrere/research-article-template-editor, production) ### 6.3 Environment Variables | Variable | Required | Purpose | |----------|----------|---------| | `PORT` | No (default 8080) | HTTP listen port | | `NODE_ENV` | No | `production` switches to `frontend-dist` path | | `SPACE_ID` | For OAuth/HF | HF Space identifier, enables OAuth + dataset | | `SPACE_HOST` | For OAuth | HTTPS callback URL host | | `OAUTH_CLIENT_ID` | For OAuth | HF OAuth client | | `OAUTH_CLIENT_SECRET` | For OAuth | HF OAuth secret | | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes | | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) | | `HF_TOKEN` | No | Fallback Hub token for HF API | | `OPENROUTER_API_KEY` | For AI chat | OpenRouter API key | | `OPENROUTER_MODEL` | No | Default AI model | | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation | ### 6.4 Local Development ```bash # Terminal 1 - Backend cd backend && npm install && npm run dev # Starts on http://localhost:8080 # Terminal 2 - Frontend cd frontend && npm install && npm run dev # Starts on http://localhost:5678 (proxies /api and /collab to :8080) ``` Create a `.env` file in `backend/` with at minimum `OPENROUTER_API_KEY` for AI chat. Without `SPACE_ID`, OAuth is disabled and all users can edit. --- ## 7. Key Data Flows ### 7.1 Collaborative Editing ```mermaid sequenceDiagram participant ClientA as Client A participant Server as Hocuspocus participant ClientB as Client B participant Disk as Local FS participant HF as HF Dataset ClientA->>Server: WebSocket connect /collab Server->>Disk: Database.fetch (load .yjs) Server-->>ClientA: sync Y.Doc state ClientA->>Server: Y.Doc update (edit) Server->>ClientB: broadcast update Server->>Disk: Database.store (write .yjs) Disk-->>HF: schedulePush (10s debounce) ``` ### 7.2 Publish Flow ```mermaid sequenceDiagram participant User as Editor UI participant API as POST /api/publish participant HP as Hocuspocus participant Pub as Publisher participant FS as Local FS participant HF as HF Dataset User->>API: Click Publish API->>HP: openDirectConnection HP-->>API: Y.Doc snapshot API->>FS: Write .yjs snapshot API->>Pub: publishDocument() Pub->>Pub: extractFromYDoc + loadCSS Pub->>Pub: renderArticleHTML + PDF Pub->>FS: Write index.html locally Pub->>HF: uploadPublishedAssets Pub-->>API: { htmlUrl, pdfUrl, success } API-->>User: Publish result ``` ### 7.3 Published Article Lifecycle (Container Restarts) HF Spaces containers are ephemeral. The local filesystem is wiped on every restart (git push, Space rebuild, idle timeout). The published article survives via this restore flow: ```mermaid sequenceDiagram participant Container as New Container participant FS as Local FS participant HF as HF Dataset participant Visitor as GET / Container->>Container: Server starts Container->>HF: ensurePublishedRestored() HF-->>FS: Pull index.html, PDF, meta.json Note over FS: data/published/default/index.html Visitor->>Container: GET / Container->>FS: Check published path FS-->>Container: index.html exists Container-->>Visitor: Serve published article ``` On publish, HTML is **always written locally first** (so `GET /` serves the new version immediately), then uploaded to HF dataset for persistence across restarts. ### 7.4 AI Agent Chat ```mermaid sequenceDiagram participant User as Chat Panel participant Hook as useAgentChat participant API as POST /api/chat participant LLM as OpenRouter User->>Hook: sendMessage(text) Hook->>Hook: Build context (doc, selection, frontmatter) Hook->>API: { messages, context } API->>LLM: streamText (system prompt + tools) LLM-->>API: Stream (text + tool_calls) API-->>Hook: SSE stream Hook->>Hook: Execute tool calls client-side Note over Hook: replaceSelection, applyDiff,updateFrontmatter, etc. Hook->>Hook: UndoManager batch for Cmd+Z ``` --- ## 8. Current Limitations and Known Issues - **Test suite in progress**: P0 tests being added (see `docs/TESTS.md`) - **Single document**: only `"default"` document supported; no multi-doc - **Single-user token**: last OAuth token cached globally for all HF API calls - **No rate limiting** on `/api/chat` or `/api/citations/*` - **XSS surface**: `meta.licence` and `biblioHtml` not escaped in published HTML - **WS debug logging**: every WebSocket message logged in production - **No `.env.example`**: environment variables documented only in code