# Collab Editor - Project Specification ## 1. Overview A collaborative, real-time scientific article editor deployed as a Hugging Face Space. Users write rich content (math, citations, custom components) in a TipTap-based editor synced via Yjs/Hocuspocus, then publish a self-contained static HTML article. An AI assistant helps with writing and editing. **Stack**: React 18 + TipTap 3 + Yjs (frontend) / Express + Hocuspocus + Node 20 (backend) / Docker on HF Spaces. No CSS-in-JS - all styling via vanilla CSS custom properties. **Relationship to `research-article-template`**: the CSS foundation, design tokens, and visual language come from the [research-article-template](https://huggingface.co/spaces/tfrere/research-article-template) project. The editor imports the template's CSS files (`_variables.css`, `_reset.css`, `_base.css`, `_layout.css`, component partials) and the publisher injects them inline into published HTML. The published output is designed to look identical to articles built with the Astro-based template. --- ## 2. Architecture ```mermaid graph TB subgraph browser [Browser] SPA["React SPA
TipTap + Yjs"] end subgraph server [Node Backend - port 8080] Express["Express HTTP"] Hocuspocus["Hocuspocus
WebSocket"] Publisher["Publisher Pipeline
HTML + PDF"] Agent["AI Agent
OpenRouter"] Auth["HF OAuth"] end subgraph storage [Persistence] LocalFS["Local FS
data/*.yjs"] HFDataset["HF Dataset
articles/ published/"] end SPA -->|"WebSocket /collab"| Hocuspocus SPA -->|"REST /api/*"| Express Hocuspocus -->|"Database ext"| LocalFS LocalFS -->|"schedulePush"| HFDataset HFDataset -->|"pullDocument"| LocalFS Express --> Publisher Express --> Agent Express --> Auth Publisher --> LocalFS Publisher -->|"uploadPublishedAssets"| HFDataset ``` **Single process in production**: the backend serves the Vite-built frontend, all REST APIs, the WebSocket collab channel, and static published articles. No reverse proxy needed. --- ## 3. Data Model (Yjs Shared Types) The entire collaborative state lives in a single `Y.Doc`: - **`Y.XmlFragment("default")`** - TipTap document content (ProseMirror nodes synced via Collaboration extension) - **`Y.Map("frontmatter")`** - scalar metadata: `title`, `subtitle`, `description`, `published`, `doi`, `template`, `licence` - **`Y.Array("frontmatter.authors")`** - `{ name, url?, affiliations: number[] }[]` - **`Y.Array("frontmatter.affiliations")`** - `{ name, url? }[]` - **`Y.Map("citations")`** - CSL-JSON entries keyed by citation ID - **`Y.Map("settings")`** - `citationStyle`, `primaryHue`, and future editor preferences - **`Y.Map("comments")`** - comment threads keyed by `commentId`, each with author/text/resolved All types are concurrently editable by multiple users and persist to `data/default.yjs`. --- ## 4. Backend Components ### 4.1 HTTP Routes | Method | Path | Auth | Purpose | |--------|------|------|---------| | `GET` | `/oauth/authorize` | Public | Redirect to HF OAuth | | `GET` | `/auth/callback` | Public (CSRF state) | Exchange code, set cookie, redirect to `/editor` | | `GET` | `/api/auth/status` | Cookie | Return `{ authenticated, canEdit, user }` | | `POST` | `/api/chat` | **None** | Stream AI agent responses (OpenRouter) | | `POST` | `/api/publish` | OAuth (canEdit) | Run publish pipeline, generate HTML/PDF | | `POST` | `/api/admin/reset-document` | OAuth (canEdit) | Delete local `.yjs`, close connections | | `POST` | `/api/upload` | None (uses cookie for HF) | Upload image (multipart, max 10MB) | | `POST` | `/api/citations/resolve` | None | Resolve DOI/URL to CSL-JSON | | `POST` | `/api/citations/format` | None | Format entries to HTML bibliography | | `POST` | `/api/citations/import-bib` | None | Parse BibTeX to CSL-JSON | | `GET` | `/editor` | OAuth (canEdit) | Serve SPA (or login page) | | `GET` | `*` | Public | Serve published article (or login page) | ### 4.2 WebSocket Collaboration - Upgrade on `/collab` only; all other paths rejected - Single document: `DEFAULT_DOC_NAME = "default"` - Hocuspocus `onAuthenticate`: validates OAuth token if enabled, checks `canEdit` - `Database` extension: `fetch` reads local `.yjs` or pulls from HF; `store` writes local + schedules HF push (10s debounce) ### 4.3 HF Storage - Dataset ID: `HF_DATASET_ID` or `{SPACE_ID}-data` - Token: `HF_TOKEN` (env) or cached OAuth token from last authenticated user - **Documents**: `articles/.yjs` - debounced push on every Hocuspocus store - **Published assets**: `published//{index.html, article.pdf, thumb.jpg, meta.json}` - **Images**: `images/` with public resolve URL - `flushAll()` on `SIGTERM`/`SIGINT` to push pending changes ### 4.4 Publisher Pipeline ```mermaid flowchart LR YDoc["Y.Doc (.yjs)"] --> Extract["extractFromYDoc
frontmatter + JSON"] Extract --> GenHTML["generateHTML
@tiptap/html"] GenHTML --> PostProc["postProcess
accordion, biblio,
mermaid, htmlEmbed"] PostProc --> Render["renderArticleHTML
full HTML page"] CSS["loadCSS
template styles"] --> Render Render --> LocalWrite["Write local
index.html"] Render --> PDF["Playwright
PDF + thumbnail"] LocalWrite --> HFUpload["uploadPublishedAssets
HF dataset"] ``` - **CSS loading**: reads template CSS files, resolves `@custom-media` queries via `resolveCustomMedia()`, splits into variables/reset/base/layout/components/article/print - **Post-processing**: accordion divs to `
`, bibliography injection, mermaid to `
`, htmlEmbed to `