tfrere HF Staff Cursor commited on
Commit
7a42df5
·
1 Parent(s): c362999

feat(storage): first-class data - no silent failures in the persistence pipeline

Browse files

The "Saved" label in the top bar used to be a comforting lie:
several failure modes (createRepo 403 on a missing manage-repos
scope, writeFileSync on a readonly FS, uploadFile mid-debounce
network blip) all collapsed into a console.error that the editor
never surfaced. A user could type for an hour, see "Saved" the
whole time, lose their tab, and discover only after the container
rebuilt that nothing had ever made it to the dataset.

Make every link in the chain observable, and make the worst signal
always win in the UI.

Backend
-------
- In-memory `StorageStatus` tracker in hf-storage.ts:
- `datasetReady` (createRepo succeeded or returned 409)
- `lastLocalSaveAt` / `lastCloudPushAt` epoch ms
- `pendingPush` (a debounce timer is armed)
- `lastError {stage, message, statusCode, at, docName}` where
`stage` is one of `dataset-create | local-save | cloud-push`
Every write path records success; every error path records the
failure. Success on one stage clears the error for THAT stage
only, so a cloud-push recovery doesn't paper over an underlying
dataset-create issue.
- `recordLocalSaveError` is called from persistence.ts so a
failed writeFileSync (disk full, readonly FS, perms) doesn't
evaporate. The previous catch logged and moved on - the new
one logs AND surfaces.
- `pushDocument` now decorates the error log with the HTTP status
for triage, AND pushes a structured record into the tracker.
- New route `GET /api/storage/status` returns the tracker. Gated
to canEdit so anonymous viewers never see dataset names /
error details. Cheap (in-memory map, no I/O).
- New route `GET /api/admin/export-doc?name=<doc>` streams the
on-disk `.yjs` snapshot to the caller as a date-stamped
download. Same canEdit gate. This is the disaster-recovery
escape hatch: when the cloud push has been failing and the
container is about to rebuild, an admin grabs the bytes
manually instead of relying on Dev Mode SSH.
- Eager `ensureDatasetExists` in `/api/auth/status` for canEdit
users. Before: a misconfigured fork looked perfectly healthy
for 12+ seconds (until the first edit's debounce window
closed). Now: the error lands in the tracker within ~1s of
login, and the next SyncIndicator poll (5s) flashes the red
badge with the exact reason in the tooltip.

Frontend
--------
- SyncIndicator rebuilt as a three-state badge:
- 🟢 saved WS connected + tracker green
- 🟡 pending local edit in flight OR backend push armed
- ⚫ offline WS disconnected (network or container restart)
- 🔴 error tracker reports lastError
Severity ordering: error > offline > pending > saved. The
worst applicable signal always wins, so a green "Saved" can
never overrule a red "Sync failed".
- Polls `/api/storage/status` every 5s. Cheap GET; status fits
in a few hundred bytes. We pick polling over SSE because the
data is small, the interval is generous, and adding another
long-lived connection is more failure modes than it's worth.
- Tooltip on the error state shows the exact backend message
plus an actionable hint for the most common cause (`statusCode
=== 403` => "your OAuth grant may be missing manage-repos,
sign out and back in").
- `beforeunload` guard pops the browser's "Leave site?" prompt
when state is pending OR error OR offline. The user can no
longer close the tab without being warned that their data
isn't safely synced yet.
- The error badge gets a subtle pulse (background fade between
12% and 22% red) to draw the eye without being obnoxious.
Pending and offline stay flat to avoid attention-seeking noise
during normal use.

Docs
----
- SPECIFICATION.md gains §4.3.1 (Storage Status & Recovery)
describing the tracker, the routes, the SyncIndicator
behaviour, and the recovery escape hatch.
- TESTS.md gains rows 2.4.1 - 2.4.6 covering the error
surface, auth gates and admin export contract.

Co-authored-by: Cursor <cursoragent@cursor.com>

backend/src/create-app.ts CHANGED
@@ -16,6 +16,7 @@ import { createChatRouter } from "./routes/chat.js";
16
  import { createPublishRouter } from "./routes/publish.js";
17
  import { createUploadRouter } from "./routes/upload.js";
18
  import { createDatasetProxyRouter } from "./routes/dataset-proxy.js";
 
19
 
20
  export { debouncedSave, resetSaveTimers, resetPublishedRestored } from "./persistence.js";
21
 
@@ -409,6 +410,7 @@ export function createApp() {
409
  app.use("/api/citations", citationsRouter);
410
  app.use(createPublishRouter({ oauthEnabled, hocuspocus }));
411
  app.use(createUploadRouter());
 
412
  // Reverse proxy for private-dataset assets. Mounted before any
413
  // static serving so `/d/*` always wins, never falls through to a
414
  // 404 from express.static.
 
16
  import { createPublishRouter } from "./routes/publish.js";
17
  import { createUploadRouter } from "./routes/upload.js";
18
  import { createDatasetProxyRouter } from "./routes/dataset-proxy.js";
19
+ import { createStorageRouter } from "./routes/storage.js";
20
 
21
  export { debouncedSave, resetSaveTimers, resetPublishedRestored } from "./persistence.js";
22
 
 
410
  app.use("/api/citations", citationsRouter);
411
  app.use(createPublishRouter({ oauthEnabled, hocuspocus }));
412
  app.use(createUploadRouter());
413
+ app.use(createStorageRouter({ oauthEnabled }));
414
  // Reverse proxy for private-dataset assets. Mounted before any
415
  // static serving so `/d/*` always wins, never falls through to a
416
  // 404 from express.static.
backend/src/hf-storage.ts CHANGED
@@ -42,6 +42,122 @@ export function isHfStorageEnabled(): boolean {
42
  return Boolean(HF_DATASET_ID && (_cachedToken || ENV_TOKEN));
43
  }
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  /**
46
  * Public-facing base URL the editor is reachable at, used to build
47
  * absolute proxy URLs for assets that need to render outside the
@@ -97,10 +213,12 @@ export async function ensureDatasetExists(token?: string): Promise<void> {
97
  `[hf-storage] failed to create dataset ${HF_DATASET_ID}` +
98
  ` (status=${statusCode ?? "unknown"}): ${message}`,
99
  );
 
100
  return;
101
  }
102
  }
103
  _datasetReady = true;
 
104
  }
105
 
106
  // ---------- Images ----------
@@ -139,12 +257,18 @@ export function schedulePush(docName: string, state: Buffer): void {
139
 
140
  const timer = setTimeout(() => pushDocument(docName), DEBOUNCE_MS);
141
  dirtyDocs.set(docName, { state, timer });
 
142
  }
143
 
144
  async function pushDocument(docName: string): Promise<void> {
145
  const entry = dirtyDocs.get(docName);
146
  if (!entry) return;
147
  dirtyDocs.delete(docName);
 
 
 
 
 
148
 
149
  await ensureDatasetExists();
150
 
@@ -159,8 +283,14 @@ async function pushDocument(docName: string): Promise<void> {
159
  commitTitle: `save ${safeName}`,
160
  });
161
  console.log(`[hf-storage] pushed ${path}`);
162
- } catch (err) {
163
- console.error(`[hf-storage] failed to push ${path}:`, err);
 
 
 
 
 
 
164
  }
165
  }
166
 
 
42
  return Boolean(HF_DATASET_ID && (_cachedToken || ENV_TOKEN));
43
  }
44
 
45
+ // ============================================================
46
+ // Storage status tracker
47
+ // ============================================================
48
+ //
49
+ // Every silent failure point in the persistence pipeline used to
50
+ // disappear into `console.error` and the editor would happily
51
+ // keep showing "Saved". This tracker is the single source of
52
+ // truth for "what's the actual state of my data right now":
53
+ // every write/error path updates it, and the /api/storage/status
54
+ // endpoint surfaces it to the frontend SyncIndicator so the user
55
+ // sees the truth, not a comforting lie.
56
+ //
57
+ // Stages we care about:
58
+ // - `dataset-create` : ensureDatasetExists couldn't create the
59
+ // backing repo (most common: missing manage-repos scope)
60
+ // - `local-save` : writeFileSync into data/<name>.yjs failed
61
+ // (disk full, readonly FS, permission denied)
62
+ // - `cloud-push` : uploadFile into the dataset failed (HF
63
+ // down, token expired, network blip)
64
+ //
65
+ // We deliberately keep this in-memory: it's per-container,
66
+ // non-critical, and the frontend polls every few seconds anyway.
67
+
68
+ export type StorageErrorStage = "dataset-create" | "local-save" | "cloud-push";
69
+
70
+ export interface StorageError {
71
+ stage: StorageErrorStage;
72
+ message: string;
73
+ statusCode?: number;
74
+ at: number;
75
+ /** Doc name when the failure is per-doc (local-save, cloud-push). */
76
+ docName?: string;
77
+ }
78
+
79
+ export interface StorageStatus {
80
+ enabled: boolean;
81
+ datasetId: string;
82
+ datasetReady: boolean;
83
+ /** ms epoch of the last successful local writeFileSync. */
84
+ lastLocalSaveAt: number | null;
85
+ /** ms epoch of the last successful HF dataset push. */
86
+ lastCloudPushAt: number | null;
87
+ /** True while a debounced push timer is armed. */
88
+ pendingPush: boolean;
89
+ /** Last error in the pipeline, or null if everything's fine. */
90
+ lastError: StorageError | null;
91
+ }
92
+
93
+ const status: StorageStatus = {
94
+ enabled: false,
95
+ datasetId: HF_DATASET_ID,
96
+ datasetReady: false,
97
+ lastLocalSaveAt: null,
98
+ lastCloudPushAt: null,
99
+ pendingPush: false,
100
+ lastError: null,
101
+ };
102
+
103
+ export function getStorageStatus(): StorageStatus {
104
+ // Refresh `enabled` on every read - it depends on whether we've
105
+ // received a user token yet, which can flip mid-session.
106
+ return { ...status, enabled: isHfStorageEnabled() };
107
+ }
108
+
109
+ export function recordLocalSave(docName: string): void {
110
+ status.lastLocalSaveAt = Date.now();
111
+ // Local save success clears a prior local-save error for the
112
+ // same doc, but leaves cloud-push / dataset-create errors alone
113
+ // since those are independent.
114
+ if (status.lastError?.stage === "local-save" && status.lastError.docName === docName) {
115
+ status.lastError = null;
116
+ }
117
+ }
118
+
119
+ export function recordLocalSaveError(docName: string, err: unknown): void {
120
+ status.lastError = {
121
+ stage: "local-save",
122
+ message: (err as Error)?.message || String(err),
123
+ at: Date.now(),
124
+ docName,
125
+ };
126
+ }
127
+
128
+ function recordCloudPush(docName: string): void {
129
+ status.lastCloudPushAt = Date.now();
130
+ if (status.lastError?.stage === "cloud-push" && status.lastError.docName === docName) {
131
+ status.lastError = null;
132
+ }
133
+ }
134
+
135
+ function recordCloudPushError(docName: string, err: unknown, statusCode?: number): void {
136
+ status.lastError = {
137
+ stage: "cloud-push",
138
+ message: (err as Error)?.message || String(err),
139
+ statusCode,
140
+ at: Date.now(),
141
+ docName,
142
+ };
143
+ }
144
+
145
+ function recordDatasetReady(): void {
146
+ status.datasetReady = true;
147
+ if (status.lastError?.stage === "dataset-create") {
148
+ status.lastError = null;
149
+ }
150
+ }
151
+
152
+ function recordDatasetError(err: unknown, statusCode?: number): void {
153
+ status.lastError = {
154
+ stage: "dataset-create",
155
+ message: (err as Error)?.message || String(err),
156
+ statusCode,
157
+ at: Date.now(),
158
+ };
159
+ }
160
+
161
  /**
162
  * Public-facing base URL the editor is reachable at, used to build
163
  * absolute proxy URLs for assets that need to render outside the
 
213
  `[hf-storage] failed to create dataset ${HF_DATASET_ID}` +
214
  ` (status=${statusCode ?? "unknown"}): ${message}`,
215
  );
216
+ recordDatasetError(err, statusCode);
217
  return;
218
  }
219
  }
220
  _datasetReady = true;
221
+ recordDatasetReady();
222
  }
223
 
224
  // ---------- Images ----------
 
257
 
258
  const timer = setTimeout(() => pushDocument(docName), DEBOUNCE_MS);
259
  dirtyDocs.set(docName, { state, timer });
260
+ status.pendingPush = true;
261
  }
262
 
263
  async function pushDocument(docName: string): Promise<void> {
264
  const entry = dirtyDocs.get(docName);
265
  if (!entry) return;
266
  dirtyDocs.delete(docName);
267
+ // Update `pendingPush` based on whether OTHER docs still have a
268
+ // timer armed. A single editor only ever touches one doc, so in
269
+ // practice this is always false after `delete`, but the multi-doc
270
+ // case must not lie either.
271
+ status.pendingPush = dirtyDocs.size > 0;
272
 
273
  await ensureDatasetExists();
274
 
 
283
  commitTitle: `save ${safeName}`,
284
  });
285
  console.log(`[hf-storage] pushed ${path}`);
286
+ recordCloudPush(docName);
287
+ } catch (err: any) {
288
+ const statusCode = err?.statusCode ?? err?.status;
289
+ console.error(
290
+ `[hf-storage] failed to push ${path}` +
291
+ ` (status=${statusCode ?? "unknown"}): ${(err as Error)?.message || err}`,
292
+ );
293
+ recordCloudPushError(docName, err, statusCode);
294
  }
295
  }
296
 
backend/src/persistence.ts CHANGED
@@ -8,6 +8,8 @@ import {
8
  setUserToken,
9
  pullPublishedAssets,
10
  schedulePush,
 
 
11
  } from "./hf-storage.js";
12
 
13
  const DEFAULT_DOC_NAME = "default";
@@ -27,12 +29,19 @@ export function debouncedSave(documentName: string, ydoc: Y.Doc) {
27
  const buf = Buffer.from(state);
28
  writeFileSync(docPath(documentName), buf);
29
  lastSaveTimestamp.set(documentName, Date.now());
 
30
  console.log(`[persist] saved "${documentName}": ${buf.length} bytes`);
31
 
32
  if (isHfStorageEnabled()) {
33
  schedulePush(documentName, buf);
34
  }
35
  } catch (err) {
 
 
 
 
 
 
36
  console.error(`[persist] failed to save "${documentName}":`, (err as Error).message);
37
  }
38
  }, SAVE_DEBOUNCE_MS));
 
8
  setUserToken,
9
  pullPublishedAssets,
10
  schedulePush,
11
+ recordLocalSave,
12
+ recordLocalSaveError,
13
  } from "./hf-storage.js";
14
 
15
  const DEFAULT_DOC_NAME = "default";
 
29
  const buf = Buffer.from(state);
30
  writeFileSync(docPath(documentName), buf);
31
  lastSaveTimestamp.set(documentName, Date.now());
32
+ recordLocalSave(documentName);
33
  console.log(`[persist] saved "${documentName}": ${buf.length} bytes`);
34
 
35
  if (isHfStorageEnabled()) {
36
  schedulePush(documentName, buf);
37
  }
38
  } catch (err) {
39
+ // A failed local save is the most dangerous silent failure
40
+ // because the WS layer already ack'd the edit to the client,
41
+ // who therefore sees "Saved" while nothing is on disk. Push
42
+ // the error into the status tracker so the SyncIndicator can
43
+ // flip to "Local save failed" within a few seconds.
44
+ recordLocalSaveError(documentName, err);
45
  console.error(`[persist] failed to save "${documentName}":`, (err as Error).message);
46
  }
47
  }, SAVE_DEBOUNCE_MS));
backend/src/routes/auth.ts CHANGED
@@ -7,7 +7,7 @@ import {
7
  handleOAuthCallback,
8
  handleOAuthLogout,
9
  } from "../auth.js";
10
- import { setUserToken } from "../hf-storage.js";
11
  import { ensurePublishedRestored } from "../persistence.js";
12
 
13
  export interface AuthContext {
@@ -61,6 +61,19 @@ export function createAuthRouter(ctx: AuthContext): Router {
61
  if (user.canEdit && token) {
62
  setUserToken(token);
63
  ensurePublishedRestored(token).catch(() => {});
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  }
65
 
66
  res.json({
 
7
  handleOAuthCallback,
8
  handleOAuthLogout,
9
  } from "../auth.js";
10
+ import { setUserToken, ensureDatasetExists } from "../hf-storage.js";
11
  import { ensurePublishedRestored } from "../persistence.js";
12
 
13
  export interface AuthContext {
 
61
  if (user.canEdit && token) {
62
  setUserToken(token);
63
  ensurePublishedRestored(token).catch(() => {});
64
+ // Eagerly try to create the backing dataset on first login
65
+ // (instead of waiting for the first edit + 12s debounce). If
66
+ // creation fails - missing manage-repos scope, org policy
67
+ // blocking the create, etc. - the error lands in the
68
+ // storage-status tracker right away, and the editor's
69
+ // SyncIndicator can flash a red "Cloud storage error" on the
70
+ // first poll. Without this, a misconfigured fork looks
71
+ // perfectly healthy until the user has typed something and
72
+ // waited half a minute, by which point they've already
73
+ // assumed everything is fine.
74
+ ensureDatasetExists(token).catch((err) => {
75
+ console.warn("[auth] eager ensureDatasetExists failed:", (err as Error).message);
76
+ });
77
  }
78
 
79
  res.json({
backend/src/routes/storage.ts ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { Router, type Request, type Response } from "express";
2
+ import { createReadStream, existsSync } from "fs";
3
+ import { extractToken, resolveUser } from "../auth.js";
4
+ import { getStorageStatus } from "../hf-storage.js";
5
+ import { docPath, sanitizeName } from "../utils.js";
6
+
7
+ interface StorageContext {
8
+ oauthEnabled: boolean;
9
+ }
10
+
11
+ /**
12
+ * Routes that surface the state of the persistence pipeline and
13
+ * provide manual escape hatches for disaster recovery.
14
+ *
15
+ * GET /api/storage/status
16
+ * Returns the current pipeline state (datasetReady,
17
+ * lastLocalSaveAt, lastCloudPushAt, pendingPush, lastError).
18
+ * Polled by the SyncIndicator every few seconds. Auth-gated to
19
+ * canEdit so we don't leak error details (which may include
20
+ * dataset ids) to anonymous viewers.
21
+ *
22
+ * GET /api/admin/export-doc?name=<docName>
23
+ * Streams the raw `.yjs` file off disk so an editor can
24
+ * download a snapshot manually. The cardinal use case is
25
+ * "the cloud push has been failing for hours, the container
26
+ * is about to rebuild, I need to get my data out NOW". Same
27
+ * canEdit gate.
28
+ */
29
+ export function createStorageRouter(ctx: StorageContext): Router {
30
+ const router = Router();
31
+
32
+ router.get("/api/storage/status", async (req, res) => {
33
+ if (ctx.oauthEnabled) {
34
+ const token = extractToken(req.headers.cookie);
35
+ const user = await resolveUser(token);
36
+ if (!user || !user.canEdit) {
37
+ res.status(403).json({ error: "Unauthorized" });
38
+ return;
39
+ }
40
+ }
41
+ res.json(getStorageStatus());
42
+ });
43
+
44
+ router.get("/api/admin/export-doc", async (req: Request, res: Response) => {
45
+ if (ctx.oauthEnabled) {
46
+ const token = extractToken(req.headers.cookie);
47
+ const user = await resolveUser(token);
48
+ if (!user || !user.canEdit) {
49
+ res.status(403).json({ error: "Unauthorized" });
50
+ return;
51
+ }
52
+ }
53
+
54
+ // `name` is the doc id; default to the only doc the editor
55
+ // currently supports ("default"). Sanitised before touching FS.
56
+ const rawName = typeof req.query.name === "string" ? req.query.name : "default";
57
+ const safeName = sanitizeName(rawName);
58
+ const path = docPath(rawName);
59
+
60
+ if (!existsSync(path)) {
61
+ res.status(404).json({ error: "No on-disk snapshot for that doc" });
62
+ return;
63
+ }
64
+
65
+ // Force a download with a date-stamped filename so multiple
66
+ // exports don't overwrite each other in the user's downloads
67
+ // folder. ISO date without colons (Windows / macOS Downloads
68
+ // are happier without them).
69
+ const stamp = new Date().toISOString().replace(/[:.]/g, "-");
70
+ res.setHeader("Content-Type", "application/octet-stream");
71
+ res.setHeader(
72
+ "Content-Disposition",
73
+ `attachment; filename="${safeName}-${stamp}.yjs"`,
74
+ );
75
+ res.setHeader("Cache-Control", "no-store");
76
+
77
+ const stream = createReadStream(path);
78
+ stream.on("error", (err) => {
79
+ console.error(`[admin-export] stream error for ${safeName}:`, err.message);
80
+ if (!res.headersSent) res.status(500).end();
81
+ else res.end();
82
+ });
83
+ stream.pipe(res);
84
+ });
85
+
86
+ return router;
87
+ }
docs/SPECIFICATION.md CHANGED
@@ -99,7 +99,17 @@ All types are concurrently editable by multiple users and persist to `data/defau
99
  - **Images**: `images/<uuid-filename>` referenced from articles via `/d/images/...` proxy URLs
100
  - `flushAll()` on `SIGTERM`/`SIGINT` to push pending changes
101
 
102
- ### 4.3.1 Dataset Reverse Proxy (`/d/*`)
 
 
 
 
 
 
 
 
 
 
103
 
104
  Since the dataset is private, anonymous viewers of a published article can't fetch its images / PDF / og:image directly from `huggingface.co/datasets/...`. The editor server exposes `GET /d/:path*` as an authenticated forward-proxy:
105
 
 
99
  - **Images**: `images/<uuid-filename>` referenced from articles via `/d/images/...` proxy URLs
100
  - `flushAll()` on `SIGTERM`/`SIGINT` to push pending changes
101
 
102
+ ### 4.3.1 Storage Status & Recovery
103
+
104
+ The persistence pipeline used to fail silently in multiple places (`createRepo` 403 on a missing scope, `uploadFile` 5xx mid-debounce, `writeFileSync` on a readonly FS, ...) and the editor would happily keep showing "Saved". To make data first-class:
105
+
106
+ - **In-memory tracker** in `hf-storage.ts` records `datasetReady`, `lastLocalSaveAt`, `lastCloudPushAt`, `pendingPush`, `lastError {stage, message, statusCode, at, docName}`. Every write path updates it; every error path records the failure.
107
+ - **`GET /api/storage/status`** exposes the tracker (canEdit-gated). The frontend `SyncIndicator` polls it every 5s and displays a three-state badge: green "Saved" / amber "Saving..." / **red "Storage error"** (pulsing, with the exact reason in the tooltip + actionable hint for the 403 / missing-scope case).
108
+ - **Eager `ensureDatasetExists`** on first `/api/auth/status` for a canEdit user. A misconfigured fork now surfaces its error within ~10s of login instead of waiting for an edit + 12s debounce cycle.
109
+ - **`beforeunload` guard** on the editor: if a local edit is in flight, a push is armed, the WS is offline, or the tracker reports an error, the browser pops the standard "Leave site?" confirm.
110
+ - **`GET /api/admin/export-doc`** (canEdit-gated) streams the on-disk `.yjs` snapshot as a download. The escape hatch for disaster recovery: when the cloud push has been failing and the container is about to rebuild, an admin can grab the doc bytes manually.
111
+
112
+ ### 4.3.2 Dataset Reverse Proxy (`/d/*`)
113
 
114
  Since the dataset is private, anonymous viewers of a published article can't fetch its images / PDF / og:image directly from `huggingface.co/datasets/...`. The editor server exposes `GET /d/:path*` as an authenticated forward-proxy:
115
 
docs/TESTS.md CHANGED
@@ -90,6 +90,17 @@ Data loss is game over for a collaborative editor.
90
  | 2.3.3 | Proxy serves images | Given an uploaded image / When GET `/d/images/<file>` / Then 200 + image bytes (proxy attaches a server-side token to fetch the private HF dataset) | P1 |
91
  | 2.3.4 | Proxy whitelist | Given any path under `/d/articles/...` (raw Y.js drafts) / When GET / Then 404 - never expose drafts via the proxy | P0 |
92
 
 
 
 
 
 
 
 
 
 
 
 
93
  ---
94
 
95
  ## 3. API Routes - HTTP Contracts (P1)
 
90
  | 2.3.3 | Proxy serves images | Given an uploaded image / When GET `/d/images/<file>` / Then 200 + image bytes (proxy attaches a server-side token to fetch the private HF dataset) | P1 |
91
  | 2.3.4 | Proxy whitelist | Given any path under `/d/articles/...` (raw Y.js drafts) / When GET / Then 404 - never expose drafts via the proxy | P0 |
92
 
93
+ ### 2.4 Storage status & disaster recovery
94
+
95
+ | # | Test | Given / When / Then | Priority |
96
+ |---|------|---------------------|----------|
97
+ | 2.4.1 | Status surfaces dataset error | Given `createRepo` returns 403 / When GET /api/storage/status / Then response `lastError.stage === "dataset-create"` with `statusCode: 403` | P0 |
98
+ | 2.4.2 | Status clears on recovery | Given a previous push error / When the next push succeeds / Then `lastError` is null and `lastCloudPushAt` is updated | P1 |
99
+ | 2.4.3 | Status auth-gated | Given an anonymous request (oauthEnabled) / When GET /api/storage/status / Then 403 - don't leak dataset error details | P1 |
100
+ | 2.4.4 | Eager creation on login | Given a successful /api/auth/status with canEdit / When the request completes / Then `ensureDatasetExists` has been attempted (success surfaces within one storage-status poll, failure surfaces too) | P0 |
101
+ | 2.4.5 | Admin export streams .yjs | Given an editor user / When GET /api/admin/export-doc?name=default / Then 200 + `Content-Disposition: attachment` + raw .yjs body | P1 |
102
+ | 2.4.6 | Admin export auth-gated | Given a non-canEdit request / When GET /api/admin/export-doc / Then 403 | P0 |
103
+
104
  ---
105
 
106
  ## 3. API Routes - HTTP Contracts (P1)
frontend/src/components/SyncIndicator.tsx CHANGED
@@ -1,94 +1,97 @@
1
  import { useEffect, useRef, useState } from "react";
2
  import type { Editor as TiptapEditor } from "@tiptap/core";
3
  import type { HocuspocusProvider } from "@hocuspocus/provider";
4
- import { Cloud, CloudOff, Loader2 } from "lucide-react";
5
  import { Tooltip } from "./Tooltip";
6
 
7
- type SyncStatus = "connected" | "syncing" | "saved" | "disconnected";
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  interface Props {
10
  editorInstance: TiptapEditor | null;
11
  providerRef: { current: HocuspocusProvider | null };
12
  }
13
 
14
- /**
15
- * Tiny connection/saving indicator for the editor top bar. Listens to the
16
- * Hocuspocus provider (`connect` / `disconnect` / `synced`) for connection
17
- * status, and to the underlying Yjs document (`update`) for change
18
- * activity.
19
- *
20
- * Why ydoc.on('update') and not editor.on('update'):
21
- * TipTap's `update` event only fires for changes to the prosemirror
22
- * document (the article body). The hero (title/subtitle/authors/...)
23
- * and the article settings live in sibling Y.Maps on the same ydoc,
24
- * and edits there go straight to Yjs without round-tripping through
25
- * TipTap. Listening on the prosemirror editor would therefore miss
26
- * every hero / settings / citation / embed change and make the user
27
- * believe nothing is being saved - even though the backend's
28
- * Hocuspocus `onChange` handler is happily persisting the full ydoc
29
- * on every Yjs update. Subscribing at the Yjs layer fixes that: we
30
- * see every change the backend sees, in the same order it sees them.
31
- */
32
  export function SyncIndicator({ editorInstance: _editorInstance, providerRef }: Props) {
33
- const [status, setStatus] = useState<SyncStatus>("disconnected");
34
- const timerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
35
-
36
- // The provider is created lazily by <Editor> and assigned to
37
- // providerRef AFTER this component mounts. A ref doesn't trigger
38
- // re-renders, so a useEffect with [providerRef] only fires once -
39
- // typically before the provider exists - and would never re-run to
40
- // attach listeners. We poll providerRef.current here until it
41
- // populates, then subscribe and stop polling. Once we hold the
42
- // provider reference, we also seed the initial status from the
43
- // underlying ws layer so we don't get stuck on "disconnected" if
44
- // we missed the very first `connect` event.
45
  useEffect(() => {
46
- let provider: HocuspocusProvider | null = providerRef.current;
47
  let pollId: ReturnType<typeof setInterval> | null = null;
48
 
49
  const attach = (p: HocuspocusProvider) => {
50
- const onConnect = () => setStatus("saved");
51
- const onDisconnect = () => setStatus("disconnected");
52
- const onSynced = () => setStatus("saved");
53
  p.on("connect", onConnect);
54
  p.on("disconnect", onDisconnect);
55
  p.on("synced", onSynced);
56
 
57
- const seedStatus = () => {
58
- const wsProvider = (p as any).configuration?.websocketProvider;
59
- const s = wsProvider?.status;
60
- if (s === "connected") setStatus("saved");
61
- else if (s === "connecting") setStatus("disconnected");
62
- else setStatus("disconnected");
63
  };
64
- seedStatus();
65
- // Connection may still be in flight when we first attach; the
66
- // `connect` event fires once it lands. As a safety net (some
67
- // browsers / HF proxy flows have been observed to skip the
68
- // first event), poll the underlying ws status every second
69
- // and reconcile. Cheap and self-correcting.
70
- const reconcileId = setInterval(seedStatus, 1000);
71
-
72
- // Listen at the Yjs layer for change activity. Hocuspocus
73
- // stores the ydoc on `provider.document` (it's the same Y.Doc
74
- // we passed in when constructing the provider). Filtering out
75
- // updates whose origin is the provider itself avoids flashing
76
- // "Saving..." every time we just RECEIVED collab updates from
77
- // someone else - those don't need to be saved by us, they were
78
- // saved by their author.
79
  const ydoc = (p as any).document as
80
  | { on: Function; off: Function }
81
  | undefined;
82
- const onYUpdate = (_update: Uint8Array, origin: unknown) => {
83
- if (origin === p) return;
84
- setStatus("syncing");
85
- if (timerRef.current) clearTimeout(timerRef.current);
86
- timerRef.current = setTimeout(() => {
87
- const wsProvider = (p as any).configuration?.websocketProvider;
88
- setStatus(
89
- wsProvider?.status === "connected" ? "saved" : "disconnected",
90
- );
91
- }, 1500);
92
  };
93
  ydoc?.on?.("update", onYUpdate);
94
 
@@ -97,26 +100,24 @@ export function SyncIndicator({ editorInstance: _editorInstance, providerRef }:
97
  p.off("disconnect", onDisconnect);
98
  p.off("synced", onSynced);
99
  ydoc?.off?.("update", onYUpdate);
100
- clearInterval(reconcileId);
101
  };
102
  };
103
 
104
- if (provider) {
105
- return attach(provider);
106
- }
107
-
108
- // Provider not ready yet - wait for it.
109
  let cleanup: (() => void) | null = null;
110
- pollId = setInterval(() => {
111
- const p = providerRef.current;
112
- if (!p) return;
113
- if (pollId) {
114
- clearInterval(pollId);
115
- pollId = null;
116
- }
117
- provider = p;
118
- cleanup = attach(p);
119
- }, 100);
 
 
 
120
 
121
  return () => {
122
  if (pollId) clearInterval(pollId);
@@ -124,27 +125,150 @@ export function SyncIndicator({ editorInstance: _editorInstance, providerRef }:
124
  };
125
  }, [providerRef]);
126
 
127
- const label =
128
- status === "saved" ? "Saved" :
129
- status === "syncing" ? "Saving..." :
130
- status === "connected" ? "Connected" :
131
- "Offline";
 
 
 
 
 
132
 
133
- const tooltip =
134
- status === "saved" ? "All changes saved" :
135
- status === "syncing" ? "Saving..." :
136
- status === "connected" ? "Connected" :
137
- "Disconnected";
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
  return (
140
  <Tooltip title={tooltip}>
141
- <span className={`sync-indicator sync-indicator--${status}`}>
142
- {status === "syncing" && <Loader2 size={14} className="spin" />}
143
- {status === "saved" && <Cloud size={14} />}
144
- {status === "connected" && <Cloud size={14} />}
145
- {status === "disconnected" && <CloudOff size={14} />}
 
146
  <span className="sync-indicator__label">{label}</span>
147
  </span>
148
  </Tooltip>
149
  );
150
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import { useEffect, useRef, useState } from "react";
2
  import type { Editor as TiptapEditor } from "@tiptap/core";
3
  import type { HocuspocusProvider } from "@hocuspocus/provider";
4
+ import { Cloud, CloudOff, AlertTriangle, Loader2 } from "lucide-react";
5
  import { Tooltip } from "./Tooltip";
6
 
7
+ /**
8
+ * Server-side persistence pipeline state, as returned by
9
+ * `GET /api/storage/status`. Mirrors the shape of `StorageStatus`
10
+ * in `backend/src/hf-storage.ts` - if you add a field there, add
11
+ * it here too.
12
+ */
13
+ interface StorageStatus {
14
+ enabled: boolean;
15
+ datasetId: string;
16
+ datasetReady: boolean;
17
+ lastLocalSaveAt: number | null;
18
+ lastCloudPushAt: number | null;
19
+ pendingPush: boolean;
20
+ lastError: {
21
+ stage: "dataset-create" | "local-save" | "cloud-push";
22
+ message: string;
23
+ statusCode?: number;
24
+ at: number;
25
+ docName?: string;
26
+ } | null;
27
+ }
28
+
29
+ /**
30
+ * What the user sees in the top bar, derived from BOTH the WS
31
+ * connection state AND the server-side persistence pipeline.
32
+ *
33
+ * Severity ordering (worst first): error > offline > pending > saved.
34
+ * The displayed status is always the worst applicable signal, so
35
+ * a green "Saved" never wins over a red "Sync failed".
36
+ */
37
+ type DisplayStatus =
38
+ | "saved" // WS connected + dataset ready + no error + recent push
39
+ | "pending" // edit in flight or push timer armed
40
+ | "offline" // WS disconnected (network or container restart)
41
+ | "error"; // backend reports lastError in the pipeline
42
 
43
  interface Props {
44
  editorInstance: TiptapEditor | null;
45
  providerRef: { current: HocuspocusProvider | null };
46
  }
47
 
48
+ const POLL_MS = 5_000;
49
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  export function SyncIndicator({ editorInstance: _editorInstance, providerRef }: Props) {
51
+ const [wsConnected, setWsConnected] = useState(false);
52
+ const [hasLocalEdit, setHasLocalEdit] = useState(false);
53
+ const [serverStatus, setServerStatus] = useState<StorageStatus | null>(null);
54
+ const editTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
55
+
56
+ // ---- WS layer: connection + edit activity ------------------------------
57
+ // Same lazy-provider polling pattern as the previous version: the
58
+ // provider is created by <Editor> AFTER this component mounts, so a
59
+ // useEffect([providerRef]) alone would never re-fire when it lands.
 
 
 
60
  useEffect(() => {
 
61
  let pollId: ReturnType<typeof setInterval> | null = null;
62
 
63
  const attach = (p: HocuspocusProvider) => {
64
+ const onConnect = () => setWsConnected(true);
65
+ const onDisconnect = () => setWsConnected(false);
66
+ const onSynced = () => setWsConnected(true);
67
  p.on("connect", onConnect);
68
  p.on("disconnect", onDisconnect);
69
  p.on("synced", onSynced);
70
 
71
+ // Seed + 1s reconcile loop (some HF proxies eat the first
72
+ // `connect` event; without this we'd stay "Offline" forever).
73
+ const seed = () => {
74
+ const ws = (p as any).configuration?.websocketProvider;
75
+ setWsConnected(ws?.status === "connected");
 
76
  };
77
+ seed();
78
+ const reconcile = setInterval(seed, 1000);
79
+
80
+ // Listen at the Yjs layer so we see EVERY change (TipTap's
81
+ // own `update` event misses hero/settings/citation edits
82
+ // because those bypass prosemirror).
 
 
 
 
 
 
 
 
 
83
  const ydoc = (p as any).document as
84
  | { on: Function; off: Function }
85
  | undefined;
86
+ const onYUpdate = (_u: Uint8Array, origin: unknown) => {
87
+ if (origin === p) return; // remote update, not ours
88
+ setHasLocalEdit(true);
89
+ if (editTimerRef.current) clearTimeout(editTimerRef.current);
90
+ // Local edit "settles" 1.5s after the last keystroke; this
91
+ // is also roughly when the backend's `debouncedSave` fires
92
+ // (2s), so the indicator briefly flashes pending then
93
+ // recovers to saved/error based on the next poll.
94
+ editTimerRef.current = setTimeout(() => setHasLocalEdit(false), 1500);
 
95
  };
96
  ydoc?.on?.("update", onYUpdate);
97
 
 
100
  p.off("disconnect", onDisconnect);
101
  p.off("synced", onSynced);
102
  ydoc?.off?.("update", onYUpdate);
103
+ clearInterval(reconcile);
104
  };
105
  };
106
 
 
 
 
 
 
107
  let cleanup: (() => void) | null = null;
108
+ if (providerRef.current) {
109
+ cleanup = attach(providerRef.current);
110
+ } else {
111
+ pollId = setInterval(() => {
112
+ const p = providerRef.current;
113
+ if (!p) return;
114
+ if (pollId) {
115
+ clearInterval(pollId);
116
+ pollId = null;
117
+ }
118
+ cleanup = attach(p);
119
+ }, 100);
120
+ }
121
 
122
  return () => {
123
  if (pollId) clearInterval(pollId);
 
125
  };
126
  }, [providerRef]);
127
 
128
+ // ---- Server pipeline polling -------------------------------------------
129
+ // Cheap GET every 5s. The backend tracker updates in-process on
130
+ // every save/push/error so the worst-case latency for surfacing
131
+ // a problem is one poll interval. We don't use SSE/WS for this
132
+ // because the data is tiny, the polling interval is generous,
133
+ // and adding another long-lived connection is more failure modes
134
+ // than it's worth.
135
+ useEffect(() => {
136
+ let cancelled = false;
137
+ let timer: ReturnType<typeof setTimeout> | null = null;
138
 
139
+ const poll = async () => {
140
+ try {
141
+ const res = await fetch("/api/storage/status", {
142
+ credentials: "include",
143
+ });
144
+ if (cancelled) return;
145
+ if (res.ok) {
146
+ const data = (await res.json()) as StorageStatus;
147
+ setServerStatus(data);
148
+ } else if (res.status === 403) {
149
+ // Viewer (not an editor) - storage status isn't relevant
150
+ // to them. Stop polling.
151
+ return;
152
+ }
153
+ } catch {
154
+ // Network blip - keep trying. The WS disconnection will
155
+ // dominate the UI anyway in that case.
156
+ }
157
+ if (!cancelled) {
158
+ timer = setTimeout(poll, POLL_MS);
159
+ }
160
+ };
161
+ poll();
162
+
163
+ return () => {
164
+ cancelled = true;
165
+ if (timer) clearTimeout(timer);
166
+ };
167
+ }, []);
168
+
169
+ // ---- Derive the displayed status ---------------------------------------
170
+ // Worst-applicable wins (see DisplayStatus jsdoc).
171
+ const status: DisplayStatus = (() => {
172
+ if (serverStatus?.lastError) return "error";
173
+ if (!wsConnected) return "offline";
174
+ if (hasLocalEdit || serverStatus?.pendingPush) return "pending";
175
+ return "saved";
176
+ })();
177
+
178
+ // ---- beforeunload guard ------------------------------------------------
179
+ // If there's an unsynced local edit OR a pending push OR a known
180
+ // sync error, browsers should pop the standard "Leave site?"
181
+ // confirmation. The exact message is ignored by modern browsers
182
+ // (Chrome/Safari/Firefox show their own generic copy) but
183
+ // setting `returnValue` is what triggers the prompt.
184
+ useEffect(() => {
185
+ const needsGuard = status === "pending" || status === "error" || status === "offline";
186
+ if (!needsGuard) return;
187
+
188
+ const handler = (e: BeforeUnloadEvent) => {
189
+ e.preventDefault();
190
+ // Legacy browsers (and TS types still hold this) want a string.
191
+ e.returnValue = "";
192
+ return "";
193
+ };
194
+ window.addEventListener("beforeunload", handler);
195
+ return () => window.removeEventListener("beforeunload", handler);
196
+ }, [status]);
197
+
198
+ // ---- Render ------------------------------------------------------------
199
+ const { icon, label, tooltip } = renderState(status, serverStatus);
200
 
201
  return (
202
  <Tooltip title={tooltip}>
203
+ <span
204
+ className={`sync-indicator sync-indicator--${status}`}
205
+ role="status"
206
+ aria-live="polite"
207
+ >
208
+ {icon}
209
  <span className="sync-indicator__label">{label}</span>
210
  </span>
211
  </Tooltip>
212
  );
213
  }
214
+
215
+ function renderState(status: DisplayStatus, server: StorageStatus | null) {
216
+ switch (status) {
217
+ case "error": {
218
+ const err = server?.lastError;
219
+ const stageLabel: Record<string, string> = {
220
+ "dataset-create": "Cloud storage setup failed",
221
+ "local-save": "Local save failed",
222
+ "cloud-push": "Cloud sync failed",
223
+ };
224
+ const label = err ? stageLabel[err.stage] ?? "Storage error" : "Storage error";
225
+ const hint = err?.statusCode === 403
226
+ ? " - your OAuth grant may be missing the `manage-repos` scope. Sign out and back in."
227
+ : "";
228
+ const tooltip = err
229
+ ? `${label}: ${err.message}${hint}`
230
+ : "Storage error";
231
+ return {
232
+ icon: <AlertTriangle size={14} />,
233
+ label,
234
+ tooltip,
235
+ };
236
+ }
237
+ case "offline":
238
+ return {
239
+ icon: <CloudOff size={14} />,
240
+ label: "Offline",
241
+ tooltip: "Disconnected - reconnecting...",
242
+ };
243
+ case "pending":
244
+ return {
245
+ icon: <Loader2 size={14} className="spin" />,
246
+ label: "Saving...",
247
+ tooltip: server?.datasetReady === false
248
+ ? "Saving locally - cloud sync starts after first successful dataset creation"
249
+ : "Saving to cloud...",
250
+ };
251
+ case "saved":
252
+ default: {
253
+ const last = server?.lastCloudPushAt;
254
+ const tooltip = last
255
+ ? `All changes saved · last cloud sync ${formatRelative(last)}`
256
+ : server?.datasetReady
257
+ ? "All changes saved"
258
+ : "Saved locally - cloud sync will start on first change";
259
+ return {
260
+ icon: <Cloud size={14} />,
261
+ label: "Saved",
262
+ tooltip,
263
+ };
264
+ }
265
+ }
266
+ }
267
+
268
+ function formatRelative(ts: number): string {
269
+ const diff = Date.now() - ts;
270
+ if (diff < 5_000) return "just now";
271
+ if (diff < 60_000) return `${Math.round(diff / 1000)}s ago`;
272
+ if (diff < 3_600_000) return `${Math.round(diff / 60_000)}min ago`;
273
+ return new Date(ts).toLocaleTimeString();
274
+ }
frontend/src/styles/_ui.css CHANGED
@@ -1359,9 +1359,21 @@ dialog.ed-dialog.ed-dialog--author { max-width: 480px; }
1359
  }
1360
  .sync-indicator__label { white-space: nowrap; }
1361
  .sync-indicator--saved { color: var(--ed-text-disabled); opacity: 0.7; }
1362
- .sync-indicator--syncing { color: var(--ed-accent, #958df1); opacity: 1; }
1363
- .sync-indicator--connected { color: var(--ed-text-disabled); opacity: 0.5; }
1364
- .sync-indicator--disconnected { color: #e15759; opacity: 1; }
 
 
 
 
 
 
 
 
 
 
 
 
1365
  .sync-indicator .spin {
1366
  animation: sync-spin 1s linear infinite;
1367
  }
 
1359
  }
1360
  .sync-indicator__label { white-space: nowrap; }
1361
  .sync-indicator--saved { color: var(--ed-text-disabled); opacity: 0.7; }
1362
+ .sync-indicator--pending { color: var(--ed-accent, #958df1); opacity: 1; }
1363
+ .sync-indicator--offline { color: #e15759; opacity: 1; }
1364
+ /* `error` is the loudest state - bright red, slight bg tint, and
1365
+ * a pulsing dot to scream "your data isn't safe". The other
1366
+ * states stay flat to avoid attention-seeking noise. */
1367
+ .sync-indicator--error {
1368
+ color: #e15759;
1369
+ background: color-mix(in srgb, #e15759 12%, transparent);
1370
+ opacity: 1;
1371
+ animation: sync-indicator-pulse 2s ease-in-out infinite;
1372
+ }
1373
+ @keyframes sync-indicator-pulse {
1374
+ 0%, 100% { background: color-mix(in srgb, #e15759 12%, transparent); }
1375
+ 50% { background: color-mix(in srgb, #e15759 22%, transparent); }
1376
+ }
1377
  .sync-indicator .spin {
1378
  animation: sync-spin 1s linear infinite;
1379
  }