tfrere HF Staff commited on
Commit
3557bec
·
verified ·
1 Parent(s): 50f7a20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -175
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Research Article Template Editor
3
  emoji: ✏️
4
  colorFrom: purple
5
  colorTo: blue
@@ -11,177 +11,3 @@ hf_oauth_scopes:
11
  - manage-repos
12
  - inference-api
13
  ---
14
-
15
- # Research Article Template Editor
16
-
17
- A collaborative, real-time editor for web-native scientific articles. It lets multiple authors co-write a paper with rich text, math, citations, figures and interactive D3 embeds, then publishes the result as a static HTML page (or a PDF) aligned with the [research-article-template](https://github.com/huggingface/research-article-template).
18
-
19
- ## What it gives you
20
-
21
- - **Real-time collaboration** over WebSocket (Y.js + Hocuspocus), with visible cursors and per-user selection colors
22
- - **Rich article authoring**: headings, lists, tables, code blocks with syntax highlighting, LaTeX math (KaTeX), footnotes, sidenotes, block quotes, callouts
23
- - **Research-specific blocks**: citations + bibliography (BibTeX), figures with captions, stacks / wide / full-width layouts, glossary terms, Mermaid/Wardley/architecture diagrams
24
- - **Interactive D3 embeds** authored inline: each embed is a self-contained HTML file the editor can generate and iterate on via an **AI-assisted "embed studio"**
25
- - **Comments & discussion** anchored on any selection
26
- - **Slash menu** (`/`) and drag/drop block handles, in the spirit of Notion
27
- - **Click-to-edit frontmatter**: title, subtitle, authors, affiliations, links, banner color
28
- - **Publishing pipeline**: one-click export to a standalone static HTML bundle, plus PDF generation (Puppeteer) and an `llms.txt` Markdown twin for LLM agents/crawlers (served at `/llms.txt`, advertised in `/robots.txt`)
29
- - **Persistence**:
30
- - Local mode: documents stored on disk under `DATA_DIR`
31
- - HF mode: documents pushed/pulled from a Hugging Face dataset via OAuth
32
- - **Dark mode**, responsive layout (TOC drawer on mobile), live table of contents with scroll-spy
33
- - **AI chat side-panel** that can edit the article via structured tool calls (agent loop over the current TipTap doc)
34
-
35
- ## Stack
36
-
37
- | Layer | Tech |
38
- |---|---|
39
- | Editor | React 18, TypeScript, TipTap v3, ProseMirror |
40
- | Collaboration | Y.js, Hocuspocus (WebSocket), y-tiptap |
41
- | Backend | Node.js, Express, Vite (dev proxy), Hocuspocus server |
42
- | Publishing | Custom TipTap-JSON → HTML renderer, Puppeteer for PDF |
43
- | AI | Vercel AI SDK v6 (`ai`, `@ai-sdk/react`) → Hugging Face Inference Providers (OpenAI-compatible router) |
44
- | Styling | Plain CSS with custom properties, no framework |
45
- | Storage | Local FS or Hugging Face datasets (via `@huggingface/hub`) |
46
- | Container | Single-image Docker build, runs on port 8080 |
47
-
48
- Around **3.6k LOC backend** and **9.5k LOC frontend** (TypeScript/TSX, excluding generated code).
49
-
50
- ## Repo layout
51
-
52
- ```
53
- collab-editor/
54
- ├── backend/ # Express + Hocuspocus server, publisher, AI agent routes
55
- │ └── src/
56
- │ ├── server.ts # Entry point
57
- │ ├── create-app.ts # App factory (routes, middleware, Hocuspocus)
58
- │ ├── publisher/ # TipTap-JSON → HTML + PDF
59
- │ ├── agent/ # LLM agent (tool calls over the doc)
60
- │ ├── shared/ # Component defs shared with the frontend
61
- │ └── hf-storage.ts # HF dataset sync
62
- ├── frontend/ # Vite + React + TipTap editor
63
- │ └── src/
64
- │ ├── App.tsx # Top-level shell
65
- │ ├── editor/ # TipTap editor + extensions + components
66
- │ ├── components/ # Shared UI pieces (TOC, Chat, Dialog, ...)
67
- │ ├── hooks/ # React hooks (agent chat, selection, ...)
68
- │ ├── styles/ # CSS layers (see docs/ARCHITECTURE.md)
69
- │ └── utils/
70
- ├── docs/
71
- │ ├── ARCHITECTURE.md # Deep dive on layers, data flow, CSS
72
- │ ├── SPECIFICATION.md # Feature spec and contracts
73
- │ ├── TESTS.md # Testing strategy
74
- │ └── embed-studio.md # How the AI-authored embeds pipeline works
75
- └── Dockerfile # Production multi-stage build
76
- ```
77
-
78
- See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for a diagram and the full tour.
79
-
80
- ## Getting started
81
-
82
- ### Prerequisites
83
-
84
- - Node.js 20+
85
- - A Hugging Face token with the `Make calls to Inference Providers` permission for the AI features (embed studio, chat agent). Generate one at https://huggingface.co/settings/tokens. On a HF Space the logged-in user's OAuth token is used instead - no manual setup needed.
86
- - A Hugging Face OAuth app (client id/secret) if you want login + HF dataset persistence
87
-
88
- ### Local development
89
-
90
- Backend and frontend run as two separate processes in dev (Vite proxies `/api`, `/collab`, `/uploads`, `/published`, `/oauth`, `/auth` to the backend).
91
-
92
- ```bash
93
- # terminal 1 — backend (Express + Hocuspocus on :8080)
94
- cd backend
95
- cp .env.example .env # set HF_TOKEN, optional OAUTH_* and HF_DATASET_ID
96
- npm install
97
- npm run dev
98
-
99
- # terminal 2 — frontend (Vite on :5678)
100
- cd frontend
101
- npm install
102
- npm run dev
103
- ```
104
-
105
- Then open http://localhost:5678. Open a second tab or browser to see collaboration in action.
106
-
107
- ### Production (Docker / HF Spaces)
108
-
109
- The `Dockerfile` builds both frontend and backend into a single image listening on port 8080. This is the image used by the Hugging Face Space.
110
-
111
- ```bash
112
- docker build -t collab-editor .
113
- docker run -p 8080:8080 --env-file backend/.env collab-editor
114
- ```
115
-
116
- Then open http://localhost:8080.
117
-
118
- ### Run your own copy on a Hugging Face Space
119
-
120
- Want your own editor? One step:
121
-
122
- 1. **Duplicate the Space.** On https://huggingface.co/spaces/tfrere/research-article-template-editor, click `⋯ → Duplicate this Space`. Pick your namespace and visibility. HF copies the Dockerfile, the OAuth wiring and rebuilds the image automatically.
123
-
124
- That's it. No API key to wire up. The AI features (chat agent + embed studio) call **Hugging Face Inference Providers** at `https://router.huggingface.co/v1` using the OAuth token of whoever is currently logged in. As long as your duplicated Space requests the `inference-api` scope (already declared in the README frontmatter as `hf_oauth_scopes`), every editor gets AI for free under their own Inference Providers quota.
125
-
126
- Optional public variable: `HF_INFERENCE_MODEL` (e.g. `meta-llama/Llama-3.3-70B-Instruct`) to override the default model id. The full list of supported chat-completion models lives at https://huggingface.co/models?inference_provider=all&other=conversational.
127
-
128
- ## Scripts
129
-
130
- ### Backend (`cd backend`)
131
-
132
- | Command | What it does |
133
- |---|---|
134
- | `npm run dev` | Start Express + Hocuspocus in watch mode |
135
- | `npm run build` | Compile TypeScript to `dist/` |
136
- | `npm start` | Run the compiled server |
137
- | `npm run test` | Unit + integration tests (Vitest) |
138
- | `npm run test:e2e` | End-to-end tests (Playwright) |
139
-
140
- ### Frontend (`cd frontend`)
141
-
142
- | Command | What it does |
143
- |---|---|
144
- | `npm run dev` | Start Vite dev server on :5678 |
145
- | `npm run build` | Production bundle to `dist/` |
146
- | `npm run preview` | Preview the built bundle |
147
- | `npm run test` | Unit tests (Vitest) |
148
- | `npm run typecheck` | `tsc --noEmit` on the whole frontend |
149
-
150
- ## Environment variables
151
-
152
- Copy `backend/.env.example` to `backend/.env` and fill the relevant values. Key ones:
153
-
154
- | Variable | Purpose |
155
- |---|---|
156
- | `OAUTH_CLIENT_ID` / `OAUTH_CLIENT_SECRET` | HF OAuth app for user login (required to edit when running on a Space) |
157
- | `OAUTH_SCOPES` | OAuth scopes (default `openid profile`). Add `manage-repos` for dataset persistence and `inference-api` to power the AI features with the user's token |
158
- | `HF_TOKEN` | Server-side Hugging Face token. Used as a fallback when no user OAuth token is present (e.g. local dev). Needs the `Make calls to Inference Providers` permission to enable the chat agent + embed studio |
159
- | `HF_INFERENCE_MODEL` | Override the default chat-completion model id (defaults to `openai/gpt-oss-120b`). Any tool-calling-capable model exposed by HF Inference Providers works |
160
- | `HF_DATASET_ID` | Target HF dataset repo for document persistence (when not running on a Space) |
161
- | `SPACE_ID` / `SPACE_HOST` | Auto-set by HF Spaces; drive dataset id + secure cookies in production |
162
- | `DATA_DIR` | Where documents, uploads and published bundles are stored on disk (default: `./data`) |
163
- | `PUBLISH_BASE_URL` | Absolute base URL used when publishing (defaults to `http://127.0.0.1:${PORT}`) |
164
- | `ENABLE_PDF` | Set to `false` to disable Playwright-based PDF export |
165
- | `PORT` | Server port (default 8080) |
166
-
167
- ## Testing
168
-
169
- - **Backend unit tests**: Vitest covers the publisher (HTML renderer, frontmatter, bibliography), storage, auth utilities.
170
- - **Backend E2E**: Playwright drives the full editor against a real backend.
171
- - **Frontend unit tests**: Vitest covers chat persistence and a handful of utilities.
172
- - **Type checking**: `npm run typecheck` in both workspaces.
173
-
174
- See [`docs/TESTS.md`](docs/TESTS.md) for the current strategy and gaps.
175
-
176
- ## Known technical debt
177
-
178
- These are tracked explicitly so new contributors don't trip on them:
179
-
180
- - **`useEmbedChat` still lacks dedicated unit tests**; the rest of the stores (frontmatter, comments, embeds) and the agent undo batching primitive are now covered.
181
- - **Bundle size warning**: the frontend bundle is over the 500 kB Vite warning threshold. Code-splitting the Mermaid / KaTeX / D3 stacks via dynamic imports would help.
182
- - **`addToolOutput` typing**: the ai-sdk v6 `ChatAddToolOutputFunction` is a generic over the tool name union. We currently cast to a plain signature at the two call sites because we don't export a typed tool registry yet.
183
- - **`backend/src/publisher/html-renderer.ts` is ~1000 LOC**: a per-node-type registry would make it more maintainable.
184
-
185
- ## License
186
-
187
- Follow the upstream [research-article-template](https://github.com/huggingface/research-article-template) license.
 
1
  ---
2
+ title: Carbon tokenization
3
  emoji: ✏️
4
  colorFrom: purple
5
  colorTo: blue
 
11
  - manage-repos
12
  - inference-api
13
  ---