tfrere HF Staff Cursor commited on
Commit
6b6afea
·
1 Parent(s): 3de227d

feat(agent): switch AI backend from OpenRouter to HF Inference Providers

Browse files

The chat panel and embed studio now call Hugging Face Inference Providers
(OpenAI-compatible router at https://router.huggingface.co/v1) via
@ai-sdk/openai-compatible, using the logged-in editor's OAuth token when
available and falling back to HF_TOKEN. Forking the Space no longer
requires wiring an OpenRouter API key - the user's own HF token funds
their own inference.

- Replace @openrouter/ai-sdk-provider with @ai-sdk/openai-compatible
- Forward the OAuth cookie token per request, fallback to HF_TOKEN
- Refresh AVAILABLE_MODELS to HF-served, tool-calling-capable models
(gpt-oss 120B/20B, Llama 3.3 70B, Qwen3 Coder 480B, DeepSeek V3.1)
- Add inference-api to hf_oauth_scopes and document the new env vars
(HF_INFERENCE_MODEL, expanded HF_TOKEN role) in README + SPECIFICATION
- Update agent-chat tests to expect the new "Hugging Face token" error

Co-authored-by: Cursor <cursoragent@cursor.com>

README.md CHANGED
@@ -9,6 +9,7 @@ pinned: false
9
  hf_oauth: true
10
  hf_oauth_scopes:
11
  - manage-repos
 
12
  ---
13
 
14
  # Research Article Template Editor
@@ -41,7 +42,7 @@ A collaborative, real-time editor for web-native scientific articles. It lets mu
41
  | Collaboration | Y.js, Hocuspocus (WebSocket), y-tiptap |
42
  | Backend | Node.js, Express, Vite (dev proxy), Hocuspocus server |
43
  | Publishing | Custom TipTap-JSON → HTML renderer, Puppeteer for PDF |
44
- | AI | Vercel AI SDK v6 (`ai`, `@ai-sdk/react`), streaming tool calls |
45
  | Styling | Plain CSS with custom properties, no framework |
46
  | Storage | Local FS or Hugging Face datasets (via `@huggingface/hub`) |
47
  | Container | Single-image Docker build, runs on port 8080 |
@@ -83,7 +84,7 @@ See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for a diagram and the full to
83
  ### Prerequisites
84
 
85
  - Node.js 20+
86
- - An OpenRouter API key if you want the AI features (embed studio, chat agent)
87
  - A Hugging Face OAuth app (client id/secret) if you want login + HF dataset persistence
88
 
89
  ### Local development
@@ -93,7 +94,7 @@ Backend and frontend run as two separate processes in dev (Vite proxies `/api`,
93
  ```bash
94
  # terminal 1 — backend (Express + Hocuspocus on :8080)
95
  cd backend
96
- cp .env.example .env # fill in OPENROUTER_API_KEY, OAUTH_*, HF_TOKEN, ...
97
  npm install
98
  npm run dev
99
 
@@ -118,13 +119,13 @@ Then open http://localhost:8080.
118
 
119
  ### Run your own copy on a Hugging Face Space
120
 
121
- Want your own editor with your own AI key? Three steps:
122
 
123
  1. **Duplicate the Space.** On https://huggingface.co/spaces/tfrere/research-article-template-editor, click `⋯ → Duplicate this Space`. Pick your namespace and visibility. HF copies the Dockerfile, the OAuth wiring and rebuilds the image automatically.
124
- 2. **Get an OpenRouter API key.** Sign up at https://openrouter.ai and create a key under https://openrouter.ai/keys. The chat agent and the embed studio call OpenRouter through the [Vercel AI SDK](https://ai-sdk.dev/), so any model exposed by OpenRouter works (defaults to `anthropic/claude-sonnet-4`).
125
- 3. **Add the key as a Space secret.** In your duplicated Space, go to `Settings → Variables and secrets → New secret`, name it `OPENROUTER_API_KEY` and paste the value. Optional: add `OPENROUTER_MODEL` as a public variable to override the default model. Save - the Space restarts and the AI features light up.
126
 
127
- That's it. The HF OAuth app, the persistence dataset (`<your-space>-data`) and the Docker build are wired automatically by HF Spaces; you only need the OpenRouter key to unlock the AI side.
 
 
128
 
129
  ## Scripts
130
 
@@ -155,10 +156,9 @@ Copy `backend/.env.example` to `backend/.env` and fill the relevant values. Key
155
  | Variable | Purpose |
156
  |---|---|
157
  | `OAUTH_CLIENT_ID` / `OAUTH_CLIENT_SECRET` | HF OAuth app for user login (required to edit when running on a Space) |
158
- | `OAUTH_SCOPES` | OAuth scopes (default `openid profile`) |
159
- | `OPENROUTER_API_KEY` | API key used by the AI agent (chat panel + embed studio) |
160
- | `OPENROUTER_MODEL` | Override the default OpenRouter model id |
161
- | `HF_TOKEN` | Server-side Hugging Face token (fallback when no user OAuth token is present) |
162
  | `HF_DATASET_ID` | Target HF dataset repo for document persistence (when not running on a Space) |
163
  | `SPACE_ID` / `SPACE_HOST` | Auto-set by HF Spaces; drive dataset id + secure cookies in production |
164
  | `DATA_DIR` | Where documents, uploads and published bundles are stored on disk (default: `./data`) |
 
9
  hf_oauth: true
10
  hf_oauth_scopes:
11
  - manage-repos
12
+ - inference-api
13
  ---
14
 
15
  # Research Article Template Editor
 
42
  | Collaboration | Y.js, Hocuspocus (WebSocket), y-tiptap |
43
  | Backend | Node.js, Express, Vite (dev proxy), Hocuspocus server |
44
  | Publishing | Custom TipTap-JSON → HTML renderer, Puppeteer for PDF |
45
+ | AI | Vercel AI SDK v6 (`ai`, `@ai-sdk/react`) Hugging Face Inference Providers (OpenAI-compatible router) |
46
  | Styling | Plain CSS with custom properties, no framework |
47
  | Storage | Local FS or Hugging Face datasets (via `@huggingface/hub`) |
48
  | Container | Single-image Docker build, runs on port 8080 |
 
84
  ### Prerequisites
85
 
86
  - Node.js 20+
87
+ - A Hugging Face token with the `Make calls to Inference Providers` permission for the AI features (embed studio, chat agent). Generate one at https://huggingface.co/settings/tokens. On a HF Space the logged-in user's OAuth token is used instead - no manual setup needed.
88
  - A Hugging Face OAuth app (client id/secret) if you want login + HF dataset persistence
89
 
90
  ### Local development
 
94
  ```bash
95
  # terminal 1 — backend (Express + Hocuspocus on :8080)
96
  cd backend
97
+ cp .env.example .env # set HF_TOKEN, optional OAUTH_* and HF_DATASET_ID
98
  npm install
99
  npm run dev
100
 
 
119
 
120
  ### Run your own copy on a Hugging Face Space
121
 
122
+ Want your own editor? One step:
123
 
124
  1. **Duplicate the Space.** On https://huggingface.co/spaces/tfrere/research-article-template-editor, click `⋯ → Duplicate this Space`. Pick your namespace and visibility. HF copies the Dockerfile, the OAuth wiring and rebuilds the image automatically.
 
 
125
 
126
+ That's it. No API key to wire up. The AI features (chat agent + embed studio) call **Hugging Face Inference Providers** at `https://router.huggingface.co/v1` using the OAuth token of whoever is currently logged in. As long as your duplicated Space requests the `inference-api` scope (already declared in the README frontmatter as `hf_oauth_scopes`), every editor gets AI for free under their own Inference Providers quota.
127
+
128
+ Optional public variable: `HF_INFERENCE_MODEL` (e.g. `meta-llama/Llama-3.3-70B-Instruct`) to override the default model id. The full list of supported chat-completion models lives at https://huggingface.co/models?inference_provider=all&other=conversational.
129
 
130
  ## Scripts
131
 
 
156
  | Variable | Purpose |
157
  |---|---|
158
  | `OAUTH_CLIENT_ID` / `OAUTH_CLIENT_SECRET` | HF OAuth app for user login (required to edit when running on a Space) |
159
+ | `OAUTH_SCOPES` | OAuth scopes (default `openid profile`). Add `manage-repos` for dataset persistence and `inference-api` to power the AI features with the user's token |
160
+ | `HF_TOKEN` | Server-side Hugging Face token. Used as a fallback when no user OAuth token is present (e.g. local dev). Needs the `Make calls to Inference Providers` permission to enable the chat agent + embed studio |
161
+ | `HF_INFERENCE_MODEL` | Override the default chat-completion model id (defaults to `openai/gpt-oss-120b`). Any tool-calling-capable model exposed by HF Inference Providers works |
 
162
  | `HF_DATASET_ID` | Target HF dataset repo for document persistence (when not running on a Space) |
163
  | `SPACE_ID` / `SPACE_HOST` | Auto-set by HF Spaces; drive dataset id + secure cookies in production |
164
  | `DATA_DIR` | Where documents, uploads and published bundles are stored on disk (default: `./data`) |
backend/.env.example CHANGED
@@ -35,10 +35,13 @@
35
  OAUTH_CLIENT_ID=
36
  OAUTH_CLIENT_SECRET=
37
 
38
- # Space-scoped OAuth requires "manage-repos" to read/write the dataset that
39
- # backs persistence. Defaults to "openid profile" when unset, which is enough
40
- # for login-only flows but cannot persist documents to a HF dataset.
41
- # OAUTH_SCOPES=openid profile manage-repos
 
 
 
42
 
43
  # -----------------------------------------------------------------------------
44
  # HF Space context (auto-injected by HF Spaces, set manually for local dev)
@@ -64,24 +67,28 @@ OAUTH_CLIENT_SECRET=
64
  # HF_DATASET_ID=
65
 
66
  # Server-side fallback HF token. Used when no user OAuth token is present yet
67
- # (e.g. before the first login). Optional - the editor caches the last
68
- # authenticated user's OAuth token for background dataset writes.
69
- # Generate one at https://huggingface.co/settings/tokens with "Write" scope.
 
 
 
 
 
70
  # HF_TOKEN=
71
 
72
  # -----------------------------------------------------------------------------
73
  # AI features (chat panel + embed studio)
74
  # -----------------------------------------------------------------------------
75
- # Required to enable the AI assistant. The chat panel and embed studio are
76
- # disabled silently when OPENROUTER_API_KEY is unset.
77
- # Get a key at https://openrouter.ai/keys
78
-
79
- OPENROUTER_API_KEY=
80
-
81
- # Override the default model id used by the chat agent. The list of supported
82
- # models is in backend/src/agent/chat.ts (AVAILABLE_MODELS).
83
- # Defaults to "anthropic/claude-sonnet-4".
84
- # OPENROUTER_MODEL=anthropic/claude-sonnet-4
85
 
86
  # -----------------------------------------------------------------------------
87
  # Publishing
 
35
  OAUTH_CLIENT_ID=
36
  OAUTH_CLIENT_SECRET=
37
 
38
+ # Space-scoped OAuth needs:
39
+ # - "manage-repos" to read/write the dataset that backs persistence
40
+ # - "inference-api" so the user's OAuth token can call HF Inference
41
+ # Providers (powers the chat panel + embed studio)
42
+ # Defaults to "openid profile" when unset, which is enough for login-only
43
+ # flows but disables AI features and dataset persistence.
44
+ # OAUTH_SCOPES=openid profile manage-repos inference-api
45
 
46
  # -----------------------------------------------------------------------------
47
  # HF Space context (auto-injected by HF Spaces, set manually for local dev)
 
67
  # HF_DATASET_ID=
68
 
69
  # Server-side fallback HF token. Used when no user OAuth token is present yet
70
+ # (e.g. before the first login, or during local dev without OAuth).
71
+ #
72
+ # The chat panel and embed studio call Hugging Face Inference Providers
73
+ # (https://router.huggingface.co/v1) with this token when no OAuth token is
74
+ # available. Generate one at https://huggingface.co/settings/tokens with
75
+ # "Write" scope (or a fine-grained token with both repo + inference
76
+ # permissions). Optional on a HF Space with OAuth configured - the logged-in
77
+ # user's token is used instead.
78
  # HF_TOKEN=
79
 
80
  # -----------------------------------------------------------------------------
81
  # AI features (chat panel + embed studio)
82
  # -----------------------------------------------------------------------------
83
+ # The AI assistant calls Hugging Face Inference Providers with either the
84
+ # logged-in user's OAuth token or HF_TOKEN above. No extra API key needed -
85
+ # this is the whole point of moving off OpenRouter.
86
+
87
+ # Override the default model id used by the chat agent. The list of
88
+ # supported models is in backend/src/agent/chat.ts (AVAILABLE_MODELS), but
89
+ # any model exposed by HF Inference Providers with tool-calling support
90
+ # works. Defaults to "openai/gpt-oss-120b".
91
+ # HF_INFERENCE_MODEL=openai/gpt-oss-120b
 
92
 
93
  # -----------------------------------------------------------------------------
94
  # Publishing
backend/package-lock.json CHANGED
@@ -19,7 +19,6 @@
19
  "@hocuspocus/server": "^3.4.4",
20
  "@hocuspocus/transformer": "^3.4.4",
21
  "@huggingface/hub": "^2.11.0",
22
- "@openrouter/ai-sdk-provider": "^2.5.1",
23
  "@tiptap/core": "^3.22.3",
24
  "@tiptap/extension-image": "^3.22.3",
25
  "@tiptap/extension-link": "^3.22.3",
@@ -817,19 +816,6 @@
817
  "url": "https://paulmillr.com/funding/"
818
  }
819
  },
820
- "node_modules/@openrouter/ai-sdk-provider": {
821
- "version": "2.5.1",
822
- "resolved": "https://registry.npmjs.org/@openrouter/ai-sdk-provider/-/ai-sdk-provider-2.5.1.tgz",
823
- "integrity": "sha512-r1fJL1Cb3gQDa2MpWH/sfx1BsEW0uzlRriJM6eihaKqbtKDmZoBisF32VcVaQYassighX7NGCkF68EsrZA43uQ==",
824
- "license": "Apache-2.0",
825
- "engines": {
826
- "node": ">=18"
827
- },
828
- "peerDependencies": {
829
- "ai": "^6.0.0",
830
- "zod": "^3.25.0 || ^4.0.0"
831
- }
832
- },
833
  "node_modules/@opentelemetry/api": {
834
  "version": "1.9.0",
835
  "resolved": "https://registry.npmjs.org/@opentelemetry/api/-/api-1.9.0.tgz",
 
19
  "@hocuspocus/server": "^3.4.4",
20
  "@hocuspocus/transformer": "^3.4.4",
21
  "@huggingface/hub": "^2.11.0",
 
22
  "@tiptap/core": "^3.22.3",
23
  "@tiptap/extension-image": "^3.22.3",
24
  "@tiptap/extension-link": "^3.22.3",
 
816
  "url": "https://paulmillr.com/funding/"
817
  }
818
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
819
  "node_modules/@opentelemetry/api": {
820
  "version": "1.9.0",
821
  "resolved": "https://registry.npmjs.org/@opentelemetry/api/-/api-1.9.0.tgz",
backend/package.json CHANGED
@@ -23,7 +23,6 @@
23
  "@hocuspocus/server": "^3.4.4",
24
  "@hocuspocus/transformer": "^3.4.4",
25
  "@huggingface/hub": "^2.11.0",
26
- "@openrouter/ai-sdk-provider": "^2.5.1",
27
  "@tiptap/core": "^3.22.3",
28
  "@tiptap/extension-image": "^3.22.3",
29
  "@tiptap/extension-link": "^3.22.3",
 
23
  "@hocuspocus/server": "^3.4.4",
24
  "@hocuspocus/transformer": "^3.4.4",
25
  "@huggingface/hub": "^2.11.0",
 
26
  "@tiptap/core": "^3.22.3",
27
  "@tiptap/extension-image": "^3.22.3",
28
  "@tiptap/extension-link": "^3.22.3",
backend/src/agent/chat.ts CHANGED
@@ -3,13 +3,24 @@ import { SYSTEM_PROMPT, buildMessages } from "./system-prompt.js";
3
  import { streamChatResponse } from "./stream-handler.js";
4
  import type { Request, Response } from "express";
5
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  export const AVAILABLE_MODELS = [
7
- { id: "google/gemini-2.5-flash", label: "Gemini 2.5 Flash", context: "1M", cost: "$" },
8
- { id: "google/gemini-2.5-pro", label: "Gemini 2.5 Pro", context: "1M", cost: "$$" },
9
- { id: "anthropic/claude-sonnet-4", label: "Claude Sonnet 4", context: "200K", cost: "$$$" },
10
- { id: "anthropic/claude-3.5-haiku", label: "Claude 3.5 Haiku", context: "200K", cost: "$" },
11
- { id: "openai/gpt-4.1-mini", label: "GPT-4.1 Mini", context: "1M", cost: "$" },
12
- { id: "openai/gpt-4.1", label: "GPT-4.1", context: "1M", cost: "$$" },
13
  ];
14
 
15
  export async function handleChat(req: Request, res: Response) {
 
3
  import { streamChatResponse } from "./stream-handler.js";
4
  import type { Request, Response } from "express";
5
 
6
+ /**
7
+ * Models exposed in the UI picker. All ids must be served by Hugging
8
+ * Face Inference Providers (`https://router.huggingface.co/v1`) and
9
+ * support function/tool calling - the agent loop won't work without it.
10
+ *
11
+ * Discover more conversational models here:
12
+ * https://huggingface.co/models?inference_provider=all&other=conversational
13
+ *
14
+ * `context` is the advertised context window; `cost` is a rough
15
+ * relative price tag ($, $$, $$$) - inference providers charge their
16
+ * own rates, see the docs for the source of truth.
17
+ */
18
  export const AVAILABLE_MODELS = [
19
+ { id: "openai/gpt-oss-120b", label: "GPT-OSS 120B", context: "131K", cost: "$$" },
20
+ { id: "openai/gpt-oss-20b", label: "GPT-OSS 20B", context: "131K", cost: "$" },
21
+ { id: "meta-llama/Llama-3.3-70B-Instruct", label: "Llama 3.3 70B", context: "128K", cost: "$" },
22
+ { id: "Qwen/Qwen3-Coder-480B-A35B-Instruct", label: "Qwen3 Coder 480B", context: "262K", cost: "$$" },
23
+ { id: "deepseek-ai/DeepSeek-V3.1", label: "DeepSeek V3.1", context: "128K", cost: "$$" },
 
24
  ];
25
 
26
  export async function handleChat(req: Request, res: Response) {
backend/src/agent/stream-handler.ts CHANGED
@@ -1,15 +1,47 @@
1
  import { streamText, convertToModelMessages } from "ai";
2
- import { createOpenRouter } from "@openrouter/ai-sdk-provider";
3
  import type { Request, Response } from "express";
 
4
 
5
- export const DEFAULT_MODEL = "google/gemini-2.5-flash";
6
 
7
- export function getProvider() {
8
- const apiKey = process.env.OPENROUTER_API_KEY;
9
- if (!apiKey) {
10
- throw new Error("OPENROUTER_API_KEY environment variable is required");
11
- }
12
- return createOpenRouter({ apiKey });
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  }
14
 
15
  interface StreamChatOptions {
@@ -24,19 +56,30 @@ export async function streamChatResponse(
24
  { systemPrompt, tools, logPrefix }: StreamChatOptions,
25
  ) {
26
  try {
27
- const { messages, context, model } = req.body;
28
 
29
  if (!messages || !Array.isArray(messages)) {
30
  res.status(400).json({ error: "messages array is required" });
31
  return;
32
  }
33
 
34
- const provider = getProvider();
35
- const modelId = model || process.env.OPENROUTER_MODEL || DEFAULT_MODEL;
 
 
 
 
 
 
 
 
 
 
 
36
  const modelMessages = await convertToModelMessages(messages);
37
 
38
  const result = streamText({
39
- model: provider.chat(modelId),
40
  system: systemPrompt,
41
  messages: modelMessages,
42
  tools,
 
1
  import { streamText, convertToModelMessages } from "ai";
2
+ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
3
  import type { Request, Response } from "express";
4
+ import { extractToken } from "../auth.js";
5
 
6
+ export const DEFAULT_MODEL = "openai/gpt-oss-120b";
7
 
8
+ /**
9
+ * Hugging Face Inference Providers exposes an OpenAI-compatible chat
10
+ * completions endpoint at `https://router.huggingface.co/v1` that routes
11
+ * to a fleet of providers (Cerebras, Together, Fireworks, ...). The
12
+ * upside: any HF user token with the `inference-api` scope can call it,
13
+ * so a forked Space gets AI features for free as soon as the user logs
14
+ * in - no extra API key to wire up.
15
+ *
16
+ * See https://huggingface.co/docs/inference-providers
17
+ */
18
+ const HF_INFERENCE_BASE_URL = "https://router.huggingface.co/v1";
19
+
20
+ /**
21
+ * Resolve the HF token used to authenticate inference calls.
22
+ *
23
+ * Priority:
24
+ * 1. The currently logged-in editor's OAuth token (forwarded from the
25
+ * `hf_access_token` cookie). This is the production path on a HF
26
+ * Space - no environment secret needed.
27
+ * 2. The `HF_TOKEN` env var fallback. Useful for local dev when OAuth
28
+ * isn't configured, or as a server-side default when the OAuth
29
+ * scope doesn't include `inference-api` yet.
30
+ */
31
+ function resolveHfToken(req: Request): string | undefined {
32
+ const userToken = extractToken(req.headers.cookie);
33
+ if (userToken) return userToken;
34
+ const envToken = process.env.HF_TOKEN;
35
+ if (envToken) return envToken;
36
+ return undefined;
37
+ }
38
+
39
+ function createProvider(apiKey: string) {
40
+ return createOpenAICompatible({
41
+ name: "huggingface",
42
+ baseURL: HF_INFERENCE_BASE_URL,
43
+ apiKey,
44
+ });
45
  }
46
 
47
  interface StreamChatOptions {
 
56
  { systemPrompt, tools, logPrefix }: StreamChatOptions,
57
  ) {
58
  try {
59
+ const { messages, model } = req.body;
60
 
61
  if (!messages || !Array.isArray(messages)) {
62
  res.status(400).json({ error: "messages array is required" });
63
  return;
64
  }
65
 
66
+ const apiKey = resolveHfToken(req);
67
+ if (!apiKey) {
68
+ res.status(500).json({
69
+ error:
70
+ "No Hugging Face token available. Sign in with your HF account " +
71
+ "(the OAuth token is used to call Inference Providers) or set " +
72
+ "HF_TOKEN in the backend environment.",
73
+ });
74
+ return;
75
+ }
76
+
77
+ const provider = createProvider(apiKey);
78
+ const modelId = model || process.env.HF_INFERENCE_MODEL || DEFAULT_MODEL;
79
  const modelMessages = await convertToModelMessages(messages);
80
 
81
  const result = streamText({
82
+ model: provider.chatModel(modelId),
83
  system: systemPrompt,
84
  messages: modelMessages,
85
  tools,
backend/tests/agent-chat.test.ts CHANGED
@@ -3,7 +3,7 @@
3
  *
4
  * Tests for /api/chat and /api/embed-chat routes:
5
  * - Input validation (missing messages)
6
- * - Missing API key handling
7
  * - Model list endpoint
8
  */
9
  import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
@@ -71,9 +71,9 @@ describe("/api/chat - validation", () => {
71
  expect(res.body).toHaveProperty("error");
72
  });
73
 
74
- it("returns 500 when OPENROUTER_API_KEY is missing", async () => {
75
- const original = process.env.OPENROUTER_API_KEY;
76
- delete process.env.OPENROUTER_API_KEY;
77
 
78
  const res = await request(app)
79
  .post("/api/chat")
@@ -83,9 +83,9 @@ describe("/api/chat - validation", () => {
83
  .expect(500);
84
 
85
  expect(res.body).toHaveProperty("error");
86
- expect(res.body.error).toContain("OPENROUTER_API_KEY");
87
 
88
- if (original) process.env.OPENROUTER_API_KEY = original;
89
  });
90
  });
91
 
@@ -109,9 +109,9 @@ describe("/api/embed-chat - validation", () => {
109
  expect(res.body).toHaveProperty("error");
110
  });
111
 
112
- it("returns 500 when OPENROUTER_API_KEY is missing", async () => {
113
- const original = process.env.OPENROUTER_API_KEY;
114
- delete process.env.OPENROUTER_API_KEY;
115
 
116
  const res = await request(app)
117
  .post("/api/embed-chat")
@@ -121,8 +121,8 @@ describe("/api/embed-chat - validation", () => {
121
  .expect(500);
122
 
123
  expect(res.body).toHaveProperty("error");
124
- expect(res.body.error).toContain("OPENROUTER_API_KEY");
125
 
126
- if (original) process.env.OPENROUTER_API_KEY = original;
127
  });
128
  });
 
3
  *
4
  * Tests for /api/chat and /api/embed-chat routes:
5
  * - Input validation (missing messages)
6
+ * - Missing HF token handling
7
  * - Model list endpoint
8
  */
9
  import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
 
71
  expect(res.body).toHaveProperty("error");
72
  });
73
 
74
+ it("returns 500 when no HF token is available", async () => {
75
+ const original = process.env.HF_TOKEN;
76
+ delete process.env.HF_TOKEN;
77
 
78
  const res = await request(app)
79
  .post("/api/chat")
 
83
  .expect(500);
84
 
85
  expect(res.body).toHaveProperty("error");
86
+ expect(res.body.error).toContain("Hugging Face token");
87
 
88
+ if (original) process.env.HF_TOKEN = original;
89
  });
90
  });
91
 
 
109
  expect(res.body).toHaveProperty("error");
110
  });
111
 
112
+ it("returns 500 when no HF token is available", async () => {
113
+ const original = process.env.HF_TOKEN;
114
+ delete process.env.HF_TOKEN;
115
 
116
  const res = await request(app)
117
  .post("/api/embed-chat")
 
121
  .expect(500);
122
 
123
  expect(res.body).toHaveProperty("error");
124
+ expect(res.body.error).toContain("Hugging Face token");
125
 
126
+ if (original) process.env.HF_TOKEN = original;
127
  });
128
  });
docs/SPECIFICATION.md CHANGED
@@ -127,8 +127,9 @@ flowchart LR
127
 
128
  ### 4.6 AI Agent
129
 
130
- - Provider: OpenRouter (`OPENROUTER_API_KEY`), default model `anthropic/claude-sonnet-4`
131
- - Streaming via Vercel AI SDK `streamText`
 
132
  - **Context**: document text, current selection, frontmatter (sent by frontend with each message)
133
  - **Tools** (declarative, executed client-side by the frontend):
134
  - `replaceSelection` - replace selected text
@@ -265,7 +266,7 @@ The publisher reads these same CSS files server-side and injects them inline int
265
  ### 6.2 HF Space Configuration (README.md frontmatter)
266
 
267
  - SDK: `docker`, port `8080`
268
- - OAuth: `hf_oauth: true`, scopes: `manage-repos`
269
  - Two git remotes: `space` (tfrere/collab-editor, dev) and `prod` (tfrere/research-article-template-editor, production)
270
 
271
  ### 6.3 Environment Variables
@@ -278,11 +279,10 @@ The publisher reads these same CSS files server-side and injects them inline int
278
  | `SPACE_HOST` | For OAuth | HTTPS callback URL host |
279
  | `OAUTH_CLIENT_ID` | For OAuth | HF OAuth client |
280
  | `OAUTH_CLIENT_SECRET` | For OAuth | HF OAuth secret |
281
- | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes |
282
  | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
283
- | `HF_TOKEN` | No | Fallback Hub token for HF API |
284
- | `OPENROUTER_API_KEY` | For AI chat | OpenRouter API key |
285
- | `OPENROUTER_MODEL` | No | Default AI model |
286
  | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
287
 
288
  ### 6.4 Local Development
@@ -297,7 +297,7 @@ cd frontend && npm install && npm run dev
297
  # Starts on http://localhost:5678 (proxies /api and /collab to :8080)
298
  ```
299
 
300
- Create a `.env` file in `backend/` with at minimum `OPENROUTER_API_KEY` for AI chat. Without `SPACE_ID`, OAuth is disabled and all users can edit.
301
 
302
  ---
303
 
 
127
 
128
  ### 4.6 AI Agent
129
 
130
+ - Provider: Hugging Face Inference Providers (`https://router.huggingface.co/v1`), default model `openai/gpt-oss-120b`
131
+ - Auth: per-request bearer token resolved from the editor's OAuth cookie when available, falling back to the server-side `HF_TOKEN`. On a HF Space with `inference-api` scope, no extra secret is needed - the logged-in user pays for their own inference under their HF quota.
132
+ - Streaming via Vercel AI SDK `streamText` over `@ai-sdk/openai-compatible`
133
  - **Context**: document text, current selection, frontmatter (sent by frontend with each message)
134
  - **Tools** (declarative, executed client-side by the frontend):
135
  - `replaceSelection` - replace selected text
 
266
  ### 6.2 HF Space Configuration (README.md frontmatter)
267
 
268
  - SDK: `docker`, port `8080`
269
+ - OAuth: `hf_oauth: true`, scopes: `manage-repos`, `inference-api`
270
  - Two git remotes: `space` (tfrere/collab-editor, dev) and `prod` (tfrere/research-article-template-editor, production)
271
 
272
  ### 6.3 Environment Variables
 
279
  | `SPACE_HOST` | For OAuth | HTTPS callback URL host |
280
  | `OAUTH_CLIENT_ID` | For OAuth | HF OAuth client |
281
  | `OAUTH_CLIENT_SECRET` | For OAuth | HF OAuth secret |
282
+ | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes. Add `manage-repos` for dataset persistence and `inference-api` to power AI features with the user's token |
283
  | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
284
+ | `HF_TOKEN` | For AI chat in local dev | Fallback Hub token for HF API + Inference Providers. Needs the "Make calls to Inference Providers" permission |
285
+ | `HF_INFERENCE_MODEL` | No (default `openai/gpt-oss-120b`) | Default chat-completion model id served by HF Inference Providers |
 
286
  | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
287
 
288
  ### 6.4 Local Development
 
297
  # Starts on http://localhost:5678 (proxies /api and /collab to :8080)
298
  ```
299
 
300
+ Create a `.env` file in `backend/` with at minimum `HF_TOKEN` for AI chat (must have the "Make calls to Inference Providers" permission). Without `SPACE_ID`, OAuth is disabled and all users can edit.
301
 
302
  ---
303