carbon-tokenization

Running

tfrere HF Staff Cursor commited on May 20

Commit

6b6afea

1 Parent(s): 3de227d

feat(agent): switch AI backend from OpenRouter to HF Inference Providers

The chat panel and embed studio now call Hugging Face Inference Providers
(OpenAI-compatible router at https://router.huggingface.co/v1) via
@ai-sdk/openai-compatible, using the logged-in editor's OAuth token when
available and falling back to HF_TOKEN. Forking the Space no longer
requires wiring an OpenRouter API key - the user's own HF token funds
their own inference.

- Replace @openrouter/ai-sdk-provider with @ai-sdk/openai-compatible
- Forward the OAuth cookie token per request, fallback to HF_TOKEN
- Refresh AVAILABLE_MODELS to HF-served, tool-calling-capable models
(gpt-oss 120B/20B, Llama 3.3 70B, Qwen3 Coder 480B, DeepSeek V3.1)
- Add inference-api to hf_oauth_scopes and document the new env vars
(HF_INFERENCE_MODEL, expanded HF_TOKEN role) in README + SPECIFICATION
- Update agent-chat tests to expect the new "Hugging Face token" error

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (8) hide show

README.md +11 -11
backend/.env.example +24 -17
backend/package-lock.json +0 -14
backend/package.json +0 -1
backend/src/agent/chat.ts +17 -6
backend/src/agent/stream-handler.ts +55 -12
backend/tests/agent-chat.test.ts +11 -11
docs/SPECIFICATION.md +8 -8

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ pinned: false
 hf_oauth: true
 hf_oauth_scopes:
   - manage-repos
 ---
 # Research Article Template Editor
@@ -41,7 +42,7 @@ A collaborative, real-time editor for web-native scientific articles. It lets mu
 | Collaboration | Y.js, Hocuspocus (WebSocket), y-tiptap |
 | Backend | Node.js, Express, Vite (dev proxy), Hocuspocus server |
 | Publishing | Custom TipTap-JSON → HTML renderer, Puppeteer for PDF |
-| AI | Vercel AI SDK v6 (`ai`, `@ai-sdk/react`), streaming tool calls |
 | Styling | Plain CSS with custom properties, no framework |
 | Storage | Local FS or Hugging Face datasets (via `@huggingface/hub`) |
 | Container | Single-image Docker build, runs on port 8080 |
@@ -83,7 +84,7 @@ See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for a diagram and the full to
 ### Prerequisites
 - Node.js 20+
-- An OpenRouter API key if you want the AI features (embed studio, chat agent)
 - A Hugging Face OAuth app (client id/secret) if you want login + HF dataset persistence
 ### Local development
@@ -93,7 +94,7 @@ Backend and frontend run as two separate processes in dev (Vite proxies `/api`,
 ```bash
 # terminal 1 — backend (Express + Hocuspocus on :8080)
 cd backend
-cp .env.example .env          # fill in OPENROUTER_API_KEY, OAUTH_*, HF_TOKEN, ...
 npm install
 npm run dev
@@ -118,13 +119,13 @@ Then open http://localhost:8080.
 ### Run your own copy on a Hugging Face Space
-Want your own editor with your own AI key? Three steps:
 1. **Duplicate the Space.** On https://huggingface.co/spaces/tfrere/research-article-template-editor, click `⋯ → Duplicate this Space`. Pick your namespace and visibility. HF copies the Dockerfile, the OAuth wiring and rebuilds the image automatically.
-2. **Get an OpenRouter API key.** Sign up at https://openrouter.ai and create a key under https://openrouter.ai/keys. The chat agent and the embed studio call OpenRouter through the [Vercel AI SDK](https://ai-sdk.dev/), so any model exposed by OpenRouter works (defaults to `anthropic/claude-sonnet-4`).
-3. **Add the key as a Space secret.** In your duplicated Space, go to `Settings → Variables and secrets → New secret`, name it `OPENROUTER_API_KEY` and paste the value. Optional: add `OPENROUTER_MODEL` as a public variable to override the default model. Save - the Space restarts and the AI features light up.
-That's it. The HF OAuth app, the persistence dataset (`<your-space>-data`) and the Docker build are wired automatically by HF Spaces; you only need the OpenRouter key to unlock the AI side.
 ## Scripts
@@ -155,10 +156,9 @@ Copy `backend/.env.example` to `backend/.env` and fill the relevant values. Key
 | Variable | Purpose |
 |---|---|
 | `OAUTH_CLIENT_ID` / `OAUTH_CLIENT_SECRET` | HF OAuth app for user login (required to edit when running on a Space) |
-| `OAUTH_SCOPES` | OAuth scopes (default `openid profile`) |
-| `OPENROUTER_API_KEY` | API key used by the AI agent (chat panel + embed studio) |
-| `OPENROUTER_MODEL` | Override the default OpenRouter model id |
-| `HF_TOKEN` | Server-side Hugging Face token (fallback when no user OAuth token is present) |
 | `HF_DATASET_ID` | Target HF dataset repo for document persistence (when not running on a Space) |
 | `SPACE_ID` / `SPACE_HOST` | Auto-set by HF Spaces; drive dataset id + secure cookies in production |
 | `DATA_DIR` | Where documents, uploads and published bundles are stored on disk (default: `./data`) |

 hf_oauth: true
 hf_oauth_scopes:
   - manage-repos
+  - inference-api
 ---
 # Research Article Template Editor
 | Collaboration | Y.js, Hocuspocus (WebSocket), y-tiptap |
 | Backend | Node.js, Express, Vite (dev proxy), Hocuspocus server |
 | Publishing | Custom TipTap-JSON → HTML renderer, Puppeteer for PDF |
+| AI | Vercel AI SDK v6 (`ai`, `@ai-sdk/react`) → Hugging Face Inference Providers (OpenAI-compatible router) |
 | Styling | Plain CSS with custom properties, no framework |
 | Storage | Local FS or Hugging Face datasets (via `@huggingface/hub`) |
 | Container | Single-image Docker build, runs on port 8080 |
 ### Prerequisites
 - Node.js 20+
+- A Hugging Face token with the `Make calls to Inference Providers` permission for the AI features (embed studio, chat agent). Generate one at https://huggingface.co/settings/tokens. On a HF Space the logged-in user's OAuth token is used instead - no manual setup needed.
 - A Hugging Face OAuth app (client id/secret) if you want login + HF dataset persistence
 ### Local development
 ```bash
 # terminal 1 — backend (Express + Hocuspocus on :8080)
 cd backend
+cp .env.example .env          # set HF_TOKEN, optional OAUTH_* and HF_DATASET_ID
 npm install
 npm run dev
 ### Run your own copy on a Hugging Face Space
+Want your own editor? One step:
 1. **Duplicate the Space.** On https://huggingface.co/spaces/tfrere/research-article-template-editor, click `⋯ → Duplicate this Space`. Pick your namespace and visibility. HF copies the Dockerfile, the OAuth wiring and rebuilds the image automatically.
+That's it. No API key to wire up. The AI features (chat agent + embed studio) call **Hugging Face Inference Providers** at `https://router.huggingface.co/v1` using the OAuth token of whoever is currently logged in. As long as your duplicated Space requests the `inference-api` scope (already declared in the README frontmatter as `hf_oauth_scopes`), every editor gets AI for free under their own Inference Providers quota.
+Optional public variable: `HF_INFERENCE_MODEL` (e.g. `meta-llama/Llama-3.3-70B-Instruct`) to override the default model id. The full list of supported chat-completion models lives at https://huggingface.co/models?inference_provider=all&other=conversational.
 ## Scripts
 | Variable | Purpose |
 |---|---|
 | `OAUTH_CLIENT_ID` / `OAUTH_CLIENT_SECRET` | HF OAuth app for user login (required to edit when running on a Space) |
+| `OAUTH_SCOPES` | OAuth scopes (default `openid profile`). Add `manage-repos` for dataset persistence and `inference-api` to power the AI features with the user's token |
+| `HF_TOKEN` | Server-side Hugging Face token. Used as a fallback when no user OAuth token is present (e.g. local dev). Needs the `Make calls to Inference Providers` permission to enable the chat agent + embed studio |
+| `HF_INFERENCE_MODEL` | Override the default chat-completion model id (defaults to `openai/gpt-oss-120b`). Any tool-calling-capable model exposed by HF Inference Providers works |
 | `HF_DATASET_ID` | Target HF dataset repo for document persistence (when not running on a Space) |
 | `SPACE_ID` / `SPACE_HOST` | Auto-set by HF Spaces; drive dataset id + secure cookies in production |
 | `DATA_DIR` | Where documents, uploads and published bundles are stored on disk (default: `./data`) |

backend/.env.example CHANGED Viewed

@@ -35,10 +35,13 @@
 OAUTH_CLIENT_ID=
 OAUTH_CLIENT_SECRET=
-# Space-scoped OAuth requires "manage-repos" to read/write the dataset that
-# backs persistence. Defaults to "openid profile" when unset, which is enough
-# for login-only flows but cannot persist documents to a HF dataset.
-# OAUTH_SCOPES=openid profile manage-repos
 # -----------------------------------------------------------------------------
 # HF Space context (auto-injected by HF Spaces, set manually for local dev)
@@ -64,24 +67,28 @@ OAUTH_CLIENT_SECRET=
 # HF_DATASET_ID=
 # Server-side fallback HF token. Used when no user OAuth token is present yet
-# (e.g. before the first login). Optional - the editor caches the last
-# authenticated user's OAuth token for background dataset writes.
-# Generate one at https://huggingface.co/settings/tokens with "Write" scope.
 # HF_TOKEN=
 # -----------------------------------------------------------------------------
 # AI features (chat panel + embed studio)
 # -----------------------------------------------------------------------------
-# Required to enable the AI assistant. The chat panel and embed studio are
-# disabled silently when OPENROUTER_API_KEY is unset.
-# Get a key at https://openrouter.ai/keys
-OPENROUTER_API_KEY=
-# Override the default model id used by the chat agent. The list of supported
-# models is in backend/src/agent/chat.ts (AVAILABLE_MODELS).
-# Defaults to "anthropic/claude-sonnet-4".
-# OPENROUTER_MODEL=anthropic/claude-sonnet-4
 # -----------------------------------------------------------------------------
 # Publishing

 OAUTH_CLIENT_ID=
 OAUTH_CLIENT_SECRET=
+# Space-scoped OAuth needs:
+#   - "manage-repos"  to read/write the dataset that backs persistence
+#   - "inference-api" so the user's OAuth token can call HF Inference
+#                     Providers (powers the chat panel + embed studio)
+# Defaults to "openid profile" when unset, which is enough for login-only
+# flows but disables AI features and dataset persistence.
+# OAUTH_SCOPES=openid profile manage-repos inference-api
 # -----------------------------------------------------------------------------
 # HF Space context (auto-injected by HF Spaces, set manually for local dev)
 # HF_DATASET_ID=
 # Server-side fallback HF token. Used when no user OAuth token is present yet
+# (e.g. before the first login, or during local dev without OAuth).
+#
+# The chat panel and embed studio call Hugging Face Inference Providers
+# (https://router.huggingface.co/v1) with this token when no OAuth token is
+# available. Generate one at https://huggingface.co/settings/tokens with
+# "Write" scope (or a fine-grained token with both repo + inference
+# permissions). Optional on a HF Space with OAuth configured - the logged-in
+# user's token is used instead.
 # HF_TOKEN=
 # -----------------------------------------------------------------------------
 # AI features (chat panel + embed studio)
 # -----------------------------------------------------------------------------
+# The AI assistant calls Hugging Face Inference Providers with either the
+# logged-in user's OAuth token or HF_TOKEN above. No extra API key needed -
+# this is the whole point of moving off OpenRouter.
+# Override the default model id used by the chat agent. The list of
+# supported models is in backend/src/agent/chat.ts (AVAILABLE_MODELS), but
+# any model exposed by HF Inference Providers with tool-calling support
+# works. Defaults to "openai/gpt-oss-120b".
+# HF_INFERENCE_MODEL=openai/gpt-oss-120b
 # -----------------------------------------------------------------------------
 # Publishing

backend/package-lock.json CHANGED Viewed

@@ -19,7 +19,6 @@
         "@hocuspocus/server": "^3.4.4",
         "@hocuspocus/transformer": "^3.4.4",
         "@huggingface/hub": "^2.11.0",
-        "@openrouter/ai-sdk-provider": "^2.5.1",
         "@tiptap/core": "^3.22.3",
         "@tiptap/extension-image": "^3.22.3",
         "@tiptap/extension-link": "^3.22.3",
@@ -817,19 +816,6 @@
         "url": "https://paulmillr.com/funding/"
       }
     },
-    "node_modules/@openrouter/ai-sdk-provider": {
-      "version": "2.5.1",
-      "resolved": "https://registry.npmjs.org/@openrouter/ai-sdk-provider/-/ai-sdk-provider-2.5.1.tgz",
-      "integrity": "sha512-r1fJL1Cb3gQDa2MpWH/sfx1BsEW0uzlRriJM6eihaKqbtKDmZoBisF32VcVaQYassighX7NGCkF68EsrZA43uQ==",
-      "license": "Apache-2.0",
-      "engines": {
-        "node": ">=18"
-      },
-      "peerDependencies": {
-        "ai": "^6.0.0",
-        "zod": "^3.25.0 || ^4.0.0"
-      }
-    },
     "node_modules/@opentelemetry/api": {
       "version": "1.9.0",
       "resolved": "https://registry.npmjs.org/@opentelemetry/api/-/api-1.9.0.tgz",

         "@hocuspocus/server": "^3.4.4",
         "@hocuspocus/transformer": "^3.4.4",
         "@huggingface/hub": "^2.11.0",
         "@tiptap/core": "^3.22.3",
         "@tiptap/extension-image": "^3.22.3",
         "@tiptap/extension-link": "^3.22.3",
         "url": "https://paulmillr.com/funding/"
       }
     },
     "node_modules/@opentelemetry/api": {
       "version": "1.9.0",
       "resolved": "https://registry.npmjs.org/@opentelemetry/api/-/api-1.9.0.tgz",

backend/package.json CHANGED Viewed

@@ -23,7 +23,6 @@
     "@hocuspocus/server": "^3.4.4",
     "@hocuspocus/transformer": "^3.4.4",
     "@huggingface/hub": "^2.11.0",
-    "@openrouter/ai-sdk-provider": "^2.5.1",
     "@tiptap/core": "^3.22.3",
     "@tiptap/extension-image": "^3.22.3",
     "@tiptap/extension-link": "^3.22.3",

     "@hocuspocus/server": "^3.4.4",
     "@hocuspocus/transformer": "^3.4.4",
     "@huggingface/hub": "^2.11.0",
     "@tiptap/core": "^3.22.3",
     "@tiptap/extension-image": "^3.22.3",
     "@tiptap/extension-link": "^3.22.3",

backend/src/agent/chat.ts CHANGED Viewed

@@ -3,13 +3,24 @@ import { SYSTEM_PROMPT, buildMessages } from "./system-prompt.js";
 import { streamChatResponse } from "./stream-handler.js";
 import type { Request, Response } from "express";
 export const AVAILABLE_MODELS = [
-  { id: "google/gemini-2.5-flash", label: "Gemini 2.5 Flash", context: "1M", cost: "$" },
-  { id: "google/gemini-2.5-pro", label: "Gemini 2.5 Pro", context: "1M", cost: "$$" },
-  { id: "anthropic/claude-sonnet-4", label: "Claude Sonnet 4", context: "200K", cost: "$$$" },
-  { id: "anthropic/claude-3.5-haiku", label: "Claude 3.5 Haiku", context: "200K", cost: "$" },
-  { id: "openai/gpt-4.1-mini", label: "GPT-4.1 Mini", context: "1M", cost: "$" },
-  { id: "openai/gpt-4.1", label: "GPT-4.1", context: "1M", cost: "$$" },
 ];
 export async function handleChat(req: Request, res: Response) {

 import { streamChatResponse } from "./stream-handler.js";
 import type { Request, Response } from "express";
+/**
+ * Models exposed in the UI picker. All ids must be served by Hugging
+ * Face Inference Providers (`https://router.huggingface.co/v1`) and
+ * support function/tool calling - the agent loop won't work without it.
+ *
+ * Discover more conversational models here:
+ *   https://huggingface.co/models?inference_provider=all&other=conversational
+ *
+ * `context` is the advertised context window; `cost` is a rough
+ * relative price tag ($, $$, $$$) - inference providers charge their
+ * own rates, see the docs for the source of truth.
+ */
 export const AVAILABLE_MODELS = [
+  { id: "openai/gpt-oss-120b", label: "GPT-OSS 120B", context: "131K", cost: "$$" },
+  { id: "openai/gpt-oss-20b", label: "GPT-OSS 20B", context: "131K", cost: "$" },
+  { id: "meta-llama/Llama-3.3-70B-Instruct", label: "Llama 3.3 70B", context: "128K", cost: "$" },
+  { id: "Qwen/Qwen3-Coder-480B-A35B-Instruct", label: "Qwen3 Coder 480B", context: "262K", cost: "$$" },
+  { id: "deepseek-ai/DeepSeek-V3.1", label: "DeepSeek V3.1", context: "128K", cost: "$$" },
 ];
 export async function handleChat(req: Request, res: Response) {

backend/src/agent/stream-handler.ts CHANGED Viewed

@@ -1,15 +1,47 @@
 import { streamText, convertToModelMessages } from "ai";
-import { createOpenRouter } from "@openrouter/ai-sdk-provider";
 import type { Request, Response } from "express";
-export const DEFAULT_MODEL = "google/gemini-2.5-flash";
-export function getProvider() {
-  const apiKey = process.env.OPENROUTER_API_KEY;
-  if (!apiKey) {
-    throw new Error("OPENROUTER_API_KEY environment variable is required");
-  }
-  return createOpenRouter({ apiKey });
 }
 interface StreamChatOptions {
@@ -24,19 +56,30 @@ export async function streamChatResponse(
   { systemPrompt, tools, logPrefix }: StreamChatOptions,
 ) {
   try {
-    const { messages, context, model } = req.body;
     if (!messages || !Array.isArray(messages)) {
       res.status(400).json({ error: "messages array is required" });
       return;
     }
-    const provider = getProvider();
-    const modelId = model || process.env.OPENROUTER_MODEL || DEFAULT_MODEL;
     const modelMessages = await convertToModelMessages(messages);
     const result = streamText({
-      model: provider.chat(modelId),
       system: systemPrompt,
       messages: modelMessages,
       tools,

 import { streamText, convertToModelMessages } from "ai";
+import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
 import type { Request, Response } from "express";
+import { extractToken } from "../auth.js";
+export const DEFAULT_MODEL = "openai/gpt-oss-120b";
+/**
+ * Hugging Face Inference Providers exposes an OpenAI-compatible chat
+ * completions endpoint at `https://router.huggingface.co/v1` that routes
+ * to a fleet of providers (Cerebras, Together, Fireworks, ...). The
+ * upside: any HF user token with the `inference-api` scope can call it,
+ * so a forked Space gets AI features for free as soon as the user logs
+ * in - no extra API key to wire up.
+ *
+ * See https://huggingface.co/docs/inference-providers
+ */
+const HF_INFERENCE_BASE_URL = "https://router.huggingface.co/v1";
+/**
+ * Resolve the HF token used to authenticate inference calls.
+ *
+ * Priority:
+ *   1. The currently logged-in editor's OAuth token (forwarded from the
+ *      `hf_access_token` cookie). This is the production path on a HF
+ *      Space - no environment secret needed.
+ *   2. The `HF_TOKEN` env var fallback. Useful for local dev when OAuth
+ *      isn't configured, or as a server-side default when the OAuth
+ *      scope doesn't include `inference-api` yet.
+ */
+function resolveHfToken(req: Request): string | undefined {
+  const userToken = extractToken(req.headers.cookie);
+  if (userToken) return userToken;
+  const envToken = process.env.HF_TOKEN;
+  if (envToken) return envToken;
+  return undefined;
+}
+function createProvider(apiKey: string) {
+  return createOpenAICompatible({
+    name: "huggingface",
+    baseURL: HF_INFERENCE_BASE_URL,
+    apiKey,
+  });
 }
 interface StreamChatOptions {
   { systemPrompt, tools, logPrefix }: StreamChatOptions,
 ) {
   try {
+    const { messages, model } = req.body;
     if (!messages || !Array.isArray(messages)) {
       res.status(400).json({ error: "messages array is required" });
       return;
     }
+    const apiKey = resolveHfToken(req);
+    if (!apiKey) {
+      res.status(500).json({
+        error:
+          "No Hugging Face token available. Sign in with your HF account " +
+          "(the OAuth token is used to call Inference Providers) or set " +
+          "HF_TOKEN in the backend environment.",
+      });
+      return;
+    }
+    const provider = createProvider(apiKey);
+    const modelId = model || process.env.HF_INFERENCE_MODEL || DEFAULT_MODEL;
     const modelMessages = await convertToModelMessages(messages);
     const result = streamText({
+      model: provider.chatModel(modelId),
       system: systemPrompt,
       messages: modelMessages,
       tools,

backend/tests/agent-chat.test.ts CHANGED Viewed

@@ -3,7 +3,7 @@
  *
  * Tests for /api/chat and /api/embed-chat routes:
  * - Input validation (missing messages)
- * - Missing API key handling
  * - Model list endpoint
  */
 import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
@@ -71,9 +71,9 @@ describe("/api/chat - validation", () => {
     expect(res.body).toHaveProperty("error");
   });
-  it("returns 500 when OPENROUTER_API_KEY is missing", async () => {
-    const original = process.env.OPENROUTER_API_KEY;
-    delete process.env.OPENROUTER_API_KEY;
     const res = await request(app)
       .post("/api/chat")
@@ -83,9 +83,9 @@ describe("/api/chat - validation", () => {
       .expect(500);
     expect(res.body).toHaveProperty("error");
-    expect(res.body.error).toContain("OPENROUTER_API_KEY");
-    if (original) process.env.OPENROUTER_API_KEY = original;
   });
 });
@@ -109,9 +109,9 @@ describe("/api/embed-chat - validation", () => {
     expect(res.body).toHaveProperty("error");
   });
-  it("returns 500 when OPENROUTER_API_KEY is missing", async () => {
-    const original = process.env.OPENROUTER_API_KEY;
-    delete process.env.OPENROUTER_API_KEY;
     const res = await request(app)
       .post("/api/embed-chat")
@@ -121,8 +121,8 @@ describe("/api/embed-chat - validation", () => {
       .expect(500);
     expect(res.body).toHaveProperty("error");
-    expect(res.body.error).toContain("OPENROUTER_API_KEY");
-    if (original) process.env.OPENROUTER_API_KEY = original;
   });
 });

  *
  * Tests for /api/chat and /api/embed-chat routes:
  * - Input validation (missing messages)
+ * - Missing HF token handling
  * - Model list endpoint
  */
 import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
     expect(res.body).toHaveProperty("error");
   });
+  it("returns 500 when no HF token is available", async () => {
+    const original = process.env.HF_TOKEN;
+    delete process.env.HF_TOKEN;
     const res = await request(app)
       .post("/api/chat")
       .expect(500);
     expect(res.body).toHaveProperty("error");
+    expect(res.body.error).toContain("Hugging Face token");
+    if (original) process.env.HF_TOKEN = original;
   });
 });
     expect(res.body).toHaveProperty("error");
   });
+  it("returns 500 when no HF token is available", async () => {
+    const original = process.env.HF_TOKEN;
+    delete process.env.HF_TOKEN;
     const res = await request(app)
       .post("/api/embed-chat")
       .expect(500);
     expect(res.body).toHaveProperty("error");
+    expect(res.body.error).toContain("Hugging Face token");
+    if (original) process.env.HF_TOKEN = original;
   });
 });

docs/SPECIFICATION.md CHANGED Viewed

@@ -127,8 +127,9 @@ flowchart LR
 ### 4.6 AI Agent
-- Provider: OpenRouter (`OPENROUTER_API_KEY`), default model `anthropic/claude-sonnet-4`
-- Streaming via Vercel AI SDK `streamText`
 - **Context**: document text, current selection, frontmatter (sent by frontend with each message)
 - **Tools** (declarative, executed client-side by the frontend):
   - `replaceSelection` - replace selected text
@@ -265,7 +266,7 @@ The publisher reads these same CSS files server-side and injects them inline int
 ### 6.2 HF Space Configuration (README.md frontmatter)
 - SDK: `docker`, port `8080`
-- OAuth: `hf_oauth: true`, scopes: `manage-repos`
 - Two git remotes: `space` (tfrere/collab-editor, dev) and `prod` (tfrere/research-article-template-editor, production)
 ### 6.3 Environment Variables
@@ -278,11 +279,10 @@ The publisher reads these same CSS files server-side and injects them inline int
 | `SPACE_HOST` | For OAuth | HTTPS callback URL host |
 | `OAUTH_CLIENT_ID` | For OAuth | HF OAuth client |
 | `OAUTH_CLIENT_SECRET` | For OAuth | HF OAuth secret |
-| `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes |
 | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
-| `HF_TOKEN` | No | Fallback Hub token for HF API |
-| `OPENROUTER_API_KEY` | For AI chat | OpenRouter API key |
-| `OPENROUTER_MODEL` | No | Default AI model |
 | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
 ### 6.4 Local Development
@@ -297,7 +297,7 @@ cd frontend && npm install && npm run dev
 # Starts on http://localhost:5678 (proxies /api and /collab to :8080)
 ```
-Create a `.env` file in `backend/` with at minimum `OPENROUTER_API_KEY` for AI chat. Without `SPACE_ID`, OAuth is disabled and all users can edit.
 ---

 ### 4.6 AI Agent
+- Provider: Hugging Face Inference Providers (`https://router.huggingface.co/v1`), default model `openai/gpt-oss-120b`
+- Auth: per-request bearer token resolved from the editor's OAuth cookie when available, falling back to the server-side `HF_TOKEN`. On a HF Space with `inference-api` scope, no extra secret is needed - the logged-in user pays for their own inference under their HF quota.
+- Streaming via Vercel AI SDK `streamText` over `@ai-sdk/openai-compatible`
 - **Context**: document text, current selection, frontmatter (sent by frontend with each message)
 - **Tools** (declarative, executed client-side by the frontend):
   - `replaceSelection` - replace selected text
 ### 6.2 HF Space Configuration (README.md frontmatter)
 - SDK: `docker`, port `8080`
+- OAuth: `hf_oauth: true`, scopes: `manage-repos`, `inference-api`
 - Two git remotes: `space` (tfrere/collab-editor, dev) and `prod` (tfrere/research-article-template-editor, production)
 ### 6.3 Environment Variables
 | `SPACE_HOST` | For OAuth | HTTPS callback URL host |
 | `OAUTH_CLIENT_ID` | For OAuth | HF OAuth client |
 | `OAUTH_CLIENT_SECRET` | For OAuth | HF OAuth secret |
+| `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes. Add `manage-repos` for dataset persistence and `inference-api` to power AI features with the user's token |
 | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
+| `HF_TOKEN` | For AI chat in local dev | Fallback Hub token for HF API + Inference Providers. Needs the "Make calls to Inference Providers" permission |
+| `HF_INFERENCE_MODEL` | No (default `openai/gpt-oss-120b`) | Default chat-completion model id served by HF Inference Providers |
 | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
 ### 6.4 Local Development
 # Starts on http://localhost:5678 (proxies /api and /collab to :8080)
 ```
+Create a `.env` file in `backend/` with at minimum `HF_TOKEN` for AI chat (must have the "Make calls to Inference Providers" permission). Without `SPACE_ID`, OAuth is disabled and all users can edit.
 ---