File size: 11,993 Bytes
15ccd87
 
 
 
 
 
 
 
 
 
 
 
e706cf6
15ccd87
 
 
e706cf6
15ccd87
 
725264a
 
15ccd87
725264a
 
 
 
 
 
 
 
 
 
 
 
 
15ccd87
 
 
e706cf6
15ccd87
725264a
 
 
 
 
 
 
 
15ccd87
 
 
725264a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15ccd87
725264a
15ccd87
725264a
 
 
 
 
 
 
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
 
 
 
 
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
 
 
 
 
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
 
 
15ccd87
725264a
15ccd87
725264a
 
 
 
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
 
 
 
15ccd87
725264a
15ccd87
725264a
 
 
 
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
15ccd87
725264a
 
 
 
 
 
c1298ea
725264a
15ccd87
725264a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15ccd87
 
 
 
 
725264a
 
 
15ccd87
 
725264a
15ccd87
 
 
725264a
15ccd87
 
 
 
 
725264a
15ccd87
725264a
 
 
 
15ccd87
725264a
 
 
 
 
 
 
 
15ccd87
 
 
725264a
 
 
 
 
 
 
 
15ccd87
725264a
15ccd87
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
---
title: HTTP API Reference
---

# HTTP API Reference

Koharu exposes a local HTTP API under:

```text
http://127.0.0.1:<PORT>/api/v1
```

This is the same API used by the desktop UI and the headless Web UI.

## Runtime model

Important current behavior:

- the API is served by the same process as the GUI or headless runtime
- the server binds to `127.0.0.1` by default; use `--host` to bind elsewhere
- the API and MCP server share the same loaded project, models, and pipeline state
- when no `--port` is provided, Koharu chooses a random local port
- everything except `/api/v1/downloads`, `/api/v1/operations`, and `/api/v1/events` returns `503 Service Unavailable` until the app finishes bootstrapping

## Resource model

The API is project-centric. A single project is open at a time and contains:

- a list of `Pages` indexed by `PageId`
- per-page `Nodes` (image layers, masks, text blocks) referenced by `NodeId`
- a content-addressed `Blob` store that holds raw image bytes by Blake3 hash
- a `Scene` snapshot built from those pieces, advanced by an `epoch` counter
- a history of `Op` mutations that can be undone or redone

Mutations always go through the history layer (`POST /history/apply`) so the scene, autosave, and event subscribers stay in sync.

## Common response shapes

Frequently used response types include:

- `MetaInfo` β€” app version and ML device label
- `EngineCatalog` β€” installable engine ids per pipeline stage
- `ProjectSummary` β€” id, name, path, page count, last opened
- `SceneSnapshot` β€” `{ epoch, scene }`
- `LlmState` β€” current LLM load state (status, target, error)
- `LlmCatalog` β€” local + provider models grouped by family
- `JobSummary` β€” `{ id, kind, status, error }`
- `DownloadProgress` β€” package id, byte counts, status

## Endpoints

### Meta

| Method | Path        | Purpose                                    |
| ------ | ----------- | ------------------------------------------ |
| `GET`  | `/meta`     | get app version and active ML backend      |
| `GET`  | `/engines`  | list registered pipeline engines per stage |

### Fonts

| Method | Path                                  | Purpose                                              |
| ------ | ------------------------------------- | ---------------------------------------------------- |
| `GET`  | `/fonts`                              | combined system + Google Fonts catalog for rendering |
| `GET`  | `/google-fonts`                       | Google Fonts catalog as a standalone list            |
| `POST` | `/google-fonts/{family}/fetch`        | download and cache one Google Fonts family           |
| `GET`  | `/google-fonts/{family}/{file}`       | serve the cached TTF/WOFF file                       |

### Projects

Every project lives under the managed `{data.path}/projects/` directory; clients never supply filesystem paths.

| Method   | Path                              | Purpose                                                   |
| -------- | --------------------------------- | --------------------------------------------------------- |
| `GET`    | `/projects`                       | list managed projects                                     |
| `POST`   | `/projects`                       | create a new project (body `{ name }`)                    |
| `POST`   | `/projects/import`                | extract a `.khr` archive into a fresh dir and open it     |
| `PUT`    | `/projects/current`               | open a managed project by `id`                            |
| `DELETE` | `/projects/current`               | close the current session                                 |
| `POST`   | `/projects/current/export`        | export the current project; returns binary bytes          |

`POST /projects/current/export` accepts `{ format, pages? }` where `format` is one of `khr`, `psd`, `rendered`, `inpainted`. When the format produces multiple files, the response is `application/zip`.

### Pages

| Method | Path                                    | Purpose                                              |
| ------ | --------------------------------------- | ---------------------------------------------------- |
| `POST` | `/pages`                                | create pages from N uploaded image files (multipart) |
| `POST` | `/pages/from-paths`                     | Tauri-only fast path that imports by absolute path   |
| `POST` | `/pages/{id}/image-layers`              | add a Custom image node from an uploaded file        |
| `PUT`  | `/pages/{id}/masks/{role}`              | upsert a mask node from raw PNG bytes                |
| `GET`  | `/pages/{id}/thumbnail`                 | get the page thumbnail (cached as WebP)              |

`role` is `segment` or `brushInpaint`. `POST /pages` accepts an optional `replace=true` field; the import is filename-sorted using natural order.

### Scene and blobs

| Method | Path                | Purpose                                                       |
| ------ | ------------------- | ------------------------------------------------------------- |
| `GET`  | `/scene.json`       | full scene snapshot for web/UI clients                        |
| `GET`  | `/scene.bin`        | postcard-encoded `Snapshot { epoch, scene }` for Tauri client |
| `GET`  | `/blobs/{hash}`     | raw blob bytes by Blake3 hash                                 |

`/scene.bin` includes the current epoch in the `x-koharu-epoch` response header.

### History (mutations)

All scene mutations go through here. Each response returns `{ epoch }`.

| Method | Path                | Purpose                                  |
| ------ | ------------------- | ---------------------------------------- |
| `POST` | `/history/apply`    | apply an `Op` (including `Op::Batch`)    |
| `POST` | `/history/undo`     | revert the last applied op               |
| `POST` | `/history/redo`     | re-apply the last undone op              |

`Op` is the discriminated union that covers add/remove/update node, add/remove page, batch, and other scene transitions. The body is the JSON-tagged variant.

### Pipelines

| Method | Path          | Purpose                                |
| ------ | ------------- | -------------------------------------- |
| `POST` | `/pipelines`  | start a pipeline run as an operation   |

Body fields:

- `steps` β€” engine ids to run in order (validated against the registry)
- `pages` β€” optional subset of `PageId`s; omit to process the whole project
- `region` β€” optional bounding box for the inpainter (repair-brush flow)
- `targetLanguage`, `systemPrompt`, `defaultFont` β€” optional per-run overrides

The response carries an `operationId`. Progress and completion arrive on `/events` as `JobStarted`, `JobProgress`, `JobWarning`, and `JobFinished`.

### Operations

`/operations` is the unified registry for in-flight and recently-completed jobs (pipelines + downloads).

| Method   | Path                  | Purpose                                                    |
| -------- | --------------------- | ---------------------------------------------------------- |
| `GET`    | `/operations`         | snapshot of every in-flight or recent operation            |
| `DELETE` | `/operations/{id}`    | cancel a pipeline run; best-effort eviction for downloads  |

### Downloads

| Method | Path                | Purpose                                    |
| ------ | ------------------- | ------------------------------------------ |
| `GET`  | `/downloads`        | snapshot of every active or recent download |
| `POST` | `/downloads`        | start a model-package download (`{ modelId }`) |

`modelId` is a package id declared via `declare_hf_model_package!` (e.g. `"model:comic-text-detector:yolo-v5"`). The response is `{ operationId }` reusing the package id.

### LLM control

The loaded model is a singleton resource at `/llm/current`.

| Method   | Path             | Purpose                                       |
| -------- | ---------------- | --------------------------------------------- |
| `GET`    | `/llm/current`   | current state (status, target, error)         |
| `PUT`    | `/llm/current`   | load the given target (local or provider)     |
| `DELETE` | `/llm/current`   | unload / release the model                    |
| `GET`    | `/llm/catalog`   | list available local + provider-backed models |

`PUT /llm/current` accepts an `LlmLoadRequest`:

- provider targets β€” `{ kind: "provider", providerId, modelId }`
- local targets β€” `{ kind: "local", modelId }`
- optional `options { temperature, maxTokens, customSystemPrompt }`

`PUT /llm/current` returns `204` once the load task is queued. The actual ready state is published as `LlmLoaded` on `/events`.

### Config

| Method   | Path                                    | Purpose                                         |
| -------- | --------------------------------------- | ----------------------------------------------- |
| `GET`    | `/config`                               | read the current `AppConfig`                    |
| `PATCH`  | `/config`                               | apply a `ConfigPatch`; persists and broadcasts  |
| `PUT`    | `/config/providers/{id}/secret`         | save (or overwrite) a provider's API key        |
| `DELETE` | `/config/providers/{id}/secret`         | clear a provider's stored API key               |

`AppConfig` exposes top-level `data`, `http`, `pipeline`, and `providers`:

- `data.path` β€” local data directory used for runtime, model cache, and projects
- `http { connectTimeout, readTimeout, maxRetries }` β€” shared HTTP client used by downloads and provider-backed requests
- `pipeline { detector, fontDetector, segmenter, bubbleSegmenter, ocr, translator, inpainter, renderer }` β€” engine id selected for each stage
- `providers[] { id, baseUrl?, apiKey? }` β€” saved API keys round-trip as the redacted placeholder `"[REDACTED]"`; never the raw secret

Built-in provider ids:

- `openai`
- `gemini`
- `claude`
- `deepseek`
- `deepl`
- `google-translate`
- `caiyun`
- `openai-compatible`

API keys are stored in the platform credential store, not in `config.toml`. PATCHing `apiKey: ""` clears the saved key; PATCHing `"[REDACTED]"` leaves it unchanged. The dedicated `/config/providers/{id}/secret` routes are the explicit, non-PATCH way to manage one provider's secret.

## Events stream

Koharu exposes a Server-Sent Events stream at:

```text
GET /events
```

Behavior:

- a fresh connection (no `Last-Event-ID` header) starts with a `Snapshot` event holding the current jobs and downloads registries
- on reconnect, the server replays buffered events with `seq > Last-Event-ID` in order; if the requested id has scrolled out of the ring, the server re-sends a `Snapshot`
- each live event is emitted with its `seq` as the SSE `id:` field
- a 15-second keep-alive is maintained

Event variants currently include:

- `Snapshot` β€” full state seed for fresh and lag-recovery clients
- `JobStarted`, `JobProgress`, `JobWarning`, `JobFinished` β€” pipeline job lifecycle
- `DownloadProgress` β€” package download progress ticks
- `ConfigChanged` β€” config was applied via `PATCH /config` or a secret route
- `LlmLoaded`, `LlmUnloaded` β€” LLM lifecycle transitions
- `SceneAdvanced` β€” emitted when a scene mutation advances the epoch

## Typical workflow

The normal API order for one new project is:

1. `POST /projects` β€” create the project
2. `POST /pages` (or `/pages/from-paths` from Tauri) β€” import images
3. `PUT /llm/current` β€” load a translation model (local or provider)
4. `POST /pipelines` β€” kick off `detect β†’ ocr β†’ translate β†’ inpaint β†’ render`
5. tail `GET /events` until `JobFinished`
6. `POST /projects/current/export` with `format = "rendered"` or `"psd"`

For finer control, post `POST /history/apply` with explicit `Op` payloads instead of running a full pipeline.

If you want agent-oriented access instead of HTTP endpoint orchestration, see [MCP Tools Reference](mcp-tools.md).