Mayo commited on
Commit
15ccd87
·
unverified ·
1 Parent(s): 9e4d84c

docs: more content

Browse files
.github/CONTRIBUTING.md CHANGED
@@ -1,13 +1,34 @@
1
  # Contributing
2
 
3
- We welcome contributions! Please ensure your code is:
4
 
5
- - Well-structured and follows existing conventions
6
- - Tested and passing all checks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  ## AI-Generated PRs
9
 
10
  AI-generated contributions are welcome, provided:
11
 
12
- 1. A human has reviewed the code before opening the PR
13
- 2. The submitter understands the changes being made
 
1
  # Contributing
2
 
3
+ Thanks for contributing to Koharu.
4
 
5
+ For the full contributor guide, including local setup, validation commands, and docs workflow, see:
6
+
7
+ - [`docs/how-to/contributing.md`](../docs/how-to/contributing.md)
8
+
9
+ In short, contributors should:
10
+
11
+ - follow existing code and UI patterns
12
+ - run the checks that match the area they changed
13
+ - explain what changed and how they verified it in the PR
14
+
15
+ Useful local commands:
16
+
17
+ ```bash
18
+ bun install
19
+ bun run build
20
+ bun cargo fmt -- --check
21
+ bun cargo check
22
+ bun cargo clippy -- -D warnings
23
+ bun cargo test --workspace --tests
24
+ bun run format
25
+ bun run test:e2e
26
+ zensical build -c
27
+ ```
28
 
29
  ## AI-Generated PRs
30
 
31
  AI-generated contributions are welcome, provided:
32
 
33
+ 1. A human has reviewed the code before opening the PR.
34
+ 2. The submitter understands the changes being made.
README.md CHANGED
@@ -1,209 +1,231 @@
1
- # Koharu
2
-
3
- [Documentation](https://koharu.rs)
4
-
5
- ML-powered manga translator, written in **Rust**.
6
-
7
- Koharu introduces a new workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.
8
-
9
- Under the hood, Koharu uses [candle](https://github.com/huggingface/candle) and [llama.cpp](https://github.com/ggml-org/llama.cpp) for high-performance inference, and uses [Tauri](https://github.com/tauri-apps/tauri) for the GUI. All components are written in Rust, ensuring safety and speed.
10
-
11
- > [!NOTE]
12
- > Koharu runs its vision models and local LLMs **locally** on your machine by default. If you choose a remote LLM provider, Koharu sends translation text only to the provider you configured. Koharu itself does not collect user data.
13
-
14
- ---
15
-
16
- ![screenshot](docs/assets/koharu-screenshot-en.png)
17
-
18
- > [!NOTE]
19
- > For help and support, please join our [Discord server](https://discord.gg/mHvHkxGnUY).
20
-
21
- ## Features
22
-
23
- - Automatic speech bubble detection and segmentation
24
- - OCR for manga text recognition
25
- - Inpainting to remove original text from images
26
- - LLM-powered translation
27
- - Vertical text layout for CJK languages
28
- - Export to layered PSD with editable text
29
- - MCP server for AI agents
30
-
31
- ## Usage
32
-
33
- ### Hot keys
34
-
35
- - <kbd>Ctrl</kbd> + Mouse Wheel: Zoom in/out
36
- - <kbd>Ctrl</kbd> + Drag: Pan the canvas
37
- - <kbd>Del</kbd>: Delete selected text block
38
-
39
- ### Export
40
-
41
- Koharu can export the current page as a rendered image or as a layered Photoshop PSD. PSD export preserves helper layers and writes translated text as editable text layers for further cleanup in Photoshop.
42
-
43
- ### MCP Server
44
-
45
- Koharu has a built-in MCP server that can be used to integrate with AI agents. By default, the MCP server will listen on a random port, but you can specify the port using the `--port` flag.
46
-
47
- ```bash
48
- # macOS / Linux
49
- koharu --port 9999
50
- # Windows
51
- koharu.exe --port 9999
52
- ```
53
-
54
- You can input `http://localhost:9999/mcp` into the MCP server URL field in your AI agent.
55
-
56
- ### Headless Mode
57
-
58
- Koharu can be run in headless mode via command line.
59
-
60
- ```bash
61
- # macOS / Linux
62
- koharu --port 4000 --headless
63
- # Windows
64
- koharu.exe --port 4000 --headless
65
- ```
66
-
67
- You can now access Koharu Web UI at `http://localhost:4000`.
68
-
69
- ## GPU acceleration
70
-
71
- CUDA, Metal and Vulkan are supported for GPU acceleration, significantly improving performance on supported hardware.
72
-
73
- ### CUDA (NVIDIA GPUs on Windows)
74
-
75
- Koharu is built with CUDA support on Windows, allowing it to leverage the power of NVIDIA GPUs for faster processing.
76
-
77
- Koharu bundles CUDA toolkit 13.1, dylibs will be automatically extracted to the application data directory on first run.
78
-
79
- > [!NOTE]
80
- > Please ensure that your system has the latest NVIDIA drivers installed. You can download the latest drivers via [NVIDIA App](https://www.nvidia.com/en-us/software/nvidia-app/).
81
-
82
- #### Supported NVIDIA GPUs
83
-
84
- Koharu supports NVIDIA GPUs with compute capability 7.5 or higher.
85
-
86
- Please make sure your GPU is supported by checking the [CUDA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) and the [cuDNN Support Matrix](https://docs.nvidia.com/deeplearning/cudnn/backend/latest/reference/support-matrix.html).
87
-
88
- ### Metal (Apple Silicon on macOS)
89
-
90
- Koharu supports Metal for GPU acceleration on macOS with Apple Silicon (M1, M2, etc.). This allows Koharu to run efficiently on a wide range of Apple devices.
91
-
92
- ### Vulkan (Windows and Linux)
93
-
94
- Koharu also supports Vulkan for GPU acceleration on Windows and Linux. Vulkan is a cross-platform graphics and compute API that provides high performance and low overhead.
95
-
96
- Note that Vulkan support only applies to the OCR and LLM inference, while the detection and inpainting models still rely on CUDA or Metal. AMD and Intel GPUs can use Vulkan for acceleration, but for the best experience with all features enabled, a CUDA-compatible NVIDIA GPU or Apple Silicon device is recommended.
97
-
98
- ### CPU fallback
99
-
100
- You can always force Koharu to use CPU for inference:
101
-
102
- ```bash
103
- # macOS / Linux
104
- koharu --cpu
105
- # Windows
106
- koharu.exe --cpu
107
- ```
108
-
109
- ## ML Models
110
-
111
- Koharu relies on a mixin of computer vision and natural language processing models to perform its tasks.
112
-
113
- ### Computer Vision Models
114
-
115
- Koharu uses several pre-trained models for different tasks:
116
-
117
- - [PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) for text detection and layout analysis
118
- - [comic-text-detector](https://huggingface.co/mayocream/comic-text-detector) for text segmentation
119
- - [PaddleOCR-VL-1.5](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5) for OCR text recognition
120
- - [lama-manga](https://huggingface.co/mayocream/lama-manga) for inpainting
121
- - [YuzuMarker.FontDetection](https://huggingface.co/fffonion/yuzumarker-font-detection) for font and color detection
122
-
123
- The models will be automatically downloaded when you run Koharu for the first time.
124
-
125
- We convert the original models to safetensors format for better performance and compatibility with Rust. The converted models are hosted on [Hugging Face](https://huggingface.co/mayocream).
126
-
127
- ### Large Language Models
128
-
129
- Koharu supports both local and remote LLM backends, and preselects a model based on your system locale when possible.
130
-
131
- #### Local LLMs
132
-
133
- Koharu supports various quantized LLMs in GGUF format via [llama.cpp](https://github.com/ggml-org/llama.cpp). These models run on your machine and are downloaded on demand when you select them in Settings. Supported models and suggested usage:
134
-
135
- For translating to English:
136
-
137
- - [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): ~8.5 GB Q8_0 weight size and suggests >=10 GB VRAM or plenty of system RAM for CPU inference, best when accuracy matters most.
138
- - [lfm2-350m-enjp-mt](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT-GGUF): ultra-light (≈350M, Q8_0); runs comfortably on CPUs and low-memory GPUs, ideal for quick previews or low-spec machines at the cost of quality.
139
-
140
- For translating to Chinese:
141
-
142
- - [sakura-galtransl-7b-v3.7](https://huggingface.co/SakuraLLM/Sakura-GalTransl-7B-v3.7): ~6.3 GB and fits on 8 GB VRAM, good balance of quality and speed.
143
- - [sakura-1.5b-qwen2.5-v1.0](https://huggingface.co/shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX): lightweight (≈1.5B, Q5KS); fits on mid-range GPUs (4–6 GB VRAM) or CPU-only setups with moderate RAM, faster than 7B/8B while keeping Qwen-style tokenizer behavior.
144
-
145
- For other languages, you may use:
146
-
147
- - [hunyuan-7b-mt-v1.0](https://huggingface.co/Mungert/Hunyuan-MT-7B-GGUF): ~6.3GB and fits on 8 GB VRAM, decent multi-language translation quality.
148
-
149
- LLMs will be automatically downloaded on demand when you select a model in the settings. Choose the smallest model that meets your quality needs if you are memory-bound; prefer the 7B/8B variants when you have sufficient VRAM/RAM for better translations.
150
-
151
- #### Remote LLMs
152
-
153
- Koharu can also translate through remote or self-hosted API providers instead of a downloaded local model. Supported remote providers:
154
-
155
- - OpenAI
156
- - Gemini
157
- - Claude
158
- - DeepSeek
159
- - OpenAI Compatible, including tools and services such as LM Studio, OpenRouter, or any endpoint that exposes the OpenAI-style `/v1/models` and `/v1/chat/completions` APIs
160
-
161
- Remote providers are configured in **Settings > API Keys**. For OpenAI Compatible, you also set a custom base URL. API keys are optional for local servers like LM Studio, but typically required for hosted services like OpenRouter.
162
-
163
- Use remote providers when you want to avoid local model downloads, reduce local VRAM/RAM usage, or connect Koharu to a hosted model. Keep in mind that OCR text selected for translation is sent to the configured provider.
164
-
165
- ## Installation
166
-
167
- You can download the latest release of Koharu from the [releases page](https://github.com/mayocream/koharu/releases/latest).
168
-
169
- We provide pre-built binaries for Windows, macOS, and Linux. For other platforms, you may need to build from source, see the [Development](#development) section below.
170
-
171
- ## Development
172
-
173
- To build Koharu from source, follow the steps below.
174
-
175
- ### Prerequisites
176
-
177
- - [Rust](https://www.rust-lang.org/tools/install) (1.92 or later)
178
- - [Bun](https://bun.sh/) (1.0 or later)
179
-
180
- ### Install dependencies
181
-
182
- ```bash
183
- bun install
184
- ```
185
-
186
- ### Build
187
-
188
- ```bash
189
- bun run build
190
- ```
191
-
192
- The built binaries will be located in the `target/release` directory.
193
-
194
- ## Sponsorship
195
-
196
- If you find Koharu useful, consider sponsoring the project to support its development!
197
-
198
- - [GitHub Sponsors](https://github.com/sponsors/mayocream)
199
- - [Patreon](https://www.patreon.com/mayocream)
200
-
201
- ## Contributors
202
-
203
- <a href="https://github.com/mayocream/koharu/graphs/contributors">
204
- <img src="https://contrib.rocks/image?repo=mayocream/koharu" />
205
- </a>
206
-
207
- ## License
208
-
209
- Koharu is licensed under the [GNU General Public License v3.0](LICENSE).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Koharu
2
+
3
+ [Documentation](https://koharu.rs)
4
+
5
+ ML-powered manga translator, written in **Rust**.
6
+
7
+ Koharu introduces a new workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.
8
+
9
+ Under the hood, Koharu uses [candle](https://github.com/huggingface/candle) and [llama.cpp](https://github.com/ggml-org/llama.cpp) for high-performance inference, and uses [Tauri](https://github.com/tauri-apps/tauri) for the GUI. All components are written in Rust, ensuring safety and speed.
10
+
11
+ > [!NOTE]
12
+ > Koharu runs its vision models and local LLMs **locally** on your machine by default. If you choose a remote LLM provider, Koharu sends translation text only to the provider you configured. Koharu itself does not collect user data.
13
+
14
+ ---
15
+
16
+ ![screenshot](docs/assets/koharu-screenshot-en.png)
17
+
18
+ > [!NOTE]
19
+ > For help and support, join the [Discord server](https://discord.gg/mHvHkxGnUY).
20
+
21
+ ## Features
22
+
23
+ - Automatic speech bubble detection and segmentation
24
+ - OCR for manga text recognition
25
+ - Inpainting to remove original text from images
26
+ - LLM-powered translation
27
+ - Vertical text layout for CJK languages
28
+ - Export to layered PSD with editable text
29
+ - Local HTTP API and MCP server for automation
30
+
31
+ If you just want to get started, see [Install Koharu](https://koharu.rs/how-to/install-koharu/) and [Translate Your First Page](https://koharu.rs/tutorials/translate-your-first-page/).
32
+
33
+ ## Usage
34
+
35
+ ### Hot keys
36
+
37
+ - <kbd>Ctrl</kbd> + Mouse Wheel: Zoom in/out
38
+ - <kbd>Ctrl</kbd> + Drag: Pan the canvas
39
+ - <kbd>Del</kbd>: Delete selected text block
40
+
41
+ ### Export
42
+
43
+ Koharu can export the current page as a rendered image or as a layered Photoshop PSD. PSD export preserves helper layers and writes translated text as editable text layers, which makes manual cleanup much easier when the automatic pass gets most of the way there.
44
+
45
+ For export behavior, PSD contents, and file naming, see [Export Pages and Manage Projects](https://koharu.rs/how-to/export-and-manage-projects/).
46
+
47
+ ### MCP Server
48
+
49
+ Koharu has a built-in MCP server for AI agents. By default it listens on a random port, but you can pin it with the `--port` flag.
50
+
51
+ ```bash
52
+ # macOS / Linux
53
+ koharu --port 9999
54
+ # Windows
55
+ koharu.exe --port 9999
56
+ ```
57
+
58
+ Then point your client at `http://localhost:9999/mcp`.
59
+
60
+ For local setup and the available tools, see [Run GUI, Headless, and MCP Modes](https://koharu.rs/how-to/run-gui-headless-and-mcp/), [Configure MCP Clients](https://koharu.rs/how-to/configure-mcp-clients/), and [MCP Tools Reference](https://koharu.rs/reference/mcp-tools/).
61
+
62
+ ### Headless Mode
63
+
64
+ Koharu can also run without the desktop window.
65
+
66
+ ```bash
67
+ # macOS / Linux
68
+ koharu --port 4000 --headless
69
+ # Windows
70
+ koharu.exe --port 4000 --headless
71
+ ```
72
+
73
+ You can then open the web UI at `http://localhost:4000`.
74
+
75
+ For runtime modes, ports, and local endpoints, see [Run GUI, Headless, and MCP Modes](https://koharu.rs/how-to/run-gui-headless-and-mcp/).
76
+
77
+ ## GPU acceleration
78
+
79
+ Koharu supports CUDA, Metal, and Vulkan for acceleration. CPU fallback is always available if the accelerated path is unavailable or not worth the trouble on your system.
80
+
81
+ ### CUDA (NVIDIA GPUs on Windows)
82
+
83
+ Koharu is built with CUDA support on Windows so it can use NVIDIA GPUs for the full local pipeline.
84
+
85
+ Koharu bundles CUDA Toolkit 13.1. The required DLLs are extracted to the application data directory on first run.
86
+
87
+ > [!NOTE]
88
+ > Make sure you have current NVIDIA drivers installed. You can update them through [NVIDIA App](https://www.nvidia.com/en-us/software/nvidia-app/).
89
+
90
+ #### Supported NVIDIA GPUs
91
+
92
+ Koharu supports NVIDIA GPUs with compute capability 7.5 or higher.
93
+
94
+ If you want to confirm GPU support, see [CUDA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) and the [cuDNN Support Matrix](https://docs.nvidia.com/deeplearning/cudnn/backend/latest/reference/support-matrix.html).
95
+
96
+ ### Metal (Apple Silicon on macOS)
97
+
98
+ Koharu supports Metal on Apple Silicon Macs. That gives you local acceleration without any extra setup beyond the normal app install.
99
+
100
+ ### Vulkan (Windows and Linux)
101
+
102
+ Koharu also supports Vulkan on Windows and Linux. This path is mainly used for OCR and local LLM inference.
103
+
104
+ Detection and inpainting still depend on CUDA or Metal, so Vulkan is helpful but not a full replacement for the main accelerated path. AMD and Intel GPUs can still benefit from it, but the best all-around experience is still NVIDIA on Windows or Apple Silicon on macOS.
105
+
106
+ ### CPU fallback
107
+
108
+ You can always force Koharu to use CPU for inference:
109
+
110
+ ```bash
111
+ # macOS / Linux
112
+ koharu --cpu
113
+ # Windows
114
+ koharu.exe --cpu
115
+ ```
116
+
117
+ For backend selection, fallback behavior, and model runtime support, see [Acceleration and Runtime](https://koharu.rs/explanation/acceleration-and-runtime/).
118
+
119
+ ## ML Models
120
+
121
+ Koharu uses a mix of computer vision and language models rather than trying to solve the whole page with one model.
122
+
123
+ ### Computer Vision Models
124
+
125
+ Koharu uses several pre-trained models for different parts of the pipeline:
126
+
127
+ - [PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) for text detection and layout analysis
128
+ - [comic-text-detector](https://huggingface.co/mayocream/comic-text-detector) for text segmentation
129
+ - [PaddleOCR-VL-1.5](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5) for OCR text recognition
130
+ - [lama-manga](https://huggingface.co/mayocream/lama-manga) for inpainting
131
+ - [YuzuMarker.FontDetection](https://huggingface.co/fffonion/yuzumarker-font-detection) for font and color detection
132
+
133
+ The models are downloaded automatically when you run Koharu for the first time.
134
+
135
+ We convert the upstream weights to safetensors format for better compatibility and runtime behavior in Rust. The converted weights are hosted on [Hugging Face](https://huggingface.co/mayocream).
136
+
137
+ For a closer look at the pipeline, see [Models and Providers](https://koharu.rs/explanation/models-and-providers/) and the [Technical Deep Dive](https://koharu.rs/explanation/technical-deep-dive/).
138
+
139
+ ### Large Language Models
140
+
141
+ Koharu supports both local and remote LLM backends, and it tries to preselect a sensible model based on your system locale when possible.
142
+
143
+ #### Local LLMs
144
+
145
+ Koharu supports quantized GGUF models through [llama.cpp](https://github.com/ggml-org/llama.cpp). These models run on your machine and are downloaded on demand when you select them in Settings. Supported models and suggested usage:
146
+
147
+ For translating to English:
148
+
149
+ - [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0, best when translation quality matters more than speed or memory use
150
+ - [lfm2-350m-enjp-mt](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT-GGUF): very small and easy to run on CPUs or low-memory GPUs, good for quick previews and low-spec machines
151
+
152
+ For translating to Chinese:
153
+
154
+ - [sakura-galtransl-7b-v3.7](https://huggingface.co/SakuraLLM/Sakura-GalTransl-7B-v3.7): around 6.3 GB, a good balance of quality and speed on 8 GB GPUs
155
+ - [sakura-1.5b-qwen2.5-v1.0](https://huggingface.co/shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX): lighter and faster, useful on mid-range GPUs or CPU-only setups
156
+
157
+ For other languages, you can use:
158
+
159
+ - [hunyuan-7b-mt-v1.0](https://huggingface.co/Mungert/Hunyuan-MT-7B-GGUF): around 6.3 GB, with decent multilingual translation quality
160
+
161
+ LLMs are downloaded on demand when you pick a model in Settings. If you are memory-bound, start small. If you have enough VRAM or RAM, the 7B and 8B models usually produce better translations.
162
+
163
+ #### Remote LLMs
164
+
165
+ Koharu can also translate through remote or self-hosted API providers instead of a downloaded local model. Supported remote providers:
166
+
167
+ - OpenAI
168
+ - Gemini
169
+ - Claude
170
+ - DeepSeek
171
+ - OpenAI Compatible, including LM Studio, OpenRouter, or any endpoint that exposes the OpenAI-style `/v1/models` and `/v1/chat/completions` APIs
172
+
173
+ Remote providers are configured in **Settings > API Keys**. OpenAI-compatible providers also need a custom base URL. API keys are optional for local servers such as LM Studio, but usually required for hosted services such as OpenRouter.
174
+
175
+ Use a remote provider if you do not want to download local models, if you want to keep VRAM and RAM usage down, or if you already have a hosted model endpoint. Keep in mind that the OCR text selected for translation is sent to the provider you configured.
176
+
177
+ For LM Studio, OpenRouter, and other OpenAI-style endpoints, see [Use OpenAI-Compatible APIs](https://koharu.rs/how-to/use-openai-compatible-api/). For provider configuration, see [Settings Reference](https://koharu.rs/reference/settings/).
178
+
179
+ ## Installation
180
+
181
+ You can download the latest release of Koharu from the [releases page](https://github.com/mayocream/koharu/releases/latest).
182
+
183
+ We provide pre-built binaries for Windows, macOS, and Linux. For the normal install flow, see [Install Koharu](https://koharu.rs/how-to/install-koharu/). If something goes wrong, see [Troubleshooting](https://koharu.rs/how-to/troubleshooting/).
184
+
185
+ ## Development
186
+
187
+ To build Koharu from source, follow the steps below.
188
+
189
+ ### Prerequisites
190
+
191
+ - [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
192
+ - [Bun](https://bun.sh/) 1.0 or later
193
+
194
+ ### Install dependencies
195
+
196
+ ```bash
197
+ bun install
198
+ ```
199
+
200
+ ### Build
201
+
202
+ ```bash
203
+ bun run build
204
+ ```
205
+
206
+ If you want more direct control over the Tauri build:
207
+
208
+ ```bash
209
+ bun tauri build --release --no-bundle
210
+ ```
211
+
212
+ The built binaries will be located in `target/release`.
213
+
214
+ For platform-specific build notes, see [Build From Source](https://koharu.rs/how-to/build-from-source/). For the local development workflow, see [Contributing](https://koharu.rs/how-to/contributing/).
215
+
216
+ ## Sponsorship
217
+
218
+ If you find Koharu useful, consider sponsoring the project to support its development.
219
+
220
+ - [GitHub Sponsors](https://github.com/sponsors/mayocream)
221
+ - [Patreon](https://www.patreon.com/mayocream)
222
+
223
+ ## Contributors
224
+
225
+ <a href="https://github.com/mayocream/koharu/graphs/contributors">
226
+ <img src="https://contrib.rocks/image?repo=mayocream/koharu" />
227
+ </a>
228
+
229
+ ## License
230
+
231
+ Koharu is licensed under the [GNU General Public License v3.0](LICENSE).
docs/explanation/how-koharu-works.md CHANGED
@@ -4,20 +4,73 @@ title: How Koharu Works
4
 
5
  # How Koharu Works
6
 
7
- Koharu is built around a translation pipeline for manga pages.
8
 
9
- ## The core workflow
10
 
11
- For a typical page, Koharu combines several stages:
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- 1. Text detection and layout analysis
14
- 2. Text region segmentation
15
- 3. OCR text recognition
16
- 4. Inpainting to remove original text
17
- 5. LLM-based translation
18
- 6. Text rendering and export
19
 
20
- This lets one application handle both the language work and much of the visual cleanup.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ## Why the stack matters
23
 
@@ -36,3 +89,7 @@ By default, Koharu runs:
36
  - local LLMs locally
37
 
38
  If you configure a remote LLM provider, Koharu sends only the text selected for translation to that provider.
 
 
 
 
 
4
 
5
  # How Koharu Works
6
 
7
+ Koharu is built around a page pipeline for manga translation. The user-facing workflow is simple, but the implementation intentionally separates layout, segmentation, OCR, inpainting, translation, and rendering into different stages.
8
 
9
+ ## The pipeline at a glance
10
 
11
+ ```mermaid
12
+ flowchart LR
13
+ A[Input manga page] --> B[Detect stage]
14
+ B --> B1[Layout analysis]
15
+ B --> B2[Segmentation mask]
16
+ B --> B3[Font hints]
17
+ B1 --> C[OCR stage]
18
+ B2 --> D[Inpaint stage]
19
+ C --> E[LLM translation stage]
20
+ D --> F[Render stage]
21
+ E --> F
22
+ F --> G[Localized page or PSD export]
23
+ ```
24
 
25
+ At the public pipeline level, Koharu runs:
 
 
 
 
 
26
 
27
+ 1. `Detect`
28
+ 2. `OCR`
29
+ 3. `Inpaint`
30
+ 4. `LLM Generate`
31
+ 5. `Render`
32
+
33
+ The important implementation detail is that `Detect` is already a multi-model stage:
34
+
35
+ - `PP-DocLayoutV3` finds text-like layout regions and reading order.
36
+ - `comic-text-detector` produces a per-pixel text probability map.
37
+ - `YuzuMarker.FontDetection` estimates font and color hints for later rendering.
38
+
39
+ That split is why Koharu can use one model to decide where text belongs on the page and another to decide which exact pixels should be removed.
40
+
41
+ ## What each stage produces
42
+
43
+ | Stage | Main models | Main output |
44
+ | --- | --- | --- |
45
+ | Detect | `PP-DocLayoutV3`, `comic-text-detector`, `YuzuMarker.FontDetection` | text blocks, segmentation mask, font hints |
46
+ | OCR | `PaddleOCR-VL-1.5` | recognized source text for each block |
47
+ | Inpaint | `lama-manga` | page with original text removed |
48
+ | LLM Generate | local GGUF LLM or remote provider | translated text |
49
+ | Render | Koharu renderer | final localized page or export |
50
+
51
+ ## Why the stages are separate
52
+
53
+ Manga pages are harder than plain document OCR:
54
+
55
+ - speech bubbles are irregular and often curved
56
+ - Japanese text may be vertical while captions or SFX may be horizontal
57
+ - text can overlap artwork, screentones, speed lines, and panel borders
58
+ - reading order is part of the page structure, not just the raw pixels
59
+
60
+ Because of that, one model is usually not enough. Koharu first estimates layout, then runs OCR on cropped regions, then uses a segmentation mask for cleanup, and only then asks an LLM to translate the text.
61
+
62
+ ## The implementation shape
63
+
64
+ In source terms, the pipeline entrypoint runs in `koharu-pipeline/src/pipeline.rs`, while the vision stack is coordinated in `koharu-ml/src/facade.rs`.
65
+
66
+ Some implementation details that matter:
67
+
68
+ - the detect stage uses `PP-DocLayoutV3` first and converts text-like layout labels into `TextBlock` objects
69
+ - overlapping boxes are deduplicated before OCR
70
+ - text direction is inferred from region aspect ratio so vertical manga text can be handled earlier
71
+ - OCR runs on cropped text regions, not on the full page
72
+ - inpainting consumes the current segmentation mask, not just a rectangular box
73
+ - when you choose a remote LLM provider, Koharu sends OCR text for translation, not the full page image
74
 
75
  ## Why the stack matters
76
 
 
89
  - local LLMs locally
90
 
91
  If you configure a remote LLM provider, Koharu sends only the text selected for translation to that provider.
92
+
93
+ ## Want the deep technical version?
94
+
95
+ See [Technical Deep Dive](technical-deep-dive.md) for model types, segmentation mask theory, FFT-based inpainting, and background references to Wikipedia diagrams plus official model cards.
docs/explanation/index.md CHANGED
@@ -9,5 +9,6 @@ Explanation pages describe how Koharu is put together and why it behaves the way
9
  ## Topics
10
 
11
  - [How Koharu Works](how-koharu-works.md)
 
12
  - [Acceleration and Runtime](acceleration-and-runtime.md)
13
  - [Models and Providers](models-and-providers.md)
 
9
  ## Topics
10
 
11
  - [How Koharu Works](how-koharu-works.md)
12
+ - [Technical Deep Dive](technical-deep-dive.md)
13
  - [Acceleration and Runtime](acceleration-and-runtime.md)
14
  - [Models and Providers](models-and-providers.md)
docs/explanation/models-and-providers.md CHANGED
@@ -6,6 +6,8 @@ title: Models and Providers
6
 
7
  Koharu uses both vision models and language models. The vision stack prepares the page; the language stack handles translation.
8
 
 
 
9
  ## Vision models
10
 
11
  Koharu automatically downloads the required vision models when you use them for the first time.
@@ -20,10 +22,29 @@ The default stack includes:
20
 
21
  Converted model weights are hosted on [Hugging Face](https://huggingface.co/mayocream) in safetensors format for Rust compatibility and performance.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ## Local LLMs
24
 
25
  Koharu supports local GGUF models through [llama.cpp](https://github.com/ggml-org/llama.cpp). These models run on your machine and are downloaded on demand when you select them in the LLM picker.
26
 
 
 
27
  ### Suggested local models for English output
28
 
29
  - [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0 form, best when translation quality matters most
@@ -52,6 +73,8 @@ Supported providers include:
52
 
53
  Remote providers are configured in **Settings > API Keys**.
54
 
 
 
55
  ## Choosing between local and remote
56
 
57
  Use local models when you want:
@@ -69,3 +92,13 @@ Use remote providers when you want:
69
  !!! note
70
 
71
  When you use a remote provider, Koharu sends OCR text selected for translation to the provider you configured.
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  Koharu uses both vision models and language models. The vision stack prepares the page; the language stack handles translation.
8
 
9
+ If you want the architecture-level explanation of how these pieces fit together, read [Technical Deep Dive](technical-deep-dive.md) after this page.
10
+
11
  ## Vision models
12
 
13
  Koharu automatically downloads the required vision models when you use them for the first time.
 
22
 
23
  Converted model weights are hosted on [Hugging Face](https://huggingface.co/mayocream) in safetensors format for Rust compatibility and performance.
24
 
25
+ ### What each vision model is
26
+
27
+ | Model | Model type | Why Koharu uses it |
28
+ | --- | --- | --- |
29
+ | `PP-DocLayoutV3` | layout detector | finds text-like regions and reading order |
30
+ | `comic-text-detector` | segmentation network | produces a text mask for cleanup |
31
+ | `PaddleOCR-VL-1.5` | vision-language model | reads cropped text into text tokens |
32
+ | `lama-manga` | inpainting network | reconstructs the image after text removal |
33
+ | `YuzuMarker.FontDetection` | classifier / regressor | estimates font and style hints for rendering |
34
+
35
+ The important design choice is that Koharu does not use a single model for every page task. Layout, segmentation, OCR, and inpainting all need different output shapes:
36
+
37
+ - layout wants regions and order
38
+ - segmentation wants per-pixel masks
39
+ - OCR wants text
40
+ - inpainting wants restored pixels
41
+
42
  ## Local LLMs
43
 
44
  Koharu supports local GGUF models through [llama.cpp](https://github.com/ggml-org/llama.cpp). These models run on your machine and are downloaded on demand when you select them in the LLM picker.
45
 
46
+ In practice, the local models are usually quantized decoder-only transformers. GGUF is the file format; `llama.cpp` is the inference runtime.
47
+
48
  ### Suggested local models for English output
49
 
50
  - [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0 form, best when translation quality matters most
 
73
 
74
  Remote providers are configured in **Settings > API Keys**.
75
 
76
+ For a step-by-step setup guide for LM Studio, OpenRouter, and similar endpoints, see [Use OpenAI-Compatible APIs](../how-to/use-openai-compatible-api.md).
77
+
78
  ## Choosing between local and remote
79
 
80
  Use local models when you want:
 
92
  !!! note
93
 
94
  When you use a remote provider, Koharu sends OCR text selected for translation to the provider you configured.
95
+
96
+ ## Background reading
97
+
98
+ For theory and diagrams behind the model categories on this page, see:
99
+
100
+ - [Technical Deep Dive](technical-deep-dive.md)
101
+ - [Fourier transform on Wikipedia](https://en.wikipedia.org/wiki/Fourier_transform)
102
+ - [Image segmentation on Wikipedia](https://en.wikipedia.org/wiki/Image_segmentation)
103
+ - [OCR on Wikipedia](https://en.wikipedia.org/wiki/Optical_character_recognition)
104
+ - [Transformer architecture on Wikipedia](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))
docs/explanation/technical-deep-dive.md ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Technical Deep Dive
3
+ ---
4
+
5
+ # Technical Deep Dive
6
+
7
+ This page explains the technical side of Koharu's manga pipeline: what each model does, how the stages fit together, and why layout analysis, segmentation masks, OCR, inpainting, and translation are handled separately.
8
+
9
+ ## The page pipeline in implementation terms
10
+
11
+ ```mermaid
12
+ flowchart TD
13
+ A[Input page] --> B[PP-DocLayoutV3]
14
+ A --> C[comic-text-detector]
15
+ B --> D[Text blocks]
16
+ C --> E[Segmentation mask]
17
+ A --> F[YuzuMarker font detector]
18
+ D --> G[PaddleOCR-VL crop OCR]
19
+ E --> H[LaMa inpainting]
20
+ G --> I[Local or remote LLM]
21
+ F --> J[Renderer style hints]
22
+ H --> K[Renderer]
23
+ I --> K
24
+ J --> K
25
+ K --> L[Rendered page / PSD]
26
+ ```
27
+
28
+ At the code level, the public pipeline steps are `Detect -> OCR -> Inpaint -> LLM Generate -> Render`, but the detect stage is already doing three distinct jobs:
29
+
30
+ - page layout analysis
31
+ - text foreground segmentation
32
+ - font and color estimation
33
+
34
+ That design is deliberate. A manga translation tool needs both page structure and pixel precision.
35
+
36
+ ## Model types at a glance
37
+
38
+ | Component | Default model | Model type | Main job in Koharu |
39
+ | --- | --- | --- | --- |
40
+ | Layout analysis | [PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) | document layout detector | find text-like regions, labels, confidence, and reading order |
41
+ | Segmentation | [comic-text-detector](https://github.com/dmMaze/comic-text-detector) | text segmentation network | produce a dense text mask for cleanup |
42
+ | OCR | [PaddleOCR-VL-1.5](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5) | vision-language model | read cropped text regions into Unicode text |
43
+ | Inpainting | [lama-manga](https://huggingface.co/mayocream/lama-manga) / [LaMa](https://github.com/advimman/lama) | image inpainting network | fill masked regions after text removal |
44
+ | Font hints | [YuzuMarker.FontDetection](https://huggingface.co/fffonion/yuzumarker-font-detection) | image classifier / regressor | estimate font family, colors, and stroke hints |
45
+ | Translation | local GGUF model via [llama.cpp](https://github.com/ggml-org/llama.cpp) or remote API | decoder-only LLM in most local setups | translate OCR text into the target language |
46
+
47
+ ## Why layout analysis matters on manga pages
48
+
49
+ Layout analysis is not just "find boxes around text". On manga pages it has to answer several structural questions:
50
+
51
+ - which regions are text-like at all
52
+ - where the reading order probably is
53
+ - whether a block is tall enough to behave like vertical text
54
+ - which boxes should be deduplicated before OCR
55
+ - which parts of the page are captions, bubble text, titles, or other layout categories
56
+
57
+ This matters because manga is visually dense:
58
+
59
+ - speech bubbles are often curved or skewed
60
+ - text may sit on top of screentones and action lines
61
+ - vertical Japanese and horizontal Latin text can coexist on the same page
62
+ - the region that should be read is not always the same shape as the pixels that should be erased
63
+
64
+ Koharu uses layout output to create `TextBlock` records first, then uses those blocks to drive OCR and later rendering.
65
+
66
+ In the current implementation, the layout stage:
67
+
68
+ - runs `PP-DocLayoutV3::inference_one_fast(...)`
69
+ - keeps regions whose labels look text-like
70
+ - converts them into `TextBlock` values
71
+ - deduplicates heavily overlapping regions
72
+ - infers vertical vs horizontal source direction from aspect ratio
73
+
74
+ So layout analysis is the structural backbone of the rest of the pipeline.
75
+
76
+ ## What a segmentation mask is
77
+
78
+ A segmentation mask is an image-sized map where each pixel says whether it belongs to a target class. In Koharu's case, the target class is effectively "text foreground that should later be removed during cleanup".
79
+
80
+ This is different from a bounding box:
81
+
82
+ | Representation | What it means | Best used for |
83
+ | --- | --- | --- |
84
+ | Bounding box | coarse rectangular region | OCR crop selection, ordering, UI editing |
85
+ | Polygon | tighter geometric outline | line-level geometry |
86
+ | Segmentation mask | per-pixel foreground map | inpainting and precise cleanup |
87
+
88
+ ```mermaid
89
+ flowchart LR
90
+ A[Speech bubble] --> B[Layout box]
91
+ A --> C[Segmentation mask]
92
+ B --> D[Crop for OCR]
93
+ C --> E[Erase exact text pixels]
94
+ ```
95
+
96
+ In Koharu, the segmentation path is intentionally separate from layout:
97
+
98
+ - `comic-text-detector` produces a grayscale probability map
99
+ - Koharu refines that map with post-processing
100
+ - the refined result becomes `doc.segment`
101
+ - LaMa then uses `doc.segment` as the erase and fill mask for inpainting
102
+
103
+ The refinement step matters because raw segmentation probabilities are usually soft and noisy. Koharu thresholds the prediction, tries block-aware refinement, and dilates the final binary mask so the cleanup covers text edges and outlines instead of leaving halos behind.
104
+
105
+ ## How the vision models work in theory
106
+
107
+ ### Layout analysis: detector plus reading-order reasoning
108
+
109
+ [PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3) is a layout model built for document parsing under skew, warping, and other non-planar distortions. Its model card highlights two properties that are especially relevant to manga-style pages:
110
+
111
+ - it predicts multi-point geometry instead of only axis-aligned two-point boxes
112
+ - it predicts logical reading order in the same forward pass
113
+
114
+ Koharu's Rust port mirrors that shape: the `pp_doclayout_v3` module contains an `HGNetV2` backbone plus attention-based encoder and decoder blocks, and the inference result exposes `label`, `score`, `bbox`, `polygon_points`, and `order`.
115
+
116
+ Conceptually, this is closer to object detection plus layout parsing than to OCR itself.
117
+
118
+ ### Segmentation: dense per-pixel text prediction
119
+
120
+ Koharu's `comic-text-detector` path is a segmentation-first design. The Rust port loads:
121
+
122
+ - a YOLOv5-style backbone
123
+ - a U-Net decoder for mask prediction
124
+ - an optional DBNet head for full detection mode
125
+
126
+ The default page pipeline uses the segmentation-only path because Koharu already gets layout boxes from `PP-DocLayoutV3`. That means Koharu combines:
127
+
128
+ - one model that is good at page structure
129
+ - one model that is good at pixel-level text foreground
130
+
131
+ This is a better fit for cleanup than relying on boxes alone.
132
+
133
+ ### OCR: multimodal decoding from image crops to text tokens
134
+
135
+ [PaddleOCR-VL](https://huggingface.co/docs/transformers/en/model_doc/paddleocr_vl) is a compact vision-language model. The official architecture description says it combines:
136
+
137
+ - a NaViT-style dynamic-resolution visual encoder
138
+ - the ERNIE-4.5-0.3B language model
139
+
140
+ In theory, OCR here works like a multimodal sequence generation problem:
141
+
142
+ 1. the image crop is encoded into visual tokens
143
+ 2. a text prompt such as `OCR:` conditions the task
144
+ 3. the decoder autoregressively emits the recognized text tokens
145
+
146
+ Koharu's implementation follows that pattern closely:
147
+
148
+ - it loads `PaddleOCR-VL-1.5.gguf` and a separate multimodal projector
149
+ - it injects the image through the llama.cpp multimodal path
150
+ - it prompts with `OCR:`
151
+ - it greedily decodes text for each crop
152
+
153
+ So OCR in Koharu is not a classic CTC-only recognizer. It is a small document-oriented VLM being used in a tightly scoped OCR task.
154
+
155
+ ### Inpainting: why LaMa uses Fourier convolutions
156
+
157
+ [LaMa](https://github.com/advimman/lama) is an inpainting model designed for large masked regions. Its paper title is explicit about the key idea: *Resolution-robust Large Mask Inpainting with Fourier Convolutions*.
158
+
159
+ The important intuition is:
160
+
161
+ - ordinary convolutions are local
162
+ - text removal often needs long-range context from the rest of the bubble or background
163
+ - frequency-domain operations can capture wider context efficiently
164
+
165
+ This is where FFT comes in.
166
+
167
+ #### What FFT means here
168
+
169
+ FFT stands for **Fast Fourier Transform**. It is a fast algorithm for moving between:
170
+
171
+ - the spatial domain, where pixels live
172
+ - the frequency domain, where repeating patterns and large-scale structure are easier to manipulate
173
+
174
+ In Koharu's LaMa port, the `FourierUnit` does exactly that:
175
+
176
+ 1. apply `rfft2` to feature maps
177
+ 2. process the real and imaginary channels with learned `1x1` convolutions
178
+ 3. apply `irfft2` to return to image space
179
+
180
+ Koharu even implements custom `rfft2` and `irfft2` ops for CPU, CUDA, and Metal backends so the same spectral block can run across hardware targets.
181
+
182
+ For manga cleanup, this matters because the missing region is often not just a tiny scratch. It may be an entire speech bubble interior with gradients, screentones, and inked edges. Fourier-style global mixing helps the model preserve larger structures while filling the hole.
183
+
184
+ ## Local LLMs and model type
185
+
186
+ Koharu's local translation path uses GGUF models through `llama.cpp`. In practice, these are usually quantized decoder-only transformers.
187
+
188
+ The theory is standard modern LLM inference:
189
+
190
+ - tokenize the OCR text
191
+ - run masked self-attention over the growing token sequence
192
+ - predict the next token repeatedly until the output is complete
193
+
194
+ The practical trade-off is also standard:
195
+
196
+ - larger models usually translate better
197
+ - smaller quantized models use less VRAM and RAM
198
+ - remote providers trade local privacy for easier access to larger hosted models
199
+
200
+ Koharu keeps the image understanding steps local even when you choose a remote text-generation provider. The remote side only needs the OCR text.
201
+
202
+ ## Koharu-specific implementation notes
203
+
204
+ Some details that are easy to miss if you only read the high-level docs:
205
+
206
+ - the detect stage currently loads `ComicTextDetector::load_segmentation_only(...)`, not the full DBNet-backed detection mode
207
+ - the segmentation mask is refined against the current detected text blocks before inpainting
208
+ - OCR runs on cropped text-block images, not the original whole page
209
+ - the OCR wrapper uses the multimodal llama.cpp path and the task prompt `OCR:`
210
+ - inpainting consumes `doc.segment`, so bad masks lead directly to bad cleanup
211
+ - font prediction is normalized before rendering so near-black and near-white colors snap to cleaner values
212
+
213
+ ## Recommended reading
214
+
215
+ ### Official model and project references
216
+
217
+ - [PP-DocLayoutV3 model card](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3)
218
+ - [PaddleOCR-VL-1.5 model card](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5)
219
+ - [PaddleOCR-VL architecture docs in Hugging Face Transformers](https://huggingface.co/docs/transformers/en/model_doc/paddleocr_vl)
220
+ - [comic-text-detector repository](https://github.com/dmMaze/comic-text-detector)
221
+ - [LaMa repository](https://github.com/advimman/lama)
222
+ - [llama.cpp](https://github.com/ggml-org/llama.cpp)
223
+
224
+ ### Background theory and Wikipedia diagrams
225
+
226
+ These pages are useful when you want the general theory and the overview diagrams before diving into model cards:
227
+
228
+ - [Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform)
229
+ - [Image segmentation](https://en.wikipedia.org/wiki/Image_segmentation)
230
+ - [Optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition)
231
+ - [Transformer (deep learning architecture)](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))
232
+ - [Object detection](https://en.wikipedia.org/wiki/Object_detection)
233
+ - [Inpainting](https://en.wikipedia.org/wiki/Inpainting)
234
+
235
+ Those Wikipedia links are background references. For Koharu-specific behavior and the actual model architecture choices, prefer the official model cards and the source tree.
docs/how-to/build-from-source.md CHANGED
@@ -4,23 +4,101 @@ title: Build From Source
4
 
5
  # Build From Source
6
 
7
- If you do not want to use a release build, you can compile Koharu locally.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ## Prerequisites
10
 
11
  - [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
12
  - [Bun](https://bun.sh/) 1.0 or later
13
 
 
 
 
 
 
 
 
14
  ## Install dependencies
15
 
16
  ```bash
17
  bun install
18
  ```
19
 
20
- ## Build the project
21
 
22
  ```bash
23
  bun run build
24
  ```
25
 
26
- The built binaries will be placed in `target/release`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  # Build From Source
6
 
7
+ If you want to compile Koharu locally instead of using a prebuilt release, start with the repository's Bun wrapper. It matches the normal developer workflow and handles platform-specific setup that a direct Tauri call does not.
8
+
9
+ ## What the build includes
10
+
11
+ A full desktop build includes:
12
+
13
+ - the Rust application in `koharu/`
14
+ - the embedded UI from `ui/`
15
+ - the local HTTP, RPC, and MCP server used by both GUI and headless modes
16
+
17
+ The default desktop build is platform-aware:
18
+
19
+ | Platform | Desktop feature path |
20
+ | --- | --- |
21
+ | Windows | `cuda` |
22
+ | Linux | `cuda` |
23
+ | macOS on Apple Silicon | `metal` |
24
 
25
  ## Prerequisites
26
 
27
  - [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
28
  - [Bun](https://bun.sh/) 1.0 or later
29
 
30
+ For Windows source builds, install:
31
+
32
+ - Visual Studio C++ build tools
33
+ - the CUDA Toolkit if you want the default CUDA-enabled desktop build
34
+
35
+ The repository's `scripts/dev.ts` helper tries to discover `nvcc` and `cl.exe` automatically on Windows before launching Tauri.
36
+
37
  ## Install dependencies
38
 
39
  ```bash
40
  bun install
41
  ```
42
 
43
+ ## Recommended desktop build
44
 
45
  ```bash
46
  bun run build
47
  ```
48
 
49
+ This is the normal source-build path for most users. It runs the repository's Bun helper, which then launches Tauri with the project's expected build flow.
50
+
51
+ On Windows, that wrapper also tries to discover `nvcc` and `cl.exe` automatically before starting the build.
52
+
53
+ The main binaries are written to `target/release`:
54
+
55
+ - `target/release/koharu`
56
+ - `target/release/koharu.exe` on Windows
57
+
58
+ ## Development build
59
+
60
+ If you are actively working on the app instead of producing a release-style binary, use:
61
+
62
+ ```bash
63
+ bun run dev
64
+ ```
65
+
66
+ The dev script launches `tauri dev` and starts the local server on a fixed port so the desktop shell and UI can talk to the same runtime during development.
67
+
68
+ ## Detailed Tauri control
69
+
70
+ If you want to control the Tauri invocation directly instead of going through the wrapper, use:
71
+
72
+ ```bash
73
+ bun tauri build --release --no-bundle
74
+ ```
75
+
76
+ This is closer to the underlying Tauri command and is useful when you want more explicit control over the build invocation.
77
+
78
+ Unlike `bun run build`, this path does not go through the repository's Windows helper that tries to configure CUDA and Visual Studio tooling for you first.
79
+
80
+ ## Direct Rust builds
81
+
82
+ If you only want to build the Rust crate directly and intentionally bypass the Bun and Tauri wrapper, use `bun cargo` rather than calling `cargo` yourself.
83
+
84
+ Examples:
85
+
86
+ ```bash
87
+ # Windows / Linux
88
+ bun cargo build --release -p koharu --features=cuda
89
+
90
+ # macOS Apple Silicon
91
+ bun cargo build --release -p koharu --features=metal
92
+ ```
93
+
94
+ This is useful for lower-level Rust work, but `bun run build` remains the better choice for a normal desktop app build because it preserves the full Tauri packaging flow.
95
+
96
+ ## What happens at runtime after the build
97
+
98
+ Building the app does not bundle every model weight. On first launch, Koharu still needs to:
99
+
100
+ - initialize runtime libraries under the local app data directory
101
+ - download the default vision and OCR models
102
+ - download optional local translation LLMs later when you choose them in Settings
103
+
104
+ If you want to prefetch those dependencies without starting the app, see [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md).
docs/how-to/configure-mcp-clients.md ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Configure MCP Clients
3
+ ---
4
+
5
+ # Configure MCP Clients
6
+
7
+ Koharu exposes a built-in MCP server over local Streamable HTTP. This page shows how to connect MCP clients to it, with detailed setup for Antigravity, Claude Desktop, and Claude Code.
8
+
9
+ ## What Koharu exposes over MCP
10
+
11
+ Koharu's MCP server is the same local runtime used by the desktop app and headless Web UI. In practice, the MCP tools cover:
12
+
13
+ - document loading and inspection
14
+ - image previews for original, segment, inpainted, and rendered layers
15
+ - detect, OCR, inpaint, render, and full pipeline processing
16
+ - LLM model listing, loading, unloading, and translation
17
+ - text-block editing and export
18
+
19
+ That means an MCP client can drive the same manga workflow that Koharu's GUI uses.
20
+
21
+ ## 1. Start Koharu on a stable port
22
+
23
+ Use a fixed port so your MCP client always has the same URL.
24
+
25
+ ```bash
26
+ # macOS / Linux
27
+ koharu --port 9999 --headless
28
+
29
+ # Windows
30
+ koharu.exe --port 9999 --headless
31
+ ```
32
+
33
+ You can also keep the desktop window and still expose MCP:
34
+
35
+ ```bash
36
+ # macOS / Linux
37
+ koharu --port 9999
38
+
39
+ # Windows
40
+ koharu.exe --port 9999
41
+ ```
42
+
43
+ Koharu's MCP endpoint will then be:
44
+
45
+ ```text
46
+ http://127.0.0.1:9999/mcp
47
+ ```
48
+
49
+ Important details:
50
+
51
+ - keep Koharu running while the MCP client is connected
52
+ - Koharu binds to `127.0.0.1` by default, so these examples assume the MCP client is on the same machine
53
+ - no authentication headers are required for the default local setup
54
+
55
+ ## 2. Quick endpoint check
56
+
57
+ Before editing any client config, make sure Koharu is actually running on the expected port.
58
+
59
+ Open:
60
+
61
+ ```text
62
+ http://127.0.0.1:9999/
63
+ ```
64
+
65
+ If the Web UI loads, the local server is up and the MCP endpoint should also exist at `/mcp`.
66
+
67
+ ## Antigravity
68
+
69
+ Antigravity can point directly at Koharu's local MCP URL through its raw MCP config.
70
+
71
+ ### Steps
72
+
73
+ 1. Start Koharu with `--port 9999`.
74
+ 2. Open Antigravity.
75
+ 3. Open the `...` menu at the top of the editor's agent panel.
76
+ 4. Click **Manage MCP Servers**.
77
+ 5. Click **View raw config**.
78
+ 6. Add a `koharu` entry under `mcpServers`.
79
+ 7. Save the config.
80
+ 8. Restart Antigravity if it does not reload the MCP server automatically.
81
+
82
+ ### Example config
83
+
84
+ ```json
85
+ {
86
+ "mcpServers": {
87
+ "koharu": {
88
+ "serverUrl": "http://127.0.0.1:9999/mcp"
89
+ }
90
+ }
91
+ }
92
+ ```
93
+
94
+ If you already have other MCP servers configured, add `koharu` alongside them instead of replacing the whole `mcpServers` object.
95
+
96
+ ### After setup
97
+
98
+ Ask Antigravity something simple first:
99
+
100
+ - `What tools are available from Koharu?`
101
+ - `How many documents are currently loaded in Koharu?`
102
+
103
+ If that works, move on to page actions such as:
104
+
105
+ - `Open C:\\manga\\page-01.png in Koharu and run detect and OCR.`
106
+ - `Show me the segment mask for document 0.`
107
+ - `Run the full pipeline on document 0 and export the rendered page.`
108
+
109
+ ## Claude Desktop
110
+
111
+ Claude Desktop's current local MCP config is command-based. Because Koharu exposes a local HTTP MCP endpoint rather than a packaged desktop extension, the practical config-file path is to use a small bridge process that connects Claude Desktop to `http://127.0.0.1:9999/mcp`.
112
+
113
+ This guide uses `mcp-remote` for that bridge.
114
+
115
+ ### Before you start
116
+
117
+ Make sure one of these is true:
118
+
119
+ - `npx` is already available on your machine
120
+ - Node.js is installed so `npx` can run
121
+
122
+ ### Steps
123
+
124
+ 1. Start Koharu with `--port 9999`.
125
+ 2. Open Claude Desktop.
126
+ 3. Open **Settings**.
127
+ 4. Open the **Developer** section.
128
+ 5. Open the MCP config file from Claude Desktop's built-in editor entry.
129
+ 6. Add a `koharu` server entry.
130
+ 7. Save the file.
131
+ 8. Fully restart Claude Desktop.
132
+
133
+ ### Windows config
134
+
135
+ ```json
136
+ {
137
+ "mcpServers": {
138
+ "koharu": {
139
+ "command": "C:\\Progra~1\\nodejs\\npx.cmd",
140
+ "args": [
141
+ "-y",
142
+ "mcp-remote@latest",
143
+ "http://127.0.0.1:9999/mcp"
144
+ ],
145
+ "env": {}
146
+ }
147
+ }
148
+ }
149
+ ```
150
+
151
+ ### macOS / Linux config
152
+
153
+ ```json
154
+ {
155
+ "mcpServers": {
156
+ "koharu": {
157
+ "command": "npx",
158
+ "args": [
159
+ "-y",
160
+ "mcp-remote@latest",
161
+ "http://127.0.0.1:9999/mcp"
162
+ ],
163
+ "env": {}
164
+ }
165
+ }
166
+ }
167
+ ```
168
+
169
+ Notes:
170
+
171
+ - if you already have other entries in `mcpServers`, add `koharu` without deleting them
172
+ - `mcp-remote@latest` is fetched on first use, so the first startup may need internet access
173
+ - if your Windows Node install is not under `C:\\Program Files\\nodejs`, update the `command` path accordingly
174
+ - Anthropic's current remote-MCP connector flow for Claude Desktop is managed through **Settings > Connectors** for actual remote servers; this page intentionally covers the config-file bridge pattern for Koharu's local `127.0.0.1` endpoint
175
+
176
+ ### After setup
177
+
178
+ Open a new Claude Desktop chat and ask:
179
+
180
+ - `What Koharu MCP tools do you have available?`
181
+ - `Check whether Koharu has any loaded documents.`
182
+
183
+ Then move to actual page work:
184
+
185
+ - `Open D:\\manga\\page-01.png in Koharu.`
186
+ - `Run detect, OCR, inpaint, translate, and render for document 0.`
187
+ - `Show me the rendered output for document 0.`
188
+
189
+ ## Claude Code
190
+
191
+ If by "Claude" you mean Claude Code, the safest setup for Koharu's local `http://127.0.0.1` MCP endpoint is also to use the same stdio bridge pattern.
192
+
193
+ ### Add it to your user config
194
+
195
+ macOS / Linux:
196
+
197
+ ```bash
198
+ claude mcp add-json koharu "{\"type\":\"stdio\",\"command\":\"npx\",\"args\":[\"-y\",\"mcp-remote@latest\",\"http://127.0.0.1:9999/mcp\"],\"env\":{}}" --scope user
199
+ ```
200
+
201
+ This writes the server into Claude Code's MCP configuration for your user account.
202
+
203
+ Windows:
204
+
205
+ ```bash
206
+ claude mcp add-json koharu "{\"type\":\"stdio\",\"command\":\"cmd\",\"args\":[\"/c\",\"npx\",\"-y\",\"mcp-remote@latest\",\"http://127.0.0.1:9999/mcp\"],\"env\":{}}" --scope user
207
+ ```
208
+
209
+ On native Windows, Claude Code's docs explicitly recommend the `cmd /c npx` wrapper for local stdio MCP servers that use `npx`.
210
+
211
+ ### Verify it
212
+
213
+ ```bash
214
+ claude mcp get koharu
215
+ claude mcp list
216
+ ```
217
+
218
+ If you already configured Koharu in Claude Desktop, Claude Code can also import compatible entries from Claude Desktop on supported platforms:
219
+
220
+ ```bash
221
+ claude mcp add-from-claude-desktop --scope user
222
+ ```
223
+
224
+ ## First tasks to try
225
+
226
+ Once the client is connected, these are good first tasks:
227
+
228
+ - ask Koharu for the loaded document count
229
+ - open one page image from disk
230
+ - run detect and OCR only first
231
+ - inspect the segment or rendered layer before running a full export
232
+
233
+ This makes failures easier to diagnose than jumping straight into a full batch pipeline.
234
+
235
+ ## Common mistakes
236
+
237
+ - starting Koharu without `--port`, then trying to connect a client to the wrong port
238
+ - using `http://127.0.0.1:9999/` instead of `http://127.0.0.1:9999/mcp`
239
+ - closing Koharu after adding the client config
240
+ - replacing your entire client config instead of merging a new `koharu` entry
241
+ - expecting Claude Desktop to connect directly to Koharu's HTTP URL through a plain command-less config entry
242
+ - forgetting that Koharu's default local server is only reachable from the same machine
243
+
244
+ ## Related pages
245
+
246
+ - [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
247
+ - [MCP Tools Reference](../reference/mcp-tools.md)
248
+ - [CLI Reference](../reference/cli.md)
249
+ - [Troubleshooting](troubleshooting.md)
250
+
251
+ ## External references
252
+
253
+ - [Claude Code MCP docs](https://code.claude.com/docs/en/mcp)
254
+ - [Claude Help: Building custom connectors via remote MCP servers](https://support.claude.com/en/articles/11503834-building-custom-connectors-via-remote-mcp-servers)
255
+ - [Wolfram support article with current Antigravity and Claude Desktop MCP config examples](https://support.wolfram.com/73463/)
docs/how-to/contributing.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Contributing
3
+ ---
4
+
5
+ # Contributing
6
+
7
+ Koharu accepts contributions to the Rust workspace, the Tauri app shell, the Next.js UI, the ML pipeline, MCP integrations, tests, and documentation.
8
+
9
+ This guide focuses on the current repository workflow so you can make changes that match CI and are easy to review.
10
+
11
+ ## Before you start
12
+
13
+ You should have:
14
+
15
+ - [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
16
+ - [Bun](https://bun.sh/) 1.0 or later
17
+
18
+ On Windows, source builds also expect:
19
+
20
+ - Visual Studio C++ build tools
21
+ - the CUDA Toolkit for the normal CUDA-enabled local build path
22
+
23
+ If you have not built Koharu locally before, read [Build From Source](build-from-source.md) first.
24
+
25
+ ## Repository layout
26
+
27
+ The main top-level areas are:
28
+
29
+ - `koharu/`: the Tauri desktop application shell
30
+ - `koharu-*`: Rust workspace crates for runtime, ML, pipeline, RPC, rendering, PSD export, and types
31
+ - `ui/`: the web UI used inside the desktop shell and headless mode
32
+ - `e2e/`: Playwright end-to-end tests and fixtures
33
+ - `docs/`: the documentation site content
34
+
35
+ If you are not sure where a change belongs:
36
+
37
+ - UI interaction and panels usually live in `ui/`
38
+ - backend APIs, MCP tools, and orchestration usually live in `koharu-rpc/` or `koharu-pipeline/`
39
+ - rendering, OCR, model runtime, and ML-specific logic live in the Rust workspace crates
40
+
41
+ ## Set up the repository
42
+
43
+ Install JS dependencies first:
44
+
45
+ ```bash
46
+ bun install
47
+ ```
48
+
49
+ For a normal local desktop build, use:
50
+
51
+ ```bash
52
+ bun run build
53
+ ```
54
+
55
+ For active development, use:
56
+
57
+ ```bash
58
+ bun run dev
59
+ ```
60
+
61
+ The dev command runs the Tauri app in dev mode and keeps the local server on a fixed port for UI development and e2e tests.
62
+
63
+ ## Use the repo's preferred local commands
64
+
65
+ For local Rust commands, prefer `bun cargo` instead of calling `cargo` directly.
66
+
67
+ Examples:
68
+
69
+ ```bash
70
+ bun cargo fmt -- --check
71
+ bun cargo check
72
+ bun cargo clippy -- -D warnings
73
+ bun cargo test --workspace --tests
74
+ ```
75
+
76
+ For UI formatting, use:
77
+
78
+ ```bash
79
+ bun run format
80
+ ```
81
+
82
+ For docs validation, use:
83
+
84
+ ```bash
85
+ zensical build -c
86
+ ```
87
+
88
+ ## What to run before opening a PR
89
+
90
+ Run the checks that match the area you changed.
91
+
92
+ If you changed Rust code:
93
+
94
+ - `bun cargo fmt -- --check`
95
+ - `bun cargo check`
96
+ - `bun cargo clippy -- -D warnings`
97
+ - `bun cargo test --workspace --tests`
98
+
99
+ If you changed the desktop app or full integration flow:
100
+
101
+ - `bun run build`
102
+
103
+ If you changed the UI or interaction flow:
104
+
105
+ - `bun run format`
106
+ - `bun run test:e2e`
107
+
108
+ If you changed docs:
109
+
110
+ - `zensical build -c`
111
+
112
+ You do not always need to run every command in this list for every PR, but you should run enough to cover the code paths you touched.
113
+
114
+ ## E2E tests
115
+
116
+ Koharu includes Playwright tests under `e2e/`.
117
+
118
+ Run them with:
119
+
120
+ ```bash
121
+ bun run test:e2e
122
+ ```
123
+
124
+ The current Playwright setup starts Koharu through:
125
+
126
+ ```bash
127
+ bun run dev -- --headless
128
+ ```
129
+
130
+ and waits for the local API to come up before running the browser tests.
131
+
132
+ ## Docs changes
133
+
134
+ Docs live in `docs/` and are built by Zensical.
135
+
136
+ When updating docs:
137
+
138
+ - keep instructions aligned with the current implementation
139
+ - prefer concrete commands and real paths over generic advice
140
+ - update navigation in `zensical.toml` if you add a new page
141
+ - build the docs locally with `zensical build -c`
142
+
143
+ ## Pull request expectations
144
+
145
+ A good contribution usually has:
146
+
147
+ - one clear goal
148
+ - code that follows existing patterns instead of introducing a new style unnecessarily
149
+ - tests or validation steps that match the change
150
+ - a PR description that explains what changed and how you verified it
151
+
152
+ Small, focused PRs are easier to review than large mixed changes.
153
+
154
+ If your change affects user-visible behavior, mention:
155
+
156
+ - what the old behavior was
157
+ - what the new behavior is
158
+ - how you tested it
159
+
160
+ ## AI-generated PRs
161
+
162
+ AI-generated contributions are welcome, provided:
163
+
164
+ 1. A human has reviewed the code before opening the PR.
165
+ 2. The submitter understands the changes being made.
166
+
167
+ That rule already exists in the repository's GitHub contribution guidance and remains in effect here as well.
168
+
169
+ ## Related pages
170
+
171
+ - [Build From Source](build-from-source.md)
172
+ - [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
173
+ - [Configure MCP Clients](configure-mcp-clients.md)
174
+ - [Troubleshooting](troubleshooting.md)
docs/how-to/export-and-manage-projects.md CHANGED
@@ -4,26 +4,103 @@ title: Export Pages and Manage Projects
4
 
5
  # Export Pages and Manage Projects
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ## Export rendered output
8
 
9
  Koharu can export the current page as a rendered image.
10
 
11
  Use this when you want a final flattened result for reading, sharing, or publishing.
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ## Export layered PSD files
14
 
15
  Koharu can also export a layered Photoshop PSD.
16
 
17
- PSD export preserves helper layers and writes translated text as editable text layers, which makes final cleanup in Photoshop much easier.
18
 
19
- ## Work with `.khr` project files
20
 
21
- Koharu stores project data in `.khr` files.
 
 
 
 
 
22
 
23
- On Windows, Koharu automatically associates `.khr` files so they can be opened by double-clicking. These files can also be viewed in ways that expose the thumbnails of their contained images.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## When to use each format
26
 
27
- - Rendered image: best for final delivery
28
- - PSD: best for manual cleanup and touch-up work
29
- - `.khr`: best for saving in-progress Koharu projects
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  # Export Pages and Manage Projects
6
 
7
+ Koharu's workflow is page-based. You import image pages, run the pipeline, review text blocks, and then export either flattened output or a layered handoff file for manual finishing.
8
+
9
+ ## Supported page inputs
10
+
11
+ The current import flow is image-based. Koharu accepts:
12
+
13
+ - `.png`
14
+ - `.jpg`
15
+ - `.jpeg`
16
+ - `.webp`
17
+
18
+ Folder import recursively scans for supported image files and ignores everything else.
19
+
20
  ## Export rendered output
21
 
22
  Koharu can export the current page as a rendered image.
23
 
24
  Use this when you want a final flattened result for reading, sharing, or publishing.
25
 
26
+ Implementation details:
27
+
28
+ - rendered export uses the page's original image extension when possible
29
+ - Koharu names the exported file with a `_koharu` suffix
30
+ - rendered export requires the page to already have a rendered layer
31
+
32
+ Example output names:
33
+
34
+ - `page-001_koharu.png`
35
+ - `chapter-03_koharu.jpg`
36
+
37
+ ## Export inpainted output
38
+
39
+ Koharu also keeps an inpainted layer in the pipeline, which is useful when you want a cleaned page without translated lettering.
40
+
41
+ This is most useful for:
42
+
43
+ - external lettering workflows
44
+ - cleanup review
45
+ - batch export of text-removed pages
46
+
47
+ When exported, Koharu uses an `_inpainted` filename suffix.
48
+
49
  ## Export layered PSD files
50
 
51
  Koharu can also export a layered Photoshop PSD.
52
 
53
+ PSD export is the handoff format for users who want to keep working in Photoshop or a PSD-compatible editor after the ML pipeline is done.
54
 
55
+ In the current implementation, PSD export uses editable text layers by default and can include:
56
 
57
+ - the original image
58
+ - the inpainted image
59
+ - the segmentation mask
60
+ - the brush layer
61
+ - translated text layers
62
+ - a merged composite image
63
 
64
+ That makes the PSD much more useful than a flat image when you still need to:
65
+
66
+ - tweak wording
67
+ - adjust bubble fit
68
+ - repaint artifacts
69
+ - hide or inspect helper layers
70
+
71
+ Koharu names PSD exports with a `_koharu.psd` suffix.
72
+
73
+ ## PSD export limitations
74
+
75
+ Koharu currently writes classic PSD files, not PSB files. That means very large pages can fail to export.
76
+
77
+ The implementation rejects dimensions above `30000 x 30000`.
78
+
79
+ ## Manage loaded page sets
80
+
81
+ Koharu lets you work with multiple loaded pages in one session.
82
+
83
+ The practical choices are:
84
+
85
+ - open images and replace the current set
86
+ - append more images to the current set
87
+ - open a folder and load its supported image files
88
+ - append a folder to the current set
89
+
90
+ This is the main way to manage a chapter or batch job inside the app today.
91
 
92
  ## When to use each format
93
 
94
+ | Output | Best for |
95
+ | --- | --- |
96
+ | Rendered image | final delivery, reading copies, simple sharing |
97
+ | Inpainted image | external lettering, cleanup review, text-removal workflows |
98
+ | PSD | manual cleanup, touch-up, editable translated text |
99
+
100
+ ## Recommended workflow
101
+
102
+ If you care about polish, a good pattern is:
103
+
104
+ 1. run detection, OCR, translation, and render in Koharu
105
+ 2. export a rendered image for quick review
106
+ 3. export a PSD when you want editable text and helper layers for final cleanup
docs/how-to/index.md CHANGED
@@ -8,7 +8,11 @@ How-to guides focus on specific jobs you may want to complete with Koharu.
8
 
9
  ## Common tasks
10
 
11
- - [Install Koharu](install-koharu.md)
12
- - [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
13
- - [Export Pages and Manage Projects](export-and-manage-projects.md)
14
- - [Build From Source](build-from-source.md)
 
 
 
 
 
8
 
9
  ## Common tasks
10
 
11
+ - [Install Koharu](install-koharu.md): release setup, first-run downloads, and acceleration expectations
12
+ - [Contributing](contributing.md): repository layout, local commands, validation steps, and PR expectations
13
+ - [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md): local deployment patterns and runtime flags
14
+ - [Configure MCP Clients](configure-mcp-clients.md): connect Antigravity, Claude Desktop, or Claude Code to Koharu's local MCP endpoint
15
+ - [Use OpenAI-Compatible APIs](use-openai-compatible-api.md): connect LM Studio, OpenRouter, and other OpenAI-style chat-completions endpoints
16
+ - [Export Pages and Manage Projects](export-and-manage-projects.md): rendered images, PSD handoff, and page-set management
17
+ - [Build From Source](build-from-source.md): local developer build flow with Bun, Tauri, and platform features
18
+ - [Troubleshooting](troubleshooting.md): common startup, download, GPU, pipeline, and connectivity failures
docs/how-to/install-koharu.md CHANGED
@@ -16,16 +16,28 @@ Koharu provides prebuilt binaries for:
16
 
17
  If your platform is not covered by a release build, use [Build From Source](build-from-source.md).
18
 
 
 
 
 
 
 
 
 
 
 
19
  ## First launch expectations
20
 
21
  On first run, Koharu may:
22
 
23
- - extract bundled runtime libraries
24
- - download required vision models
25
- - download local LLMs later when you select them in Settings
26
 
27
  This is normal and can take time depending on your connection and hardware.
28
 
 
 
29
  ## GPU acceleration notes
30
 
31
  Koharu supports:
@@ -35,12 +47,33 @@ Koharu supports:
35
  - Vulkan on Windows and Linux for OCR and LLM inference
36
  - CPU fallback on all platforms
37
 
38
- For CUDA, Koharu bundles CUDA toolkit 13.1, then extracts the required dynamic libraries into the app data directory on first run.
 
 
 
 
 
 
39
 
40
  !!! note
41
 
42
  Keep your NVIDIA driver up to date. Koharu checks for CUDA 13.1 support and falls back to CPU if the driver is too old.
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ## Need help?
45
 
46
  For support, join the [Discord server](https://discord.gg/mHvHkxGnUY).
 
16
 
17
  If your platform is not covered by a release build, use [Build From Source](build-from-source.md).
18
 
19
+ ## What gets installed locally
20
+
21
+ Koharu is a local-first app. In practice, the desktop binary is only part of the installation footprint. The first real run also creates a per-user local data directory for:
22
+
23
+ - runtime libraries used by llama.cpp and GPU backends
24
+ - downloaded vision and OCR models
25
+ - optional local translation models you select later
26
+
27
+ Koharu keeps its own files under a `Koharu` app-data root and stores model weights separately from the application binary.
28
+
29
  ## First launch expectations
30
 
31
  On first run, Koharu may:
32
 
33
+ - extract or download runtime libraries required by the local inference stack
34
+ - download the default vision and OCR models used by detection, segmentation, OCR, inpainting, and font estimation
35
+ - wait to download local translation LLMs until you actually select them in Settings
36
 
37
  This is normal and can take time depending on your connection and hardware.
38
 
39
+ If you want to prefetch those runtime dependencies ahead of time, run Koharu once with `--download`. That path initializes the runtime packages and default vision stack, then exits without opening the GUI.
40
+
41
  ## GPU acceleration notes
42
 
43
  Koharu supports:
 
47
  - Vulkan on Windows and Linux for OCR and LLM inference
48
  - CPU fallback on all platforms
49
 
50
+ Some practical details matter:
51
+
52
+ - detection and inpainting benefit most from CUDA or Metal
53
+ - Vulkan is mainly the fallback GPU path for OCR and local LLM inference
54
+ - if Koharu cannot verify that your NVIDIA driver supports CUDA 13.1, it falls back to CPU
55
+
56
+ For CUDA-capable systems, Koharu bundles and initializes the runtime pieces it needs instead of requiring you to wire every library path by hand.
57
 
58
  !!! note
59
 
60
  Keep your NVIDIA driver up to date. Koharu checks for CUDA 13.1 support and falls back to CPU if the driver is too old.
61
 
62
+ ## After installation
63
+
64
+ Once Koharu launches successfully, the next decisions are usually:
65
+
66
+ - desktop GUI vs headless mode
67
+ - local translation model vs remote provider
68
+ - rendered export vs layered PSD export
69
+
70
+ See:
71
+
72
+ - [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
73
+ - [Models and Providers](../explanation/models-and-providers.md)
74
+ - [Export Pages and Manage Projects](export-and-manage-projects.md)
75
+ - [Troubleshooting](troubleshooting.md)
76
+
77
  ## Need help?
78
 
79
  For support, join the [Discord server](https://discord.gg/mHvHkxGnUY).
docs/how-to/run-gui-headless-and-mcp.md CHANGED
@@ -4,17 +4,37 @@ title: Run GUI, Headless, and MCP Modes
4
 
5
  # Run GUI, Headless, and MCP Modes
6
 
7
- Koharu can run as a normal desktop app, a headless local server with a Web UI, or an MCP server for AI agents.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ## Run the desktop app
10
 
11
  Launch Koharu normally from your installed application.
12
 
 
 
13
  This is the default mode and is the best choice for most users.
14
 
15
  ## Run headless mode
16
 
17
- Headless mode starts the local HTTP server without opening the desktop GUI.
18
 
19
  ```bash
20
  # macOS / Linux
@@ -26,9 +46,11 @@ koharu.exe --port 4000 --headless
26
 
27
  After startup, open the Web UI at `http://localhost:4000`.
28
 
 
 
29
  ## Run with a fixed port
30
 
31
- By default, Koharu uses a random local port. Use `--port` when you need a stable address.
32
 
33
  ```bash
34
  # macOS / Linux
@@ -38,13 +60,40 @@ koharu --port 9999
38
  koharu.exe --port 9999
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ## Connect to the MCP server
42
 
43
- Koharu includes a built-in MCP server. When you run Koharu on a fixed port, point your AI agent at:
 
 
44
 
45
  `http://localhost:9999/mcp`
46
 
47
- Replace `9999` with the port you chose.
 
 
 
 
 
 
 
 
 
48
 
49
  ## Force CPU mode
50
 
@@ -58,9 +107,11 @@ koharu --cpu
58
  koharu.exe --cpu
59
  ```
60
 
 
 
61
  ## Download runtime dependencies only
62
 
63
- Use `--download` if you want Koharu to fetch runtime packages and exit without starting the app.
64
 
65
  ```bash
66
  # macOS / Linux
@@ -69,3 +120,24 @@ koharu --download
69
  # Windows
70
  koharu.exe --download
71
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  # Run GUI, Headless, and MCP Modes
6
 
7
+ Koharu can run as a normal desktop app, a headless local server with a Web UI, or an MCP server for AI agents. These are not separate backends. They all sit on top of the same local runtime and HTTP server.
8
+
9
+ ## What stays the same across modes
10
+
11
+ No matter how you launch Koharu, the runtime model is the same:
12
+
13
+ - the server binds to `127.0.0.1`
14
+ - the UI and API are served by the same local process
15
+ - the page pipeline, model loading, and exports use the same internal code paths
16
+
17
+ That is why desktop editing, headless automation, and MCP tooling stay aligned.
18
+
19
+ ## Mode summary
20
+
21
+ | Mode | Desktop window | Local server | Typical use |
22
+ | --- | --- | --- | --- |
23
+ | Desktop | yes | yes | normal interactive editing |
24
+ | Headless | no | yes | local Web UI, scripting, automation |
25
+ | MCP | optional | yes | agent tooling through `/mcp` |
26
 
27
  ## Run the desktop app
28
 
29
  Launch Koharu normally from your installed application.
30
 
31
+ Even in desktop mode, Koharu still starts a local HTTP server internally. The embedded window talks to that local server rather than calling the pipeline directly.
32
+
33
  This is the default mode and is the best choice for most users.
34
 
35
  ## Run headless mode
36
 
37
+ Headless mode starts the local server without opening the desktop GUI.
38
 
39
  ```bash
40
  # macOS / Linux
 
46
 
47
  After startup, open the Web UI at `http://localhost:4000`.
48
 
49
+ Headless mode stays in the foreground until you stop it, typically with `Ctrl+C`.
50
+
51
  ## Run with a fixed port
52
 
53
+ By default, Koharu uses a random local port. Use `--port` when you need a stable address for bookmarks, scripts, reverse proxies, or MCP clients.
54
 
55
  ```bash
56
  # macOS / Linux
 
60
  koharu.exe --port 9999
61
  ```
62
 
63
+ If you do not specify `--port`, Koharu still starts the server, but the chosen port is dynamic.
64
+
65
+ ## Connect to the local API
66
+
67
+ When Koharu is running on a fixed port, the main endpoints are:
68
+
69
+ - Web UI: `http://localhost:9999/`
70
+ - RPC / HTTP API: `http://localhost:9999/api/v1`
71
+ - MCP server: `http://localhost:9999/mcp`
72
+
73
+ Replace `9999` with the port you chose.
74
+
75
+ Because Koharu binds to loopback, these endpoints are local by default. If you want remote access from another machine, you need to expose that port yourself through your own network setup.
76
+
77
+ For endpoint-level details, see [HTTP API Reference](../reference/http-api.md).
78
+
79
  ## Connect to the MCP server
80
 
81
+ Koharu includes a built-in MCP server using the same loaded documents, models, and page pipeline as the rest of the app.
82
+
83
+ Point your MCP client or agent at:
84
 
85
  `http://localhost:9999/mcp`
86
 
87
+ This is useful when you want an agent to:
88
+
89
+ - inspect text blocks
90
+ - run OCR or translation
91
+ - export rendered pages
92
+ - automate review or batch workflows
93
+
94
+ For client-specific setup examples, see [Configure MCP Clients](configure-mcp-clients.md).
95
+
96
+ For the built-in tool list itself, see [MCP Tools Reference](../reference/mcp-tools.md).
97
 
98
  ## Force CPU mode
99
 
 
107
  koharu.exe --cpu
108
  ```
109
 
110
+ This is useful for compatibility testing, driver issues, or low-risk debugging when GPU setup is uncertain.
111
+
112
  ## Download runtime dependencies only
113
 
114
+ Use `--download` if you want Koharu to prefetch runtime dependencies and exit without starting the app.
115
 
116
  ```bash
117
  # macOS / Linux
 
120
  # Windows
121
  koharu.exe --download
122
  ```
123
+
124
+ In the current implementation, this path initializes:
125
+
126
+ - runtime libraries used by the local inference stack
127
+ - the default vision and OCR models
128
+
129
+ It does not predownload every optional local translation LLM. Those are still fetched when you select them in Settings.
130
+
131
+ ## Enable debug output
132
+
133
+ Use `--debug` when you want console-oriented startup with log output.
134
+
135
+ ```bash
136
+ # macOS / Linux
137
+ koharu --debug
138
+
139
+ # Windows
140
+ koharu.exe --debug
141
+ ```
142
+
143
+ On Windows, debug and headless runs also influence how Koharu attaches to or creates a console window.
docs/how-to/troubleshooting.md ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Troubleshooting
3
+ ---
4
+
5
+ # Troubleshooting
6
+
7
+ This page covers the most common Koharu problems that follow from the current implementation: first-run downloads, runtime initialization, GPU fallback, headless and MCP access, pipeline-stage ordering, and source-build setup.
8
+
9
+ ## Before you start
10
+
11
+ When troubleshooting, first identify which layer is failing:
12
+
13
+ - application startup
14
+ - runtime or model downloads
15
+ - GPU acceleration
16
+ - page pipeline stages such as detect, OCR, inpaint, or render
17
+ - headless or MCP connectivity
18
+ - source build and local development
19
+
20
+ That usually narrows the problem quickly.
21
+
22
+ ## Koharu does not start cleanly on first launch
23
+
24
+ Possible causes:
25
+
26
+ - runtime libraries have not finished downloading or extracting yet
27
+ - the first-run model downloads are still in progress
28
+ - the machine is missing local permissions for its app-data directory
29
+ - GPU initialization failed and the app is trying to fall back
30
+
31
+ Try this:
32
+
33
+ 1. wait longer on the very first launch, especially on slower disks or networks
34
+ 2. start Koharu once with `--download` to prefetch runtime dependencies without opening the GUI
35
+ 3. start once with `--cpu` to check whether the problem is GPU-related
36
+ 4. start once with `--debug` to get console-oriented logs
37
+
38
+ ```bash
39
+ # macOS / Linux
40
+ koharu --download
41
+ koharu --cpu
42
+ koharu --debug
43
+
44
+ # Windows
45
+ koharu.exe --download
46
+ koharu.exe --cpu
47
+ koharu.exe --debug
48
+ ```
49
+
50
+ If `--cpu` works and the normal launch does not, the problem is usually in the GPU path rather than the general app startup path.
51
+
52
+ ## Model or runtime downloads fail
53
+
54
+ Koharu needs network access on first use for:
55
+
56
+ - llama.cpp runtime packages
57
+ - GPU runtime support files where applicable
58
+ - the default vision and OCR model stack
59
+ - optional local translation models when selected later
60
+
61
+ Likely causes:
62
+
63
+ - intermittent network failures
64
+ - blocked access to GitHub release assets or model hosting
65
+ - local filesystem permission issues in the app-data directory
66
+
67
+ What to check:
68
+
69
+ - whether GitHub and Hugging Face downloads are reachable from the machine
70
+ - whether retrying `--download` succeeds
71
+ - whether another process or security tool is locking files in the local runtime directory
72
+
73
+ If downloads keep failing, test on a different network first. That quickly distinguishes a machine-local problem from an upstream reachability issue.
74
+
75
+ ## Koharu falls back to CPU even though you have an NVIDIA GPU
76
+
77
+ This is expected when Koharu cannot confirm support for CUDA 13.1.
78
+
79
+ The current runtime behavior is:
80
+
81
+ - detect an NVIDIA driver
82
+ - query driver compatibility
83
+ - continue on CUDA only when the driver reports CUDA 13.1 support
84
+ - otherwise fall back to CPU
85
+
86
+ Try this:
87
+
88
+ 1. update the NVIDIA driver
89
+ 2. restart Koharu after the update
90
+ 3. verify behavior with `--debug`
91
+
92
+ If the driver is old or the CUDA check fails, Koharu deliberately prefers CPU over a partially working CUDA configuration.
93
+
94
+ ## OCR, inpainting, or export says something is missing
95
+
96
+ Some errors are just pipeline ordering problems.
97
+
98
+ Common examples from the current API and MCP layer:
99
+
100
+ - `No segment mask available. Run detect first.`
101
+ - `No rendered image found`
102
+ - `No inpainted image found`
103
+
104
+ These usually mean a required earlier stage has not produced its output yet.
105
+
106
+ Use this order:
107
+
108
+ 1. Detect
109
+ 2. OCR
110
+ 3. Inpaint
111
+ 4. LLM Generate
112
+ 5. Render
113
+ 6. Export
114
+
115
+ If export fails because there is no rendered or inpainted layer, rerun the missing stage instead of retrying export repeatedly.
116
+
117
+ ## Detection or OCR quality is poor on a page
118
+
119
+ Common causes:
120
+
121
+ - low-resolution source images
122
+ - unusual page crops
123
+ - heavy screentones or noisy scans
124
+ - vertical text mixed with difficult artwork
125
+ - badly placed or duplicated text blocks after detection
126
+
127
+ Try this:
128
+
129
+ 1. start from a cleaner page image if possible
130
+ 2. inspect the detected text blocks before translating
131
+ 3. fix obvious bad blocks before running the rest of the pipeline
132
+ 4. rerun later stages after the structural fixes
133
+
134
+ If the structure is wrong, translation quality usually gets worse downstream because OCR and rendering both depend on the block geometry.
135
+
136
+ ## Headless mode starts, but you cannot open the Web UI
137
+
138
+ Check the basics first:
139
+
140
+ - did you pass `--headless`
141
+ - did you choose a fixed port
142
+ - is the process still running
143
+
144
+ Example:
145
+
146
+ ```bash
147
+ koharu --port 4000 --headless
148
+ ```
149
+
150
+ Then open:
151
+
152
+ ```text
153
+ http://localhost:4000
154
+ ```
155
+
156
+ Important implementation detail:
157
+
158
+ - Koharu binds to `127.0.0.1`
159
+
160
+ That means the local Web UI is only available on the same machine unless you expose it yourself through your own networking setup.
161
+
162
+ Also verify that another process is not already using the selected port.
163
+
164
+ ## The MCP client cannot connect
165
+
166
+ Use a fixed port and point the client to:
167
+
168
+ ```text
169
+ http://localhost:9999/mcp
170
+ ```
171
+
172
+ Common mistakes:
173
+
174
+ - using the root URL instead of `/mcp`
175
+ - forgetting `--port`
176
+ - trying to connect after the Koharu process has already exited
177
+ - trying to reach the service from another machine without explicitly exposing the port
178
+
179
+ If normal headless Web UI access works but MCP does not, check the exact URL first. Wrong path selection is more common than server failure.
180
+
181
+ If the client is Antigravity, Claude Desktop, or Claude Code, follow the client-specific setup in [Configure MCP Clients](configure-mcp-clients.md).
182
+
183
+ ## Import appears to do nothing
184
+
185
+ The current documented import flow is image-based. Koharu accepts:
186
+
187
+ - `.png`
188
+ - `.jpg`
189
+ - `.jpeg`
190
+ - `.webp`
191
+
192
+ Folder import recursively filters files to those extensions only.
193
+
194
+ If a folder import seems empty, check whether the folder actually contains supported image files instead of archives, PSDs, or other formats.
195
+
196
+ ## Export fails or gives you the wrong kind of output
197
+
198
+ Use the output type that matches the current pipeline state:
199
+
200
+ - rendered export requires a rendered layer
201
+ - inpainted export requires an inpainted layer
202
+ - PSD export is the best choice when you still want editable text and helper layers
203
+
204
+ Also remember:
205
+
206
+ - rendered exports use a `_koharu` suffix
207
+ - inpainted exports use an `_inpainted` suffix
208
+ - PSD export uses `_koharu.psd`
209
+ - classic PSD export rejects images above `30000 x 30000`
210
+
211
+ If the page is extremely large, resize or split it before expecting PSD export to succeed.
212
+
213
+ ## Source build fails on Windows
214
+
215
+ The Windows build helper expects:
216
+
217
+ - `nvcc` for the default CUDA build path
218
+ - `cl.exe` from Visual Studio C++ tools
219
+
220
+ The Bun wrapper script tries to discover both automatically, but if either one is missing the build can fail before Tauri finishes launching.
221
+
222
+ Use the project wrapper commands:
223
+
224
+ ```bash
225
+ bun install
226
+ bun run build
227
+ ```
228
+
229
+ If you want direct control over the Tauri command, try:
230
+
231
+ ```bash
232
+ bun tauri build --release --no-bundle
233
+ ```
234
+
235
+ If you want lower-level Rust builds, prefer:
236
+
237
+ ```bash
238
+ bun cargo build --release -p koharu --features=cuda
239
+ ```
240
+
241
+ If you only need to confirm that the app works at all, try a CPU-only runtime launch first instead of debugging the full CUDA toolchain immediately.
242
+
243
+ ## Source build fails because of the chosen feature path
244
+
245
+ The desktop build is platform-aware:
246
+
247
+ - Windows and Linux use `cuda`
248
+ - macOS on Apple Silicon uses `metal`
249
+
250
+ If you manually invoke lower-level cargo commands with the wrong feature set for your platform, the build can fail or produce a mismatched binary. Follow the platform examples in [Build From Source](build-from-source.md).
251
+
252
+ ## When to stop debugging locally
253
+
254
+ You have probably isolated the issue enough to report it when:
255
+
256
+ - `--cpu` works but GPU mode does not
257
+ - `--download` consistently fails on a healthy network
258
+ - the same page repeatedly triggers a reproducible pipeline failure
259
+ - headless mode starts but a correct `localhost` URL still fails
260
+
261
+ At that point, collect:
262
+
263
+ - your OS and hardware
264
+ - the exact command you ran
265
+ - whether `--cpu` changes the result
266
+ - the exact error message
267
+ - whether the issue happens on one page or every page
268
+
269
+ ## Related pages
270
+
271
+ - [Install Koharu](install-koharu.md)
272
+ - [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
273
+ - [Configure MCP Clients](configure-mcp-clients.md)
274
+ - [Build From Source](build-from-source.md)
275
+ - [CLI Reference](../reference/cli.md)
276
+ - [Technical Deep Dive](../explanation/technical-deep-dive.md)
docs/how-to/use-openai-compatible-api.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Use OpenAI-Compatible APIs
3
+ ---
4
+
5
+ # Use OpenAI-Compatible APIs
6
+
7
+ Koharu can translate through APIs that follow the OpenAI Chat Completions shape. That includes local servers such as LM Studio and hosted routers such as OpenRouter.
8
+
9
+ This page is specifically about the current OpenAI-compatible path in Koharu. It is different from Koharu's built-in OpenAI, Gemini, Claude, and DeepSeek provider presets.
10
+
11
+ ## What Koharu expects from a compatible endpoint
12
+
13
+ In the current implementation, Koharu expects:
14
+
15
+ - a base URL that points at the API root, usually ending in `/v1`
16
+ - `GET /models` for connection testing
17
+ - `POST /chat/completions` for translation
18
+ - a response that includes `choices[0].message.content`
19
+ - bearer-token authentication when an API key is provided
20
+
21
+ Some implementation details matter:
22
+
23
+ - Koharu trims whitespace and a trailing slash from the base URL before appending `/models` or `/chat/completions`
24
+ - an empty API key is omitted entirely instead of sending an empty `Authorization` header
25
+ - a compatible model only appears in Koharu's LLM picker after both `Base URL` and `Model name` are filled in
26
+ - each configured preset shows up as its own selectable source in the LLM picker
27
+
28
+ That means OpenAI-compatible here really means OpenAI API-compatible, not just "can be used with OpenAI tools in general."
29
+
30
+ ## Where to configure it in Koharu
31
+
32
+ Open **Settings** and scroll to **Local LLM & OpenAI Compatible Providers**.
33
+
34
+ The current UI exposes:
35
+
36
+ - a preset selector: `Ollama`, `LM Studio`, `Preset 1`, `Preset 2`
37
+ - `Base URL`
38
+ - `API Key (optional)`
39
+ - `Model name`
40
+ - `Test Connection`
41
+ - advanced fields for `Temperature`, `Max tokens`, and a custom system prompt
42
+
43
+ `Test Connection` currently calls `/models` with a 5-second timeout and reports whether Koharu connected successfully, how many model IDs the endpoint returned, and the measured latency.
44
+
45
+ ## LM Studio
46
+
47
+ Use the built-in `LM Studio` preset when you want a local model server on the same machine.
48
+
49
+ 1. Start LM Studio's local server.
50
+ 2. In Koharu, open **Settings**.
51
+ 3. Choose the `LM Studio` preset.
52
+ 4. Set `Base URL` to `http://127.0.0.1:1234/v1`.
53
+ 5. Leave `API Key` empty unless you configured authentication in front of LM Studio.
54
+ 6. Enter the exact LM Studio model identifier in `Model name`.
55
+ 7. Click `Test Connection`.
56
+ 8. Open Koharu's LLM picker and select the LM Studio-backed model entry.
57
+
58
+ Notes:
59
+
60
+ - Koharu's default LM Studio preset already uses `http://127.0.0.1:1234/v1`
61
+ - LM Studio's official docs use the same OpenAI-compatible base path on port `1234`
62
+ - Koharu's connection test only shows the model count, not the full model names, so you still need to know the exact model ID you want to use
63
+
64
+ If you are unsure about the model identifier, query LM Studio directly:
65
+
66
+ ```bash
67
+ curl http://127.0.0.1:1234/v1/models
68
+ ```
69
+
70
+ Then copy the `id` field for the model you want.
71
+
72
+ Official references:
73
+
74
+ - [LM Studio OpenAI compatibility docs](https://lmstudio.ai/docs/developer/openai-compat)
75
+ - [LM Studio list models endpoint](https://lmstudio.ai/docs/developer/openai-compat/models)
76
+
77
+ ## OpenRouter
78
+
79
+ Use `Preset 1` or `Preset 2` for hosted OpenAI-compatible services such as OpenRouter. That avoids overwriting the local LM Studio preset.
80
+
81
+ 1. Create an API key in OpenRouter.
82
+ 2. In Koharu, open **Settings**.
83
+ 3. Choose `Preset 1` or `Preset 2`.
84
+ 4. Set `Base URL` to `https://openrouter.ai/api/v1`.
85
+ 5. Paste your OpenRouter API key into `API Key`.
86
+ 6. Enter the exact OpenRouter model ID in `Model name`.
87
+ 7. Click `Test Connection`.
88
+ 8. Select that preset-backed model from Koharu's LLM picker.
89
+
90
+ Important details:
91
+
92
+ - OpenRouter model IDs should include the organization prefix, not just a display name
93
+ - Koharu currently sends standard bearer auth and a normal OpenAI-style chat-completions request body
94
+ - OpenRouter supports extra headers such as `HTTP-Referer` and `X-OpenRouter-Title`, but Koharu does not currently expose fields for those optional headers
95
+
96
+ Official references:
97
+
98
+ - [OpenRouter API overview](https://openrouter.ai/docs/api/reference/overview)
99
+ - [OpenRouter authentication](https://openrouter.ai/docs/api/reference/authentication)
100
+ - [OpenRouter models](https://openrouter.ai/models)
101
+
102
+ ## Other compatible endpoints
103
+
104
+ For other self-hosted or routed APIs, use the same checklist:
105
+
106
+ - use the API root as `Base URL`, not the full `/chat/completions` URL
107
+ - make sure the endpoint supports `GET /models`
108
+ - make sure it supports `POST /chat/completions`
109
+ - use the exact model `id`, not just a marketing name
110
+ - provide an API key if the server requires bearer authentication
111
+
112
+ If the server only implements `Responses` or some custom schema, Koharu's current OpenAI-compatible integration will not work without an adapter or proxy because Koharu currently talks to `chat/completions`.
113
+
114
+ ## How model selection works in practice
115
+
116
+ Koharu does not treat these endpoints as one generic remote bucket. Instead, each configured preset becomes its own LLM entry source.
117
+
118
+ For example:
119
+
120
+ - `LM Studio` can point at a local server
121
+ - `Preset 1` can point at OpenRouter
122
+ - `Preset 2` can point at another self-hosted OpenAI-compatible API
123
+
124
+ That lets you keep multiple compatible backends configured and switch between them from the normal LLM picker.
125
+
126
+ ## Common mistakes
127
+
128
+ - using a base URL without `/v1`
129
+ - pasting the full `/chat/completions` URL into `Base URL`
130
+ - leaving `Model name` empty and expecting the model to appear anyway
131
+ - using a display label instead of the exact API model ID
132
+ - assuming `Test Connection` loads or selects a model for you
133
+ - trying to use an endpoint that only supports the newer `Responses` API
134
+
135
+ ## Related pages
136
+
137
+ - [Models and Providers](../explanation/models-and-providers.md)
138
+ - [Translate Your First Page](../tutorials/translate-your-first-page.md)
139
+ - [Troubleshooting](troubleshooting.md)
docs/reference/cli.md CHANGED
@@ -6,6 +6,13 @@ title: CLI Reference
6
 
7
  This page covers the command-line options exposed by Koharu's desktop binary.
8
 
 
 
 
 
 
 
 
9
  ## Common usage
10
 
11
  ```bash
@@ -20,11 +27,26 @@ koharu.exe [OPTIONS]
20
 
21
  | Option | Meaning |
22
  | --- | --- |
23
- | `-d`, `--download` | Download runtime libraries and exit |
24
  | `--cpu` | Force CPU mode even when a GPU is available |
25
- | `-p`, `--port <PORT>` | Bind the local HTTP server to a specific port |
26
  | `--headless` | Run without starting the desktop GUI |
27
- | `--debug` | Enable debug mode with console output |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Common patterns
30
 
@@ -45,3 +67,21 @@ Download runtime packages ahead of time:
45
  ```bash
46
  koharu --download
47
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  This page covers the command-line options exposed by Koharu's desktop binary.
8
 
9
+ Koharu uses the same binary for:
10
+
11
+ - desktop startup
12
+ - headless local Web UI
13
+ - the local HTTP API
14
+ - the built-in MCP server
15
+
16
  ## Common usage
17
 
18
  ```bash
 
27
 
28
  | Option | Meaning |
29
  | --- | --- |
30
+ | `-d`, `--download` | Prefetch runtime libraries and the default vision and OCR stack, then exit |
31
  | `--cpu` | Force CPU mode even when a GPU is available |
32
+ | `-p`, `--port <PORT>` | Bind the local HTTP server to a specific `127.0.0.1` port instead of a random one |
33
  | `--headless` | Run without starting the desktop GUI |
34
+ | `--debug` | Enable debug-oriented console output |
35
+
36
+ ## Behavior notes
37
+
38
+ Some flags change more than just startup appearance:
39
+
40
+ - without `--port`, Koharu chooses a random local port
41
+ - with `--headless`, Koharu skips the Tauri window but still serves the Web UI and API
42
+ - with `--download`, Koharu exits after dependency prefetch and does not stay running
43
+ - with `--cpu`, both the vision stack and local LLM path avoid GPU acceleration
44
+
45
+ When a fixed port is set, the main local endpoints are:
46
+
47
+ - `http://localhost:<PORT>/`
48
+ - `http://localhost:<PORT>/api/v1`
49
+ - `http://localhost:<PORT>/mcp`
50
 
51
  ## Common patterns
52
 
 
67
  ```bash
68
  koharu --download
69
  ```
70
+
71
+ Run a local MCP endpoint on a stable port:
72
+
73
+ ```bash
74
+ koharu --port 9999
75
+ ```
76
+
77
+ Then connect your MCP client to:
78
+
79
+ ```text
80
+ http://localhost:9999/mcp
81
+ ```
82
+
83
+ Start with explicit debug logging:
84
+
85
+ ```bash
86
+ koharu --debug
87
+ ```
docs/reference/http-api.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: HTTP API Reference
3
+ ---
4
+
5
+ # HTTP API Reference
6
+
7
+ Koharu exposes a local HTTP API under:
8
+
9
+ ```text
10
+ http://127.0.0.1:<PORT>/api/v1
11
+ ```
12
+
13
+ This is the same API used by the desktop UI and headless Web UI.
14
+
15
+ ## Runtime model
16
+
17
+ Important behavior from the current implementation:
18
+
19
+ - the API is served by the same process as the GUI or headless runtime
20
+ - the server binds to `127.0.0.1` by default
21
+ - the API and MCP server share the same loaded documents, models, and pipeline state
22
+ - when no `--port` is provided, Koharu chooses a random local port
23
+
24
+ ## Common response shapes
25
+
26
+ Frequently used types include:
27
+
28
+ - `MetaInfo`: app version and ML device
29
+ - `DocumentSummary`: document id, name, size, revision, layer availability, and text-block count
30
+ - `DocumentDetail`: full document metadata plus text blocks
31
+ - `JobState`: current pipeline job progress
32
+ - `LlmState`: current LLM load state
33
+ - `ImportResult`: imported document count and summaries
34
+ - `ExportResult`: count of exported files
35
+
36
+ ## Endpoints
37
+
38
+ ### Meta and fonts
39
+
40
+ | Method | Path | Purpose |
41
+ | --- | --- | --- |
42
+ | `GET` | `/meta` | get app version and active ML backend |
43
+ | `GET` | `/fonts` | list font families available for rendering |
44
+
45
+ ### Documents
46
+
47
+ | Method | Path | Purpose |
48
+ | --- | --- | --- |
49
+ | `GET` | `/documents` | list loaded documents |
50
+ | `POST` | `/documents/import?mode=replace` | replace the current document set with uploaded images |
51
+ | `POST` | `/documents/import?mode=append` | append uploaded images to the current document set |
52
+ | `GET` | `/documents/{documentId}` | get one document and all text-block metadata |
53
+ | `GET` | `/documents/{documentId}/thumbnail` | get a thumbnail image |
54
+ | `GET` | `/documents/{documentId}/layers/{layer}` | fetch one image layer |
55
+
56
+ The import endpoint uses multipart form data with repeated `files` fields.
57
+
58
+ Document layers currently exposed by the implementation include:
59
+
60
+ - `original`
61
+ - `segment`
62
+ - `inpainted`
63
+ - `brush`
64
+ - `rendered`
65
+
66
+ ### Page pipeline
67
+
68
+ | Method | Path | Purpose |
69
+ | --- | --- | --- |
70
+ | `POST` | `/documents/{documentId}/detect` | detect text blocks and layout |
71
+ | `POST` | `/documents/{documentId}/ocr` | run OCR on detected text blocks |
72
+ | `POST` | `/documents/{documentId}/inpaint` | remove original text using the current mask |
73
+ | `POST` | `/documents/{documentId}/render` | render translated text |
74
+ | `POST` | `/documents/{documentId}/translate` | generate translations for one block or the full page |
75
+ | `PUT` | `/documents/{documentId}/mask-region` | replace or update part of the segmentation mask |
76
+ | `PUT` | `/documents/{documentId}/brush-region` | write a patch into the brush layer |
77
+ | `POST` | `/documents/{documentId}/inpaint-region` | re-inpaint a rectangular region only |
78
+
79
+ Useful request details:
80
+
81
+ - `/render` accepts `textBlockId`, `shaderEffect`, `shaderStroke`, and `fontFamily`
82
+ - `/translate` accepts `textBlockId` and `language`
83
+ - `/mask-region` accepts `data` plus an optional `region`
84
+ - `/brush-region` accepts `data` plus a required `region`
85
+ - `/inpaint-region` accepts a rectangular `region`
86
+
87
+ ## Text blocks
88
+
89
+ | Method | Path | Purpose |
90
+ | --- | --- | --- |
91
+ | `POST` | `/documents/{documentId}/text-blocks` | create a new text block from `x`, `y`, `width`, `height` |
92
+ | `PATCH` | `/documents/{documentId}/text-blocks/{textBlockId}` | patch text, translation, box geometry, or style |
93
+ | `DELETE` | `/documents/{documentId}/text-blocks/{textBlockId}` | remove a text block |
94
+
95
+ The text-block patch shape currently includes:
96
+
97
+ - `text`
98
+ - `translation`
99
+ - `x`
100
+ - `y`
101
+ - `width`
102
+ - `height`
103
+ - `style`
104
+
105
+ `style` can include font families, font size, RGBA color, text alignment, italic and bold flags, and stroke configuration.
106
+
107
+ ## Export
108
+
109
+ | Method | Path | Purpose |
110
+ | --- | --- | --- |
111
+ | `GET` | `/documents/{documentId}/export?layer=rendered` | export one rendered image |
112
+ | `GET` | `/documents/{documentId}/export?layer=inpainted` | export one inpainted image |
113
+ | `GET` | `/documents/{documentId}/export/psd` | export one layered PSD |
114
+ | `POST` | `/exports?layer=rendered` | export all rendered pages |
115
+ | `POST` | `/exports?layer=inpainted` | export all inpainted pages |
116
+
117
+ Single-document export endpoints return binary file content. Bulk export returns JSON with the number of files written.
118
+
119
+ ## LLM control
120
+
121
+ | Method | Path | Purpose |
122
+ | --- | --- | --- |
123
+ | `GET` | `/llm/models` | list local and API-backed translation models |
124
+ | `GET` | `/llm/state` | get the current LLM status |
125
+ | `POST` | `/llm/load` | load a local or API-backed model |
126
+ | `POST` | `/llm/offload` | unload the current model |
127
+ | `POST` | `/llm/ping` | test an OpenAI-compatible base URL |
128
+
129
+ Useful request details:
130
+
131
+ - `/llm/models` accepts optional `language` and `openaiCompatibleBaseUrl` query parameters
132
+ - `/llm/load` accepts `id`, `apiKey`, `baseUrl`, `temperature`, `maxTokens`, and `customSystemPrompt`
133
+ - `/llm/ping` accepts `baseUrl` and optional `apiKey`
134
+
135
+ ## Provider API keys
136
+
137
+ | Method | Path | Purpose |
138
+ | --- | --- | --- |
139
+ | `GET` | `/providers/{provider}/api-key` | read a saved API key for a provider |
140
+ | `PUT` | `/providers/{provider}/api-key` | store or overwrite a provider API key |
141
+
142
+ Current built-in provider ids include:
143
+
144
+ - `openai`
145
+ - `gemini`
146
+ - `claude`
147
+ - `deepseek`
148
+ - `openai-compatible`
149
+
150
+ ## Pipeline jobs
151
+
152
+ | Method | Path | Purpose |
153
+ | --- | --- | --- |
154
+ | `POST` | `/jobs/pipeline` | start a full processing job |
155
+ | `DELETE` | `/jobs/{jobId}` | cancel a running pipeline job |
156
+
157
+ The pipeline job request can include:
158
+
159
+ - `documentId` to target one page, or omit it to process all loaded pages
160
+ - LLM settings such as `llmModelId`, `llmApiKey`, `llmBaseUrl`, `llmTemperature`, `llmMaxTokens`, and `llmCustomSystemPrompt`
161
+ - render settings such as `shaderEffect`, `shaderStroke`, and `fontFamily`
162
+ - `language`
163
+
164
+ ## Events stream
165
+
166
+ Koharu also exposes server-sent events at:
167
+
168
+ ```text
169
+ GET /events
170
+ ```
171
+
172
+ Current event names are:
173
+
174
+ - `snapshot`
175
+ - `documents.changed`
176
+ - `document.changed`
177
+ - `job.changed`
178
+ - `download.changed`
179
+ - `llm.changed`
180
+
181
+ The stream sends an initial `snapshot` event and uses a 15-second keepalive.
182
+
183
+ ## Typical workflow
184
+
185
+ The normal API order for one page is:
186
+
187
+ 1. `POST /documents/import?mode=replace`
188
+ 2. `POST /documents/{documentId}/detect`
189
+ 3. `POST /documents/{documentId}/ocr`
190
+ 4. `POST /llm/load`
191
+ 5. `POST /documents/{documentId}/translate`
192
+ 6. `POST /documents/{documentId}/inpaint`
193
+ 7. `POST /documents/{documentId}/render`
194
+ 8. `GET /documents/{documentId}/export?layer=rendered`
195
+
196
+ If you want agent-oriented access instead of HTTP endpoint orchestration, see [MCP Tools Reference](mcp-tools.md).
docs/reference/index.md CHANGED
@@ -8,5 +8,8 @@ Reference pages collect factual details you may want to look up quickly.
8
 
9
  ## Available references
10
 
11
- - [CLI Reference](cli.md)
12
- - [Keyboard Shortcuts](keyboard-shortcuts.md)
 
 
 
 
8
 
9
  ## Available references
10
 
11
+ - [CLI Reference](cli.md): startup flags, local server behavior, and common runtime patterns
12
+ - [HTTP API Reference](http-api.md): local REST endpoints, event stream names, payloads, and workflow order
13
+ - [MCP Tools Reference](mcp-tools.md): built-in MCP tool names, parameters, and suggested usage flow
14
+ - [Settings Reference](settings.md): appearance, language, provider keys, local-LLM presets, and About page behavior
15
+ - [Keyboard Shortcuts](keyboard-shortcuts.md): the default editor shortcuts currently documented in the UI
docs/reference/mcp-tools.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: MCP Tools Reference
3
+ ---
4
+
5
+ # MCP Tools Reference
6
+
7
+ Koharu exposes MCP tools at:
8
+
9
+ ```text
10
+ http://127.0.0.1:<PORT>/mcp
11
+ ```
12
+
13
+ These tools operate on the same runtime state as the GUI and HTTP API.
14
+
15
+ ## General behavior
16
+
17
+ Important implementation details:
18
+
19
+ - image-based tools can return text plus inline image content
20
+ - `open_documents` replaces the current document set rather than appending
21
+ - `process` starts the full pipeline but does not itself stream progress
22
+ - `llm_load` and `process` currently accept local-model-style parameters and do not expose every HTTP API field
23
+
24
+ ## Inspection tools
25
+
26
+ | Tool | What it does | Key parameters |
27
+ | --- | --- | --- |
28
+ | `app_version` | get the application version | none |
29
+ | `device` | get ML device and GPU-related info | none |
30
+ | `get_documents` | get the number of loaded documents | none |
31
+ | `get_document` | get one document's metadata and text blocks | `index` |
32
+ | `list_font_families` | list available render fonts | none |
33
+ | `llm_list` | list translation models | none |
34
+ | `llm_ready` | check whether an LLM is currently loaded | none |
35
+
36
+ ## Image and block preview tools
37
+
38
+ | Tool | What it does | Key parameters |
39
+ | --- | --- | --- |
40
+ | `view_image` | preview a whole document layer | `index`, `layer`, optional `max_size` |
41
+ | `view_text_block` | preview one cropped text block | `index`, `text_block_index`, optional `layer` |
42
+
43
+ Valid `view_image` layers:
44
+
45
+ - `original`
46
+ - `segment`
47
+ - `inpainted`
48
+ - `rendered`
49
+
50
+ Valid `view_text_block` layers:
51
+
52
+ - `original`
53
+ - `rendered`
54
+
55
+ ## Document and export tools
56
+
57
+ | Tool | What it does | Key parameters |
58
+ | --- | --- | --- |
59
+ | `open_documents` | load image files from disk and replace the current set | `paths` |
60
+ | `export_document` | write the rendered document to disk | `index`, `output_path` |
61
+
62
+ `open_documents` expects filesystem paths, not uploaded file blobs.
63
+
64
+ `export_document` currently exports the rendered image path only. PSD export is available through the HTTP API but does not currently have a dedicated MCP tool.
65
+
66
+ ## Pipeline tools
67
+
68
+ | Tool | What it does | Key parameters |
69
+ | --- | --- | --- |
70
+ | `detect` | run text detection and font prediction | `index` |
71
+ | `ocr` | run OCR on detected blocks | `index` |
72
+ | `inpaint` | remove text using the current mask | `index` |
73
+ | `render` | draw translated text back onto the page | `index`, optional `text_block_index`, `shader_effect`, `font_family` |
74
+ | `process` | start detect -> OCR -> inpaint -> translate -> render | optional `index`, `llm_model_id`, `language`, `shader_effect`, `font_family` |
75
+
76
+ `process` is the coarse-grained convenience tool. If you need more control or easier debugging, use the stage tools separately.
77
+
78
+ ## LLM tools
79
+
80
+ | Tool | What it does | Key parameters |
81
+ | --- | --- | --- |
82
+ | `llm_load` | load a translation model | `id`, optional `temperature`, `max_tokens`, `custom_system_prompt` |
83
+ | `llm_offload` | unload the current model | none |
84
+ | `llm_generate` | translate one block or all blocks | `index`, optional `text_block_index`, `language` |
85
+
86
+ `llm_generate` expects an LLM to already be loaded.
87
+
88
+ ## Text-block editing tools
89
+
90
+ | Tool | What it does | Key parameters |
91
+ | --- | --- | --- |
92
+ | `update_text_block` | patch text, translation, box geometry, or style | `index`, `text_block_index`, optional text and style fields |
93
+ | `add_text_block` | add a new empty text block | `index`, `x`, `y`, `width`, `height` |
94
+ | `remove_text_block` | remove one text block | `index`, `text_block_index` |
95
+
96
+ The current update tool can change:
97
+
98
+ - `translation`
99
+ - `x`
100
+ - `y`
101
+ - `width`
102
+ - `height`
103
+ - `font_families`
104
+ - `font_size`
105
+ - `color`
106
+ - `shader_effect`
107
+
108
+ ## Mask and cleanup tools
109
+
110
+ | Tool | What it does | Key parameters |
111
+ | --- | --- | --- |
112
+ | `dilate_mask` | expand the current text mask | `index`, `radius` |
113
+ | `erode_mask` | shrink the current text mask | `index`, `radius` |
114
+ | `inpaint_region` | re-inpaint a specific rectangle only | `index`, `x`, `y`, `width`, `height` |
115
+
116
+ These are useful when the automatic segmentation mask is close but still needs manual cleanup.
117
+
118
+ ## Suggested prompt flow
119
+
120
+ For reliable agent behavior, this sequence works well:
121
+
122
+ 1. `open_documents`
123
+ 2. `get_documents`
124
+ 3. `detect`
125
+ 4. `ocr`
126
+ 5. `get_document`
127
+ 6. `llm_load`
128
+ 7. `llm_generate`
129
+ 8. `inpaint`
130
+ 9. `render`
131
+ 10. `view_image`
132
+ 11. `export_document`
133
+
134
+ If you need to inspect a problem block, use `view_text_block` before asking the agent to patch layout or translation.
135
+
136
+ ## Related pages
137
+
138
+ - [Configure MCP Clients](../how-to/configure-mcp-clients.md)
139
+ - [Run GUI, Headless, and MCP Modes](../how-to/run-gui-headless-and-mcp.md)
140
+ - [HTTP API Reference](http-api.md)
docs/reference/settings.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Settings Reference
3
+ ---
4
+
5
+ # Settings Reference
6
+
7
+ Koharu's Settings screen exposes appearance, language, device, provider, and local-LLM configuration. This page documents the current settings surface as implemented in the app.
8
+
9
+ ## Appearance
10
+
11
+ Theme options:
12
+
13
+ - `Light`
14
+ - `Dark`
15
+ - `System`
16
+
17
+ The app uses the selected theme immediately through the frontend theme provider.
18
+
19
+ ## Language
20
+
21
+ The current UI locale list comes from the bundled translation resources.
22
+
23
+ Currently shipped locales are:
24
+
25
+ - `en-US`
26
+ - `es-ES`
27
+ - `ja-JP`
28
+ - `ru-RU`
29
+ - `zh-CN`
30
+ - `zh-TW`
31
+
32
+ Changing the UI language updates the frontend locale and also influences language-aware LLM model listing in the current implementation.
33
+
34
+ ## Device
35
+
36
+ The Settings screen shows the current ML compute backend as `ML Compute`.
37
+
38
+ This value comes from the app metadata endpoint and reflects the runtime backend Koharu is currently using, such as CPU or a GPU-backed path.
39
+
40
+ ## API Keys
41
+
42
+ The current built-in provider key section covers:
43
+
44
+ - `OpenAI`
45
+ - `Gemini`
46
+ - `Claude`
47
+ - `DeepSeek`
48
+
49
+ Important behavior:
50
+
51
+ - API keys are stored through the local keyring integration rather than plain frontend storage
52
+ - Gemini is marked as a free-tier provider in the current UI
53
+ - the password-style input is only a visibility toggle in the UI, not a different storage mode
54
+
55
+ ## Local LLM and OpenAI-compatible providers
56
+
57
+ This section is used for local servers such as Ollama and LM Studio, and for custom OpenAI-compatible endpoints.
58
+
59
+ ### Presets
60
+
61
+ Current presets:
62
+
63
+ - `Ollama`
64
+ - `LM Studio`
65
+ - `Preset 1`
66
+ - `Preset 2`
67
+
68
+ Default base URLs:
69
+
70
+ - Ollama: `http://localhost:11434/v1`
71
+ - LM Studio: `http://127.0.0.1:1234/v1`
72
+ - Preset 1: empty until configured
73
+ - Preset 2: empty until configured
74
+
75
+ Each preset stores its own:
76
+
77
+ - `Base URL`
78
+ - `API Key`
79
+ - `Model name`
80
+ - `Temperature`
81
+ - `Max tokens`
82
+ - `Custom system prompt`
83
+
84
+ That lets you keep several compatible backends configured and switch between them from the same settings screen.
85
+
86
+ ### Required fields for the model picker
87
+
88
+ In the current implementation, a preset-backed OpenAI-compatible model only becomes selectable when both of these are filled in:
89
+
90
+ - `Base URL`
91
+ - `Model name`
92
+
93
+ An empty preset does not appear as a usable model entry.
94
+
95
+ ### Advanced fields
96
+
97
+ The expandable advanced section currently exposes:
98
+
99
+ - `Temperature`
100
+ - `Max tokens`
101
+ - `Custom system prompt`
102
+
103
+ Behavior notes:
104
+
105
+ - leaving `Temperature` or `Max tokens` empty sends no override
106
+ - leaving `Custom system prompt` empty uses Koharu's default manga translation system prompt
107
+ - the reset button clears only the custom prompt override for the current preset
108
+
109
+ ### Test Connection
110
+
111
+ `Test Connection` is a connectivity check for the current preset.
112
+
113
+ The current implementation:
114
+
115
+ - sends a request to Koharu's `/llm/ping` path
116
+ - checks the preset `Base URL`
117
+ - optionally includes the preset API key
118
+ - reports success or failure inline
119
+ - shows model count and latency on success
120
+ - uses a 5-second timeout for the underlying compatible-model listing
121
+
122
+ This is a connectivity test, not a model load.
123
+
124
+ ## About page
125
+
126
+ Settings links to a separate About page.
127
+
128
+ The About screen currently shows:
129
+
130
+ - the current app version
131
+ - whether a newer GitHub release exists
132
+ - the author link
133
+ - the repository link
134
+
135
+ In packaged app mode, the version check compares the local app version against the latest GitHub release for `mayocream/koharu`.
136
+
137
+ ## Persistence model
138
+
139
+ The current settings behavior is split across storage layers:
140
+
141
+ - provider API keys are stored through the system keyring
142
+ - local LLM preset config is persisted in Koharu's frontend preferences store
143
+ - theme and other UI preferences also persist locally
144
+
145
+ That means clearing frontend preferences is not the same as clearing saved provider API keys.
146
+
147
+ ## Related pages
148
+
149
+ - [Use OpenAI-Compatible APIs](../how-to/use-openai-compatible-api.md)
150
+ - [Models and Providers](../explanation/models-and-providers.md)
151
+ - [HTTP API Reference](http-api.md)
docs/tutorials/translate-your-first-page.md CHANGED
@@ -4,13 +4,13 @@ title: Translate Your First Page
4
 
5
  # Translate Your First Page
6
 
7
- This tutorial covers the normal Koharu workflow for a single manga page: import, detect, recognize, translate, review, and export.
8
 
9
  ## Before you begin
10
 
11
  - Install Koharu from the latest GitHub release
12
- - Start with a clear manga page image
13
- - Make sure you have enough local VRAM/RAM for your preferred model, or plan to use a remote provider
14
 
15
  If you have not installed Koharu yet, start with [Install Koharu](../how-to/install-koharu.md).
16
 
@@ -18,23 +18,40 @@ If you have not installed Koharu yet, start with [Install Koharu](../how-to/inst
18
 
19
  Open the desktop application normally.
20
 
21
- On the first run, Koharu may download required runtime packages and ML models. This is expected.
22
 
23
  ## 2. Import a page
24
 
25
- Load your manga page into the app.
26
 
27
- Koharu keeps your work inside a project, and on Windows it can associate `.khr` project files so you can reopen them by double-clicking.
 
 
 
 
 
 
 
28
 
29
  ## 3. Detect text and run OCR
30
 
31
  Use Koharu's built-in vision pipeline to:
32
 
33
- - detect speech bubbles and text regions
34
- - segment text areas
35
- - recognize the original text with OCR
 
 
 
 
 
36
 
37
- At this point, review the detected blocks and clean up anything obvious before translation.
 
 
 
 
 
38
 
39
  ## 4. Choose a translation backend
40
 
@@ -45,28 +62,56 @@ Pick either:
45
 
46
  Koharu can use OpenAI, Gemini, Claude, DeepSeek, and OpenAI-compatible endpoints such as LM Studio or OpenRouter.
47
 
 
 
 
 
 
 
 
 
48
  ## 5. Translate and review
49
 
50
  Run translation on the page, then inspect the result carefully.
51
 
52
- Koharu helps with text layout and vertical CJK rendering, but you should still review:
53
 
54
  - names and terminology
55
- - line breaks
56
- - font choices
57
- - bubble fit
 
 
 
58
 
59
  ## 6. Export the result
60
 
61
- When the page looks right, export it as either:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
- - a rendered image
64
- - a layered Photoshop PSD with editable text layers
 
 
65
 
66
- PSD export is useful when you want to do final cleanup in Photoshop without rebuilding the page structure by hand.
67
 
68
  ## Next steps
69
 
70
  - Learn export options: [Export Pages and Manage Projects](../how-to/export-and-manage-projects.md)
71
  - Compare runtime choices: [Acceleration and Runtime](../explanation/acceleration-and-runtime.md)
72
- - Choose a model: [Models and Providers](../explanation/models-and-providers.md)
 
 
4
 
5
  # Translate Your First Page
6
 
7
+ This tutorial walks through the normal Koharu workflow for a single manga page: import, detect, recognize, translate, review, and export.
8
 
9
  ## Before you begin
10
 
11
  - Install Koharu from the latest GitHub release
12
+ - Start with a clear page image in `.png`, `.jpg`, `.jpeg`, or `.webp`
13
+ - Make sure you have enough local VRAM or RAM for your preferred model, or plan to use a remote provider
14
 
15
  If you have not installed Koharu yet, start with [Install Koharu](../how-to/install-koharu.md).
16
 
 
18
 
19
  Open the desktop application normally.
20
 
21
+ On the first run, Koharu may spend time initializing local runtime packages and downloading the default vision stack. This is expected and usually only happens once per machine or runtime update.
22
 
23
  ## 2. Import a page
24
 
25
+ Load your page image into the app.
26
 
27
+ At the moment, the documented import flow is image-based rather than project-file based. If you import a folder instead of a single file, Koharu recursively filters it down to supported image files.
28
+
29
+ For a first pass, use one clean page so it is easy to judge:
30
+
31
+ - text detection quality
32
+ - OCR quality
33
+ - translation quality
34
+ - final bubble fit
35
 
36
  ## 3. Detect text and run OCR
37
 
38
  Use Koharu's built-in vision pipeline to:
39
 
40
+ - detect text-like layout regions
41
+ - build a segmentation mask for cleanup
42
+ - estimate font and color hints
43
+ - recognize the source text with OCR
44
+
45
+ Under the hood, Koharu does not just run OCR on the full page. It first creates text blocks, crops those regions, and then runs OCR on the cropped areas.
46
+
47
+ After detection and OCR, review the page before you translate. Look for:
48
 
49
+ - missed bubbles or captions
50
+ - duplicate or badly placed text blocks
51
+ - obvious OCR errors
52
+ - vertical text that should stay vertical
53
+
54
+ Fixing structural issues before translation usually saves time later.
55
 
56
  ## 4. Choose a translation backend
57
 
 
62
 
63
  Koharu can use OpenAI, Gemini, Claude, DeepSeek, and OpenAI-compatible endpoints such as LM Studio or OpenRouter.
64
 
65
+ If you want to wire up LM Studio, OpenRouter, or another OpenAI-style endpoint, follow [Use OpenAI-Compatible APIs](../how-to/use-openai-compatible-api.md).
66
+
67
+ In practice:
68
+
69
+ - local models are better when privacy and offline use matter most
70
+ - remote models are easier when your machine is memory-constrained
71
+ - when you use a remote provider, Koharu sends OCR text for translation rather than the whole page image
72
+
73
  ## 5. Translate and review
74
 
75
  Run translation on the page, then inspect the result carefully.
76
 
77
+ Koharu helps with text layout and vertical CJK rendering, but the final page still benefits from manual review. Focus on:
78
 
79
  - names and terminology
80
+ - tone and character voice
81
+ - line breaks and bubble fit
82
+ - font choice and stroke readability
83
+ - blocks whose source OCR looked uncertain
84
+
85
+ If a translation reads correctly but still looks cramped, adjust the text block or styling before exporting.
86
 
87
  ## 6. Export the result
88
 
89
+ When the page looks right, export it in the format that matches your next step:
90
+
91
+ - rendered image for a flattened final page
92
+ - PSD for editable text and helper layers
93
+
94
+ Rendered exports are best when the page is finished. PSD export is better when you still want to:
95
+
96
+ - make small wording edits
97
+ - repaint artifacts
98
+ - hide or inspect helper layers
99
+ - finish the page in Photoshop
100
+
101
+ ## 7. If the first result is not good enough
102
+
103
+ The usual fixes are:
104
 
105
+ - rerun detection after adjusting page selection or replacing bad blocks
106
+ - correct OCR or translation text manually
107
+ - switch to a stronger translation model
108
+ - export PSD and finish the page with manual lettering cleanup
109
 
110
+ Koharu works best when you treat the pipeline as a fast first pass, then use manual review where the page needs it.
111
 
112
  ## Next steps
113
 
114
  - Learn export options: [Export Pages and Manage Projects](../how-to/export-and-manage-projects.md)
115
  - Compare runtime choices: [Acceleration and Runtime](../explanation/acceleration-and-runtime.md)
116
+ - Understand the model stack: [Technical Deep Dive](../explanation/technical-deep-dive.md)
117
+ - Choose a translation backend: [Models and Providers](../explanation/models-and-providers.md)
zensical.toml CHANGED
@@ -13,24 +13,32 @@ nav = [
13
  "tutorials/index.md",
14
  "tutorials/translate-your-first-page.md",
15
  ]},
16
- {"How-To Guides" = [
17
- "how-to/index.md",
18
- "how-to/install-koharu.md",
19
- "how-to/run-gui-headless-and-mcp.md",
20
- "how-to/export-and-manage-projects.md",
21
- "how-to/build-from-source.md",
22
- ]},
23
- {"Explanation" = [
24
- "explanation/index.md",
25
- "explanation/how-koharu-works.md",
26
- "explanation/acceleration-and-runtime.md",
27
- "explanation/models-and-providers.md",
28
- ]},
29
- {"Reference" = [
30
- "reference/index.md",
31
- "reference/cli.md",
32
- "reference/keyboard-shortcuts.md",
33
- ]},
 
 
 
 
 
 
 
 
34
  ]
35
 
36
  [project.extra]
 
13
  "tutorials/index.md",
14
  "tutorials/translate-your-first-page.md",
15
  ]},
16
+ {"How-To Guides" = [
17
+ "how-to/index.md",
18
+ "how-to/install-koharu.md",
19
+ "how-to/contributing.md",
20
+ "how-to/run-gui-headless-and-mcp.md",
21
+ "how-to/configure-mcp-clients.md",
22
+ "how-to/use-openai-compatible-api.md",
23
+ "how-to/export-and-manage-projects.md",
24
+ "how-to/build-from-source.md",
25
+ "how-to/troubleshooting.md",
26
+ ]},
27
+ {"Explanation" = [
28
+ "explanation/index.md",
29
+ "explanation/how-koharu-works.md",
30
+ "explanation/technical-deep-dive.md",
31
+ "explanation/acceleration-and-runtime.md",
32
+ "explanation/models-and-providers.md",
33
+ ]},
34
+ {"Reference" = [
35
+ "reference/index.md",
36
+ "reference/cli.md",
37
+ "reference/http-api.md",
38
+ "reference/mcp-tools.md",
39
+ "reference/settings.md",
40
+ "reference/keyboard-shortcuts.md",
41
+ ]},
42
  ]
43
 
44
  [project.extra]