Mayo commited on
docs: more content
Browse files- .github/CONTRIBUTING.md +26 -5
- README.md +231 -209
- docs/explanation/how-koharu-works.md +67 -10
- docs/explanation/index.md +1 -0
- docs/explanation/models-and-providers.md +33 -0
- docs/explanation/technical-deep-dive.md +235 -0
- docs/how-to/build-from-source.md +81 -3
- docs/how-to/configure-mcp-clients.md +255 -0
- docs/how-to/contributing.md +174 -0
- docs/how-to/export-and-manage-projects.md +84 -7
- docs/how-to/index.md +8 -4
- docs/how-to/install-koharu.md +37 -4
- docs/how-to/run-gui-headless-and-mcp.md +78 -6
- docs/how-to/troubleshooting.md +276 -0
- docs/how-to/use-openai-compatible-api.md +139 -0
- docs/reference/cli.md +43 -3
- docs/reference/http-api.md +196 -0
- docs/reference/index.md +5 -2
- docs/reference/mcp-tools.md +140 -0
- docs/reference/settings.md +151 -0
- docs/tutorials/translate-your-first-page.md +64 -19
- zensical.toml +26 -18
.github/CONTRIBUTING.md
CHANGED
|
@@ -1,13 +1,34 @@
|
|
| 1 |
# Contributing
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
## AI-Generated PRs
|
| 9 |
|
| 10 |
AI-generated contributions are welcome, provided:
|
| 11 |
|
| 12 |
-
1. A human has reviewed the code before opening the PR
|
| 13 |
-
2. The submitter understands the changes being made
|
|
|
|
| 1 |
# Contributing
|
| 2 |
|
| 3 |
+
Thanks for contributing to Koharu.
|
| 4 |
|
| 5 |
+
For the full contributor guide, including local setup, validation commands, and docs workflow, see:
|
| 6 |
+
|
| 7 |
+
- [`docs/how-to/contributing.md`](../docs/how-to/contributing.md)
|
| 8 |
+
|
| 9 |
+
In short, contributors should:
|
| 10 |
+
|
| 11 |
+
- follow existing code and UI patterns
|
| 12 |
+
- run the checks that match the area they changed
|
| 13 |
+
- explain what changed and how they verified it in the PR
|
| 14 |
+
|
| 15 |
+
Useful local commands:
|
| 16 |
+
|
| 17 |
+
```bash
|
| 18 |
+
bun install
|
| 19 |
+
bun run build
|
| 20 |
+
bun cargo fmt -- --check
|
| 21 |
+
bun cargo check
|
| 22 |
+
bun cargo clippy -- -D warnings
|
| 23 |
+
bun cargo test --workspace --tests
|
| 24 |
+
bun run format
|
| 25 |
+
bun run test:e2e
|
| 26 |
+
zensical build -c
|
| 27 |
+
```
|
| 28 |
|
| 29 |
## AI-Generated PRs
|
| 30 |
|
| 31 |
AI-generated contributions are welcome, provided:
|
| 32 |
|
| 33 |
+
1. A human has reviewed the code before opening the PR.
|
| 34 |
+
2. The submitter understands the changes being made.
|
README.md
CHANGED
|
@@ -1,209 +1,231 @@
|
|
| 1 |
-
# Koharu
|
| 2 |
-
|
| 3 |
-
[Documentation](https://koharu.rs)
|
| 4 |
-
|
| 5 |
-
ML-powered manga translator, written in **Rust**.
|
| 6 |
-
|
| 7 |
-
Koharu introduces a new workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.
|
| 8 |
-
|
| 9 |
-
Under the hood, Koharu uses [candle](https://github.com/huggingface/candle) and [llama.cpp](https://github.com/ggml-org/llama.cpp) for high-performance inference, and uses [Tauri](https://github.com/tauri-apps/tauri) for the GUI. All components are written in Rust, ensuring safety and speed.
|
| 10 |
-
|
| 11 |
-
> [!NOTE]
|
| 12 |
-
> Koharu runs its vision models and local LLMs **locally** on your machine by default. If you choose a remote LLM provider, Koharu sends translation text only to the provider you configured. Koharu itself does not collect user data.
|
| 13 |
-
|
| 14 |
-
---
|
| 15 |
-
|
| 16 |
-

|
| 17 |
-
|
| 18 |
-
> [!NOTE]
|
| 19 |
-
> For help and support,
|
| 20 |
-
|
| 21 |
-
## Features
|
| 22 |
-
|
| 23 |
-
- Automatic speech bubble detection and segmentation
|
| 24 |
-
- OCR for manga text recognition
|
| 25 |
-
- Inpainting to remove original text from images
|
| 26 |
-
- LLM-powered translation
|
| 27 |
-
- Vertical text layout for CJK languages
|
| 28 |
-
- Export to layered PSD with editable text
|
| 29 |
-
- MCP server for
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
##
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
- <kbd>
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
#
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
#
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
- [
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
##
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Koharu
|
| 2 |
+
|
| 3 |
+
[Documentation](https://koharu.rs)
|
| 4 |
+
|
| 5 |
+
ML-powered manga translator, written in **Rust**.
|
| 6 |
+
|
| 7 |
+
Koharu introduces a new workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.
|
| 8 |
+
|
| 9 |
+
Under the hood, Koharu uses [candle](https://github.com/huggingface/candle) and [llama.cpp](https://github.com/ggml-org/llama.cpp) for high-performance inference, and uses [Tauri](https://github.com/tauri-apps/tauri) for the GUI. All components are written in Rust, ensuring safety and speed.
|
| 10 |
+
|
| 11 |
+
> [!NOTE]
|
| 12 |
+
> Koharu runs its vision models and local LLMs **locally** on your machine by default. If you choose a remote LLM provider, Koharu sends translation text only to the provider you configured. Koharu itself does not collect user data.
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+

|
| 17 |
+
|
| 18 |
+
> [!NOTE]
|
| 19 |
+
> For help and support, join the [Discord server](https://discord.gg/mHvHkxGnUY).
|
| 20 |
+
|
| 21 |
+
## Features
|
| 22 |
+
|
| 23 |
+
- Automatic speech bubble detection and segmentation
|
| 24 |
+
- OCR for manga text recognition
|
| 25 |
+
- Inpainting to remove original text from images
|
| 26 |
+
- LLM-powered translation
|
| 27 |
+
- Vertical text layout for CJK languages
|
| 28 |
+
- Export to layered PSD with editable text
|
| 29 |
+
- Local HTTP API and MCP server for automation
|
| 30 |
+
|
| 31 |
+
If you just want to get started, see [Install Koharu](https://koharu.rs/how-to/install-koharu/) and [Translate Your First Page](https://koharu.rs/tutorials/translate-your-first-page/).
|
| 32 |
+
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
### Hot keys
|
| 36 |
+
|
| 37 |
+
- <kbd>Ctrl</kbd> + Mouse Wheel: Zoom in/out
|
| 38 |
+
- <kbd>Ctrl</kbd> + Drag: Pan the canvas
|
| 39 |
+
- <kbd>Del</kbd>: Delete selected text block
|
| 40 |
+
|
| 41 |
+
### Export
|
| 42 |
+
|
| 43 |
+
Koharu can export the current page as a rendered image or as a layered Photoshop PSD. PSD export preserves helper layers and writes translated text as editable text layers, which makes manual cleanup much easier when the automatic pass gets most of the way there.
|
| 44 |
+
|
| 45 |
+
For export behavior, PSD contents, and file naming, see [Export Pages and Manage Projects](https://koharu.rs/how-to/export-and-manage-projects/).
|
| 46 |
+
|
| 47 |
+
### MCP Server
|
| 48 |
+
|
| 49 |
+
Koharu has a built-in MCP server for AI agents. By default it listens on a random port, but you can pin it with the `--port` flag.
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
# macOS / Linux
|
| 53 |
+
koharu --port 9999
|
| 54 |
+
# Windows
|
| 55 |
+
koharu.exe --port 9999
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Then point your client at `http://localhost:9999/mcp`.
|
| 59 |
+
|
| 60 |
+
For local setup and the available tools, see [Run GUI, Headless, and MCP Modes](https://koharu.rs/how-to/run-gui-headless-and-mcp/), [Configure MCP Clients](https://koharu.rs/how-to/configure-mcp-clients/), and [MCP Tools Reference](https://koharu.rs/reference/mcp-tools/).
|
| 61 |
+
|
| 62 |
+
### Headless Mode
|
| 63 |
+
|
| 64 |
+
Koharu can also run without the desktop window.
|
| 65 |
+
|
| 66 |
+
```bash
|
| 67 |
+
# macOS / Linux
|
| 68 |
+
koharu --port 4000 --headless
|
| 69 |
+
# Windows
|
| 70 |
+
koharu.exe --port 4000 --headless
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
You can then open the web UI at `http://localhost:4000`.
|
| 74 |
+
|
| 75 |
+
For runtime modes, ports, and local endpoints, see [Run GUI, Headless, and MCP Modes](https://koharu.rs/how-to/run-gui-headless-and-mcp/).
|
| 76 |
+
|
| 77 |
+
## GPU acceleration
|
| 78 |
+
|
| 79 |
+
Koharu supports CUDA, Metal, and Vulkan for acceleration. CPU fallback is always available if the accelerated path is unavailable or not worth the trouble on your system.
|
| 80 |
+
|
| 81 |
+
### CUDA (NVIDIA GPUs on Windows)
|
| 82 |
+
|
| 83 |
+
Koharu is built with CUDA support on Windows so it can use NVIDIA GPUs for the full local pipeline.
|
| 84 |
+
|
| 85 |
+
Koharu bundles CUDA Toolkit 13.1. The required DLLs are extracted to the application data directory on first run.
|
| 86 |
+
|
| 87 |
+
> [!NOTE]
|
| 88 |
+
> Make sure you have current NVIDIA drivers installed. You can update them through [NVIDIA App](https://www.nvidia.com/en-us/software/nvidia-app/).
|
| 89 |
+
|
| 90 |
+
#### Supported NVIDIA GPUs
|
| 91 |
+
|
| 92 |
+
Koharu supports NVIDIA GPUs with compute capability 7.5 or higher.
|
| 93 |
+
|
| 94 |
+
If you want to confirm GPU support, see [CUDA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) and the [cuDNN Support Matrix](https://docs.nvidia.com/deeplearning/cudnn/backend/latest/reference/support-matrix.html).
|
| 95 |
+
|
| 96 |
+
### Metal (Apple Silicon on macOS)
|
| 97 |
+
|
| 98 |
+
Koharu supports Metal on Apple Silicon Macs. That gives you local acceleration without any extra setup beyond the normal app install.
|
| 99 |
+
|
| 100 |
+
### Vulkan (Windows and Linux)
|
| 101 |
+
|
| 102 |
+
Koharu also supports Vulkan on Windows and Linux. This path is mainly used for OCR and local LLM inference.
|
| 103 |
+
|
| 104 |
+
Detection and inpainting still depend on CUDA or Metal, so Vulkan is helpful but not a full replacement for the main accelerated path. AMD and Intel GPUs can still benefit from it, but the best all-around experience is still NVIDIA on Windows or Apple Silicon on macOS.
|
| 105 |
+
|
| 106 |
+
### CPU fallback
|
| 107 |
+
|
| 108 |
+
You can always force Koharu to use CPU for inference:
|
| 109 |
+
|
| 110 |
+
```bash
|
| 111 |
+
# macOS / Linux
|
| 112 |
+
koharu --cpu
|
| 113 |
+
# Windows
|
| 114 |
+
koharu.exe --cpu
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
For backend selection, fallback behavior, and model runtime support, see [Acceleration and Runtime](https://koharu.rs/explanation/acceleration-and-runtime/).
|
| 118 |
+
|
| 119 |
+
## ML Models
|
| 120 |
+
|
| 121 |
+
Koharu uses a mix of computer vision and language models rather than trying to solve the whole page with one model.
|
| 122 |
+
|
| 123 |
+
### Computer Vision Models
|
| 124 |
+
|
| 125 |
+
Koharu uses several pre-trained models for different parts of the pipeline:
|
| 126 |
+
|
| 127 |
+
- [PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) for text detection and layout analysis
|
| 128 |
+
- [comic-text-detector](https://huggingface.co/mayocream/comic-text-detector) for text segmentation
|
| 129 |
+
- [PaddleOCR-VL-1.5](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5) for OCR text recognition
|
| 130 |
+
- [lama-manga](https://huggingface.co/mayocream/lama-manga) for inpainting
|
| 131 |
+
- [YuzuMarker.FontDetection](https://huggingface.co/fffonion/yuzumarker-font-detection) for font and color detection
|
| 132 |
+
|
| 133 |
+
The models are downloaded automatically when you run Koharu for the first time.
|
| 134 |
+
|
| 135 |
+
We convert the upstream weights to safetensors format for better compatibility and runtime behavior in Rust. The converted weights are hosted on [Hugging Face](https://huggingface.co/mayocream).
|
| 136 |
+
|
| 137 |
+
For a closer look at the pipeline, see [Models and Providers](https://koharu.rs/explanation/models-and-providers/) and the [Technical Deep Dive](https://koharu.rs/explanation/technical-deep-dive/).
|
| 138 |
+
|
| 139 |
+
### Large Language Models
|
| 140 |
+
|
| 141 |
+
Koharu supports both local and remote LLM backends, and it tries to preselect a sensible model based on your system locale when possible.
|
| 142 |
+
|
| 143 |
+
#### Local LLMs
|
| 144 |
+
|
| 145 |
+
Koharu supports quantized GGUF models through [llama.cpp](https://github.com/ggml-org/llama.cpp). These models run on your machine and are downloaded on demand when you select them in Settings. Supported models and suggested usage:
|
| 146 |
+
|
| 147 |
+
For translating to English:
|
| 148 |
+
|
| 149 |
+
- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0, best when translation quality matters more than speed or memory use
|
| 150 |
+
- [lfm2-350m-enjp-mt](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT-GGUF): very small and easy to run on CPUs or low-memory GPUs, good for quick previews and low-spec machines
|
| 151 |
+
|
| 152 |
+
For translating to Chinese:
|
| 153 |
+
|
| 154 |
+
- [sakura-galtransl-7b-v3.7](https://huggingface.co/SakuraLLM/Sakura-GalTransl-7B-v3.7): around 6.3 GB, a good balance of quality and speed on 8 GB GPUs
|
| 155 |
+
- [sakura-1.5b-qwen2.5-v1.0](https://huggingface.co/shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX): lighter and faster, useful on mid-range GPUs or CPU-only setups
|
| 156 |
+
|
| 157 |
+
For other languages, you can use:
|
| 158 |
+
|
| 159 |
+
- [hunyuan-7b-mt-v1.0](https://huggingface.co/Mungert/Hunyuan-MT-7B-GGUF): around 6.3 GB, with decent multilingual translation quality
|
| 160 |
+
|
| 161 |
+
LLMs are downloaded on demand when you pick a model in Settings. If you are memory-bound, start small. If you have enough VRAM or RAM, the 7B and 8B models usually produce better translations.
|
| 162 |
+
|
| 163 |
+
#### Remote LLMs
|
| 164 |
+
|
| 165 |
+
Koharu can also translate through remote or self-hosted API providers instead of a downloaded local model. Supported remote providers:
|
| 166 |
+
|
| 167 |
+
- OpenAI
|
| 168 |
+
- Gemini
|
| 169 |
+
- Claude
|
| 170 |
+
- DeepSeek
|
| 171 |
+
- OpenAI Compatible, including LM Studio, OpenRouter, or any endpoint that exposes the OpenAI-style `/v1/models` and `/v1/chat/completions` APIs
|
| 172 |
+
|
| 173 |
+
Remote providers are configured in **Settings > API Keys**. OpenAI-compatible providers also need a custom base URL. API keys are optional for local servers such as LM Studio, but usually required for hosted services such as OpenRouter.
|
| 174 |
+
|
| 175 |
+
Use a remote provider if you do not want to download local models, if you want to keep VRAM and RAM usage down, or if you already have a hosted model endpoint. Keep in mind that the OCR text selected for translation is sent to the provider you configured.
|
| 176 |
+
|
| 177 |
+
For LM Studio, OpenRouter, and other OpenAI-style endpoints, see [Use OpenAI-Compatible APIs](https://koharu.rs/how-to/use-openai-compatible-api/). For provider configuration, see [Settings Reference](https://koharu.rs/reference/settings/).
|
| 178 |
+
|
| 179 |
+
## Installation
|
| 180 |
+
|
| 181 |
+
You can download the latest release of Koharu from the [releases page](https://github.com/mayocream/koharu/releases/latest).
|
| 182 |
+
|
| 183 |
+
We provide pre-built binaries for Windows, macOS, and Linux. For the normal install flow, see [Install Koharu](https://koharu.rs/how-to/install-koharu/). If something goes wrong, see [Troubleshooting](https://koharu.rs/how-to/troubleshooting/).
|
| 184 |
+
|
| 185 |
+
## Development
|
| 186 |
+
|
| 187 |
+
To build Koharu from source, follow the steps below.
|
| 188 |
+
|
| 189 |
+
### Prerequisites
|
| 190 |
+
|
| 191 |
+
- [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
|
| 192 |
+
- [Bun](https://bun.sh/) 1.0 or later
|
| 193 |
+
|
| 194 |
+
### Install dependencies
|
| 195 |
+
|
| 196 |
+
```bash
|
| 197 |
+
bun install
|
| 198 |
+
```
|
| 199 |
+
|
| 200 |
+
### Build
|
| 201 |
+
|
| 202 |
+
```bash
|
| 203 |
+
bun run build
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
If you want more direct control over the Tauri build:
|
| 207 |
+
|
| 208 |
+
```bash
|
| 209 |
+
bun tauri build --release --no-bundle
|
| 210 |
+
```
|
| 211 |
+
|
| 212 |
+
The built binaries will be located in `target/release`.
|
| 213 |
+
|
| 214 |
+
For platform-specific build notes, see [Build From Source](https://koharu.rs/how-to/build-from-source/). For the local development workflow, see [Contributing](https://koharu.rs/how-to/contributing/).
|
| 215 |
+
|
| 216 |
+
## Sponsorship
|
| 217 |
+
|
| 218 |
+
If you find Koharu useful, consider sponsoring the project to support its development.
|
| 219 |
+
|
| 220 |
+
- [GitHub Sponsors](https://github.com/sponsors/mayocream)
|
| 221 |
+
- [Patreon](https://www.patreon.com/mayocream)
|
| 222 |
+
|
| 223 |
+
## Contributors
|
| 224 |
+
|
| 225 |
+
<a href="https://github.com/mayocream/koharu/graphs/contributors">
|
| 226 |
+
<img src="https://contrib.rocks/image?repo=mayocream/koharu" />
|
| 227 |
+
</a>
|
| 228 |
+
|
| 229 |
+
## License
|
| 230 |
+
|
| 231 |
+
Koharu is licensed under the [GNU General Public License v3.0](LICENSE).
|
docs/explanation/how-koharu-works.md
CHANGED
|
@@ -4,20 +4,73 @@ title: How Koharu Works
|
|
| 4 |
|
| 5 |
# How Koharu Works
|
| 6 |
|
| 7 |
-
Koharu is built around a
|
| 8 |
|
| 9 |
-
## The
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
| 14 |
-
2. Text region segmentation
|
| 15 |
-
3. OCR text recognition
|
| 16 |
-
4. Inpainting to remove original text
|
| 17 |
-
5. LLM-based translation
|
| 18 |
-
6. Text rendering and export
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Why the stack matters
|
| 23 |
|
|
@@ -36,3 +89,7 @@ By default, Koharu runs:
|
|
| 36 |
- local LLMs locally
|
| 37 |
|
| 38 |
If you configure a remote LLM provider, Koharu sends only the text selected for translation to that provider.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# How Koharu Works
|
| 6 |
|
| 7 |
+
Koharu is built around a page pipeline for manga translation. The user-facing workflow is simple, but the implementation intentionally separates layout, segmentation, OCR, inpainting, translation, and rendering into different stages.
|
| 8 |
|
| 9 |
+
## The pipeline at a glance
|
| 10 |
|
| 11 |
+
```mermaid
|
| 12 |
+
flowchart LR
|
| 13 |
+
A[Input manga page] --> B[Detect stage]
|
| 14 |
+
B --> B1[Layout analysis]
|
| 15 |
+
B --> B2[Segmentation mask]
|
| 16 |
+
B --> B3[Font hints]
|
| 17 |
+
B1 --> C[OCR stage]
|
| 18 |
+
B2 --> D[Inpaint stage]
|
| 19 |
+
C --> E[LLM translation stage]
|
| 20 |
+
D --> F[Render stage]
|
| 21 |
+
E --> F
|
| 22 |
+
F --> G[Localized page or PSD export]
|
| 23 |
+
```
|
| 24 |
|
| 25 |
+
At the public pipeline level, Koharu runs:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
1. `Detect`
|
| 28 |
+
2. `OCR`
|
| 29 |
+
3. `Inpaint`
|
| 30 |
+
4. `LLM Generate`
|
| 31 |
+
5. `Render`
|
| 32 |
+
|
| 33 |
+
The important implementation detail is that `Detect` is already a multi-model stage:
|
| 34 |
+
|
| 35 |
+
- `PP-DocLayoutV3` finds text-like layout regions and reading order.
|
| 36 |
+
- `comic-text-detector` produces a per-pixel text probability map.
|
| 37 |
+
- `YuzuMarker.FontDetection` estimates font and color hints for later rendering.
|
| 38 |
+
|
| 39 |
+
That split is why Koharu can use one model to decide where text belongs on the page and another to decide which exact pixels should be removed.
|
| 40 |
+
|
| 41 |
+
## What each stage produces
|
| 42 |
+
|
| 43 |
+
| Stage | Main models | Main output |
|
| 44 |
+
| --- | --- | --- |
|
| 45 |
+
| Detect | `PP-DocLayoutV3`, `comic-text-detector`, `YuzuMarker.FontDetection` | text blocks, segmentation mask, font hints |
|
| 46 |
+
| OCR | `PaddleOCR-VL-1.5` | recognized source text for each block |
|
| 47 |
+
| Inpaint | `lama-manga` | page with original text removed |
|
| 48 |
+
| LLM Generate | local GGUF LLM or remote provider | translated text |
|
| 49 |
+
| Render | Koharu renderer | final localized page or export |
|
| 50 |
+
|
| 51 |
+
## Why the stages are separate
|
| 52 |
+
|
| 53 |
+
Manga pages are harder than plain document OCR:
|
| 54 |
+
|
| 55 |
+
- speech bubbles are irregular and often curved
|
| 56 |
+
- Japanese text may be vertical while captions or SFX may be horizontal
|
| 57 |
+
- text can overlap artwork, screentones, speed lines, and panel borders
|
| 58 |
+
- reading order is part of the page structure, not just the raw pixels
|
| 59 |
+
|
| 60 |
+
Because of that, one model is usually not enough. Koharu first estimates layout, then runs OCR on cropped regions, then uses a segmentation mask for cleanup, and only then asks an LLM to translate the text.
|
| 61 |
+
|
| 62 |
+
## The implementation shape
|
| 63 |
+
|
| 64 |
+
In source terms, the pipeline entrypoint runs in `koharu-pipeline/src/pipeline.rs`, while the vision stack is coordinated in `koharu-ml/src/facade.rs`.
|
| 65 |
+
|
| 66 |
+
Some implementation details that matter:
|
| 67 |
+
|
| 68 |
+
- the detect stage uses `PP-DocLayoutV3` first and converts text-like layout labels into `TextBlock` objects
|
| 69 |
+
- overlapping boxes are deduplicated before OCR
|
| 70 |
+
- text direction is inferred from region aspect ratio so vertical manga text can be handled earlier
|
| 71 |
+
- OCR runs on cropped text regions, not on the full page
|
| 72 |
+
- inpainting consumes the current segmentation mask, not just a rectangular box
|
| 73 |
+
- when you choose a remote LLM provider, Koharu sends OCR text for translation, not the full page image
|
| 74 |
|
| 75 |
## Why the stack matters
|
| 76 |
|
|
|
|
| 89 |
- local LLMs locally
|
| 90 |
|
| 91 |
If you configure a remote LLM provider, Koharu sends only the text selected for translation to that provider.
|
| 92 |
+
|
| 93 |
+
## Want the deep technical version?
|
| 94 |
+
|
| 95 |
+
See [Technical Deep Dive](technical-deep-dive.md) for model types, segmentation mask theory, FFT-based inpainting, and background references to Wikipedia diagrams plus official model cards.
|
docs/explanation/index.md
CHANGED
|
@@ -9,5 +9,6 @@ Explanation pages describe how Koharu is put together and why it behaves the way
|
|
| 9 |
## Topics
|
| 10 |
|
| 11 |
- [How Koharu Works](how-koharu-works.md)
|
|
|
|
| 12 |
- [Acceleration and Runtime](acceleration-and-runtime.md)
|
| 13 |
- [Models and Providers](models-and-providers.md)
|
|
|
|
| 9 |
## Topics
|
| 10 |
|
| 11 |
- [How Koharu Works](how-koharu-works.md)
|
| 12 |
+
- [Technical Deep Dive](technical-deep-dive.md)
|
| 13 |
- [Acceleration and Runtime](acceleration-and-runtime.md)
|
| 14 |
- [Models and Providers](models-and-providers.md)
|
docs/explanation/models-and-providers.md
CHANGED
|
@@ -6,6 +6,8 @@ title: Models and Providers
|
|
| 6 |
|
| 7 |
Koharu uses both vision models and language models. The vision stack prepares the page; the language stack handles translation.
|
| 8 |
|
|
|
|
|
|
|
| 9 |
## Vision models
|
| 10 |
|
| 11 |
Koharu automatically downloads the required vision models when you use them for the first time.
|
|
@@ -20,10 +22,29 @@ The default stack includes:
|
|
| 20 |
|
| 21 |
Converted model weights are hosted on [Hugging Face](https://huggingface.co/mayocream) in safetensors format for Rust compatibility and performance.
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
## Local LLMs
|
| 24 |
|
| 25 |
Koharu supports local GGUF models through [llama.cpp](https://github.com/ggml-org/llama.cpp). These models run on your machine and are downloaded on demand when you select them in the LLM picker.
|
| 26 |
|
|
|
|
|
|
|
| 27 |
### Suggested local models for English output
|
| 28 |
|
| 29 |
- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0 form, best when translation quality matters most
|
|
@@ -52,6 +73,8 @@ Supported providers include:
|
|
| 52 |
|
| 53 |
Remote providers are configured in **Settings > API Keys**.
|
| 54 |
|
|
|
|
|
|
|
| 55 |
## Choosing between local and remote
|
| 56 |
|
| 57 |
Use local models when you want:
|
|
@@ -69,3 +92,13 @@ Use remote providers when you want:
|
|
| 69 |
!!! note
|
| 70 |
|
| 71 |
When you use a remote provider, Koharu sends OCR text selected for translation to the provider you configured.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
Koharu uses both vision models and language models. The vision stack prepares the page; the language stack handles translation.
|
| 8 |
|
| 9 |
+
If you want the architecture-level explanation of how these pieces fit together, read [Technical Deep Dive](technical-deep-dive.md) after this page.
|
| 10 |
+
|
| 11 |
## Vision models
|
| 12 |
|
| 13 |
Koharu automatically downloads the required vision models when you use them for the first time.
|
|
|
|
| 22 |
|
| 23 |
Converted model weights are hosted on [Hugging Face](https://huggingface.co/mayocream) in safetensors format for Rust compatibility and performance.
|
| 24 |
|
| 25 |
+
### What each vision model is
|
| 26 |
+
|
| 27 |
+
| Model | Model type | Why Koharu uses it |
|
| 28 |
+
| --- | --- | --- |
|
| 29 |
+
| `PP-DocLayoutV3` | layout detector | finds text-like regions and reading order |
|
| 30 |
+
| `comic-text-detector` | segmentation network | produces a text mask for cleanup |
|
| 31 |
+
| `PaddleOCR-VL-1.5` | vision-language model | reads cropped text into text tokens |
|
| 32 |
+
| `lama-manga` | inpainting network | reconstructs the image after text removal |
|
| 33 |
+
| `YuzuMarker.FontDetection` | classifier / regressor | estimates font and style hints for rendering |
|
| 34 |
+
|
| 35 |
+
The important design choice is that Koharu does not use a single model for every page task. Layout, segmentation, OCR, and inpainting all need different output shapes:
|
| 36 |
+
|
| 37 |
+
- layout wants regions and order
|
| 38 |
+
- segmentation wants per-pixel masks
|
| 39 |
+
- OCR wants text
|
| 40 |
+
- inpainting wants restored pixels
|
| 41 |
+
|
| 42 |
## Local LLMs
|
| 43 |
|
| 44 |
Koharu supports local GGUF models through [llama.cpp](https://github.com/ggml-org/llama.cpp). These models run on your machine and are downloaded on demand when you select them in the LLM picker.
|
| 45 |
|
| 46 |
+
In practice, the local models are usually quantized decoder-only transformers. GGUF is the file format; `llama.cpp` is the inference runtime.
|
| 47 |
+
|
| 48 |
### Suggested local models for English output
|
| 49 |
|
| 50 |
- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0 form, best when translation quality matters most
|
|
|
|
| 73 |
|
| 74 |
Remote providers are configured in **Settings > API Keys**.
|
| 75 |
|
| 76 |
+
For a step-by-step setup guide for LM Studio, OpenRouter, and similar endpoints, see [Use OpenAI-Compatible APIs](../how-to/use-openai-compatible-api.md).
|
| 77 |
+
|
| 78 |
## Choosing between local and remote
|
| 79 |
|
| 80 |
Use local models when you want:
|
|
|
|
| 92 |
!!! note
|
| 93 |
|
| 94 |
When you use a remote provider, Koharu sends OCR text selected for translation to the provider you configured.
|
| 95 |
+
|
| 96 |
+
## Background reading
|
| 97 |
+
|
| 98 |
+
For theory and diagrams behind the model categories on this page, see:
|
| 99 |
+
|
| 100 |
+
- [Technical Deep Dive](technical-deep-dive.md)
|
| 101 |
+
- [Fourier transform on Wikipedia](https://en.wikipedia.org/wiki/Fourier_transform)
|
| 102 |
+
- [Image segmentation on Wikipedia](https://en.wikipedia.org/wiki/Image_segmentation)
|
| 103 |
+
- [OCR on Wikipedia](https://en.wikipedia.org/wiki/Optical_character_recognition)
|
| 104 |
+
- [Transformer architecture on Wikipedia](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))
|
docs/explanation/technical-deep-dive.md
ADDED
|
@@ -0,0 +1,235 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Technical Deep Dive
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Technical Deep Dive
|
| 6 |
+
|
| 7 |
+
This page explains the technical side of Koharu's manga pipeline: what each model does, how the stages fit together, and why layout analysis, segmentation masks, OCR, inpainting, and translation are handled separately.
|
| 8 |
+
|
| 9 |
+
## The page pipeline in implementation terms
|
| 10 |
+
|
| 11 |
+
```mermaid
|
| 12 |
+
flowchart TD
|
| 13 |
+
A[Input page] --> B[PP-DocLayoutV3]
|
| 14 |
+
A --> C[comic-text-detector]
|
| 15 |
+
B --> D[Text blocks]
|
| 16 |
+
C --> E[Segmentation mask]
|
| 17 |
+
A --> F[YuzuMarker font detector]
|
| 18 |
+
D --> G[PaddleOCR-VL crop OCR]
|
| 19 |
+
E --> H[LaMa inpainting]
|
| 20 |
+
G --> I[Local or remote LLM]
|
| 21 |
+
F --> J[Renderer style hints]
|
| 22 |
+
H --> K[Renderer]
|
| 23 |
+
I --> K
|
| 24 |
+
J --> K
|
| 25 |
+
K --> L[Rendered page / PSD]
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
At the code level, the public pipeline steps are `Detect -> OCR -> Inpaint -> LLM Generate -> Render`, but the detect stage is already doing three distinct jobs:
|
| 29 |
+
|
| 30 |
+
- page layout analysis
|
| 31 |
+
- text foreground segmentation
|
| 32 |
+
- font and color estimation
|
| 33 |
+
|
| 34 |
+
That design is deliberate. A manga translation tool needs both page structure and pixel precision.
|
| 35 |
+
|
| 36 |
+
## Model types at a glance
|
| 37 |
+
|
| 38 |
+
| Component | Default model | Model type | Main job in Koharu |
|
| 39 |
+
| --- | --- | --- | --- |
|
| 40 |
+
| Layout analysis | [PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) | document layout detector | find text-like regions, labels, confidence, and reading order |
|
| 41 |
+
| Segmentation | [comic-text-detector](https://github.com/dmMaze/comic-text-detector) | text segmentation network | produce a dense text mask for cleanup |
|
| 42 |
+
| OCR | [PaddleOCR-VL-1.5](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5) | vision-language model | read cropped text regions into Unicode text |
|
| 43 |
+
| Inpainting | [lama-manga](https://huggingface.co/mayocream/lama-manga) / [LaMa](https://github.com/advimman/lama) | image inpainting network | fill masked regions after text removal |
|
| 44 |
+
| Font hints | [YuzuMarker.FontDetection](https://huggingface.co/fffonion/yuzumarker-font-detection) | image classifier / regressor | estimate font family, colors, and stroke hints |
|
| 45 |
+
| Translation | local GGUF model via [llama.cpp](https://github.com/ggml-org/llama.cpp) or remote API | decoder-only LLM in most local setups | translate OCR text into the target language |
|
| 46 |
+
|
| 47 |
+
## Why layout analysis matters on manga pages
|
| 48 |
+
|
| 49 |
+
Layout analysis is not just "find boxes around text". On manga pages it has to answer several structural questions:
|
| 50 |
+
|
| 51 |
+
- which regions are text-like at all
|
| 52 |
+
- where the reading order probably is
|
| 53 |
+
- whether a block is tall enough to behave like vertical text
|
| 54 |
+
- which boxes should be deduplicated before OCR
|
| 55 |
+
- which parts of the page are captions, bubble text, titles, or other layout categories
|
| 56 |
+
|
| 57 |
+
This matters because manga is visually dense:
|
| 58 |
+
|
| 59 |
+
- speech bubbles are often curved or skewed
|
| 60 |
+
- text may sit on top of screentones and action lines
|
| 61 |
+
- vertical Japanese and horizontal Latin text can coexist on the same page
|
| 62 |
+
- the region that should be read is not always the same shape as the pixels that should be erased
|
| 63 |
+
|
| 64 |
+
Koharu uses layout output to create `TextBlock` records first, then uses those blocks to drive OCR and later rendering.
|
| 65 |
+
|
| 66 |
+
In the current implementation, the layout stage:
|
| 67 |
+
|
| 68 |
+
- runs `PP-DocLayoutV3::inference_one_fast(...)`
|
| 69 |
+
- keeps regions whose labels look text-like
|
| 70 |
+
- converts them into `TextBlock` values
|
| 71 |
+
- deduplicates heavily overlapping regions
|
| 72 |
+
- infers vertical vs horizontal source direction from aspect ratio
|
| 73 |
+
|
| 74 |
+
So layout analysis is the structural backbone of the rest of the pipeline.
|
| 75 |
+
|
| 76 |
+
## What a segmentation mask is
|
| 77 |
+
|
| 78 |
+
A segmentation mask is an image-sized map where each pixel says whether it belongs to a target class. In Koharu's case, the target class is effectively "text foreground that should later be removed during cleanup".
|
| 79 |
+
|
| 80 |
+
This is different from a bounding box:
|
| 81 |
+
|
| 82 |
+
| Representation | What it means | Best used for |
|
| 83 |
+
| --- | --- | --- |
|
| 84 |
+
| Bounding box | coarse rectangular region | OCR crop selection, ordering, UI editing |
|
| 85 |
+
| Polygon | tighter geometric outline | line-level geometry |
|
| 86 |
+
| Segmentation mask | per-pixel foreground map | inpainting and precise cleanup |
|
| 87 |
+
|
| 88 |
+
```mermaid
|
| 89 |
+
flowchart LR
|
| 90 |
+
A[Speech bubble] --> B[Layout box]
|
| 91 |
+
A --> C[Segmentation mask]
|
| 92 |
+
B --> D[Crop for OCR]
|
| 93 |
+
C --> E[Erase exact text pixels]
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
In Koharu, the segmentation path is intentionally separate from layout:
|
| 97 |
+
|
| 98 |
+
- `comic-text-detector` produces a grayscale probability map
|
| 99 |
+
- Koharu refines that map with post-processing
|
| 100 |
+
- the refined result becomes `doc.segment`
|
| 101 |
+
- LaMa then uses `doc.segment` as the erase and fill mask for inpainting
|
| 102 |
+
|
| 103 |
+
The refinement step matters because raw segmentation probabilities are usually soft and noisy. Koharu thresholds the prediction, tries block-aware refinement, and dilates the final binary mask so the cleanup covers text edges and outlines instead of leaving halos behind.
|
| 104 |
+
|
| 105 |
+
## How the vision models work in theory
|
| 106 |
+
|
| 107 |
+
### Layout analysis: detector plus reading-order reasoning
|
| 108 |
+
|
| 109 |
+
[PP-DocLayoutV3](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3) is a layout model built for document parsing under skew, warping, and other non-planar distortions. Its model card highlights two properties that are especially relevant to manga-style pages:
|
| 110 |
+
|
| 111 |
+
- it predicts multi-point geometry instead of only axis-aligned two-point boxes
|
| 112 |
+
- it predicts logical reading order in the same forward pass
|
| 113 |
+
|
| 114 |
+
Koharu's Rust port mirrors that shape: the `pp_doclayout_v3` module contains an `HGNetV2` backbone plus attention-based encoder and decoder blocks, and the inference result exposes `label`, `score`, `bbox`, `polygon_points`, and `order`.
|
| 115 |
+
|
| 116 |
+
Conceptually, this is closer to object detection plus layout parsing than to OCR itself.
|
| 117 |
+
|
| 118 |
+
### Segmentation: dense per-pixel text prediction
|
| 119 |
+
|
| 120 |
+
Koharu's `comic-text-detector` path is a segmentation-first design. The Rust port loads:
|
| 121 |
+
|
| 122 |
+
- a YOLOv5-style backbone
|
| 123 |
+
- a U-Net decoder for mask prediction
|
| 124 |
+
- an optional DBNet head for full detection mode
|
| 125 |
+
|
| 126 |
+
The default page pipeline uses the segmentation-only path because Koharu already gets layout boxes from `PP-DocLayoutV3`. That means Koharu combines:
|
| 127 |
+
|
| 128 |
+
- one model that is good at page structure
|
| 129 |
+
- one model that is good at pixel-level text foreground
|
| 130 |
+
|
| 131 |
+
This is a better fit for cleanup than relying on boxes alone.
|
| 132 |
+
|
| 133 |
+
### OCR: multimodal decoding from image crops to text tokens
|
| 134 |
+
|
| 135 |
+
[PaddleOCR-VL](https://huggingface.co/docs/transformers/en/model_doc/paddleocr_vl) is a compact vision-language model. The official architecture description says it combines:
|
| 136 |
+
|
| 137 |
+
- a NaViT-style dynamic-resolution visual encoder
|
| 138 |
+
- the ERNIE-4.5-0.3B language model
|
| 139 |
+
|
| 140 |
+
In theory, OCR here works like a multimodal sequence generation problem:
|
| 141 |
+
|
| 142 |
+
1. the image crop is encoded into visual tokens
|
| 143 |
+
2. a text prompt such as `OCR:` conditions the task
|
| 144 |
+
3. the decoder autoregressively emits the recognized text tokens
|
| 145 |
+
|
| 146 |
+
Koharu's implementation follows that pattern closely:
|
| 147 |
+
|
| 148 |
+
- it loads `PaddleOCR-VL-1.5.gguf` and a separate multimodal projector
|
| 149 |
+
- it injects the image through the llama.cpp multimodal path
|
| 150 |
+
- it prompts with `OCR:`
|
| 151 |
+
- it greedily decodes text for each crop
|
| 152 |
+
|
| 153 |
+
So OCR in Koharu is not a classic CTC-only recognizer. It is a small document-oriented VLM being used in a tightly scoped OCR task.
|
| 154 |
+
|
| 155 |
+
### Inpainting: why LaMa uses Fourier convolutions
|
| 156 |
+
|
| 157 |
+
[LaMa](https://github.com/advimman/lama) is an inpainting model designed for large masked regions. Its paper title is explicit about the key idea: *Resolution-robust Large Mask Inpainting with Fourier Convolutions*.
|
| 158 |
+
|
| 159 |
+
The important intuition is:
|
| 160 |
+
|
| 161 |
+
- ordinary convolutions are local
|
| 162 |
+
- text removal often needs long-range context from the rest of the bubble or background
|
| 163 |
+
- frequency-domain operations can capture wider context efficiently
|
| 164 |
+
|
| 165 |
+
This is where FFT comes in.
|
| 166 |
+
|
| 167 |
+
#### What FFT means here
|
| 168 |
+
|
| 169 |
+
FFT stands for **Fast Fourier Transform**. It is a fast algorithm for moving between:
|
| 170 |
+
|
| 171 |
+
- the spatial domain, where pixels live
|
| 172 |
+
- the frequency domain, where repeating patterns and large-scale structure are easier to manipulate
|
| 173 |
+
|
| 174 |
+
In Koharu's LaMa port, the `FourierUnit` does exactly that:
|
| 175 |
+
|
| 176 |
+
1. apply `rfft2` to feature maps
|
| 177 |
+
2. process the real and imaginary channels with learned `1x1` convolutions
|
| 178 |
+
3. apply `irfft2` to return to image space
|
| 179 |
+
|
| 180 |
+
Koharu even implements custom `rfft2` and `irfft2` ops for CPU, CUDA, and Metal backends so the same spectral block can run across hardware targets.
|
| 181 |
+
|
| 182 |
+
For manga cleanup, this matters because the missing region is often not just a tiny scratch. It may be an entire speech bubble interior with gradients, screentones, and inked edges. Fourier-style global mixing helps the model preserve larger structures while filling the hole.
|
| 183 |
+
|
| 184 |
+
## Local LLMs and model type
|
| 185 |
+
|
| 186 |
+
Koharu's local translation path uses GGUF models through `llama.cpp`. In practice, these are usually quantized decoder-only transformers.
|
| 187 |
+
|
| 188 |
+
The theory is standard modern LLM inference:
|
| 189 |
+
|
| 190 |
+
- tokenize the OCR text
|
| 191 |
+
- run masked self-attention over the growing token sequence
|
| 192 |
+
- predict the next token repeatedly until the output is complete
|
| 193 |
+
|
| 194 |
+
The practical trade-off is also standard:
|
| 195 |
+
|
| 196 |
+
- larger models usually translate better
|
| 197 |
+
- smaller quantized models use less VRAM and RAM
|
| 198 |
+
- remote providers trade local privacy for easier access to larger hosted models
|
| 199 |
+
|
| 200 |
+
Koharu keeps the image understanding steps local even when you choose a remote text-generation provider. The remote side only needs the OCR text.
|
| 201 |
+
|
| 202 |
+
## Koharu-specific implementation notes
|
| 203 |
+
|
| 204 |
+
Some details that are easy to miss if you only read the high-level docs:
|
| 205 |
+
|
| 206 |
+
- the detect stage currently loads `ComicTextDetector::load_segmentation_only(...)`, not the full DBNet-backed detection mode
|
| 207 |
+
- the segmentation mask is refined against the current detected text blocks before inpainting
|
| 208 |
+
- OCR runs on cropped text-block images, not the original whole page
|
| 209 |
+
- the OCR wrapper uses the multimodal llama.cpp path and the task prompt `OCR:`
|
| 210 |
+
- inpainting consumes `doc.segment`, so bad masks lead directly to bad cleanup
|
| 211 |
+
- font prediction is normalized before rendering so near-black and near-white colors snap to cleaner values
|
| 212 |
+
|
| 213 |
+
## Recommended reading
|
| 214 |
+
|
| 215 |
+
### Official model and project references
|
| 216 |
+
|
| 217 |
+
- [PP-DocLayoutV3 model card](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3)
|
| 218 |
+
- [PaddleOCR-VL-1.5 model card](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5)
|
| 219 |
+
- [PaddleOCR-VL architecture docs in Hugging Face Transformers](https://huggingface.co/docs/transformers/en/model_doc/paddleocr_vl)
|
| 220 |
+
- [comic-text-detector repository](https://github.com/dmMaze/comic-text-detector)
|
| 221 |
+
- [LaMa repository](https://github.com/advimman/lama)
|
| 222 |
+
- [llama.cpp](https://github.com/ggml-org/llama.cpp)
|
| 223 |
+
|
| 224 |
+
### Background theory and Wikipedia diagrams
|
| 225 |
+
|
| 226 |
+
These pages are useful when you want the general theory and the overview diagrams before diving into model cards:
|
| 227 |
+
|
| 228 |
+
- [Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform)
|
| 229 |
+
- [Image segmentation](https://en.wikipedia.org/wiki/Image_segmentation)
|
| 230 |
+
- [Optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition)
|
| 231 |
+
- [Transformer (deep learning architecture)](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))
|
| 232 |
+
- [Object detection](https://en.wikipedia.org/wiki/Object_detection)
|
| 233 |
+
- [Inpainting](https://en.wikipedia.org/wiki/Inpainting)
|
| 234 |
+
|
| 235 |
+
Those Wikipedia links are background references. For Koharu-specific behavior and the actual model architecture choices, prefer the official model cards and the source tree.
|
docs/how-to/build-from-source.md
CHANGED
|
@@ -4,23 +4,101 @@ title: Build From Source
|
|
| 4 |
|
| 5 |
# Build From Source
|
| 6 |
|
| 7 |
-
If you
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Prerequisites
|
| 10 |
|
| 11 |
- [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
|
| 12 |
- [Bun](https://bun.sh/) 1.0 or later
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
## Install dependencies
|
| 15 |
|
| 16 |
```bash
|
| 17 |
bun install
|
| 18 |
```
|
| 19 |
|
| 20 |
-
##
|
| 21 |
|
| 22 |
```bash
|
| 23 |
bun run build
|
| 24 |
```
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# Build From Source
|
| 6 |
|
| 7 |
+
If you want to compile Koharu locally instead of using a prebuilt release, start with the repository's Bun wrapper. It matches the normal developer workflow and handles platform-specific setup that a direct Tauri call does not.
|
| 8 |
+
|
| 9 |
+
## What the build includes
|
| 10 |
+
|
| 11 |
+
A full desktop build includes:
|
| 12 |
+
|
| 13 |
+
- the Rust application in `koharu/`
|
| 14 |
+
- the embedded UI from `ui/`
|
| 15 |
+
- the local HTTP, RPC, and MCP server used by both GUI and headless modes
|
| 16 |
+
|
| 17 |
+
The default desktop build is platform-aware:
|
| 18 |
+
|
| 19 |
+
| Platform | Desktop feature path |
|
| 20 |
+
| --- | --- |
|
| 21 |
+
| Windows | `cuda` |
|
| 22 |
+
| Linux | `cuda` |
|
| 23 |
+
| macOS on Apple Silicon | `metal` |
|
| 24 |
|
| 25 |
## Prerequisites
|
| 26 |
|
| 27 |
- [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
|
| 28 |
- [Bun](https://bun.sh/) 1.0 or later
|
| 29 |
|
| 30 |
+
For Windows source builds, install:
|
| 31 |
+
|
| 32 |
+
- Visual Studio C++ build tools
|
| 33 |
+
- the CUDA Toolkit if you want the default CUDA-enabled desktop build
|
| 34 |
+
|
| 35 |
+
The repository's `scripts/dev.ts` helper tries to discover `nvcc` and `cl.exe` automatically on Windows before launching Tauri.
|
| 36 |
+
|
| 37 |
## Install dependencies
|
| 38 |
|
| 39 |
```bash
|
| 40 |
bun install
|
| 41 |
```
|
| 42 |
|
| 43 |
+
## Recommended desktop build
|
| 44 |
|
| 45 |
```bash
|
| 46 |
bun run build
|
| 47 |
```
|
| 48 |
|
| 49 |
+
This is the normal source-build path for most users. It runs the repository's Bun helper, which then launches Tauri with the project's expected build flow.
|
| 50 |
+
|
| 51 |
+
On Windows, that wrapper also tries to discover `nvcc` and `cl.exe` automatically before starting the build.
|
| 52 |
+
|
| 53 |
+
The main binaries are written to `target/release`:
|
| 54 |
+
|
| 55 |
+
- `target/release/koharu`
|
| 56 |
+
- `target/release/koharu.exe` on Windows
|
| 57 |
+
|
| 58 |
+
## Development build
|
| 59 |
+
|
| 60 |
+
If you are actively working on the app instead of producing a release-style binary, use:
|
| 61 |
+
|
| 62 |
+
```bash
|
| 63 |
+
bun run dev
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
The dev script launches `tauri dev` and starts the local server on a fixed port so the desktop shell and UI can talk to the same runtime during development.
|
| 67 |
+
|
| 68 |
+
## Detailed Tauri control
|
| 69 |
+
|
| 70 |
+
If you want to control the Tauri invocation directly instead of going through the wrapper, use:
|
| 71 |
+
|
| 72 |
+
```bash
|
| 73 |
+
bun tauri build --release --no-bundle
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
This is closer to the underlying Tauri command and is useful when you want more explicit control over the build invocation.
|
| 77 |
+
|
| 78 |
+
Unlike `bun run build`, this path does not go through the repository's Windows helper that tries to configure CUDA and Visual Studio tooling for you first.
|
| 79 |
+
|
| 80 |
+
## Direct Rust builds
|
| 81 |
+
|
| 82 |
+
If you only want to build the Rust crate directly and intentionally bypass the Bun and Tauri wrapper, use `bun cargo` rather than calling `cargo` yourself.
|
| 83 |
+
|
| 84 |
+
Examples:
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
# Windows / Linux
|
| 88 |
+
bun cargo build --release -p koharu --features=cuda
|
| 89 |
+
|
| 90 |
+
# macOS Apple Silicon
|
| 91 |
+
bun cargo build --release -p koharu --features=metal
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
This is useful for lower-level Rust work, but `bun run build` remains the better choice for a normal desktop app build because it preserves the full Tauri packaging flow.
|
| 95 |
+
|
| 96 |
+
## What happens at runtime after the build
|
| 97 |
+
|
| 98 |
+
Building the app does not bundle every model weight. On first launch, Koharu still needs to:
|
| 99 |
+
|
| 100 |
+
- initialize runtime libraries under the local app data directory
|
| 101 |
+
- download the default vision and OCR models
|
| 102 |
+
- download optional local translation LLMs later when you choose them in Settings
|
| 103 |
+
|
| 104 |
+
If you want to prefetch those dependencies without starting the app, see [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md).
|
docs/how-to/configure-mcp-clients.md
ADDED
|
@@ -0,0 +1,255 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Configure MCP Clients
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Configure MCP Clients
|
| 6 |
+
|
| 7 |
+
Koharu exposes a built-in MCP server over local Streamable HTTP. This page shows how to connect MCP clients to it, with detailed setup for Antigravity, Claude Desktop, and Claude Code.
|
| 8 |
+
|
| 9 |
+
## What Koharu exposes over MCP
|
| 10 |
+
|
| 11 |
+
Koharu's MCP server is the same local runtime used by the desktop app and headless Web UI. In practice, the MCP tools cover:
|
| 12 |
+
|
| 13 |
+
- document loading and inspection
|
| 14 |
+
- image previews for original, segment, inpainted, and rendered layers
|
| 15 |
+
- detect, OCR, inpaint, render, and full pipeline processing
|
| 16 |
+
- LLM model listing, loading, unloading, and translation
|
| 17 |
+
- text-block editing and export
|
| 18 |
+
|
| 19 |
+
That means an MCP client can drive the same manga workflow that Koharu's GUI uses.
|
| 20 |
+
|
| 21 |
+
## 1. Start Koharu on a stable port
|
| 22 |
+
|
| 23 |
+
Use a fixed port so your MCP client always has the same URL.
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
# macOS / Linux
|
| 27 |
+
koharu --port 9999 --headless
|
| 28 |
+
|
| 29 |
+
# Windows
|
| 30 |
+
koharu.exe --port 9999 --headless
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
You can also keep the desktop window and still expose MCP:
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
# macOS / Linux
|
| 37 |
+
koharu --port 9999
|
| 38 |
+
|
| 39 |
+
# Windows
|
| 40 |
+
koharu.exe --port 9999
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Koharu's MCP endpoint will then be:
|
| 44 |
+
|
| 45 |
+
```text
|
| 46 |
+
http://127.0.0.1:9999/mcp
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Important details:
|
| 50 |
+
|
| 51 |
+
- keep Koharu running while the MCP client is connected
|
| 52 |
+
- Koharu binds to `127.0.0.1` by default, so these examples assume the MCP client is on the same machine
|
| 53 |
+
- no authentication headers are required for the default local setup
|
| 54 |
+
|
| 55 |
+
## 2. Quick endpoint check
|
| 56 |
+
|
| 57 |
+
Before editing any client config, make sure Koharu is actually running on the expected port.
|
| 58 |
+
|
| 59 |
+
Open:
|
| 60 |
+
|
| 61 |
+
```text
|
| 62 |
+
http://127.0.0.1:9999/
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
If the Web UI loads, the local server is up and the MCP endpoint should also exist at `/mcp`.
|
| 66 |
+
|
| 67 |
+
## Antigravity
|
| 68 |
+
|
| 69 |
+
Antigravity can point directly at Koharu's local MCP URL through its raw MCP config.
|
| 70 |
+
|
| 71 |
+
### Steps
|
| 72 |
+
|
| 73 |
+
1. Start Koharu with `--port 9999`.
|
| 74 |
+
2. Open Antigravity.
|
| 75 |
+
3. Open the `...` menu at the top of the editor's agent panel.
|
| 76 |
+
4. Click **Manage MCP Servers**.
|
| 77 |
+
5. Click **View raw config**.
|
| 78 |
+
6. Add a `koharu` entry under `mcpServers`.
|
| 79 |
+
7. Save the config.
|
| 80 |
+
8. Restart Antigravity if it does not reload the MCP server automatically.
|
| 81 |
+
|
| 82 |
+
### Example config
|
| 83 |
+
|
| 84 |
+
```json
|
| 85 |
+
{
|
| 86 |
+
"mcpServers": {
|
| 87 |
+
"koharu": {
|
| 88 |
+
"serverUrl": "http://127.0.0.1:9999/mcp"
|
| 89 |
+
}
|
| 90 |
+
}
|
| 91 |
+
}
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
If you already have other MCP servers configured, add `koharu` alongside them instead of replacing the whole `mcpServers` object.
|
| 95 |
+
|
| 96 |
+
### After setup
|
| 97 |
+
|
| 98 |
+
Ask Antigravity something simple first:
|
| 99 |
+
|
| 100 |
+
- `What tools are available from Koharu?`
|
| 101 |
+
- `How many documents are currently loaded in Koharu?`
|
| 102 |
+
|
| 103 |
+
If that works, move on to page actions such as:
|
| 104 |
+
|
| 105 |
+
- `Open C:\\manga\\page-01.png in Koharu and run detect and OCR.`
|
| 106 |
+
- `Show me the segment mask for document 0.`
|
| 107 |
+
- `Run the full pipeline on document 0 and export the rendered page.`
|
| 108 |
+
|
| 109 |
+
## Claude Desktop
|
| 110 |
+
|
| 111 |
+
Claude Desktop's current local MCP config is command-based. Because Koharu exposes a local HTTP MCP endpoint rather than a packaged desktop extension, the practical config-file path is to use a small bridge process that connects Claude Desktop to `http://127.0.0.1:9999/mcp`.
|
| 112 |
+
|
| 113 |
+
This guide uses `mcp-remote` for that bridge.
|
| 114 |
+
|
| 115 |
+
### Before you start
|
| 116 |
+
|
| 117 |
+
Make sure one of these is true:
|
| 118 |
+
|
| 119 |
+
- `npx` is already available on your machine
|
| 120 |
+
- Node.js is installed so `npx` can run
|
| 121 |
+
|
| 122 |
+
### Steps
|
| 123 |
+
|
| 124 |
+
1. Start Koharu with `--port 9999`.
|
| 125 |
+
2. Open Claude Desktop.
|
| 126 |
+
3. Open **Settings**.
|
| 127 |
+
4. Open the **Developer** section.
|
| 128 |
+
5. Open the MCP config file from Claude Desktop's built-in editor entry.
|
| 129 |
+
6. Add a `koharu` server entry.
|
| 130 |
+
7. Save the file.
|
| 131 |
+
8. Fully restart Claude Desktop.
|
| 132 |
+
|
| 133 |
+
### Windows config
|
| 134 |
+
|
| 135 |
+
```json
|
| 136 |
+
{
|
| 137 |
+
"mcpServers": {
|
| 138 |
+
"koharu": {
|
| 139 |
+
"command": "C:\\Progra~1\\nodejs\\npx.cmd",
|
| 140 |
+
"args": [
|
| 141 |
+
"-y",
|
| 142 |
+
"mcp-remote@latest",
|
| 143 |
+
"http://127.0.0.1:9999/mcp"
|
| 144 |
+
],
|
| 145 |
+
"env": {}
|
| 146 |
+
}
|
| 147 |
+
}
|
| 148 |
+
}
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
### macOS / Linux config
|
| 152 |
+
|
| 153 |
+
```json
|
| 154 |
+
{
|
| 155 |
+
"mcpServers": {
|
| 156 |
+
"koharu": {
|
| 157 |
+
"command": "npx",
|
| 158 |
+
"args": [
|
| 159 |
+
"-y",
|
| 160 |
+
"mcp-remote@latest",
|
| 161 |
+
"http://127.0.0.1:9999/mcp"
|
| 162 |
+
],
|
| 163 |
+
"env": {}
|
| 164 |
+
}
|
| 165 |
+
}
|
| 166 |
+
}
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
Notes:
|
| 170 |
+
|
| 171 |
+
- if you already have other entries in `mcpServers`, add `koharu` without deleting them
|
| 172 |
+
- `mcp-remote@latest` is fetched on first use, so the first startup may need internet access
|
| 173 |
+
- if your Windows Node install is not under `C:\\Program Files\\nodejs`, update the `command` path accordingly
|
| 174 |
+
- Anthropic's current remote-MCP connector flow for Claude Desktop is managed through **Settings > Connectors** for actual remote servers; this page intentionally covers the config-file bridge pattern for Koharu's local `127.0.0.1` endpoint
|
| 175 |
+
|
| 176 |
+
### After setup
|
| 177 |
+
|
| 178 |
+
Open a new Claude Desktop chat and ask:
|
| 179 |
+
|
| 180 |
+
- `What Koharu MCP tools do you have available?`
|
| 181 |
+
- `Check whether Koharu has any loaded documents.`
|
| 182 |
+
|
| 183 |
+
Then move to actual page work:
|
| 184 |
+
|
| 185 |
+
- `Open D:\\manga\\page-01.png in Koharu.`
|
| 186 |
+
- `Run detect, OCR, inpaint, translate, and render for document 0.`
|
| 187 |
+
- `Show me the rendered output for document 0.`
|
| 188 |
+
|
| 189 |
+
## Claude Code
|
| 190 |
+
|
| 191 |
+
If by "Claude" you mean Claude Code, the safest setup for Koharu's local `http://127.0.0.1` MCP endpoint is also to use the same stdio bridge pattern.
|
| 192 |
+
|
| 193 |
+
### Add it to your user config
|
| 194 |
+
|
| 195 |
+
macOS / Linux:
|
| 196 |
+
|
| 197 |
+
```bash
|
| 198 |
+
claude mcp add-json koharu "{\"type\":\"stdio\",\"command\":\"npx\",\"args\":[\"-y\",\"mcp-remote@latest\",\"http://127.0.0.1:9999/mcp\"],\"env\":{}}" --scope user
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
This writes the server into Claude Code's MCP configuration for your user account.
|
| 202 |
+
|
| 203 |
+
Windows:
|
| 204 |
+
|
| 205 |
+
```bash
|
| 206 |
+
claude mcp add-json koharu "{\"type\":\"stdio\",\"command\":\"cmd\",\"args\":[\"/c\",\"npx\",\"-y\",\"mcp-remote@latest\",\"http://127.0.0.1:9999/mcp\"],\"env\":{}}" --scope user
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
On native Windows, Claude Code's docs explicitly recommend the `cmd /c npx` wrapper for local stdio MCP servers that use `npx`.
|
| 210 |
+
|
| 211 |
+
### Verify it
|
| 212 |
+
|
| 213 |
+
```bash
|
| 214 |
+
claude mcp get koharu
|
| 215 |
+
claude mcp list
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
+
If you already configured Koharu in Claude Desktop, Claude Code can also import compatible entries from Claude Desktop on supported platforms:
|
| 219 |
+
|
| 220 |
+
```bash
|
| 221 |
+
claude mcp add-from-claude-desktop --scope user
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
## First tasks to try
|
| 225 |
+
|
| 226 |
+
Once the client is connected, these are good first tasks:
|
| 227 |
+
|
| 228 |
+
- ask Koharu for the loaded document count
|
| 229 |
+
- open one page image from disk
|
| 230 |
+
- run detect and OCR only first
|
| 231 |
+
- inspect the segment or rendered layer before running a full export
|
| 232 |
+
|
| 233 |
+
This makes failures easier to diagnose than jumping straight into a full batch pipeline.
|
| 234 |
+
|
| 235 |
+
## Common mistakes
|
| 236 |
+
|
| 237 |
+
- starting Koharu without `--port`, then trying to connect a client to the wrong port
|
| 238 |
+
- using `http://127.0.0.1:9999/` instead of `http://127.0.0.1:9999/mcp`
|
| 239 |
+
- closing Koharu after adding the client config
|
| 240 |
+
- replacing your entire client config instead of merging a new `koharu` entry
|
| 241 |
+
- expecting Claude Desktop to connect directly to Koharu's HTTP URL through a plain command-less config entry
|
| 242 |
+
- forgetting that Koharu's default local server is only reachable from the same machine
|
| 243 |
+
|
| 244 |
+
## Related pages
|
| 245 |
+
|
| 246 |
+
- [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
|
| 247 |
+
- [MCP Tools Reference](../reference/mcp-tools.md)
|
| 248 |
+
- [CLI Reference](../reference/cli.md)
|
| 249 |
+
- [Troubleshooting](troubleshooting.md)
|
| 250 |
+
|
| 251 |
+
## External references
|
| 252 |
+
|
| 253 |
+
- [Claude Code MCP docs](https://code.claude.com/docs/en/mcp)
|
| 254 |
+
- [Claude Help: Building custom connectors via remote MCP servers](https://support.claude.com/en/articles/11503834-building-custom-connectors-via-remote-mcp-servers)
|
| 255 |
+
- [Wolfram support article with current Antigravity and Claude Desktop MCP config examples](https://support.wolfram.com/73463/)
|
docs/how-to/contributing.md
ADDED
|
@@ -0,0 +1,174 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Contributing
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Contributing
|
| 6 |
+
|
| 7 |
+
Koharu accepts contributions to the Rust workspace, the Tauri app shell, the Next.js UI, the ML pipeline, MCP integrations, tests, and documentation.
|
| 8 |
+
|
| 9 |
+
This guide focuses on the current repository workflow so you can make changes that match CI and are easy to review.
|
| 10 |
+
|
| 11 |
+
## Before you start
|
| 12 |
+
|
| 13 |
+
You should have:
|
| 14 |
+
|
| 15 |
+
- [Rust](https://www.rust-lang.org/tools/install) 1.92 or later
|
| 16 |
+
- [Bun](https://bun.sh/) 1.0 or later
|
| 17 |
+
|
| 18 |
+
On Windows, source builds also expect:
|
| 19 |
+
|
| 20 |
+
- Visual Studio C++ build tools
|
| 21 |
+
- the CUDA Toolkit for the normal CUDA-enabled local build path
|
| 22 |
+
|
| 23 |
+
If you have not built Koharu locally before, read [Build From Source](build-from-source.md) first.
|
| 24 |
+
|
| 25 |
+
## Repository layout
|
| 26 |
+
|
| 27 |
+
The main top-level areas are:
|
| 28 |
+
|
| 29 |
+
- `koharu/`: the Tauri desktop application shell
|
| 30 |
+
- `koharu-*`: Rust workspace crates for runtime, ML, pipeline, RPC, rendering, PSD export, and types
|
| 31 |
+
- `ui/`: the web UI used inside the desktop shell and headless mode
|
| 32 |
+
- `e2e/`: Playwright end-to-end tests and fixtures
|
| 33 |
+
- `docs/`: the documentation site content
|
| 34 |
+
|
| 35 |
+
If you are not sure where a change belongs:
|
| 36 |
+
|
| 37 |
+
- UI interaction and panels usually live in `ui/`
|
| 38 |
+
- backend APIs, MCP tools, and orchestration usually live in `koharu-rpc/` or `koharu-pipeline/`
|
| 39 |
+
- rendering, OCR, model runtime, and ML-specific logic live in the Rust workspace crates
|
| 40 |
+
|
| 41 |
+
## Set up the repository
|
| 42 |
+
|
| 43 |
+
Install JS dependencies first:
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
bun install
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
For a normal local desktop build, use:
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
bun run build
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
For active development, use:
|
| 56 |
+
|
| 57 |
+
```bash
|
| 58 |
+
bun run dev
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
The dev command runs the Tauri app in dev mode and keeps the local server on a fixed port for UI development and e2e tests.
|
| 62 |
+
|
| 63 |
+
## Use the repo's preferred local commands
|
| 64 |
+
|
| 65 |
+
For local Rust commands, prefer `bun cargo` instead of calling `cargo` directly.
|
| 66 |
+
|
| 67 |
+
Examples:
|
| 68 |
+
|
| 69 |
+
```bash
|
| 70 |
+
bun cargo fmt -- --check
|
| 71 |
+
bun cargo check
|
| 72 |
+
bun cargo clippy -- -D warnings
|
| 73 |
+
bun cargo test --workspace --tests
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
For UI formatting, use:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
bun run format
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
For docs validation, use:
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
zensical build -c
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
## What to run before opening a PR
|
| 89 |
+
|
| 90 |
+
Run the checks that match the area you changed.
|
| 91 |
+
|
| 92 |
+
If you changed Rust code:
|
| 93 |
+
|
| 94 |
+
- `bun cargo fmt -- --check`
|
| 95 |
+
- `bun cargo check`
|
| 96 |
+
- `bun cargo clippy -- -D warnings`
|
| 97 |
+
- `bun cargo test --workspace --tests`
|
| 98 |
+
|
| 99 |
+
If you changed the desktop app or full integration flow:
|
| 100 |
+
|
| 101 |
+
- `bun run build`
|
| 102 |
+
|
| 103 |
+
If you changed the UI or interaction flow:
|
| 104 |
+
|
| 105 |
+
- `bun run format`
|
| 106 |
+
- `bun run test:e2e`
|
| 107 |
+
|
| 108 |
+
If you changed docs:
|
| 109 |
+
|
| 110 |
+
- `zensical build -c`
|
| 111 |
+
|
| 112 |
+
You do not always need to run every command in this list for every PR, but you should run enough to cover the code paths you touched.
|
| 113 |
+
|
| 114 |
+
## E2E tests
|
| 115 |
+
|
| 116 |
+
Koharu includes Playwright tests under `e2e/`.
|
| 117 |
+
|
| 118 |
+
Run them with:
|
| 119 |
+
|
| 120 |
+
```bash
|
| 121 |
+
bun run test:e2e
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
The current Playwright setup starts Koharu through:
|
| 125 |
+
|
| 126 |
+
```bash
|
| 127 |
+
bun run dev -- --headless
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
and waits for the local API to come up before running the browser tests.
|
| 131 |
+
|
| 132 |
+
## Docs changes
|
| 133 |
+
|
| 134 |
+
Docs live in `docs/` and are built by Zensical.
|
| 135 |
+
|
| 136 |
+
When updating docs:
|
| 137 |
+
|
| 138 |
+
- keep instructions aligned with the current implementation
|
| 139 |
+
- prefer concrete commands and real paths over generic advice
|
| 140 |
+
- update navigation in `zensical.toml` if you add a new page
|
| 141 |
+
- build the docs locally with `zensical build -c`
|
| 142 |
+
|
| 143 |
+
## Pull request expectations
|
| 144 |
+
|
| 145 |
+
A good contribution usually has:
|
| 146 |
+
|
| 147 |
+
- one clear goal
|
| 148 |
+
- code that follows existing patterns instead of introducing a new style unnecessarily
|
| 149 |
+
- tests or validation steps that match the change
|
| 150 |
+
- a PR description that explains what changed and how you verified it
|
| 151 |
+
|
| 152 |
+
Small, focused PRs are easier to review than large mixed changes.
|
| 153 |
+
|
| 154 |
+
If your change affects user-visible behavior, mention:
|
| 155 |
+
|
| 156 |
+
- what the old behavior was
|
| 157 |
+
- what the new behavior is
|
| 158 |
+
- how you tested it
|
| 159 |
+
|
| 160 |
+
## AI-generated PRs
|
| 161 |
+
|
| 162 |
+
AI-generated contributions are welcome, provided:
|
| 163 |
+
|
| 164 |
+
1. A human has reviewed the code before opening the PR.
|
| 165 |
+
2. The submitter understands the changes being made.
|
| 166 |
+
|
| 167 |
+
That rule already exists in the repository's GitHub contribution guidance and remains in effect here as well.
|
| 168 |
+
|
| 169 |
+
## Related pages
|
| 170 |
+
|
| 171 |
+
- [Build From Source](build-from-source.md)
|
| 172 |
+
- [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
|
| 173 |
+
- [Configure MCP Clients](configure-mcp-clients.md)
|
| 174 |
+
- [Troubleshooting](troubleshooting.md)
|
docs/how-to/export-and-manage-projects.md
CHANGED
|
@@ -4,26 +4,103 @@ title: Export Pages and Manage Projects
|
|
| 4 |
|
| 5 |
# Export Pages and Manage Projects
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
## Export rendered output
|
| 8 |
|
| 9 |
Koharu can export the current page as a rendered image.
|
| 10 |
|
| 11 |
Use this when you want a final flattened result for reading, sharing, or publishing.
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
## Export layered PSD files
|
| 14 |
|
| 15 |
Koharu can also export a layered Photoshop PSD.
|
| 16 |
|
| 17 |
-
PSD export
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## When to use each format
|
| 26 |
|
| 27 |
-
|
| 28 |
-
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# Export Pages and Manage Projects
|
| 6 |
|
| 7 |
+
Koharu's workflow is page-based. You import image pages, run the pipeline, review text blocks, and then export either flattened output or a layered handoff file for manual finishing.
|
| 8 |
+
|
| 9 |
+
## Supported page inputs
|
| 10 |
+
|
| 11 |
+
The current import flow is image-based. Koharu accepts:
|
| 12 |
+
|
| 13 |
+
- `.png`
|
| 14 |
+
- `.jpg`
|
| 15 |
+
- `.jpeg`
|
| 16 |
+
- `.webp`
|
| 17 |
+
|
| 18 |
+
Folder import recursively scans for supported image files and ignores everything else.
|
| 19 |
+
|
| 20 |
## Export rendered output
|
| 21 |
|
| 22 |
Koharu can export the current page as a rendered image.
|
| 23 |
|
| 24 |
Use this when you want a final flattened result for reading, sharing, or publishing.
|
| 25 |
|
| 26 |
+
Implementation details:
|
| 27 |
+
|
| 28 |
+
- rendered export uses the page's original image extension when possible
|
| 29 |
+
- Koharu names the exported file with a `_koharu` suffix
|
| 30 |
+
- rendered export requires the page to already have a rendered layer
|
| 31 |
+
|
| 32 |
+
Example output names:
|
| 33 |
+
|
| 34 |
+
- `page-001_koharu.png`
|
| 35 |
+
- `chapter-03_koharu.jpg`
|
| 36 |
+
|
| 37 |
+
## Export inpainted output
|
| 38 |
+
|
| 39 |
+
Koharu also keeps an inpainted layer in the pipeline, which is useful when you want a cleaned page without translated lettering.
|
| 40 |
+
|
| 41 |
+
This is most useful for:
|
| 42 |
+
|
| 43 |
+
- external lettering workflows
|
| 44 |
+
- cleanup review
|
| 45 |
+
- batch export of text-removed pages
|
| 46 |
+
|
| 47 |
+
When exported, Koharu uses an `_inpainted` filename suffix.
|
| 48 |
+
|
| 49 |
## Export layered PSD files
|
| 50 |
|
| 51 |
Koharu can also export a layered Photoshop PSD.
|
| 52 |
|
| 53 |
+
PSD export is the handoff format for users who want to keep working in Photoshop or a PSD-compatible editor after the ML pipeline is done.
|
| 54 |
|
| 55 |
+
In the current implementation, PSD export uses editable text layers by default and can include:
|
| 56 |
|
| 57 |
+
- the original image
|
| 58 |
+
- the inpainted image
|
| 59 |
+
- the segmentation mask
|
| 60 |
+
- the brush layer
|
| 61 |
+
- translated text layers
|
| 62 |
+
- a merged composite image
|
| 63 |
|
| 64 |
+
That makes the PSD much more useful than a flat image when you still need to:
|
| 65 |
+
|
| 66 |
+
- tweak wording
|
| 67 |
+
- adjust bubble fit
|
| 68 |
+
- repaint artifacts
|
| 69 |
+
- hide or inspect helper layers
|
| 70 |
+
|
| 71 |
+
Koharu names PSD exports with a `_koharu.psd` suffix.
|
| 72 |
+
|
| 73 |
+
## PSD export limitations
|
| 74 |
+
|
| 75 |
+
Koharu currently writes classic PSD files, not PSB files. That means very large pages can fail to export.
|
| 76 |
+
|
| 77 |
+
The implementation rejects dimensions above `30000 x 30000`.
|
| 78 |
+
|
| 79 |
+
## Manage loaded page sets
|
| 80 |
+
|
| 81 |
+
Koharu lets you work with multiple loaded pages in one session.
|
| 82 |
+
|
| 83 |
+
The practical choices are:
|
| 84 |
+
|
| 85 |
+
- open images and replace the current set
|
| 86 |
+
- append more images to the current set
|
| 87 |
+
- open a folder and load its supported image files
|
| 88 |
+
- append a folder to the current set
|
| 89 |
+
|
| 90 |
+
This is the main way to manage a chapter or batch job inside the app today.
|
| 91 |
|
| 92 |
## When to use each format
|
| 93 |
|
| 94 |
+
| Output | Best for |
|
| 95 |
+
| --- | --- |
|
| 96 |
+
| Rendered image | final delivery, reading copies, simple sharing |
|
| 97 |
+
| Inpainted image | external lettering, cleanup review, text-removal workflows |
|
| 98 |
+
| PSD | manual cleanup, touch-up, editable translated text |
|
| 99 |
+
|
| 100 |
+
## Recommended workflow
|
| 101 |
+
|
| 102 |
+
If you care about polish, a good pattern is:
|
| 103 |
+
|
| 104 |
+
1. run detection, OCR, translation, and render in Koharu
|
| 105 |
+
2. export a rendered image for quick review
|
| 106 |
+
3. export a PSD when you want editable text and helper layers for final cleanup
|
docs/how-to/index.md
CHANGED
|
@@ -8,7 +8,11 @@ How-to guides focus on specific jobs you may want to complete with Koharu.
|
|
| 8 |
|
| 9 |
## Common tasks
|
| 10 |
|
| 11 |
-
- [Install Koharu](install-koharu.md)
|
| 12 |
-
- [
|
| 13 |
-
- [
|
| 14 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Common tasks
|
| 10 |
|
| 11 |
+
- [Install Koharu](install-koharu.md): release setup, first-run downloads, and acceleration expectations
|
| 12 |
+
- [Contributing](contributing.md): repository layout, local commands, validation steps, and PR expectations
|
| 13 |
+
- [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md): local deployment patterns and runtime flags
|
| 14 |
+
- [Configure MCP Clients](configure-mcp-clients.md): connect Antigravity, Claude Desktop, or Claude Code to Koharu's local MCP endpoint
|
| 15 |
+
- [Use OpenAI-Compatible APIs](use-openai-compatible-api.md): connect LM Studio, OpenRouter, and other OpenAI-style chat-completions endpoints
|
| 16 |
+
- [Export Pages and Manage Projects](export-and-manage-projects.md): rendered images, PSD handoff, and page-set management
|
| 17 |
+
- [Build From Source](build-from-source.md): local developer build flow with Bun, Tauri, and platform features
|
| 18 |
+
- [Troubleshooting](troubleshooting.md): common startup, download, GPU, pipeline, and connectivity failures
|
docs/how-to/install-koharu.md
CHANGED
|
@@ -16,16 +16,28 @@ Koharu provides prebuilt binaries for:
|
|
| 16 |
|
| 17 |
If your platform is not covered by a release build, use [Build From Source](build-from-source.md).
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
## First launch expectations
|
| 20 |
|
| 21 |
On first run, Koharu may:
|
| 22 |
|
| 23 |
-
- extract
|
| 24 |
-
- download
|
| 25 |
-
- download local LLMs
|
| 26 |
|
| 27 |
This is normal and can take time depending on your connection and hardware.
|
| 28 |
|
|
|
|
|
|
|
| 29 |
## GPU acceleration notes
|
| 30 |
|
| 31 |
Koharu supports:
|
|
@@ -35,12 +47,33 @@ Koharu supports:
|
|
| 35 |
- Vulkan on Windows and Linux for OCR and LLM inference
|
| 36 |
- CPU fallback on all platforms
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
!!! note
|
| 41 |
|
| 42 |
Keep your NVIDIA driver up to date. Koharu checks for CUDA 13.1 support and falls back to CPU if the driver is too old.
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
## Need help?
|
| 45 |
|
| 46 |
For support, join the [Discord server](https://discord.gg/mHvHkxGnUY).
|
|
|
|
| 16 |
|
| 17 |
If your platform is not covered by a release build, use [Build From Source](build-from-source.md).
|
| 18 |
|
| 19 |
+
## What gets installed locally
|
| 20 |
+
|
| 21 |
+
Koharu is a local-first app. In practice, the desktop binary is only part of the installation footprint. The first real run also creates a per-user local data directory for:
|
| 22 |
+
|
| 23 |
+
- runtime libraries used by llama.cpp and GPU backends
|
| 24 |
+
- downloaded vision and OCR models
|
| 25 |
+
- optional local translation models you select later
|
| 26 |
+
|
| 27 |
+
Koharu keeps its own files under a `Koharu` app-data root and stores model weights separately from the application binary.
|
| 28 |
+
|
| 29 |
## First launch expectations
|
| 30 |
|
| 31 |
On first run, Koharu may:
|
| 32 |
|
| 33 |
+
- extract or download runtime libraries required by the local inference stack
|
| 34 |
+
- download the default vision and OCR models used by detection, segmentation, OCR, inpainting, and font estimation
|
| 35 |
+
- wait to download local translation LLMs until you actually select them in Settings
|
| 36 |
|
| 37 |
This is normal and can take time depending on your connection and hardware.
|
| 38 |
|
| 39 |
+
If you want to prefetch those runtime dependencies ahead of time, run Koharu once with `--download`. That path initializes the runtime packages and default vision stack, then exits without opening the GUI.
|
| 40 |
+
|
| 41 |
## GPU acceleration notes
|
| 42 |
|
| 43 |
Koharu supports:
|
|
|
|
| 47 |
- Vulkan on Windows and Linux for OCR and LLM inference
|
| 48 |
- CPU fallback on all platforms
|
| 49 |
|
| 50 |
+
Some practical details matter:
|
| 51 |
+
|
| 52 |
+
- detection and inpainting benefit most from CUDA or Metal
|
| 53 |
+
- Vulkan is mainly the fallback GPU path for OCR and local LLM inference
|
| 54 |
+
- if Koharu cannot verify that your NVIDIA driver supports CUDA 13.1, it falls back to CPU
|
| 55 |
+
|
| 56 |
+
For CUDA-capable systems, Koharu bundles and initializes the runtime pieces it needs instead of requiring you to wire every library path by hand.
|
| 57 |
|
| 58 |
!!! note
|
| 59 |
|
| 60 |
Keep your NVIDIA driver up to date. Koharu checks for CUDA 13.1 support and falls back to CPU if the driver is too old.
|
| 61 |
|
| 62 |
+
## After installation
|
| 63 |
+
|
| 64 |
+
Once Koharu launches successfully, the next decisions are usually:
|
| 65 |
+
|
| 66 |
+
- desktop GUI vs headless mode
|
| 67 |
+
- local translation model vs remote provider
|
| 68 |
+
- rendered export vs layered PSD export
|
| 69 |
+
|
| 70 |
+
See:
|
| 71 |
+
|
| 72 |
+
- [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
|
| 73 |
+
- [Models and Providers](../explanation/models-and-providers.md)
|
| 74 |
+
- [Export Pages and Manage Projects](export-and-manage-projects.md)
|
| 75 |
+
- [Troubleshooting](troubleshooting.md)
|
| 76 |
+
|
| 77 |
## Need help?
|
| 78 |
|
| 79 |
For support, join the [Discord server](https://discord.gg/mHvHkxGnUY).
|
docs/how-to/run-gui-headless-and-mcp.md
CHANGED
|
@@ -4,17 +4,37 @@ title: Run GUI, Headless, and MCP Modes
|
|
| 4 |
|
| 5 |
# Run GUI, Headless, and MCP Modes
|
| 6 |
|
| 7 |
-
Koharu can run as a normal desktop app, a headless local server with a Web UI, or an MCP server for AI agents.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Run the desktop app
|
| 10 |
|
| 11 |
Launch Koharu normally from your installed application.
|
| 12 |
|
|
|
|
|
|
|
| 13 |
This is the default mode and is the best choice for most users.
|
| 14 |
|
| 15 |
## Run headless mode
|
| 16 |
|
| 17 |
-
Headless mode starts the local
|
| 18 |
|
| 19 |
```bash
|
| 20 |
# macOS / Linux
|
|
@@ -26,9 +46,11 @@ koharu.exe --port 4000 --headless
|
|
| 26 |
|
| 27 |
After startup, open the Web UI at `http://localhost:4000`.
|
| 28 |
|
|
|
|
|
|
|
| 29 |
## Run with a fixed port
|
| 30 |
|
| 31 |
-
By default, Koharu uses a random local port. Use `--port` when you need a stable address.
|
| 32 |
|
| 33 |
```bash
|
| 34 |
# macOS / Linux
|
|
@@ -38,13 +60,40 @@ koharu --port 9999
|
|
| 38 |
koharu.exe --port 9999
|
| 39 |
```
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
## Connect to the MCP server
|
| 42 |
|
| 43 |
-
Koharu includes a built-in MCP server
|
|
|
|
|
|
|
| 44 |
|
| 45 |
`http://localhost:9999/mcp`
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
## Force CPU mode
|
| 50 |
|
|
@@ -58,9 +107,11 @@ koharu --cpu
|
|
| 58 |
koharu.exe --cpu
|
| 59 |
```
|
| 60 |
|
|
|
|
|
|
|
| 61 |
## Download runtime dependencies only
|
| 62 |
|
| 63 |
-
Use `--download` if you want Koharu to
|
| 64 |
|
| 65 |
```bash
|
| 66 |
# macOS / Linux
|
|
@@ -69,3 +120,24 @@ koharu --download
|
|
| 69 |
# Windows
|
| 70 |
koharu.exe --download
|
| 71 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# Run GUI, Headless, and MCP Modes
|
| 6 |
|
| 7 |
+
Koharu can run as a normal desktop app, a headless local server with a Web UI, or an MCP server for AI agents. These are not separate backends. They all sit on top of the same local runtime and HTTP server.
|
| 8 |
+
|
| 9 |
+
## What stays the same across modes
|
| 10 |
+
|
| 11 |
+
No matter how you launch Koharu, the runtime model is the same:
|
| 12 |
+
|
| 13 |
+
- the server binds to `127.0.0.1`
|
| 14 |
+
- the UI and API are served by the same local process
|
| 15 |
+
- the page pipeline, model loading, and exports use the same internal code paths
|
| 16 |
+
|
| 17 |
+
That is why desktop editing, headless automation, and MCP tooling stay aligned.
|
| 18 |
+
|
| 19 |
+
## Mode summary
|
| 20 |
+
|
| 21 |
+
| Mode | Desktop window | Local server | Typical use |
|
| 22 |
+
| --- | --- | --- | --- |
|
| 23 |
+
| Desktop | yes | yes | normal interactive editing |
|
| 24 |
+
| Headless | no | yes | local Web UI, scripting, automation |
|
| 25 |
+
| MCP | optional | yes | agent tooling through `/mcp` |
|
| 26 |
|
| 27 |
## Run the desktop app
|
| 28 |
|
| 29 |
Launch Koharu normally from your installed application.
|
| 30 |
|
| 31 |
+
Even in desktop mode, Koharu still starts a local HTTP server internally. The embedded window talks to that local server rather than calling the pipeline directly.
|
| 32 |
+
|
| 33 |
This is the default mode and is the best choice for most users.
|
| 34 |
|
| 35 |
## Run headless mode
|
| 36 |
|
| 37 |
+
Headless mode starts the local server without opening the desktop GUI.
|
| 38 |
|
| 39 |
```bash
|
| 40 |
# macOS / Linux
|
|
|
|
| 46 |
|
| 47 |
After startup, open the Web UI at `http://localhost:4000`.
|
| 48 |
|
| 49 |
+
Headless mode stays in the foreground until you stop it, typically with `Ctrl+C`.
|
| 50 |
+
|
| 51 |
## Run with a fixed port
|
| 52 |
|
| 53 |
+
By default, Koharu uses a random local port. Use `--port` when you need a stable address for bookmarks, scripts, reverse proxies, or MCP clients.
|
| 54 |
|
| 55 |
```bash
|
| 56 |
# macOS / Linux
|
|
|
|
| 60 |
koharu.exe --port 9999
|
| 61 |
```
|
| 62 |
|
| 63 |
+
If you do not specify `--port`, Koharu still starts the server, but the chosen port is dynamic.
|
| 64 |
+
|
| 65 |
+
## Connect to the local API
|
| 66 |
+
|
| 67 |
+
When Koharu is running on a fixed port, the main endpoints are:
|
| 68 |
+
|
| 69 |
+
- Web UI: `http://localhost:9999/`
|
| 70 |
+
- RPC / HTTP API: `http://localhost:9999/api/v1`
|
| 71 |
+
- MCP server: `http://localhost:9999/mcp`
|
| 72 |
+
|
| 73 |
+
Replace `9999` with the port you chose.
|
| 74 |
+
|
| 75 |
+
Because Koharu binds to loopback, these endpoints are local by default. If you want remote access from another machine, you need to expose that port yourself through your own network setup.
|
| 76 |
+
|
| 77 |
+
For endpoint-level details, see [HTTP API Reference](../reference/http-api.md).
|
| 78 |
+
|
| 79 |
## Connect to the MCP server
|
| 80 |
|
| 81 |
+
Koharu includes a built-in MCP server using the same loaded documents, models, and page pipeline as the rest of the app.
|
| 82 |
+
|
| 83 |
+
Point your MCP client or agent at:
|
| 84 |
|
| 85 |
`http://localhost:9999/mcp`
|
| 86 |
|
| 87 |
+
This is useful when you want an agent to:
|
| 88 |
+
|
| 89 |
+
- inspect text blocks
|
| 90 |
+
- run OCR or translation
|
| 91 |
+
- export rendered pages
|
| 92 |
+
- automate review or batch workflows
|
| 93 |
+
|
| 94 |
+
For client-specific setup examples, see [Configure MCP Clients](configure-mcp-clients.md).
|
| 95 |
+
|
| 96 |
+
For the built-in tool list itself, see [MCP Tools Reference](../reference/mcp-tools.md).
|
| 97 |
|
| 98 |
## Force CPU mode
|
| 99 |
|
|
|
|
| 107 |
koharu.exe --cpu
|
| 108 |
```
|
| 109 |
|
| 110 |
+
This is useful for compatibility testing, driver issues, or low-risk debugging when GPU setup is uncertain.
|
| 111 |
+
|
| 112 |
## Download runtime dependencies only
|
| 113 |
|
| 114 |
+
Use `--download` if you want Koharu to prefetch runtime dependencies and exit without starting the app.
|
| 115 |
|
| 116 |
```bash
|
| 117 |
# macOS / Linux
|
|
|
|
| 120 |
# Windows
|
| 121 |
koharu.exe --download
|
| 122 |
```
|
| 123 |
+
|
| 124 |
+
In the current implementation, this path initializes:
|
| 125 |
+
|
| 126 |
+
- runtime libraries used by the local inference stack
|
| 127 |
+
- the default vision and OCR models
|
| 128 |
+
|
| 129 |
+
It does not predownload every optional local translation LLM. Those are still fetched when you select them in Settings.
|
| 130 |
+
|
| 131 |
+
## Enable debug output
|
| 132 |
+
|
| 133 |
+
Use `--debug` when you want console-oriented startup with log output.
|
| 134 |
+
|
| 135 |
+
```bash
|
| 136 |
+
# macOS / Linux
|
| 137 |
+
koharu --debug
|
| 138 |
+
|
| 139 |
+
# Windows
|
| 140 |
+
koharu.exe --debug
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
On Windows, debug and headless runs also influence how Koharu attaches to or creates a console window.
|
docs/how-to/troubleshooting.md
ADDED
|
@@ -0,0 +1,276 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Troubleshooting
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Troubleshooting
|
| 6 |
+
|
| 7 |
+
This page covers the most common Koharu problems that follow from the current implementation: first-run downloads, runtime initialization, GPU fallback, headless and MCP access, pipeline-stage ordering, and source-build setup.
|
| 8 |
+
|
| 9 |
+
## Before you start
|
| 10 |
+
|
| 11 |
+
When troubleshooting, first identify which layer is failing:
|
| 12 |
+
|
| 13 |
+
- application startup
|
| 14 |
+
- runtime or model downloads
|
| 15 |
+
- GPU acceleration
|
| 16 |
+
- page pipeline stages such as detect, OCR, inpaint, or render
|
| 17 |
+
- headless or MCP connectivity
|
| 18 |
+
- source build and local development
|
| 19 |
+
|
| 20 |
+
That usually narrows the problem quickly.
|
| 21 |
+
|
| 22 |
+
## Koharu does not start cleanly on first launch
|
| 23 |
+
|
| 24 |
+
Possible causes:
|
| 25 |
+
|
| 26 |
+
- runtime libraries have not finished downloading or extracting yet
|
| 27 |
+
- the first-run model downloads are still in progress
|
| 28 |
+
- the machine is missing local permissions for its app-data directory
|
| 29 |
+
- GPU initialization failed and the app is trying to fall back
|
| 30 |
+
|
| 31 |
+
Try this:
|
| 32 |
+
|
| 33 |
+
1. wait longer on the very first launch, especially on slower disks or networks
|
| 34 |
+
2. start Koharu once with `--download` to prefetch runtime dependencies without opening the GUI
|
| 35 |
+
3. start once with `--cpu` to check whether the problem is GPU-related
|
| 36 |
+
4. start once with `--debug` to get console-oriented logs
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
# macOS / Linux
|
| 40 |
+
koharu --download
|
| 41 |
+
koharu --cpu
|
| 42 |
+
koharu --debug
|
| 43 |
+
|
| 44 |
+
# Windows
|
| 45 |
+
koharu.exe --download
|
| 46 |
+
koharu.exe --cpu
|
| 47 |
+
koharu.exe --debug
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
If `--cpu` works and the normal launch does not, the problem is usually in the GPU path rather than the general app startup path.
|
| 51 |
+
|
| 52 |
+
## Model or runtime downloads fail
|
| 53 |
+
|
| 54 |
+
Koharu needs network access on first use for:
|
| 55 |
+
|
| 56 |
+
- llama.cpp runtime packages
|
| 57 |
+
- GPU runtime support files where applicable
|
| 58 |
+
- the default vision and OCR model stack
|
| 59 |
+
- optional local translation models when selected later
|
| 60 |
+
|
| 61 |
+
Likely causes:
|
| 62 |
+
|
| 63 |
+
- intermittent network failures
|
| 64 |
+
- blocked access to GitHub release assets or model hosting
|
| 65 |
+
- local filesystem permission issues in the app-data directory
|
| 66 |
+
|
| 67 |
+
What to check:
|
| 68 |
+
|
| 69 |
+
- whether GitHub and Hugging Face downloads are reachable from the machine
|
| 70 |
+
- whether retrying `--download` succeeds
|
| 71 |
+
- whether another process or security tool is locking files in the local runtime directory
|
| 72 |
+
|
| 73 |
+
If downloads keep failing, test on a different network first. That quickly distinguishes a machine-local problem from an upstream reachability issue.
|
| 74 |
+
|
| 75 |
+
## Koharu falls back to CPU even though you have an NVIDIA GPU
|
| 76 |
+
|
| 77 |
+
This is expected when Koharu cannot confirm support for CUDA 13.1.
|
| 78 |
+
|
| 79 |
+
The current runtime behavior is:
|
| 80 |
+
|
| 81 |
+
- detect an NVIDIA driver
|
| 82 |
+
- query driver compatibility
|
| 83 |
+
- continue on CUDA only when the driver reports CUDA 13.1 support
|
| 84 |
+
- otherwise fall back to CPU
|
| 85 |
+
|
| 86 |
+
Try this:
|
| 87 |
+
|
| 88 |
+
1. update the NVIDIA driver
|
| 89 |
+
2. restart Koharu after the update
|
| 90 |
+
3. verify behavior with `--debug`
|
| 91 |
+
|
| 92 |
+
If the driver is old or the CUDA check fails, Koharu deliberately prefers CPU over a partially working CUDA configuration.
|
| 93 |
+
|
| 94 |
+
## OCR, inpainting, or export says something is missing
|
| 95 |
+
|
| 96 |
+
Some errors are just pipeline ordering problems.
|
| 97 |
+
|
| 98 |
+
Common examples from the current API and MCP layer:
|
| 99 |
+
|
| 100 |
+
- `No segment mask available. Run detect first.`
|
| 101 |
+
- `No rendered image found`
|
| 102 |
+
- `No inpainted image found`
|
| 103 |
+
|
| 104 |
+
These usually mean a required earlier stage has not produced its output yet.
|
| 105 |
+
|
| 106 |
+
Use this order:
|
| 107 |
+
|
| 108 |
+
1. Detect
|
| 109 |
+
2. OCR
|
| 110 |
+
3. Inpaint
|
| 111 |
+
4. LLM Generate
|
| 112 |
+
5. Render
|
| 113 |
+
6. Export
|
| 114 |
+
|
| 115 |
+
If export fails because there is no rendered or inpainted layer, rerun the missing stage instead of retrying export repeatedly.
|
| 116 |
+
|
| 117 |
+
## Detection or OCR quality is poor on a page
|
| 118 |
+
|
| 119 |
+
Common causes:
|
| 120 |
+
|
| 121 |
+
- low-resolution source images
|
| 122 |
+
- unusual page crops
|
| 123 |
+
- heavy screentones or noisy scans
|
| 124 |
+
- vertical text mixed with difficult artwork
|
| 125 |
+
- badly placed or duplicated text blocks after detection
|
| 126 |
+
|
| 127 |
+
Try this:
|
| 128 |
+
|
| 129 |
+
1. start from a cleaner page image if possible
|
| 130 |
+
2. inspect the detected text blocks before translating
|
| 131 |
+
3. fix obvious bad blocks before running the rest of the pipeline
|
| 132 |
+
4. rerun later stages after the structural fixes
|
| 133 |
+
|
| 134 |
+
If the structure is wrong, translation quality usually gets worse downstream because OCR and rendering both depend on the block geometry.
|
| 135 |
+
|
| 136 |
+
## Headless mode starts, but you cannot open the Web UI
|
| 137 |
+
|
| 138 |
+
Check the basics first:
|
| 139 |
+
|
| 140 |
+
- did you pass `--headless`
|
| 141 |
+
- did you choose a fixed port
|
| 142 |
+
- is the process still running
|
| 143 |
+
|
| 144 |
+
Example:
|
| 145 |
+
|
| 146 |
+
```bash
|
| 147 |
+
koharu --port 4000 --headless
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
Then open:
|
| 151 |
+
|
| 152 |
+
```text
|
| 153 |
+
http://localhost:4000
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
Important implementation detail:
|
| 157 |
+
|
| 158 |
+
- Koharu binds to `127.0.0.1`
|
| 159 |
+
|
| 160 |
+
That means the local Web UI is only available on the same machine unless you expose it yourself through your own networking setup.
|
| 161 |
+
|
| 162 |
+
Also verify that another process is not already using the selected port.
|
| 163 |
+
|
| 164 |
+
## The MCP client cannot connect
|
| 165 |
+
|
| 166 |
+
Use a fixed port and point the client to:
|
| 167 |
+
|
| 168 |
+
```text
|
| 169 |
+
http://localhost:9999/mcp
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
Common mistakes:
|
| 173 |
+
|
| 174 |
+
- using the root URL instead of `/mcp`
|
| 175 |
+
- forgetting `--port`
|
| 176 |
+
- trying to connect after the Koharu process has already exited
|
| 177 |
+
- trying to reach the service from another machine without explicitly exposing the port
|
| 178 |
+
|
| 179 |
+
If normal headless Web UI access works but MCP does not, check the exact URL first. Wrong path selection is more common than server failure.
|
| 180 |
+
|
| 181 |
+
If the client is Antigravity, Claude Desktop, or Claude Code, follow the client-specific setup in [Configure MCP Clients](configure-mcp-clients.md).
|
| 182 |
+
|
| 183 |
+
## Import appears to do nothing
|
| 184 |
+
|
| 185 |
+
The current documented import flow is image-based. Koharu accepts:
|
| 186 |
+
|
| 187 |
+
- `.png`
|
| 188 |
+
- `.jpg`
|
| 189 |
+
- `.jpeg`
|
| 190 |
+
- `.webp`
|
| 191 |
+
|
| 192 |
+
Folder import recursively filters files to those extensions only.
|
| 193 |
+
|
| 194 |
+
If a folder import seems empty, check whether the folder actually contains supported image files instead of archives, PSDs, or other formats.
|
| 195 |
+
|
| 196 |
+
## Export fails or gives you the wrong kind of output
|
| 197 |
+
|
| 198 |
+
Use the output type that matches the current pipeline state:
|
| 199 |
+
|
| 200 |
+
- rendered export requires a rendered layer
|
| 201 |
+
- inpainted export requires an inpainted layer
|
| 202 |
+
- PSD export is the best choice when you still want editable text and helper layers
|
| 203 |
+
|
| 204 |
+
Also remember:
|
| 205 |
+
|
| 206 |
+
- rendered exports use a `_koharu` suffix
|
| 207 |
+
- inpainted exports use an `_inpainted` suffix
|
| 208 |
+
- PSD export uses `_koharu.psd`
|
| 209 |
+
- classic PSD export rejects images above `30000 x 30000`
|
| 210 |
+
|
| 211 |
+
If the page is extremely large, resize or split it before expecting PSD export to succeed.
|
| 212 |
+
|
| 213 |
+
## Source build fails on Windows
|
| 214 |
+
|
| 215 |
+
The Windows build helper expects:
|
| 216 |
+
|
| 217 |
+
- `nvcc` for the default CUDA build path
|
| 218 |
+
- `cl.exe` from Visual Studio C++ tools
|
| 219 |
+
|
| 220 |
+
The Bun wrapper script tries to discover both automatically, but if either one is missing the build can fail before Tauri finishes launching.
|
| 221 |
+
|
| 222 |
+
Use the project wrapper commands:
|
| 223 |
+
|
| 224 |
+
```bash
|
| 225 |
+
bun install
|
| 226 |
+
bun run build
|
| 227 |
+
```
|
| 228 |
+
|
| 229 |
+
If you want direct control over the Tauri command, try:
|
| 230 |
+
|
| 231 |
+
```bash
|
| 232 |
+
bun tauri build --release --no-bundle
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
If you want lower-level Rust builds, prefer:
|
| 236 |
+
|
| 237 |
+
```bash
|
| 238 |
+
bun cargo build --release -p koharu --features=cuda
|
| 239 |
+
```
|
| 240 |
+
|
| 241 |
+
If you only need to confirm that the app works at all, try a CPU-only runtime launch first instead of debugging the full CUDA toolchain immediately.
|
| 242 |
+
|
| 243 |
+
## Source build fails because of the chosen feature path
|
| 244 |
+
|
| 245 |
+
The desktop build is platform-aware:
|
| 246 |
+
|
| 247 |
+
- Windows and Linux use `cuda`
|
| 248 |
+
- macOS on Apple Silicon uses `metal`
|
| 249 |
+
|
| 250 |
+
If you manually invoke lower-level cargo commands with the wrong feature set for your platform, the build can fail or produce a mismatched binary. Follow the platform examples in [Build From Source](build-from-source.md).
|
| 251 |
+
|
| 252 |
+
## When to stop debugging locally
|
| 253 |
+
|
| 254 |
+
You have probably isolated the issue enough to report it when:
|
| 255 |
+
|
| 256 |
+
- `--cpu` works but GPU mode does not
|
| 257 |
+
- `--download` consistently fails on a healthy network
|
| 258 |
+
- the same page repeatedly triggers a reproducible pipeline failure
|
| 259 |
+
- headless mode starts but a correct `localhost` URL still fails
|
| 260 |
+
|
| 261 |
+
At that point, collect:
|
| 262 |
+
|
| 263 |
+
- your OS and hardware
|
| 264 |
+
- the exact command you ran
|
| 265 |
+
- whether `--cpu` changes the result
|
| 266 |
+
- the exact error message
|
| 267 |
+
- whether the issue happens on one page or every page
|
| 268 |
+
|
| 269 |
+
## Related pages
|
| 270 |
+
|
| 271 |
+
- [Install Koharu](install-koharu.md)
|
| 272 |
+
- [Run GUI, Headless, and MCP Modes](run-gui-headless-and-mcp.md)
|
| 273 |
+
- [Configure MCP Clients](configure-mcp-clients.md)
|
| 274 |
+
- [Build From Source](build-from-source.md)
|
| 275 |
+
- [CLI Reference](../reference/cli.md)
|
| 276 |
+
- [Technical Deep Dive](../explanation/technical-deep-dive.md)
|
docs/how-to/use-openai-compatible-api.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Use OpenAI-Compatible APIs
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Use OpenAI-Compatible APIs
|
| 6 |
+
|
| 7 |
+
Koharu can translate through APIs that follow the OpenAI Chat Completions shape. That includes local servers such as LM Studio and hosted routers such as OpenRouter.
|
| 8 |
+
|
| 9 |
+
This page is specifically about the current OpenAI-compatible path in Koharu. It is different from Koharu's built-in OpenAI, Gemini, Claude, and DeepSeek provider presets.
|
| 10 |
+
|
| 11 |
+
## What Koharu expects from a compatible endpoint
|
| 12 |
+
|
| 13 |
+
In the current implementation, Koharu expects:
|
| 14 |
+
|
| 15 |
+
- a base URL that points at the API root, usually ending in `/v1`
|
| 16 |
+
- `GET /models` for connection testing
|
| 17 |
+
- `POST /chat/completions` for translation
|
| 18 |
+
- a response that includes `choices[0].message.content`
|
| 19 |
+
- bearer-token authentication when an API key is provided
|
| 20 |
+
|
| 21 |
+
Some implementation details matter:
|
| 22 |
+
|
| 23 |
+
- Koharu trims whitespace and a trailing slash from the base URL before appending `/models` or `/chat/completions`
|
| 24 |
+
- an empty API key is omitted entirely instead of sending an empty `Authorization` header
|
| 25 |
+
- a compatible model only appears in Koharu's LLM picker after both `Base URL` and `Model name` are filled in
|
| 26 |
+
- each configured preset shows up as its own selectable source in the LLM picker
|
| 27 |
+
|
| 28 |
+
That means OpenAI-compatible here really means OpenAI API-compatible, not just "can be used with OpenAI tools in general."
|
| 29 |
+
|
| 30 |
+
## Where to configure it in Koharu
|
| 31 |
+
|
| 32 |
+
Open **Settings** and scroll to **Local LLM & OpenAI Compatible Providers**.
|
| 33 |
+
|
| 34 |
+
The current UI exposes:
|
| 35 |
+
|
| 36 |
+
- a preset selector: `Ollama`, `LM Studio`, `Preset 1`, `Preset 2`
|
| 37 |
+
- `Base URL`
|
| 38 |
+
- `API Key (optional)`
|
| 39 |
+
- `Model name`
|
| 40 |
+
- `Test Connection`
|
| 41 |
+
- advanced fields for `Temperature`, `Max tokens`, and a custom system prompt
|
| 42 |
+
|
| 43 |
+
`Test Connection` currently calls `/models` with a 5-second timeout and reports whether Koharu connected successfully, how many model IDs the endpoint returned, and the measured latency.
|
| 44 |
+
|
| 45 |
+
## LM Studio
|
| 46 |
+
|
| 47 |
+
Use the built-in `LM Studio` preset when you want a local model server on the same machine.
|
| 48 |
+
|
| 49 |
+
1. Start LM Studio's local server.
|
| 50 |
+
2. In Koharu, open **Settings**.
|
| 51 |
+
3. Choose the `LM Studio` preset.
|
| 52 |
+
4. Set `Base URL` to `http://127.0.0.1:1234/v1`.
|
| 53 |
+
5. Leave `API Key` empty unless you configured authentication in front of LM Studio.
|
| 54 |
+
6. Enter the exact LM Studio model identifier in `Model name`.
|
| 55 |
+
7. Click `Test Connection`.
|
| 56 |
+
8. Open Koharu's LLM picker and select the LM Studio-backed model entry.
|
| 57 |
+
|
| 58 |
+
Notes:
|
| 59 |
+
|
| 60 |
+
- Koharu's default LM Studio preset already uses `http://127.0.0.1:1234/v1`
|
| 61 |
+
- LM Studio's official docs use the same OpenAI-compatible base path on port `1234`
|
| 62 |
+
- Koharu's connection test only shows the model count, not the full model names, so you still need to know the exact model ID you want to use
|
| 63 |
+
|
| 64 |
+
If you are unsure about the model identifier, query LM Studio directly:
|
| 65 |
+
|
| 66 |
+
```bash
|
| 67 |
+
curl http://127.0.0.1:1234/v1/models
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
Then copy the `id` field for the model you want.
|
| 71 |
+
|
| 72 |
+
Official references:
|
| 73 |
+
|
| 74 |
+
- [LM Studio OpenAI compatibility docs](https://lmstudio.ai/docs/developer/openai-compat)
|
| 75 |
+
- [LM Studio list models endpoint](https://lmstudio.ai/docs/developer/openai-compat/models)
|
| 76 |
+
|
| 77 |
+
## OpenRouter
|
| 78 |
+
|
| 79 |
+
Use `Preset 1` or `Preset 2` for hosted OpenAI-compatible services such as OpenRouter. That avoids overwriting the local LM Studio preset.
|
| 80 |
+
|
| 81 |
+
1. Create an API key in OpenRouter.
|
| 82 |
+
2. In Koharu, open **Settings**.
|
| 83 |
+
3. Choose `Preset 1` or `Preset 2`.
|
| 84 |
+
4. Set `Base URL` to `https://openrouter.ai/api/v1`.
|
| 85 |
+
5. Paste your OpenRouter API key into `API Key`.
|
| 86 |
+
6. Enter the exact OpenRouter model ID in `Model name`.
|
| 87 |
+
7. Click `Test Connection`.
|
| 88 |
+
8. Select that preset-backed model from Koharu's LLM picker.
|
| 89 |
+
|
| 90 |
+
Important details:
|
| 91 |
+
|
| 92 |
+
- OpenRouter model IDs should include the organization prefix, not just a display name
|
| 93 |
+
- Koharu currently sends standard bearer auth and a normal OpenAI-style chat-completions request body
|
| 94 |
+
- OpenRouter supports extra headers such as `HTTP-Referer` and `X-OpenRouter-Title`, but Koharu does not currently expose fields for those optional headers
|
| 95 |
+
|
| 96 |
+
Official references:
|
| 97 |
+
|
| 98 |
+
- [OpenRouter API overview](https://openrouter.ai/docs/api/reference/overview)
|
| 99 |
+
- [OpenRouter authentication](https://openrouter.ai/docs/api/reference/authentication)
|
| 100 |
+
- [OpenRouter models](https://openrouter.ai/models)
|
| 101 |
+
|
| 102 |
+
## Other compatible endpoints
|
| 103 |
+
|
| 104 |
+
For other self-hosted or routed APIs, use the same checklist:
|
| 105 |
+
|
| 106 |
+
- use the API root as `Base URL`, not the full `/chat/completions` URL
|
| 107 |
+
- make sure the endpoint supports `GET /models`
|
| 108 |
+
- make sure it supports `POST /chat/completions`
|
| 109 |
+
- use the exact model `id`, not just a marketing name
|
| 110 |
+
- provide an API key if the server requires bearer authentication
|
| 111 |
+
|
| 112 |
+
If the server only implements `Responses` or some custom schema, Koharu's current OpenAI-compatible integration will not work without an adapter or proxy because Koharu currently talks to `chat/completions`.
|
| 113 |
+
|
| 114 |
+
## How model selection works in practice
|
| 115 |
+
|
| 116 |
+
Koharu does not treat these endpoints as one generic remote bucket. Instead, each configured preset becomes its own LLM entry source.
|
| 117 |
+
|
| 118 |
+
For example:
|
| 119 |
+
|
| 120 |
+
- `LM Studio` can point at a local server
|
| 121 |
+
- `Preset 1` can point at OpenRouter
|
| 122 |
+
- `Preset 2` can point at another self-hosted OpenAI-compatible API
|
| 123 |
+
|
| 124 |
+
That lets you keep multiple compatible backends configured and switch between them from the normal LLM picker.
|
| 125 |
+
|
| 126 |
+
## Common mistakes
|
| 127 |
+
|
| 128 |
+
- using a base URL without `/v1`
|
| 129 |
+
- pasting the full `/chat/completions` URL into `Base URL`
|
| 130 |
+
- leaving `Model name` empty and expecting the model to appear anyway
|
| 131 |
+
- using a display label instead of the exact API model ID
|
| 132 |
+
- assuming `Test Connection` loads or selects a model for you
|
| 133 |
+
- trying to use an endpoint that only supports the newer `Responses` API
|
| 134 |
+
|
| 135 |
+
## Related pages
|
| 136 |
+
|
| 137 |
+
- [Models and Providers](../explanation/models-and-providers.md)
|
| 138 |
+
- [Translate Your First Page](../tutorials/translate-your-first-page.md)
|
| 139 |
+
- [Troubleshooting](troubleshooting.md)
|
docs/reference/cli.md
CHANGED
|
@@ -6,6 +6,13 @@ title: CLI Reference
|
|
| 6 |
|
| 7 |
This page covers the command-line options exposed by Koharu's desktop binary.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Common usage
|
| 10 |
|
| 11 |
```bash
|
|
@@ -20,11 +27,26 @@ koharu.exe [OPTIONS]
|
|
| 20 |
|
| 21 |
| Option | Meaning |
|
| 22 |
| --- | --- |
|
| 23 |
-
| `-d`, `--download` |
|
| 24 |
| `--cpu` | Force CPU mode even when a GPU is available |
|
| 25 |
-
| `-p`, `--port <PORT>` | Bind the local HTTP server to a specific port |
|
| 26 |
| `--headless` | Run without starting the desktop GUI |
|
| 27 |
-
| `--debug` | Enable debug
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Common patterns
|
| 30 |
|
|
@@ -45,3 +67,21 @@ Download runtime packages ahead of time:
|
|
| 45 |
```bash
|
| 46 |
koharu --download
|
| 47 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
This page covers the command-line options exposed by Koharu's desktop binary.
|
| 8 |
|
| 9 |
+
Koharu uses the same binary for:
|
| 10 |
+
|
| 11 |
+
- desktop startup
|
| 12 |
+
- headless local Web UI
|
| 13 |
+
- the local HTTP API
|
| 14 |
+
- the built-in MCP server
|
| 15 |
+
|
| 16 |
## Common usage
|
| 17 |
|
| 18 |
```bash
|
|
|
|
| 27 |
|
| 28 |
| Option | Meaning |
|
| 29 |
| --- | --- |
|
| 30 |
+
| `-d`, `--download` | Prefetch runtime libraries and the default vision and OCR stack, then exit |
|
| 31 |
| `--cpu` | Force CPU mode even when a GPU is available |
|
| 32 |
+
| `-p`, `--port <PORT>` | Bind the local HTTP server to a specific `127.0.0.1` port instead of a random one |
|
| 33 |
| `--headless` | Run without starting the desktop GUI |
|
| 34 |
+
| `--debug` | Enable debug-oriented console output |
|
| 35 |
+
|
| 36 |
+
## Behavior notes
|
| 37 |
+
|
| 38 |
+
Some flags change more than just startup appearance:
|
| 39 |
+
|
| 40 |
+
- without `--port`, Koharu chooses a random local port
|
| 41 |
+
- with `--headless`, Koharu skips the Tauri window but still serves the Web UI and API
|
| 42 |
+
- with `--download`, Koharu exits after dependency prefetch and does not stay running
|
| 43 |
+
- with `--cpu`, both the vision stack and local LLM path avoid GPU acceleration
|
| 44 |
+
|
| 45 |
+
When a fixed port is set, the main local endpoints are:
|
| 46 |
+
|
| 47 |
+
- `http://localhost:<PORT>/`
|
| 48 |
+
- `http://localhost:<PORT>/api/v1`
|
| 49 |
+
- `http://localhost:<PORT>/mcp`
|
| 50 |
|
| 51 |
## Common patterns
|
| 52 |
|
|
|
|
| 67 |
```bash
|
| 68 |
koharu --download
|
| 69 |
```
|
| 70 |
+
|
| 71 |
+
Run a local MCP endpoint on a stable port:
|
| 72 |
+
|
| 73 |
+
```bash
|
| 74 |
+
koharu --port 9999
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
Then connect your MCP client to:
|
| 78 |
+
|
| 79 |
+
```text
|
| 80 |
+
http://localhost:9999/mcp
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
Start with explicit debug logging:
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
koharu --debug
|
| 87 |
+
```
|
docs/reference/http-api.md
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: HTTP API Reference
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# HTTP API Reference
|
| 6 |
+
|
| 7 |
+
Koharu exposes a local HTTP API under:
|
| 8 |
+
|
| 9 |
+
```text
|
| 10 |
+
http://127.0.0.1:<PORT>/api/v1
|
| 11 |
+
```
|
| 12 |
+
|
| 13 |
+
This is the same API used by the desktop UI and headless Web UI.
|
| 14 |
+
|
| 15 |
+
## Runtime model
|
| 16 |
+
|
| 17 |
+
Important behavior from the current implementation:
|
| 18 |
+
|
| 19 |
+
- the API is served by the same process as the GUI or headless runtime
|
| 20 |
+
- the server binds to `127.0.0.1` by default
|
| 21 |
+
- the API and MCP server share the same loaded documents, models, and pipeline state
|
| 22 |
+
- when no `--port` is provided, Koharu chooses a random local port
|
| 23 |
+
|
| 24 |
+
## Common response shapes
|
| 25 |
+
|
| 26 |
+
Frequently used types include:
|
| 27 |
+
|
| 28 |
+
- `MetaInfo`: app version and ML device
|
| 29 |
+
- `DocumentSummary`: document id, name, size, revision, layer availability, and text-block count
|
| 30 |
+
- `DocumentDetail`: full document metadata plus text blocks
|
| 31 |
+
- `JobState`: current pipeline job progress
|
| 32 |
+
- `LlmState`: current LLM load state
|
| 33 |
+
- `ImportResult`: imported document count and summaries
|
| 34 |
+
- `ExportResult`: count of exported files
|
| 35 |
+
|
| 36 |
+
## Endpoints
|
| 37 |
+
|
| 38 |
+
### Meta and fonts
|
| 39 |
+
|
| 40 |
+
| Method | Path | Purpose |
|
| 41 |
+
| --- | --- | --- |
|
| 42 |
+
| `GET` | `/meta` | get app version and active ML backend |
|
| 43 |
+
| `GET` | `/fonts` | list font families available for rendering |
|
| 44 |
+
|
| 45 |
+
### Documents
|
| 46 |
+
|
| 47 |
+
| Method | Path | Purpose |
|
| 48 |
+
| --- | --- | --- |
|
| 49 |
+
| `GET` | `/documents` | list loaded documents |
|
| 50 |
+
| `POST` | `/documents/import?mode=replace` | replace the current document set with uploaded images |
|
| 51 |
+
| `POST` | `/documents/import?mode=append` | append uploaded images to the current document set |
|
| 52 |
+
| `GET` | `/documents/{documentId}` | get one document and all text-block metadata |
|
| 53 |
+
| `GET` | `/documents/{documentId}/thumbnail` | get a thumbnail image |
|
| 54 |
+
| `GET` | `/documents/{documentId}/layers/{layer}` | fetch one image layer |
|
| 55 |
+
|
| 56 |
+
The import endpoint uses multipart form data with repeated `files` fields.
|
| 57 |
+
|
| 58 |
+
Document layers currently exposed by the implementation include:
|
| 59 |
+
|
| 60 |
+
- `original`
|
| 61 |
+
- `segment`
|
| 62 |
+
- `inpainted`
|
| 63 |
+
- `brush`
|
| 64 |
+
- `rendered`
|
| 65 |
+
|
| 66 |
+
### Page pipeline
|
| 67 |
+
|
| 68 |
+
| Method | Path | Purpose |
|
| 69 |
+
| --- | --- | --- |
|
| 70 |
+
| `POST` | `/documents/{documentId}/detect` | detect text blocks and layout |
|
| 71 |
+
| `POST` | `/documents/{documentId}/ocr` | run OCR on detected text blocks |
|
| 72 |
+
| `POST` | `/documents/{documentId}/inpaint` | remove original text using the current mask |
|
| 73 |
+
| `POST` | `/documents/{documentId}/render` | render translated text |
|
| 74 |
+
| `POST` | `/documents/{documentId}/translate` | generate translations for one block or the full page |
|
| 75 |
+
| `PUT` | `/documents/{documentId}/mask-region` | replace or update part of the segmentation mask |
|
| 76 |
+
| `PUT` | `/documents/{documentId}/brush-region` | write a patch into the brush layer |
|
| 77 |
+
| `POST` | `/documents/{documentId}/inpaint-region` | re-inpaint a rectangular region only |
|
| 78 |
+
|
| 79 |
+
Useful request details:
|
| 80 |
+
|
| 81 |
+
- `/render` accepts `textBlockId`, `shaderEffect`, `shaderStroke`, and `fontFamily`
|
| 82 |
+
- `/translate` accepts `textBlockId` and `language`
|
| 83 |
+
- `/mask-region` accepts `data` plus an optional `region`
|
| 84 |
+
- `/brush-region` accepts `data` plus a required `region`
|
| 85 |
+
- `/inpaint-region` accepts a rectangular `region`
|
| 86 |
+
|
| 87 |
+
## Text blocks
|
| 88 |
+
|
| 89 |
+
| Method | Path | Purpose |
|
| 90 |
+
| --- | --- | --- |
|
| 91 |
+
| `POST` | `/documents/{documentId}/text-blocks` | create a new text block from `x`, `y`, `width`, `height` |
|
| 92 |
+
| `PATCH` | `/documents/{documentId}/text-blocks/{textBlockId}` | patch text, translation, box geometry, or style |
|
| 93 |
+
| `DELETE` | `/documents/{documentId}/text-blocks/{textBlockId}` | remove a text block |
|
| 94 |
+
|
| 95 |
+
The text-block patch shape currently includes:
|
| 96 |
+
|
| 97 |
+
- `text`
|
| 98 |
+
- `translation`
|
| 99 |
+
- `x`
|
| 100 |
+
- `y`
|
| 101 |
+
- `width`
|
| 102 |
+
- `height`
|
| 103 |
+
- `style`
|
| 104 |
+
|
| 105 |
+
`style` can include font families, font size, RGBA color, text alignment, italic and bold flags, and stroke configuration.
|
| 106 |
+
|
| 107 |
+
## Export
|
| 108 |
+
|
| 109 |
+
| Method | Path | Purpose |
|
| 110 |
+
| --- | --- | --- |
|
| 111 |
+
| `GET` | `/documents/{documentId}/export?layer=rendered` | export one rendered image |
|
| 112 |
+
| `GET` | `/documents/{documentId}/export?layer=inpainted` | export one inpainted image |
|
| 113 |
+
| `GET` | `/documents/{documentId}/export/psd` | export one layered PSD |
|
| 114 |
+
| `POST` | `/exports?layer=rendered` | export all rendered pages |
|
| 115 |
+
| `POST` | `/exports?layer=inpainted` | export all inpainted pages |
|
| 116 |
+
|
| 117 |
+
Single-document export endpoints return binary file content. Bulk export returns JSON with the number of files written.
|
| 118 |
+
|
| 119 |
+
## LLM control
|
| 120 |
+
|
| 121 |
+
| Method | Path | Purpose |
|
| 122 |
+
| --- | --- | --- |
|
| 123 |
+
| `GET` | `/llm/models` | list local and API-backed translation models |
|
| 124 |
+
| `GET` | `/llm/state` | get the current LLM status |
|
| 125 |
+
| `POST` | `/llm/load` | load a local or API-backed model |
|
| 126 |
+
| `POST` | `/llm/offload` | unload the current model |
|
| 127 |
+
| `POST` | `/llm/ping` | test an OpenAI-compatible base URL |
|
| 128 |
+
|
| 129 |
+
Useful request details:
|
| 130 |
+
|
| 131 |
+
- `/llm/models` accepts optional `language` and `openaiCompatibleBaseUrl` query parameters
|
| 132 |
+
- `/llm/load` accepts `id`, `apiKey`, `baseUrl`, `temperature`, `maxTokens`, and `customSystemPrompt`
|
| 133 |
+
- `/llm/ping` accepts `baseUrl` and optional `apiKey`
|
| 134 |
+
|
| 135 |
+
## Provider API keys
|
| 136 |
+
|
| 137 |
+
| Method | Path | Purpose |
|
| 138 |
+
| --- | --- | --- |
|
| 139 |
+
| `GET` | `/providers/{provider}/api-key` | read a saved API key for a provider |
|
| 140 |
+
| `PUT` | `/providers/{provider}/api-key` | store or overwrite a provider API key |
|
| 141 |
+
|
| 142 |
+
Current built-in provider ids include:
|
| 143 |
+
|
| 144 |
+
- `openai`
|
| 145 |
+
- `gemini`
|
| 146 |
+
- `claude`
|
| 147 |
+
- `deepseek`
|
| 148 |
+
- `openai-compatible`
|
| 149 |
+
|
| 150 |
+
## Pipeline jobs
|
| 151 |
+
|
| 152 |
+
| Method | Path | Purpose |
|
| 153 |
+
| --- | --- | --- |
|
| 154 |
+
| `POST` | `/jobs/pipeline` | start a full processing job |
|
| 155 |
+
| `DELETE` | `/jobs/{jobId}` | cancel a running pipeline job |
|
| 156 |
+
|
| 157 |
+
The pipeline job request can include:
|
| 158 |
+
|
| 159 |
+
- `documentId` to target one page, or omit it to process all loaded pages
|
| 160 |
+
- LLM settings such as `llmModelId`, `llmApiKey`, `llmBaseUrl`, `llmTemperature`, `llmMaxTokens`, and `llmCustomSystemPrompt`
|
| 161 |
+
- render settings such as `shaderEffect`, `shaderStroke`, and `fontFamily`
|
| 162 |
+
- `language`
|
| 163 |
+
|
| 164 |
+
## Events stream
|
| 165 |
+
|
| 166 |
+
Koharu also exposes server-sent events at:
|
| 167 |
+
|
| 168 |
+
```text
|
| 169 |
+
GET /events
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
Current event names are:
|
| 173 |
+
|
| 174 |
+
- `snapshot`
|
| 175 |
+
- `documents.changed`
|
| 176 |
+
- `document.changed`
|
| 177 |
+
- `job.changed`
|
| 178 |
+
- `download.changed`
|
| 179 |
+
- `llm.changed`
|
| 180 |
+
|
| 181 |
+
The stream sends an initial `snapshot` event and uses a 15-second keepalive.
|
| 182 |
+
|
| 183 |
+
## Typical workflow
|
| 184 |
+
|
| 185 |
+
The normal API order for one page is:
|
| 186 |
+
|
| 187 |
+
1. `POST /documents/import?mode=replace`
|
| 188 |
+
2. `POST /documents/{documentId}/detect`
|
| 189 |
+
3. `POST /documents/{documentId}/ocr`
|
| 190 |
+
4. `POST /llm/load`
|
| 191 |
+
5. `POST /documents/{documentId}/translate`
|
| 192 |
+
6. `POST /documents/{documentId}/inpaint`
|
| 193 |
+
7. `POST /documents/{documentId}/render`
|
| 194 |
+
8. `GET /documents/{documentId}/export?layer=rendered`
|
| 195 |
+
|
| 196 |
+
If you want agent-oriented access instead of HTTP endpoint orchestration, see [MCP Tools Reference](mcp-tools.md).
|
docs/reference/index.md
CHANGED
|
@@ -8,5 +8,8 @@ Reference pages collect factual details you may want to look up quickly.
|
|
| 8 |
|
| 9 |
## Available references
|
| 10 |
|
| 11 |
-
- [CLI Reference](cli.md)
|
| 12 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Available references
|
| 10 |
|
| 11 |
+
- [CLI Reference](cli.md): startup flags, local server behavior, and common runtime patterns
|
| 12 |
+
- [HTTP API Reference](http-api.md): local REST endpoints, event stream names, payloads, and workflow order
|
| 13 |
+
- [MCP Tools Reference](mcp-tools.md): built-in MCP tool names, parameters, and suggested usage flow
|
| 14 |
+
- [Settings Reference](settings.md): appearance, language, provider keys, local-LLM presets, and About page behavior
|
| 15 |
+
- [Keyboard Shortcuts](keyboard-shortcuts.md): the default editor shortcuts currently documented in the UI
|
docs/reference/mcp-tools.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: MCP Tools Reference
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# MCP Tools Reference
|
| 6 |
+
|
| 7 |
+
Koharu exposes MCP tools at:
|
| 8 |
+
|
| 9 |
+
```text
|
| 10 |
+
http://127.0.0.1:<PORT>/mcp
|
| 11 |
+
```
|
| 12 |
+
|
| 13 |
+
These tools operate on the same runtime state as the GUI and HTTP API.
|
| 14 |
+
|
| 15 |
+
## General behavior
|
| 16 |
+
|
| 17 |
+
Important implementation details:
|
| 18 |
+
|
| 19 |
+
- image-based tools can return text plus inline image content
|
| 20 |
+
- `open_documents` replaces the current document set rather than appending
|
| 21 |
+
- `process` starts the full pipeline but does not itself stream progress
|
| 22 |
+
- `llm_load` and `process` currently accept local-model-style parameters and do not expose every HTTP API field
|
| 23 |
+
|
| 24 |
+
## Inspection tools
|
| 25 |
+
|
| 26 |
+
| Tool | What it does | Key parameters |
|
| 27 |
+
| --- | --- | --- |
|
| 28 |
+
| `app_version` | get the application version | none |
|
| 29 |
+
| `device` | get ML device and GPU-related info | none |
|
| 30 |
+
| `get_documents` | get the number of loaded documents | none |
|
| 31 |
+
| `get_document` | get one document's metadata and text blocks | `index` |
|
| 32 |
+
| `list_font_families` | list available render fonts | none |
|
| 33 |
+
| `llm_list` | list translation models | none |
|
| 34 |
+
| `llm_ready` | check whether an LLM is currently loaded | none |
|
| 35 |
+
|
| 36 |
+
## Image and block preview tools
|
| 37 |
+
|
| 38 |
+
| Tool | What it does | Key parameters |
|
| 39 |
+
| --- | --- | --- |
|
| 40 |
+
| `view_image` | preview a whole document layer | `index`, `layer`, optional `max_size` |
|
| 41 |
+
| `view_text_block` | preview one cropped text block | `index`, `text_block_index`, optional `layer` |
|
| 42 |
+
|
| 43 |
+
Valid `view_image` layers:
|
| 44 |
+
|
| 45 |
+
- `original`
|
| 46 |
+
- `segment`
|
| 47 |
+
- `inpainted`
|
| 48 |
+
- `rendered`
|
| 49 |
+
|
| 50 |
+
Valid `view_text_block` layers:
|
| 51 |
+
|
| 52 |
+
- `original`
|
| 53 |
+
- `rendered`
|
| 54 |
+
|
| 55 |
+
## Document and export tools
|
| 56 |
+
|
| 57 |
+
| Tool | What it does | Key parameters |
|
| 58 |
+
| --- | --- | --- |
|
| 59 |
+
| `open_documents` | load image files from disk and replace the current set | `paths` |
|
| 60 |
+
| `export_document` | write the rendered document to disk | `index`, `output_path` |
|
| 61 |
+
|
| 62 |
+
`open_documents` expects filesystem paths, not uploaded file blobs.
|
| 63 |
+
|
| 64 |
+
`export_document` currently exports the rendered image path only. PSD export is available through the HTTP API but does not currently have a dedicated MCP tool.
|
| 65 |
+
|
| 66 |
+
## Pipeline tools
|
| 67 |
+
|
| 68 |
+
| Tool | What it does | Key parameters |
|
| 69 |
+
| --- | --- | --- |
|
| 70 |
+
| `detect` | run text detection and font prediction | `index` |
|
| 71 |
+
| `ocr` | run OCR on detected blocks | `index` |
|
| 72 |
+
| `inpaint` | remove text using the current mask | `index` |
|
| 73 |
+
| `render` | draw translated text back onto the page | `index`, optional `text_block_index`, `shader_effect`, `font_family` |
|
| 74 |
+
| `process` | start detect -> OCR -> inpaint -> translate -> render | optional `index`, `llm_model_id`, `language`, `shader_effect`, `font_family` |
|
| 75 |
+
|
| 76 |
+
`process` is the coarse-grained convenience tool. If you need more control or easier debugging, use the stage tools separately.
|
| 77 |
+
|
| 78 |
+
## LLM tools
|
| 79 |
+
|
| 80 |
+
| Tool | What it does | Key parameters |
|
| 81 |
+
| --- | --- | --- |
|
| 82 |
+
| `llm_load` | load a translation model | `id`, optional `temperature`, `max_tokens`, `custom_system_prompt` |
|
| 83 |
+
| `llm_offload` | unload the current model | none |
|
| 84 |
+
| `llm_generate` | translate one block or all blocks | `index`, optional `text_block_index`, `language` |
|
| 85 |
+
|
| 86 |
+
`llm_generate` expects an LLM to already be loaded.
|
| 87 |
+
|
| 88 |
+
## Text-block editing tools
|
| 89 |
+
|
| 90 |
+
| Tool | What it does | Key parameters |
|
| 91 |
+
| --- | --- | --- |
|
| 92 |
+
| `update_text_block` | patch text, translation, box geometry, or style | `index`, `text_block_index`, optional text and style fields |
|
| 93 |
+
| `add_text_block` | add a new empty text block | `index`, `x`, `y`, `width`, `height` |
|
| 94 |
+
| `remove_text_block` | remove one text block | `index`, `text_block_index` |
|
| 95 |
+
|
| 96 |
+
The current update tool can change:
|
| 97 |
+
|
| 98 |
+
- `translation`
|
| 99 |
+
- `x`
|
| 100 |
+
- `y`
|
| 101 |
+
- `width`
|
| 102 |
+
- `height`
|
| 103 |
+
- `font_families`
|
| 104 |
+
- `font_size`
|
| 105 |
+
- `color`
|
| 106 |
+
- `shader_effect`
|
| 107 |
+
|
| 108 |
+
## Mask and cleanup tools
|
| 109 |
+
|
| 110 |
+
| Tool | What it does | Key parameters |
|
| 111 |
+
| --- | --- | --- |
|
| 112 |
+
| `dilate_mask` | expand the current text mask | `index`, `radius` |
|
| 113 |
+
| `erode_mask` | shrink the current text mask | `index`, `radius` |
|
| 114 |
+
| `inpaint_region` | re-inpaint a specific rectangle only | `index`, `x`, `y`, `width`, `height` |
|
| 115 |
+
|
| 116 |
+
These are useful when the automatic segmentation mask is close but still needs manual cleanup.
|
| 117 |
+
|
| 118 |
+
## Suggested prompt flow
|
| 119 |
+
|
| 120 |
+
For reliable agent behavior, this sequence works well:
|
| 121 |
+
|
| 122 |
+
1. `open_documents`
|
| 123 |
+
2. `get_documents`
|
| 124 |
+
3. `detect`
|
| 125 |
+
4. `ocr`
|
| 126 |
+
5. `get_document`
|
| 127 |
+
6. `llm_load`
|
| 128 |
+
7. `llm_generate`
|
| 129 |
+
8. `inpaint`
|
| 130 |
+
9. `render`
|
| 131 |
+
10. `view_image`
|
| 132 |
+
11. `export_document`
|
| 133 |
+
|
| 134 |
+
If you need to inspect a problem block, use `view_text_block` before asking the agent to patch layout or translation.
|
| 135 |
+
|
| 136 |
+
## Related pages
|
| 137 |
+
|
| 138 |
+
- [Configure MCP Clients](../how-to/configure-mcp-clients.md)
|
| 139 |
+
- [Run GUI, Headless, and MCP Modes](../how-to/run-gui-headless-and-mcp.md)
|
| 140 |
+
- [HTTP API Reference](http-api.md)
|
docs/reference/settings.md
ADDED
|
@@ -0,0 +1,151 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Settings Reference
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Settings Reference
|
| 6 |
+
|
| 7 |
+
Koharu's Settings screen exposes appearance, language, device, provider, and local-LLM configuration. This page documents the current settings surface as implemented in the app.
|
| 8 |
+
|
| 9 |
+
## Appearance
|
| 10 |
+
|
| 11 |
+
Theme options:
|
| 12 |
+
|
| 13 |
+
- `Light`
|
| 14 |
+
- `Dark`
|
| 15 |
+
- `System`
|
| 16 |
+
|
| 17 |
+
The app uses the selected theme immediately through the frontend theme provider.
|
| 18 |
+
|
| 19 |
+
## Language
|
| 20 |
+
|
| 21 |
+
The current UI locale list comes from the bundled translation resources.
|
| 22 |
+
|
| 23 |
+
Currently shipped locales are:
|
| 24 |
+
|
| 25 |
+
- `en-US`
|
| 26 |
+
- `es-ES`
|
| 27 |
+
- `ja-JP`
|
| 28 |
+
- `ru-RU`
|
| 29 |
+
- `zh-CN`
|
| 30 |
+
- `zh-TW`
|
| 31 |
+
|
| 32 |
+
Changing the UI language updates the frontend locale and also influences language-aware LLM model listing in the current implementation.
|
| 33 |
+
|
| 34 |
+
## Device
|
| 35 |
+
|
| 36 |
+
The Settings screen shows the current ML compute backend as `ML Compute`.
|
| 37 |
+
|
| 38 |
+
This value comes from the app metadata endpoint and reflects the runtime backend Koharu is currently using, such as CPU or a GPU-backed path.
|
| 39 |
+
|
| 40 |
+
## API Keys
|
| 41 |
+
|
| 42 |
+
The current built-in provider key section covers:
|
| 43 |
+
|
| 44 |
+
- `OpenAI`
|
| 45 |
+
- `Gemini`
|
| 46 |
+
- `Claude`
|
| 47 |
+
- `DeepSeek`
|
| 48 |
+
|
| 49 |
+
Important behavior:
|
| 50 |
+
|
| 51 |
+
- API keys are stored through the local keyring integration rather than plain frontend storage
|
| 52 |
+
- Gemini is marked as a free-tier provider in the current UI
|
| 53 |
+
- the password-style input is only a visibility toggle in the UI, not a different storage mode
|
| 54 |
+
|
| 55 |
+
## Local LLM and OpenAI-compatible providers
|
| 56 |
+
|
| 57 |
+
This section is used for local servers such as Ollama and LM Studio, and for custom OpenAI-compatible endpoints.
|
| 58 |
+
|
| 59 |
+
### Presets
|
| 60 |
+
|
| 61 |
+
Current presets:
|
| 62 |
+
|
| 63 |
+
- `Ollama`
|
| 64 |
+
- `LM Studio`
|
| 65 |
+
- `Preset 1`
|
| 66 |
+
- `Preset 2`
|
| 67 |
+
|
| 68 |
+
Default base URLs:
|
| 69 |
+
|
| 70 |
+
- Ollama: `http://localhost:11434/v1`
|
| 71 |
+
- LM Studio: `http://127.0.0.1:1234/v1`
|
| 72 |
+
- Preset 1: empty until configured
|
| 73 |
+
- Preset 2: empty until configured
|
| 74 |
+
|
| 75 |
+
Each preset stores its own:
|
| 76 |
+
|
| 77 |
+
- `Base URL`
|
| 78 |
+
- `API Key`
|
| 79 |
+
- `Model name`
|
| 80 |
+
- `Temperature`
|
| 81 |
+
- `Max tokens`
|
| 82 |
+
- `Custom system prompt`
|
| 83 |
+
|
| 84 |
+
That lets you keep several compatible backends configured and switch between them from the same settings screen.
|
| 85 |
+
|
| 86 |
+
### Required fields for the model picker
|
| 87 |
+
|
| 88 |
+
In the current implementation, a preset-backed OpenAI-compatible model only becomes selectable when both of these are filled in:
|
| 89 |
+
|
| 90 |
+
- `Base URL`
|
| 91 |
+
- `Model name`
|
| 92 |
+
|
| 93 |
+
An empty preset does not appear as a usable model entry.
|
| 94 |
+
|
| 95 |
+
### Advanced fields
|
| 96 |
+
|
| 97 |
+
The expandable advanced section currently exposes:
|
| 98 |
+
|
| 99 |
+
- `Temperature`
|
| 100 |
+
- `Max tokens`
|
| 101 |
+
- `Custom system prompt`
|
| 102 |
+
|
| 103 |
+
Behavior notes:
|
| 104 |
+
|
| 105 |
+
- leaving `Temperature` or `Max tokens` empty sends no override
|
| 106 |
+
- leaving `Custom system prompt` empty uses Koharu's default manga translation system prompt
|
| 107 |
+
- the reset button clears only the custom prompt override for the current preset
|
| 108 |
+
|
| 109 |
+
### Test Connection
|
| 110 |
+
|
| 111 |
+
`Test Connection` is a connectivity check for the current preset.
|
| 112 |
+
|
| 113 |
+
The current implementation:
|
| 114 |
+
|
| 115 |
+
- sends a request to Koharu's `/llm/ping` path
|
| 116 |
+
- checks the preset `Base URL`
|
| 117 |
+
- optionally includes the preset API key
|
| 118 |
+
- reports success or failure inline
|
| 119 |
+
- shows model count and latency on success
|
| 120 |
+
- uses a 5-second timeout for the underlying compatible-model listing
|
| 121 |
+
|
| 122 |
+
This is a connectivity test, not a model load.
|
| 123 |
+
|
| 124 |
+
## About page
|
| 125 |
+
|
| 126 |
+
Settings links to a separate About page.
|
| 127 |
+
|
| 128 |
+
The About screen currently shows:
|
| 129 |
+
|
| 130 |
+
- the current app version
|
| 131 |
+
- whether a newer GitHub release exists
|
| 132 |
+
- the author link
|
| 133 |
+
- the repository link
|
| 134 |
+
|
| 135 |
+
In packaged app mode, the version check compares the local app version against the latest GitHub release for `mayocream/koharu`.
|
| 136 |
+
|
| 137 |
+
## Persistence model
|
| 138 |
+
|
| 139 |
+
The current settings behavior is split across storage layers:
|
| 140 |
+
|
| 141 |
+
- provider API keys are stored through the system keyring
|
| 142 |
+
- local LLM preset config is persisted in Koharu's frontend preferences store
|
| 143 |
+
- theme and other UI preferences also persist locally
|
| 144 |
+
|
| 145 |
+
That means clearing frontend preferences is not the same as clearing saved provider API keys.
|
| 146 |
+
|
| 147 |
+
## Related pages
|
| 148 |
+
|
| 149 |
+
- [Use OpenAI-Compatible APIs](../how-to/use-openai-compatible-api.md)
|
| 150 |
+
- [Models and Providers](../explanation/models-and-providers.md)
|
| 151 |
+
- [HTTP API Reference](http-api.md)
|
docs/tutorials/translate-your-first-page.md
CHANGED
|
@@ -4,13 +4,13 @@ title: Translate Your First Page
|
|
| 4 |
|
| 5 |
# Translate Your First Page
|
| 6 |
|
| 7 |
-
This tutorial
|
| 8 |
|
| 9 |
## Before you begin
|
| 10 |
|
| 11 |
- Install Koharu from the latest GitHub release
|
| 12 |
-
- Start with a clear
|
| 13 |
-
- Make sure you have enough local VRAM
|
| 14 |
|
| 15 |
If you have not installed Koharu yet, start with [Install Koharu](../how-to/install-koharu.md).
|
| 16 |
|
|
@@ -18,23 +18,40 @@ If you have not installed Koharu yet, start with [Install Koharu](../how-to/inst
|
|
| 18 |
|
| 19 |
Open the desktop application normally.
|
| 20 |
|
| 21 |
-
On the first run, Koharu may
|
| 22 |
|
| 23 |
## 2. Import a page
|
| 24 |
|
| 25 |
-
Load your
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## 3. Detect text and run OCR
|
| 30 |
|
| 31 |
Use Koharu's built-in vision pipeline to:
|
| 32 |
|
| 33 |
-
- detect
|
| 34 |
-
-
|
| 35 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
## 4. Choose a translation backend
|
| 40 |
|
|
@@ -45,28 +62,56 @@ Pick either:
|
|
| 45 |
|
| 46 |
Koharu can use OpenAI, Gemini, Claude, DeepSeek, and OpenAI-compatible endpoints such as LM Studio or OpenRouter.
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## 5. Translate and review
|
| 49 |
|
| 50 |
Run translation on the page, then inspect the result carefully.
|
| 51 |
|
| 52 |
-
Koharu helps with text layout and vertical CJK rendering, but
|
| 53 |
|
| 54 |
- names and terminology
|
| 55 |
-
-
|
| 56 |
-
-
|
| 57 |
-
-
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
## 6. Export the result
|
| 60 |
|
| 61 |
-
When the page looks right, export it
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
-
|
| 64 |
-
-
|
|
|
|
|
|
|
| 65 |
|
| 66 |
-
|
| 67 |
|
| 68 |
## Next steps
|
| 69 |
|
| 70 |
- Learn export options: [Export Pages and Manage Projects](../how-to/export-and-manage-projects.md)
|
| 71 |
- Compare runtime choices: [Acceleration and Runtime](../explanation/acceleration-and-runtime.md)
|
| 72 |
-
-
|
|
|
|
|
|
| 4 |
|
| 5 |
# Translate Your First Page
|
| 6 |
|
| 7 |
+
This tutorial walks through the normal Koharu workflow for a single manga page: import, detect, recognize, translate, review, and export.
|
| 8 |
|
| 9 |
## Before you begin
|
| 10 |
|
| 11 |
- Install Koharu from the latest GitHub release
|
| 12 |
+
- Start with a clear page image in `.png`, `.jpg`, `.jpeg`, or `.webp`
|
| 13 |
+
- Make sure you have enough local VRAM or RAM for your preferred model, or plan to use a remote provider
|
| 14 |
|
| 15 |
If you have not installed Koharu yet, start with [Install Koharu](../how-to/install-koharu.md).
|
| 16 |
|
|
|
|
| 18 |
|
| 19 |
Open the desktop application normally.
|
| 20 |
|
| 21 |
+
On the first run, Koharu may spend time initializing local runtime packages and downloading the default vision stack. This is expected and usually only happens once per machine or runtime update.
|
| 22 |
|
| 23 |
## 2. Import a page
|
| 24 |
|
| 25 |
+
Load your page image into the app.
|
| 26 |
|
| 27 |
+
At the moment, the documented import flow is image-based rather than project-file based. If you import a folder instead of a single file, Koharu recursively filters it down to supported image files.
|
| 28 |
+
|
| 29 |
+
For a first pass, use one clean page so it is easy to judge:
|
| 30 |
+
|
| 31 |
+
- text detection quality
|
| 32 |
+
- OCR quality
|
| 33 |
+
- translation quality
|
| 34 |
+
- final bubble fit
|
| 35 |
|
| 36 |
## 3. Detect text and run OCR
|
| 37 |
|
| 38 |
Use Koharu's built-in vision pipeline to:
|
| 39 |
|
| 40 |
+
- detect text-like layout regions
|
| 41 |
+
- build a segmentation mask for cleanup
|
| 42 |
+
- estimate font and color hints
|
| 43 |
+
- recognize the source text with OCR
|
| 44 |
+
|
| 45 |
+
Under the hood, Koharu does not just run OCR on the full page. It first creates text blocks, crops those regions, and then runs OCR on the cropped areas.
|
| 46 |
+
|
| 47 |
+
After detection and OCR, review the page before you translate. Look for:
|
| 48 |
|
| 49 |
+
- missed bubbles or captions
|
| 50 |
+
- duplicate or badly placed text blocks
|
| 51 |
+
- obvious OCR errors
|
| 52 |
+
- vertical text that should stay vertical
|
| 53 |
+
|
| 54 |
+
Fixing structural issues before translation usually saves time later.
|
| 55 |
|
| 56 |
## 4. Choose a translation backend
|
| 57 |
|
|
|
|
| 62 |
|
| 63 |
Koharu can use OpenAI, Gemini, Claude, DeepSeek, and OpenAI-compatible endpoints such as LM Studio or OpenRouter.
|
| 64 |
|
| 65 |
+
If you want to wire up LM Studio, OpenRouter, or another OpenAI-style endpoint, follow [Use OpenAI-Compatible APIs](../how-to/use-openai-compatible-api.md).
|
| 66 |
+
|
| 67 |
+
In practice:
|
| 68 |
+
|
| 69 |
+
- local models are better when privacy and offline use matter most
|
| 70 |
+
- remote models are easier when your machine is memory-constrained
|
| 71 |
+
- when you use a remote provider, Koharu sends OCR text for translation rather than the whole page image
|
| 72 |
+
|
| 73 |
## 5. Translate and review
|
| 74 |
|
| 75 |
Run translation on the page, then inspect the result carefully.
|
| 76 |
|
| 77 |
+
Koharu helps with text layout and vertical CJK rendering, but the final page still benefits from manual review. Focus on:
|
| 78 |
|
| 79 |
- names and terminology
|
| 80 |
+
- tone and character voice
|
| 81 |
+
- line breaks and bubble fit
|
| 82 |
+
- font choice and stroke readability
|
| 83 |
+
- blocks whose source OCR looked uncertain
|
| 84 |
+
|
| 85 |
+
If a translation reads correctly but still looks cramped, adjust the text block or styling before exporting.
|
| 86 |
|
| 87 |
## 6. Export the result
|
| 88 |
|
| 89 |
+
When the page looks right, export it in the format that matches your next step:
|
| 90 |
+
|
| 91 |
+
- rendered image for a flattened final page
|
| 92 |
+
- PSD for editable text and helper layers
|
| 93 |
+
|
| 94 |
+
Rendered exports are best when the page is finished. PSD export is better when you still want to:
|
| 95 |
+
|
| 96 |
+
- make small wording edits
|
| 97 |
+
- repaint artifacts
|
| 98 |
+
- hide or inspect helper layers
|
| 99 |
+
- finish the page in Photoshop
|
| 100 |
+
|
| 101 |
+
## 7. If the first result is not good enough
|
| 102 |
+
|
| 103 |
+
The usual fixes are:
|
| 104 |
|
| 105 |
+
- rerun detection after adjusting page selection or replacing bad blocks
|
| 106 |
+
- correct OCR or translation text manually
|
| 107 |
+
- switch to a stronger translation model
|
| 108 |
+
- export PSD and finish the page with manual lettering cleanup
|
| 109 |
|
| 110 |
+
Koharu works best when you treat the pipeline as a fast first pass, then use manual review where the page needs it.
|
| 111 |
|
| 112 |
## Next steps
|
| 113 |
|
| 114 |
- Learn export options: [Export Pages and Manage Projects](../how-to/export-and-manage-projects.md)
|
| 115 |
- Compare runtime choices: [Acceleration and Runtime](../explanation/acceleration-and-runtime.md)
|
| 116 |
+
- Understand the model stack: [Technical Deep Dive](../explanation/technical-deep-dive.md)
|
| 117 |
+
- Choose a translation backend: [Models and Providers](../explanation/models-and-providers.md)
|
zensical.toml
CHANGED
|
@@ -13,24 +13,32 @@ nav = [
|
|
| 13 |
"tutorials/index.md",
|
| 14 |
"tutorials/translate-your-first-page.md",
|
| 15 |
]},
|
| 16 |
-
{"How-To Guides" = [
|
| 17 |
-
"how-to/index.md",
|
| 18 |
-
"how-to/install-koharu.md",
|
| 19 |
-
"how-to/
|
| 20 |
-
"how-to/
|
| 21 |
-
"how-to/
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
"
|
| 25 |
-
"
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
"
|
| 31 |
-
"
|
| 32 |
-
"
|
| 33 |
-
]},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
]
|
| 35 |
|
| 36 |
[project.extra]
|
|
|
|
| 13 |
"tutorials/index.md",
|
| 14 |
"tutorials/translate-your-first-page.md",
|
| 15 |
]},
|
| 16 |
+
{"How-To Guides" = [
|
| 17 |
+
"how-to/index.md",
|
| 18 |
+
"how-to/install-koharu.md",
|
| 19 |
+
"how-to/contributing.md",
|
| 20 |
+
"how-to/run-gui-headless-and-mcp.md",
|
| 21 |
+
"how-to/configure-mcp-clients.md",
|
| 22 |
+
"how-to/use-openai-compatible-api.md",
|
| 23 |
+
"how-to/export-and-manage-projects.md",
|
| 24 |
+
"how-to/build-from-source.md",
|
| 25 |
+
"how-to/troubleshooting.md",
|
| 26 |
+
]},
|
| 27 |
+
{"Explanation" = [
|
| 28 |
+
"explanation/index.md",
|
| 29 |
+
"explanation/how-koharu-works.md",
|
| 30 |
+
"explanation/technical-deep-dive.md",
|
| 31 |
+
"explanation/acceleration-and-runtime.md",
|
| 32 |
+
"explanation/models-and-providers.md",
|
| 33 |
+
]},
|
| 34 |
+
{"Reference" = [
|
| 35 |
+
"reference/index.md",
|
| 36 |
+
"reference/cli.md",
|
| 37 |
+
"reference/http-api.md",
|
| 38 |
+
"reference/mcp-tools.md",
|
| 39 |
+
"reference/settings.md",
|
| 40 |
+
"reference/keyboard-shortcuts.md",
|
| 41 |
+
]},
|
| 42 |
]
|
| 43 |
|
| 44 |
[project.extra]
|