File size: 4,513 Bytes
90dce1f
 
 
 
de84f82
 
 
 
90dce1f
b7624b8
3714332
0342aef
7eb04d0
b7624b8
 
 
c5fa74f
c3e62bf
c5fa74f
 
 
 
 
 
 
7eb04d0
b7624b8
 
 
 
 
 
 
 
 
e69fa77
 
 
 
 
 
 
 
 
cc643a9
 
 
 
e69fa77
8717741
 
0bed8ea
 
 
e86608c
8717741
 
 
 
 
efb415f
8717741
 
 
 
efb415f
8717741
 
 
b8e5fa6
 
 
 
 
d7ac7fc
 
 
a05d9bf
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: other
license_name: ideogram-non-commercial-model-agreement
license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md
tags:
- comfyui
- ai-toolkit
- lora
---

<img src='resources/MERIDIQ_before_after_50_50.png' width='512'>

This is an early experimental LoRA that adds bbox guided inpainting / editing to the Ideogram 4
model. It is a work in progress, so the files here are snapshots at different
points in time while I adjust training parameters and build a better dataset.

~~I currently get the most stable results with the [checkpoint at step 4000](https://huggingface.co/BitPoet/Ideogram4-Inpaint-LoRA/blob/main/IdoInpaint_2_00004000.safetensors) of the second training run.~~

~~The dataset is very small, so do not expect any magic or precision. It is a starting point that hopefully evolves over the next weeks as I prepare a bigger dataset and start over with training with larger rank and finetuned parameters.~~

__June 21, 2026::__

V3 of the LoRA was trained on an expanded dataset of 1000 image pairs, exclusively adding/removing/replacing objects and people.

The most stable [checkpoint so far is at step 6400](https://huggingface.co/BitPoet/Ideogram4-Inpaint-LoRA/blob/main/V3/Ido4Inpaint_V3_1k_000006400.safetensors).

## Prerequisites

### Custom Node

You can find my custom node set on GitHub at [ComfyUI-bitpoet-IG4Inpaint](https://github.com/BitPoet/ComfyUI-bitpoet-IG4Inpaint).
The necessary workflow is included in the node or can be downloaded [here](https://github.com/BitPoet/ComfyUI-bitpoet-IG4Inpaint/blob/main/workflows/ideogram4_reference_workflow.json).

### ComfyUI Changes

Check out or download the [dev-ideogram4-inpaint branch](https://github.com/BitPoet/ComfyUI/tree/dev-ideogram4-inpaint) of my Comfy fork.

## Training

To train with reference images, you currently need to use a slightly adapted fork of AI-Toolkit.
You can find my bitpoet-ideogram4-refimages branch [here on GitHub](https://github.com/BitPoet/ai-toolkit/tree/bitpoet-ideogram4-refimages)

It also includes a fix for the UTF-8 / ANSII error lately popping up on Windows that has jobs fail at startup.

Note that this AI-Toolkit adaption has a switch for reference image support at the top of the dataset editor.
You have to switch this on every time you open a dataset with reference images.

An example training config for AI-Toolkit is also [in this repository](https://huggingface.co/BitPoet/Ideogram4-Inpaint-LoRA/blob/main/ai-toolkit_example_job_config.json).

I will add a small example dataset at some point.

If you want to assemble your own dataset, you might find my simple [node.js based dataset editor IdeoInCap](https://github.com/BitPoet/IdeoInCap) handy (that's short
for Ideogram4 Inpaint Captioning. I know, not my most creative moment.) It's tailored especially for Ideogram 4 image-reference-prompt datasets with a graphical bbox
editor and completion indication.

### Buzzwords (technical details)

What we changed in AI-Toolkit besides the dataset editor:

We added reference-latent token concatenation for Ideogram 4: each clean reference image is VAE-encoded and appended to the packed sequence as
`[text | noisy target | clean reference]`, with its own indicator, MRoPE time coordinate, and clean timestep. The transformer output and
diffusion loss are sliced to target tokens only, while bounding-box JSON prompts provide spatial edit conditioning.

These changes have to be mirrored in ComfyUI as well:

ComfyUI core: Extended the native Ideogram 4 model to accept reference latents and reproduce the training sequence `[text | noisy output | clean reference]`,
including the separate indicator, MRoPE coordinate, clean timestep, and output-only prediction slicing.

Custom node: Ideogram4ReferenceConditioning resizes and VAE-encodes a reference image to match the target latent, then attaches it only to positive
conditioning so the separate unconditional model remains unchanged.

## Credits

Credits go to:
- [ideogram-ai](https://huggingface.co/ideogram-ai) for releasing a highly interesting and high quality new image model. 
- Ostris for [AI-Toolkit](https://github.com/ostris/ai-toolkit) 
- [Comfy-Org](https://github.com/Comfy-Org) and [Kijai](https://huggingface.co/Kijai) for [ComfyUI](https://github.com/comfy-org/ComfyUI) itself and zero day support for Ideogram 4


## Disclaimer

I am in no way affiliated with Ideogram, Inc.  
The LoRAs provided here are my own experimental work.  
Please see the license linked above.