IP-Adapter using Jina-clip-v2's vision tower, and to be used on Mugen

Quick Notes:

368M parameters
Trained on Mugen w/Jina with Jina-clip-v2 + Jina-clip-v2 adapter as the text encoder.
- However, it also works on Mugen with the original CLIP-G + CLIP-L text encoders. The outputs are slightly different.
Trained for 5 epochs with 208,000 unbatched steps in total, at a base resolution 1024x1024.
- Started showing promising results from the 2nd epoch. I have included the previous epochs in this repo if you are interested in testing them.

Install custom ComfyUI nodes from here
Download the IP-adapter in this repo and place it at the path ComfyUI/models/ip_adapter/
Download jina-clip-v2 and put files under a new folder in ComfyUI/models/LLM/
Download Mugen Jina (Cross-attn only trained) and Jina-clip-v2 adapter
- Alternatively you can just use Mugen

See the workflow included in this repo on how to use the IP-adapter.
I mainly tested the IP-adapter as a style transfer, however it can do other things too.
Try different weighting if the default of 1.0 is too much. The default of 1.0 isn't recommened in all cases.