Image-to-Image

Using ZwZ and better VLM along side DeepGen

#6
by TomLucidor - opened

There are better VLMs out there than just Qwen3, so would swapping them out lead to better performance (understanding)?

deepgen org

yep, but need to be trained for alignment with DiT

As long as there are cheap ways to do re-alignment across different sized Qwen3 VLM derivatives (and maybe also other VLMs with "linear attention") that would be really sweet. We need the speed to go along with the modality. Ditto for Pony v7 or Chroma+Kontext or Qwen-Image/Z-Image finetunes or some other comprehensive diffusion models. Robust alignment + finetuned knowledge transfer.
Delta-Sampling and architecture transfer all looks like good ideas https://www.alphaxiv.org/abs/2512.03056 https://alphaxiv.org/abs/2506.18999

deepgen org

Thanks for your share, we will consider it and merge Qwen3-VL in DeepGen 1.5 . Welcome and stay tuned~

Sign up or log in to comment