Using ZwZ and better VLM along side DeepGen
There are better VLMs out there than just Qwen3, so would swapping them out lead to better performance (understanding)?
yep, but need to be trained for alignment with DiT
As long as there are cheap ways to do re-alignment across different sized Qwen3 VLM derivatives (and maybe also other VLMs with "linear attention") that would be really sweet. We need the speed to go along with the modality. Ditto for Pony v7 or Chroma+Kontext or Qwen-Image/Z-Image finetunes or some other comprehensive diffusion models. Robust alignment + finetuned knowledge transfer.
Delta-Sampling and architecture transfer all looks like good ideas https://www.alphaxiv.org/abs/2512.03056 https://alphaxiv.org/abs/2506.18999
Thanks for your share, we will consider it and merge Qwen3-VL in DeepGen 1.5 . Welcome and stay tuned~