Using ZwZ and better VLM along side DeepGen

by TomLucidor - opened Feb 14

Discussion

TomLucidor

Feb 14

There are better VLMs out there than just Qwen3, so would swapping them out lead to better performance (understanding)?

Alex11556666

deepgen org Feb 15

yep, but need to be trained for alignment with DiT

TomLucidor

Feb 15

•

edited Feb 15

As long as there are cheap ways to do re-alignment across different sized Qwen3 VLM derivatives (and maybe also other VLMs with "linear attention") that would be really sweet. We need the speed to go along with the modality. Ditto for Pony v7 or Chroma+Kontext or Qwen-Image/Z-Image finetunes or some other comprehensive diffusion models. Robust alignment + finetuned knowledge transfer.
Delta-Sampling and architecture transfer all looks like good ideas https://www.alphaxiv.org/abs/2512.03056 https://alphaxiv.org/abs/2506.18999

Alex11556666

deepgen org Feb 15

Thanks for your share, we will consider it and merge Qwen3-VL in DeepGen 1.5 . Welcome and stay tuned~

Alex11556666 changed discussion status to closed Feb 18

TomLucidor

Feb 18

@Alex11556666 if I ever want something to be revamped, that would be X-Adapter so that more efficient models can be switched out, and knowledge can aggregate https://www.alphaxiv.org/abs/2312.02238

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment