Issue with MLLM Planner / MLPConnector zero initialization in config.json?

by Relven - opened Jun 14

Jun 14

Hi Bernini Team,

First of all, thank you for open-sourcing this incredible framework and releasing the paper!

We are currently trying to implement and test the full pipeline using the Qwen2.5-VL planner as specified in the repository's configuration. We have successfully set up the mllm/ folder by pulling the official Qwen2.5-VL-7B-Instruct weights to match the architecture layout.

However, during our tests, we noticed that the model behaves identically to a standard T5-only inference. There is no noticeable difference in prompt-following, scene physics, or complex logic when the MLLM path is active versus when it is completely bypassed.

Upon deeper inspection of the root config.json, we found the following configuration for the projector:

JSON
"connector_cfg": {
"model_type": "MLPConnector",
"out_dim_for_gen": 4096,
"enable_gen_branch": true,
"out_dim_for_vit": 3584,
"enable_vit_branch": true,
"gen_head_type": "zerolinear",
"zero_init_proj_gen_last": true
},
"t5_combine_type": "concat_with_zero_init"

When analyzing the actual tensor weights inside the bernini checkpoint subfolder, the RMSnorm/projection weights for this connector appear to be effectively zero (or near-zero), meaning any latent semantic guidance coming from the Qwen MLLM is completely nullified before it reaches the DiT renderer.

Our questions are:

Is the currently released checkpoint in bernini/ a preview snapshot that lacks the fully trained weights from Stage III (Joint Training) for the MLPConnector?

If the Stage III weights are fully trained and present, is there a specific script, inference flag, or custom framework routing required to prevent the model from force-initializing/overriding this connector to zero during loading?

Any clarification or code snippets regarding the proper inference setup for the full MLLM-planner + DiT-renderer pipeline would be greatly appreciated by the open-source community!

Best regards