Align text_config.hidden_size with vision encoder output. Make both 128
· Sign up or log in to comment