没有多模态模型 QWenVLChatModel

#27

by jerry-lion - opened May 10, 2025

May 10, 2025

config.json中只有"auto_map": {
"AutoConfig": "configuration_qwen.QWenConfig",
"AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
}这是不支持图像输入的“纯语言模型”配置，即它默认调用的模型类是 QWenLMHeadModel，而不是多模态模型 QWenVLChatModel；同时modeling_qwen.py 也不包含 QWenVLChatModel 类

jerry-lion

May 10, 2025

QWenLMHeadModel 是支持图像输入的前提是你输入的是编码后的特殊 token，如 ... 字节数组（byte image）。
当前的问题不是模型类不支持图像，而是你传入了 image= 参数给 .chat()，而这个函数不接受 image= 作为参数（会报错）。
官方对图像的处理是：你把图像保存成 .... 的 base64 或路径编码字符串，在 prompt 里一并传入，它会在 transformer 模块自动识别并编码。
所以是vision输入处理有问题一开始

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment