fix(chat_template): Emit multimodal placeholders in tool response content-parts

### What does this PR do?

→ When a tool message contains multimodal content parts (e.g. `[{"type": "text", ...}, {"type": "image"}]`), the template only extracts text parts and silently drops image/audio/video placeholders. Causes downstream multimodal processors (e.g. <ins>vLLM</ins>) to fail with:

`Failed to apply prompt replacement for mm_items['image'][0]`

→ Images in user messages work fine because the `captured_content` block properly handles all content types. The tool message branch was missing the same handling 🤗
→ Bug reported in: https://github.com/vllm-project/vllm/issues/41452

### Fixed by?

→ After rendering the tool response text block, emit `<|image|>`, `<|audio|>`, and `<|video|>` placeholders for any multimodal parts in the content array. This matches the pattern already used for regular message content in the `captured_content` block later in the template.

Files changed (1) hide show

chat_template.jinja +9 -0

chat_template.jinja CHANGED Viewed

@@ -295,6 +295,15 @@
                                 {%- endif -%}
                             {%- endfor -%}
                             {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
                         {%- else -%}
                             {{- format_tool_response_block(ns_tname.name, tool_body) -}}
                         {%- endif -%}

                                 {%- endif -%}
                             {%- endfor -%}
                             {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
+                            {%- for part in tool_body -%}
+                                {%- if part.get('type') == 'image' -%}
+                                    {{- '<|image|>' -}}
+                                {%- elif part.get('type') == 'audio' -%}
+                                    {{- '<|audio|>' -}}
+                                {%- elif part.get('type') == 'video' -%}
+                                    {{- '<|video|>' -}}
+                                {%- endif -%}
+                            {%- endfor -%}
                         {%- else -%}
                             {{- format_tool_response_block(ns_tname.name, tool_body) -}}
                         {%- endif -%}