Bug fix for Transformers v4.49.0 per https://huggingface.co/microsoft/Phi-3.5-vision-instruct/discussions/39/files; added a note to README.md to indicate this repo as a fork.

Files changed (2) hide show

README.md CHANGED Viewed

@@ -19,6 +19,8 @@ library_name: transformers
 ---
 ## Model Summary
 Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
 🏡 [Phi-3 Portal](https://azure.microsoft.com/en-us/products/phi-3) <br>

 ---
 ## Model Summary
+n.b., This is a fork of [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct) that fixes the bug in modeling_phi3_v.py per [this Discussion here.](https://huggingface.co/microsoft/Phi-3.5-vision-instruct/discussions/39/files)
 Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision. The model belongs to the Phi-3 model family, and the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
 🏡 [Phi-3 Portal](https://azure.microsoft.com/en-us/products/phi-3) <br>

modeling_phi3_v.py CHANGED Viewed

@@ -1658,7 +1658,7 @@ class Phi3VForCausalLM(Phi3VPreTrainedModel):
             if isinstance(past_key_values, Cache):
                 cache_length = past_key_values.get_seq_length()
                 past_length = past_key_values.seen_tokens
-                max_cache_length = past_key_values.get_max_length()
             else:
                 cache_length = past_length = past_key_values[0][0].shape[2]
                 max_cache_length = None

             if isinstance(past_key_values, Cache):
                 cache_length = past_key_values.get_seq_length()
                 past_length = past_key_values.seen_tokens
+                max_cache_length = past_key_values.get_max_cache_shape()
             else:
                 cache_length = past_length = past_key_values[0][0].shape[2]
                 max_cache_length = None