fix: compatibility patches for newer transformers (4.50+ / 5.x)
Summary
Compatibility patches for newer transformers versions (4.50+ / 5.x) that broke Florence-2's custom code. Each fix is a minimal getattr/wrapper-style change that preserves behavior on older transformers while not crashing on newer ones.
Changes
1. configuration_florence2.py β forced_bos_token_id access
Newer transformers no longer auto-creates forced_bos_token_id on PretrainedConfig subclasses, causing AttributeError during Florence2LanguageConfig.__init__. Wrap with getattr(self, "forced_bos_token_id", None) for safe access.
2. modeling_florence2.py β torch.linspace().item() on meta tensors
Newer transformers instantiates the model on the meta device first (lazy loading), then materializes weights. The DaViT vision tower's __init__ calls [x.item() for x in torch.linspace(...)] which fails on meta tensors. Replaced with pure-Python list comprehension that doesn't touch tensors during init.
3. modeling_florence2.py β _supports_sdpa / _supports_flash_attn_2 properties
These were @property methods that delegated to self.language_model._supports_sdpa. But during super().__init__() (parent class init), self.language_model doesn't exist yet, causing AttributeError. Replaced with class-level attributes (_supports_sdpa = True, _supports_flash_attn_2 = False) since the support state is known at class definition.
4. processing_florence2.py β tokenizer.additional_special_tokens access
Newer transformers RobertaTokenizer (and others) no longer expose additional_special_tokens as a direct attribute. Use getattr(tokenizer, 'additional_special_tokens', []) for safe access with empty-list fallback.
Caveat / Known Limitation
These patches resolve the Python-level errors during model loading. There remains a CUDA-level segfault during the SDPA forward pass with transformers 5.x + torch 2.10+ β that's a deeper compatibility issue with the language model's SDPA attention path that this PR does not address. As a workaround, users on transformers 5.x can use attn_implementation="eager" to avoid the SDPA path entirely.
For users on transformers <= 4.49, these 4 patches are sufficient for Florence-2 to work correctly with newer attribute access patterns.
Tested with
- transformers 4.49.0 + torch 2.11.0+cu128 + CUDA 12.8 β works
- transformers 5.5.0 β Python errors fixed by these patches, but separate CUDA SDPA crash remains (not addressed here)