fix: compatibility patches for newer transformers (4.50+ / 5.x)

#44
by TheeRealDude - opened

Summary

Compatibility patches for newer transformers versions (4.50+ / 5.x) that broke Florence-2's custom code. Each fix is a minimal getattr/wrapper-style change that preserves behavior on older transformers while not crashing on newer ones.

Changes

1. configuration_florence2.py β€” forced_bos_token_id access

Newer transformers no longer auto-creates forced_bos_token_id on PretrainedConfig subclasses, causing AttributeError during Florence2LanguageConfig.__init__. Wrap with getattr(self, "forced_bos_token_id", None) for safe access.

2. modeling_florence2.py β€” torch.linspace().item() on meta tensors

Newer transformers instantiates the model on the meta device first (lazy loading), then materializes weights. The DaViT vision tower's __init__ calls [x.item() for x in torch.linspace(...)] which fails on meta tensors. Replaced with pure-Python list comprehension that doesn't touch tensors during init.

3. modeling_florence2.py β€” _supports_sdpa / _supports_flash_attn_2 properties

These were @property methods that delegated to self.language_model._supports_sdpa. But during super().__init__() (parent class init), self.language_model doesn't exist yet, causing AttributeError. Replaced with class-level attributes (_supports_sdpa = True, _supports_flash_attn_2 = False) since the support state is known at class definition.

4. processing_florence2.py β€” tokenizer.additional_special_tokens access

Newer transformers RobertaTokenizer (and others) no longer expose additional_special_tokens as a direct attribute. Use getattr(tokenizer, 'additional_special_tokens', []) for safe access with empty-list fallback.

Caveat / Known Limitation

These patches resolve the Python-level errors during model loading. There remains a CUDA-level segfault during the SDPA forward pass with transformers 5.x + torch 2.10+ β€” that's a deeper compatibility issue with the language model's SDPA attention path that this PR does not address. As a workaround, users on transformers 5.x can use attn_implementation="eager" to avoid the SDPA path entirely.

For users on transformers <= 4.49, these 4 patches are sufficient for Florence-2 to work correctly with newer attribute access patterns.

Tested with

  • transformers 4.49.0 + torch 2.11.0+cu128 + CUDA 12.8 β€” works
  • transformers 5.5.0 β€” Python errors fixed by these patches, but separate CUDA SDPA crash remains (not addressed here)
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment