Instructions to use microsoft/Florence-2-large-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Florence-2-large-ft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="microsoft/Florence-2-large-ft", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True) model = AutoModelForMultimodalLM.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use microsoft/Florence-2-large-ft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Florence-2-large-ft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/Florence-2-large-ft
- SGLang
How to use microsoft/Florence-2-large-ft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Florence-2-large-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Florence-2-large-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/Florence-2-large-ft with Docker Model Runner:
docker model run hf.co/microsoft/Florence-2-large-ft
fix: compatibility patches for newer transformers (4.50+ / 5.x)
Summary
Compatibility patches for newer transformers versions (4.50+ / 5.x) that broke Florence-2's custom code. Each fix is a minimal getattr/wrapper-style change that preserves behavior on older transformers while not crashing on newer ones.
Changes
1. configuration_florence2.py β forced_bos_token_id access
Newer transformers no longer auto-creates forced_bos_token_id on PretrainedConfig subclasses, causing AttributeError during Florence2LanguageConfig.__init__. Wrap with getattr(self, "forced_bos_token_id", None) for safe access.
2. modeling_florence2.py β torch.linspace().item() on meta tensors
Newer transformers instantiates the model on the meta device first (lazy loading), then materializes weights. The DaViT vision tower's __init__ calls [x.item() for x in torch.linspace(...)] which fails on meta tensors. Replaced with pure-Python list comprehension that doesn't touch tensors during init.
3. modeling_florence2.py β _supports_sdpa / _supports_flash_attn_2 properties
These were @property methods that delegated to self.language_model._supports_sdpa. But during super().__init__() (parent class init), self.language_model doesn't exist yet, causing AttributeError. Replaced with class-level attributes (_supports_sdpa = True, _supports_flash_attn_2 = False) since the support state is known at class definition.
4. processing_florence2.py β tokenizer.additional_special_tokens access
Newer transformers RobertaTokenizer (and others) no longer expose additional_special_tokens as a direct attribute. Use getattr(tokenizer, 'additional_special_tokens', []) for safe access with empty-list fallback.
Caveat / Known Limitation
These patches resolve the Python-level errors during model loading. There remains a CUDA-level segfault during the SDPA forward pass with transformers 5.x + torch 2.10+ β that's a deeper compatibility issue with the language model's SDPA attention path that this PR does not address. As a workaround, users on transformers 5.x can use attn_implementation="eager" to avoid the SDPA path entirely.
For users on transformers <= 4.49, these 4 patches are sufficient for Florence-2 to work correctly with newer attribute access patterns.
Tested with
- transformers 4.49.0 + torch 2.11.0+cu128 + CUDA 12.8 β works
- transformers 5.5.0 β Python errors fixed by these patches, but separate CUDA SDPA crash remains (not addressed here)