Instructions to use microsoft/Florence-2-large-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/Florence-2-large-ft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="microsoft/Florence-2-large-ft", trust_remote_code=True)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use microsoft/Florence-2-large-ft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/Florence-2-large-ft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Florence-2-large-ft",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/microsoft/Florence-2-large-ft

SGLang

How to use microsoft/Florence-2-large-ft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/Florence-2-large-ft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Florence-2-large-ft",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/Florence-2-large-ft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Florence-2-large-ft",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use microsoft/Florence-2-large-ft with Docker Model Runner:
```
docker model run hf.co/microsoft/Florence-2-large-ft
```

fix: compatibility patches for newer transformers (4.50+ / 5.x)

#44

by TheeRealDude - opened 13 days ago

base: refs/heads/main

←

from: refs/pr/44

Discussion Files changed

+11

-18

TheeRealDude

13 days ago

Summary

Compatibility patches for newer transformers versions (4.50+ / 5.x) that broke Florence-2's custom code. Each fix is a minimal getattr/wrapper-style change that preserves behavior on older transformers while not crashing on newer ones.

Changes

1. `configuration_florence2.py` — `forced_bos_token_id` access

Newer transformers no longer auto-creates forced_bos_token_id on PretrainedConfig subclasses, causing AttributeError during Florence2LanguageConfig.__init__. Wrap with getattr(self, "forced_bos_token_id", None) for safe access.

2. `modeling_florence2.py` — `torch.linspace().item()` on meta tensors

Newer transformers instantiates the model on the meta device first (lazy loading), then materializes weights. The DaViT vision tower's __init__ calls [x.item() for x in torch.linspace(...)] which fails on meta tensors. Replaced with pure-Python list comprehension that doesn't touch tensors during init.

3. `modeling_florence2.py` — `_supports_sdpa` / `_supports_flash_attn_2` properties

These were @property methods that delegated to self.language_model._supports_sdpa. But during super().__init__() (parent class init), self.language_model doesn't exist yet, causing AttributeError. Replaced with class-level attributes (_supports_sdpa = True, _supports_flash_attn_2 = False) since the support state is known at class definition.

4. `processing_florence2.py` — `tokenizer.additional_special_tokens` access

Newer transformers RobertaTokenizer (and others) no longer expose additional_special_tokens as a direct attribute. Use getattr(tokenizer, 'additional_special_tokens', []) for safe access with empty-list fallback.

Caveat / Known Limitation

These patches resolve the Python-level errors during model loading. There remains a CUDA-level segfault during the SDPA forward pass with transformers 5.x + torch 2.10+ — that's a deeper compatibility issue with the language model's SDPA attention path that this PR does not address. As a workaround, users on transformers 5.x can use attn_implementation="eager" to avoid the SDPA path entirely.

For users on transformers <= 4.49, these 4 patches are sufficient for Florence-2 to work correctly with newer attribute access patterns.

Tested with

transformers 4.49.0 + torch 2.11.0+cu128 + CUDA 12.8 — works
transformers 5.5.0 — Python errors fixed by these patches, but separate CUDA SDPA crash remains (not addressed here)

fix: compatibility patches for newer transformers (4.50+ / 5.x)a452acba

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment

fix: compatibility patches for newer transformers (4.50+ / 5.x)

Summary

Changes

1. configuration_florence2.py — forced_bos_token_id access

2. modeling_florence2.py — torch.linspace().item() on meta tensors

3. modeling_florence2.py — _supports_sdpa / _supports_flash_attn_2 properties