Florence-2-large LoRA v1 — UI grounding

PEFT LoRA adapter for Florence-2-large-ft fine-tuned for UI click localization via <CAPTION_TO_PHRASE_GROUNDING>.

Benchmark results — does not justify its cost

78% pass rate (39/50) on the 50-test Magnitude suite — worse than the smaller Khabner/florence-base-lora-v1 (84%) at ×3–4 inference cost.
Median best-pass time 40.0s, mean 61.8s.
Deterministic failure on Apple AirPods navigation (#5, 4/4 runs no points) — all other variants pass this test. A narrow regression introduced by the v1 LoRA training on Florence-2-large specifically.
Other persistent fails: terse instructions (0/3 on Round 12), Booking date picker, structural failures shared across all models (Wikipedia history radio, inline [3]).

A larger backbone does not fix structural grounding weaknesses (controls in dense table rows, tiny inline links). Recommendation: prefer the base variant.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor

base = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "Khabner/florence-large-lora-v1").eval()

Full inference + FastAPI server code: github.com/VLM-WEBTEST/magnitude_integration.

Downloads last month: 37

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Khabner/florence-large-lora-v1

Base model

microsoft/Florence-2-large-ft

Adapter

(2)

this model