Florence-2-large LoRA v1 β€” UI grounding

PEFT LoRA adapter for Florence-2-large-ft fine-tuned for UI click localization via <CAPTION_TO_PHRASE_GROUNDING>.

Benchmark results β€” does not justify its cost

  • 78% pass rate (39/50) on the 50-test Magnitude suite β€” worse than the smaller Khabner/florence-base-lora-v1 (84%) at Γ—3–4 inference cost.
  • Median best-pass time 40.0s, mean 61.8s.
  • Deterministic failure on Apple AirPods navigation (#5, 4/4 runs no points) β€” all other variants pass this test. A narrow regression introduced by the v1 LoRA training on Florence-2-large specifically.
  • Other persistent fails: terse instructions (0/3 on Round 12), Booking date picker, structural failures shared across all models (Wikipedia history radio, inline [3]).

A larger backbone does not fix structural grounding weaknesses (controls in dense table rows, tiny inline links). Recommendation: prefer the base variant.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor

base = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "Khabner/florence-large-lora-v1").eval()

Full inference + FastAPI server code: github.com/VLM-WEBTEST/magnitude_integration.

Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Khabner/florence-large-lora-v1

Adapter
(2)
this model