Florence-2-base LoRA v1 β UI grounding
PEFT LoRA adapter for Florence-2-base-ft (refs/pr/6) fine-tuned for UI click localization via the <CAPTION_TO_PHRASE_GROUNDING> task. Coordinates returned as <loc_N> tokens (normalized via /999).
Benchmark results
- 84% pass rate (42/50) on a 50-test end-to-end Magnitude suite across ~25 public sites β fastest among Florence variants when it succeeds.
- Median best-pass time: 39.0s, mean 54.2s.
- Regresses to 70% on the v22 iteration of the same training pipeline β see
Khabner/florence-base-lora-v22.
Usage
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor
base = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-base-ft", trust_remote_code=True, revision="refs/pr/6",
)
processor = AutoProcessor.from_pretrained(
"microsoft/Florence-2-base-ft", trust_remote_code=True, revision="refs/pr/6",
)
model = PeftModel.from_pretrained(base, "Khabner/florence-base-lora-v1").eval()
prompt = "<CAPTION_TO_PHRASE_GROUNDING>Send button"
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated = model.generate(**inputs, max_new_tokens=50, num_beams=1, do_sample=False)
text = processor.batch_decode(generated, skip_special_tokens=False)[0]
# Parse <loc_X><loc_Y> from `text`, divide by 999 β normalized coords
Full reference inference and FastAPI serving code: github.com/VLM-WEBTEST/magnitude_integration.
- Downloads last month
- 16
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for Khabner/florence-base-lora-v1
Base model
microsoft/Florence-2-base-ft