Florence-2-base LoRA v1 — UI grounding

PEFT LoRA adapter for Florence-2-base-ft (refs/pr/6) fine-tuned for UI click localization via the <CAPTION_TO_PHRASE_GROUNDING> task. Coordinates returned as <loc_N> tokens (normalized via /999).

Benchmark results

84% pass rate (42/50) on a 50-test end-to-end Magnitude suite across ~25 public sites — fastest among Florence variants when it succeeds.
Median best-pass time: 39.0s, mean 54.2s.
Regresses to 70% on the v22 iteration of the same training pipeline — see Khabner/florence-base-lora-v22.

Usage

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor

base = AutoModelForCausalLM.from_pretrained(
    "microsoft/Florence-2-base-ft", trust_remote_code=True, revision="refs/pr/6",
)
processor = AutoProcessor.from_pretrained(
    "microsoft/Florence-2-base-ft", trust_remote_code=True, revision="refs/pr/6",
)
model = PeftModel.from_pretrained(base, "Khabner/florence-base-lora-v1").eval()

prompt = "<CAPTION_TO_PHRASE_GROUNDING>Send button"
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated = model.generate(**inputs, max_new_tokens=50, num_beams=1, do_sample=False)
text = processor.batch_decode(generated, skip_special_tokens=False)[0]
# Parse <loc_X><loc_Y> from `text`, divide by 999 → normalized coords

Full reference inference and FastAPI serving code: github.com/VLM-WEBTEST/magnitude_integration.

Downloads last month: 16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Khabner/florence-base-lora-v1

Base model

microsoft/Florence-2-base-ft

Adapter

(19)

this model