Moondream2 LoRA v22 โ UI grounding
LoRA adapter for Moondream2 (revision 2025-06-21) fine-tuned for UI click localization. This is the latest iteration in the v* series; for the version used in the published end-to-end benchmarks see Khabner/moondream-lora-v12.
Hyperparameters
LoRA rank 16, alpha 32, dropout 0.05; 2 epochs, lr 1e-4, cosine decay with 10% warmup; bfloat16 (GPU/MPS), float32 (CPU).
Usage
import importlib, torch
from huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2", revision="2025-06-21",
trust_remote_code=True, torch_dtype=torch.bfloat16,
)
model.model._setup_caches()
ckpt = torch.load(hf_hub_download("Khabner/moondream-lora-v22", "adapter.pt"), map_location="cpu", weights_only=True)
def _nest(flat):
tree = {}
for k, v in flat.items():
d = tree
for p in k.split(".")[:-1]:
d = d.setdefault(p, {})
d[k.split(".")[-1]] = v
return tree
inner = model.model
pkg = inner.__class__.__module__.rsplit(".", 1)[0]
flat = {k: v.to(device=str(inner.device), dtype=torch.bfloat16) for k, v in ckpt["lora"].items()}
importlib.import_module(f"{pkg}.moondream").variant_state_dict = lambda *a, **kw: _nest(flat)
if "coord_decoder" in ckpt:
cd = {k.removeprefix("coord_decoder."): v.to(device=str(inner.device), dtype=torch.bfloat16)
for k, v in ckpt["coord_decoder"].items()}
inner.region.coord_decoder.load_state_dict(cd)
model.eval()
result = model.model.point(image, "Send button", settings={"variant": "custom"})
Full reference inference and serving code: github.com/VLM-WEBTEST/magnitude_integration.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Khabner/moondream-lora-v22
Base model
vikhyatk/moondream2