Moondream2 LoRA v22 โ€” UI grounding

LoRA adapter for Moondream2 (revision 2025-06-21) fine-tuned for UI click localization. This is the latest iteration in the v* series; for the version used in the published end-to-end benchmarks see Khabner/moondream-lora-v12.

Hyperparameters

LoRA rank 16, alpha 32, dropout 0.05; 2 epochs, lr 1e-4, cosine decay with 10% warmup; bfloat16 (GPU/MPS), float32 (CPU).

Usage

import importlib, torch
from huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2", revision="2025-06-21",
    trust_remote_code=True, torch_dtype=torch.bfloat16,
)
model.model._setup_caches()

ckpt = torch.load(hf_hub_download("Khabner/moondream-lora-v22", "adapter.pt"), map_location="cpu", weights_only=True)

def _nest(flat):
    tree = {}
    for k, v in flat.items():
        d = tree
        for p in k.split(".")[:-1]:
            d = d.setdefault(p, {})
        d[k.split(".")[-1]] = v
    return tree

inner = model.model
pkg = inner.__class__.__module__.rsplit(".", 1)[0]
flat = {k: v.to(device=str(inner.device), dtype=torch.bfloat16) for k, v in ckpt["lora"].items()}
importlib.import_module(f"{pkg}.moondream").variant_state_dict = lambda *a, **kw: _nest(flat)

if "coord_decoder" in ckpt:
    cd = {k.removeprefix("coord_decoder."): v.to(device=str(inner.device), dtype=torch.bfloat16)
          for k, v in ckpt["coord_decoder"].items()}
    inner.region.coord_decoder.load_state_dict(cd)

model.eval()
result = model.model.point(image, "Send button", settings={"variant": "custom"})

Full reference inference and serving code: github.com/VLM-WEBTEST/magnitude_integration.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Khabner/moondream-lora-v22

Adapter
(7)
this model