dt-all-image-400 / README.md
patrickamadeus's picture
Upload PrefixVLM checkpoint
de1a5e9 verified
metadata
library_name: nanofresh
license: mit
pipeline_tag: image-text-to-text
tags:
  - vision-language
  - multimodal
  - prefixvlm
  - dual-tower
  - kv-transport

PrefixVLM

PrefixVLM is a dual-language-model vision-language architecture with one-pass K/V transport:

  • Left LM (left_lm) builds image-conditioned memory.
  • Right LM (right_lm) performs continuation.
  • Optional K/V bridge transforms left K/V before transport.
  • transport_mode supports image (visual-token transport) and full (full-sequence transport).
  • Optional gate (use_gate=True) applies feature-wise modulation: sigmoid(Wx) (no bias).

Load

from models.prefix_vlm import PrefixVLM
model = PrefixVLM.from_pretrained("patrickamadeus/dt-all-image-400")

Build from backbone configs

from models.config import VLMConfig
from models.prefix_vlm import PrefixVLM

cfg = VLMConfig()
model = PrefixVLM(cfg=cfg, load_backbone=True, bridge_mode="linear", bridge_use_gate=True)