metadata
library_name: nanofresh
license: mit
pipeline_tag: image-text-to-text
tags:
- vision-language
- multimodal
- prefixvlm
- dual-tower
- kv-transport
PrefixVLM
PrefixVLM is a dual-language-model vision-language architecture with one-pass K/V transport:
- Left LM (
left_lm) builds image-conditioned memory. - Right LM (
right_lm) performs continuation. - Optional K/V bridge transforms left K/V before transport.
transport_modesupportsimage(visual-token transport) andfull(full-sequence transport).- Optional gate (
use_gate=True) applies feature-wise modulation:sigmoid(Wx)(no bias).
Load
from models.prefix_vlm import PrefixVLM
model = PrefixVLM.from_pretrained("patrickamadeus/dt-all-image-400")
Build from backbone configs
from models.config import VLMConfig
from models.prefix_vlm import PrefixVLM
cfg = VLMConfig()
model = PrefixVLM(cfg=cfg, load_backbone=True, bridge_mode="linear", bridge_use_gate=True)