--- library_name: nanofresh license: mit pipeline_tag: image-text-to-text tags: - vision-language - multimodal - prefixvlm - dual-tower - kv-transport --- # PrefixVLM **PrefixVLM** is a dual-language-model vision-language architecture with one-pass K/V transport: - Left LM (`left_lm`) builds image-conditioned memory. - Right LM (`right_lm`) performs continuation. - Optional K/V bridge transforms left K/V before transport. - `transport_mode` supports `image` (visual-token transport) and `full` (full-sequence transport). - Optional gate (`use_gate=True`) applies feature-wise modulation: `sigmoid(Wx)` (no bias). ## Load ```python from models.prefix_vlm import PrefixVLM model = PrefixVLM.from_pretrained("patrickamadeus/dt-all-image-400") ``` ## Build from backbone configs ```python from models.config import VLMConfig from models.prefix_vlm import PrefixVLM cfg = VLMConfig() model = PrefixVLM(cfg=cfg, load_backbone=True, bridge_mode="linear", bridge_use_gate=True) ```