| library_name: nanofresh | |
| license: mit | |
| pipeline_tag: image-text-to-text | |
| tags: | |
| - vision-language | |
| - multimodal | |
| - prefixvlm | |
| - dual-tower | |
| - kv-transport | |
| # PrefixVLM | |
| **PrefixVLM** is a dual-language-model vision-language architecture with one-pass K/V transport: | |
| - Left LM (`left_lm`) builds image-conditioned memory. | |
| - Right LM (`right_lm`) performs continuation. | |
| - Optional K/V bridge transforms left K/V before transport. | |
| - `transport_mode` supports `image` (visual-token transport) and `full` (full-sequence transport). | |
| - Optional gate (`use_gate=True`) applies feature-wise modulation: `sigmoid(Wx)` (no bias). | |
| ## Load | |
| ```python | |
| from models.prefix_vlm import PrefixVLM | |
| model = PrefixVLM.from_pretrained("patrickamadeus/dt-all-image-400") | |
| ``` | |
| ## Build from backbone configs | |
| ```python | |
| from models.config import VLMConfig | |
| from models.prefix_vlm import PrefixVLM | |
| cfg = VLMConfig() | |
| model = PrefixVLM(cfg=cfg, load_backbone=True, bridge_mode="linear", bridge_use_gate=True) | |
| ``` | |