File size: 996 Bytes
670195d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
library_name: nanofresh
license: mit
pipeline_tag: image-text-to-text
tags:
  - vision-language
  - multimodal
  - prefixvlm
  - dual-tower
  - kv-transport
---

# PrefixVLM

**PrefixVLM** is a dual-language-model vision-language architecture with one-pass K/V transport:
- Left LM (`left_lm`) builds image-conditioned memory.
- Right LM (`right_lm`) performs continuation.
- Optional K/V bridge transforms left K/V before transport.
- `transport_mode` supports `image` (visual-token transport) and `full` (full-sequence transport).
- Optional gate (`use_gate=True`) applies feature-wise modulation: `sigmoid(Wx)` (no bias).

## Load

```python
from models.prefix_vlm import PrefixVLM
model = PrefixVLM.from_pretrained("patrickamadeus/dt-all-image-2000")
```

## Build from backbone configs

```python
from models.config import VLMConfig
from models.prefix_vlm import PrefixVLM

cfg = VLMConfig()
model = PrefixVLM(cfg=cfg, load_backbone=True, bridge_mode="linear", bridge_use_gate=True)
```