File size: 1,098 Bytes
3334467
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Model Assets

PixDLM uses the following components:

| Asset | Default local path | Source |
| --- | --- | --- |
| PixDLM checkpoint | `pretrained/pixdlm-7b` | `WhynotHug/PixDLM` |
| CLIP vision tower | `checkpoints/clip-vit-large-patch14` | `openai/clip-vit-large-patch14` |
| LLaVA/Vicuna base | `checkpoints/llava-v1.6-vicuna-7b` | LLaVA/Vicuna upstream |
| SAM2 checkpoint | `checkpoints/sam2_checkpoints/sam2.1_hiera_large.pt` | SAM2 upstream |

The release scripts do not assume private filesystem locations. Pass paths
explicitly through command-line arguments or use the default relative layout.

## Weight Loading

Evaluation uses:

```bash
--version pretrained/pixdlm-7b
--vision-tower checkpoints/clip-vit-large-patch14
```

The `pretrained/pixdlm-7b` directory is a model checkpoint directory, not the
project root. It contains HuggingFace config/tokenizer files and the downloaded
or merged PixDLM weights.

Training from the base LLaVA/Vicuna model uses:

```bash
--version checkpoints/llava-v1.6-vicuna-7b
```

Follow the upstream licenses for LLaVA, Vicuna/LLaMA, CLIP, and SAM2.