File size: 1,098 Bytes
# Model Assets

PixDLM uses the following components:

| Asset | Default local path | Source |
| --- | --- | --- |
| PixDLM checkpoint | `pretrained/pixdlm-7b` | `WhynotHug/PixDLM` |
| CLIP vision tower | `checkpoints/clip-vit-large-patch14` | `openai/clip-vit-large-patch14` |
| LLaVA/Vicuna base | `checkpoints/llava-v1.6-vicuna-7b` | LLaVA/Vicuna upstream |
| SAM2 checkpoint | `checkpoints/sam2_checkpoints/sam2.1_hiera_large.pt` | SAM2 upstream |

The release scripts do not assume private filesystem locations. Pass paths
explicitly through command-line arguments or use the default relative layout.

## Weight Loading

Evaluation uses:

```bash
--version pretrained/pixdlm-7b
--vision-tower checkpoints/clip-vit-large-patch14
```

The `pretrained/pixdlm-7b` directory is a model checkpoint directory, not the
project root. It contains HuggingFace config/tokenizer files and the downloaded
or merged PixDLM weights.

Training from the base LLaVA/Vicuna model uses:

```bash
--version checkpoints/llava-v1.6-vicuna-7b
```

Follow the upstream licenses for LLaVA, Vicuna/LLaMA, CLIP, and SAM2.