Model Assets

PixDLM uses the following components:

Asset	Default local path	Source
PixDLM checkpoint	`pretrained/pixdlm-7b`	`WhynotHug/PixDLM`
CLIP vision tower	`checkpoints/clip-vit-large-patch14`	`openai/clip-vit-large-patch14`
LLaVA/Vicuna base	`checkpoints/llava-v1.6-vicuna-7b`	LLaVA/Vicuna upstream
SAM2 checkpoint	`checkpoints/sam2_checkpoints/sam2.1_hiera_large.pt`	SAM2 upstream

The release scripts do not assume private filesystem locations. Pass paths explicitly through command-line arguments or use the default relative layout.

Weight Loading

Evaluation uses:

--version pretrained/pixdlm-7b
--vision-tower checkpoints/clip-vit-large-patch14

The pretrained/pixdlm-7b directory is a model checkpoint directory, not the project root. It contains HuggingFace config/tokenizer files and the downloaded or merged PixDLM weights.

Training from the base LLaVA/Vicuna model uses:

--version checkpoints/llava-v1.6-vicuna-7b

Follow the upstream licenses for LLaVA, Vicuna/LLaMA, CLIP, and SAM2.