WhynotHug
/

PixDLM

Image Segmentation

compute-transparency

reasoning-segmentation

vision-language

Model card Files Files and versions

PixDLM / docs /MODEL.md

WhynotHug's picture

Upload folder using huggingface_hub

3334467 verified 5 days ago

|

History Blame Contribute Delete

1.1 kB

	# Model Assets

	PixDLM uses the following components:

	\| Asset \| Default local path \| Source \|
	\| --- \| --- \| --- \|
	\| PixDLM checkpoint \| `pretrained/pixdlm-7b` \| `WhynotHug/PixDLM` \|
	\| CLIP vision tower \| `checkpoints/clip-vit-large-patch14` \| `openai/clip-vit-large-patch14` \|
	\| LLaVA/Vicuna base \| `checkpoints/llava-v1.6-vicuna-7b` \| LLaVA/Vicuna upstream \|
	\| SAM2 checkpoint \| `checkpoints/sam2_checkpoints/sam2.1_hiera_large.pt` \| SAM2 upstream \|

	The release scripts do not assume private filesystem locations. Pass paths
	explicitly through command-line arguments or use the default relative layout.

	## Weight Loading

	Evaluation uses:

	```bash
	--version pretrained/pixdlm-7b
	--vision-tower checkpoints/clip-vit-large-patch14
	```

	The `pretrained/pixdlm-7b` directory is a model checkpoint directory, not the
	project root. It contains HuggingFace config/tokenizer files and the downloaded
	or merged PixDLM weights.

	Training from the base LLaVA/Vicuna model uses:

	```bash
	--version checkpoints/llava-v1.6-vicuna-7b
	```

	Follow the upstream licenses for LLaVA, Vicuna/LLaMA, CLIP, and SAM2.