Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Model Assets
PixDLM uses the following components:
| Asset | Default local path | Source |
|---|---|---|
| PixDLM checkpoint | pretrained/pixdlm-7b |
WhynotHug/PixDLM |
| CLIP vision tower | checkpoints/clip-vit-large-patch14 |
openai/clip-vit-large-patch14 |
| LLaVA/Vicuna base | checkpoints/llava-v1.6-vicuna-7b |
LLaVA/Vicuna upstream |
| SAM2 checkpoint | checkpoints/sam2_checkpoints/sam2.1_hiera_large.pt |
SAM2 upstream |
The release scripts do not assume private filesystem locations. Pass paths explicitly through command-line arguments or use the default relative layout.
Weight Loading
Evaluation uses:
--version pretrained/pixdlm-7b
--vision-tower checkpoints/clip-vit-large-patch14
The pretrained/pixdlm-7b directory is a model checkpoint directory, not the
project root. It contains HuggingFace config/tokenizer files and the downloaded
or merged PixDLM weights.
Training from the base LLaVA/Vicuna model uses:
--version checkpoints/llava-v1.6-vicuna-7b
Follow the upstream licenses for LLaVA, Vicuna/LLaMA, CLIP, and SAM2.