Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Model Assets | |
| PixDLM uses the following components: | |
| | Asset | Default local path | Source | | |
| | --- | --- | --- | | |
| | PixDLM checkpoint | `pretrained/pixdlm-7b` | `WhynotHug/PixDLM` | | |
| | CLIP vision tower | `checkpoints/clip-vit-large-patch14` | `openai/clip-vit-large-patch14` | | |
| | LLaVA/Vicuna base | `checkpoints/llava-v1.6-vicuna-7b` | LLaVA/Vicuna upstream | | |
| | SAM2 checkpoint | `checkpoints/sam2_checkpoints/sam2.1_hiera_large.pt` | SAM2 upstream | | |
| The release scripts do not assume private filesystem locations. Pass paths | |
| explicitly through command-line arguments or use the default relative layout. | |
| ## Weight Loading | |
| Evaluation uses: | |
| ```bash | |
| --version pretrained/pixdlm-7b | |
| --vision-tower checkpoints/clip-vit-large-patch14 | |
| ``` | |
| The `pretrained/pixdlm-7b` directory is a model checkpoint directory, not the | |
| project root. It contains HuggingFace config/tokenizer files and the downloaded | |
| or merged PixDLM weights. | |
| Training from the base LLaVA/Vicuna model uses: | |
| ```bash | |
| --version checkpoints/llava-v1.6-vicuna-7b | |
| ``` | |
| Follow the upstream licenses for LLaVA, Vicuna/LLaMA, CLIP, and SAM2. | |