Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,098 Bytes
3334467 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | # Model Assets
PixDLM uses the following components:
| Asset | Default local path | Source |
| --- | --- | --- |
| PixDLM checkpoint | `pretrained/pixdlm-7b` | `WhynotHug/PixDLM` |
| CLIP vision tower | `checkpoints/clip-vit-large-patch14` | `openai/clip-vit-large-patch14` |
| LLaVA/Vicuna base | `checkpoints/llava-v1.6-vicuna-7b` | LLaVA/Vicuna upstream |
| SAM2 checkpoint | `checkpoints/sam2_checkpoints/sam2.1_hiera_large.pt` | SAM2 upstream |
The release scripts do not assume private filesystem locations. Pass paths
explicitly through command-line arguments or use the default relative layout.
## Weight Loading
Evaluation uses:
```bash
--version pretrained/pixdlm-7b
--vision-tower checkpoints/clip-vit-large-patch14
```
The `pretrained/pixdlm-7b` directory is a model checkpoint directory, not the
project root. It contains HuggingFace config/tokenizer files and the downloaded
or merged PixDLM weights.
Training from the base LLaVA/Vicuna model uses:
```bash
--version checkpoints/llava-v1.6-vicuna-7b
```
Follow the upstream licenses for LLaVA, Vicuna/LLaMA, CLIP, and SAM2.
|