Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
DRSeg Data
DRSeg is a UAV reasoning segmentation benchmark with 10,000 image-level samples. Each sample contains a UAV image, one or more segmentation annotations, a reasoning question, a reasoning answer, and a reasoning type.
Splits
| Split | Samples |
|---|---|
| Train | 2,999 |
| Validation | 2,000 |
| Test | 5,001 |
Expected Layout
data/DRSeg/
βββ DRtrain/
βββ DRval/
βββ DRtest/
βββ label/
β βββ DRSeg_train.json
β βββ DRSeg_val.json
β βββ DRSeg_test.json
βββ CODrone -> .
βββ labels -> label
The CODrone and labels entries are compatibility links for the original
dataset loader.
Reasoning Types
spatial: position and spatial relation reasoning.attribute: visual attribute and object property reasoning.scene: scene-context reasoning.
Metadata Preview
The HuggingFace dataset repo includes lightweight metadata JSONL files under
metadata/. They are intended for dataset-card preview and quick inspection.
Use the full image/mask archive for training and evaluation.