Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,102 Bytes
3334467 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | # DRSeg Data
DRSeg is a UAV reasoning segmentation benchmark with 10,000 image-level samples.
Each sample contains a UAV image, one or more segmentation annotations, a
reasoning question, a reasoning answer, and a reasoning type.
## Splits
| Split | Samples |
| --- | ---: |
| Train | 2,999 |
| Validation | 2,000 |
| Test | 5,001 |
## Expected Layout
```text
data/DRSeg/
βββ DRtrain/
βββ DRval/
βββ DRtest/
βββ label/
β βββ DRSeg_train.json
β βββ DRSeg_val.json
β βββ DRSeg_test.json
βββ CODrone -> .
βββ labels -> label
```
The `CODrone` and `labels` entries are compatibility links for the original
dataset loader.
## Reasoning Types
- `spatial`: position and spatial relation reasoning.
- `attribute`: visual attribute and object property reasoning.
- `scene`: scene-context reasoning.
## Metadata Preview
The HuggingFace dataset repo includes lightweight metadata JSONL files under
`metadata/`. They are intended for dataset-card preview and quick inspection.
Use the full image/mask archive for training and evaluation.
|