Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,082 Bytes
3334467 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | # Examples
PixDLM evaluates reasoning segmentation samples in the DRSeg format.
Minimal sample fields:
```json
{
"id": "example_frame_0001",
"width": 1024,
"height": 1024,
"metadata": {
"time_of_day": "day",
"location": "urban_road",
"altitude": "60m",
"camera_angle": "90deg"
},
"ann_list": [
{
"bbox": [100.0, 120.0, 80.0, 60.0],
"segmentation": [[100.0, 120.0, 180.0, 120.0, 180.0, 180.0, 100.0, 180.0]],
"area": 4800.0,
"category_name": "car"
}
],
"questions": [
"Which vehicle is closest to the intersection and may affect traffic flow?"
],
"answers": [
"<think>The target vehicle is positioned nearest to the intersection and aligned with the traffic lane.</think> <answer>The vehicle closest to the intersection is the target.</answer>"
],
"reasoning_types": ["spatial"]
}
```
Evaluation outputs are written to:
```text
outputs/<exp_name>/<dataset_name>/with_cot/
```
Per sample:
- `*_input.jpg`
- `*_pred_mask.png`
- `*_gt_mask.png`
- `*_overlay_pred_red_gt_green.jpg`
- `*_result.json`
|