Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Examples | |
| PixDLM evaluates reasoning segmentation samples in the DRSeg format. | |
| Minimal sample fields: | |
| ```json | |
| { | |
| "id": "example_frame_0001", | |
| "width": 1024, | |
| "height": 1024, | |
| "metadata": { | |
| "time_of_day": "day", | |
| "location": "urban_road", | |
| "altitude": "60m", | |
| "camera_angle": "90deg" | |
| }, | |
| "ann_list": [ | |
| { | |
| "bbox": [100.0, 120.0, 80.0, 60.0], | |
| "segmentation": [[100.0, 120.0, 180.0, 120.0, 180.0, 180.0, 100.0, 180.0]], | |
| "area": 4800.0, | |
| "category_name": "car" | |
| } | |
| ], | |
| "questions": [ | |
| "Which vehicle is closest to the intersection and may affect traffic flow?" | |
| ], | |
| "answers": [ | |
| "<think>The target vehicle is positioned nearest to the intersection and aligned with the traffic lane.</think> <answer>The vehicle closest to the intersection is the target.</answer>" | |
| ], | |
| "reasoning_types": ["spatial"] | |
| } | |
| ``` | |
| Evaluation outputs are written to: | |
| ```text | |
| outputs/<exp_name>/<dataset_name>/with_cot/ | |
| ``` | |
| Per sample: | |
| - `*_input.jpg` | |
| - `*_pred_mask.png` | |
| - `*_gt_mask.png` | |
| - `*_overlay_pred_red_gt_green.jpg` | |
| - `*_result.json` | |