Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Examples
PixDLM evaluates reasoning segmentation samples in the DRSeg format.
Minimal sample fields:
{
"id": "example_frame_0001",
"width": 1024,
"height": 1024,
"metadata": {
"time_of_day": "day",
"location": "urban_road",
"altitude": "60m",
"camera_angle": "90deg"
},
"ann_list": [
{
"bbox": [100.0, 120.0, 80.0, 60.0],
"segmentation": [[100.0, 120.0, 180.0, 120.0, 180.0, 180.0, 100.0, 180.0]],
"area": 4800.0,
"category_name": "car"
}
],
"questions": [
"Which vehicle is closest to the intersection and may affect traffic flow?"
],
"answers": [
"<think>The target vehicle is positioned nearest to the intersection and aligned with the traffic lane.</think> <answer>The vehicle closest to the intersection is the target.</answer>"
],
"reasoning_types": ["spatial"]
}
Evaluation outputs are written to:
outputs/<exp_name>/<dataset_name>/with_cot/
Per sample:
*_input.jpg*_pred_mask.png*_gt_mask.png*_overlay_pred_red_gt_green.jpg*_result.json