Examples

PixDLM evaluates reasoning segmentation samples in the DRSeg format.

Minimal sample fields:

{
  "id": "example_frame_0001",
  "width": 1024,
  "height": 1024,
  "metadata": {
    "time_of_day": "day",
    "location": "urban_road",
    "altitude": "60m",
    "camera_angle": "90deg"
  },
  "ann_list": [
    {
      "bbox": [100.0, 120.0, 80.0, 60.0],
      "segmentation": [[100.0, 120.0, 180.0, 120.0, 180.0, 180.0, 100.0, 180.0]],
      "area": 4800.0,
      "category_name": "car"
    }
  ],
  "questions": [
    "Which vehicle is closest to the intersection and may affect traffic flow?"
  ],
  "answers": [
    "<think>The target vehicle is positioned nearest to the intersection and aligned with the traffic lane.</think> <answer>The vehicle closest to the intersection is the target.</answer>"
  ],
  "reasoning_types": ["spatial"]
}

Evaluation outputs are written to:

outputs/<exp_name>/<dataset_name>/with_cot/

Per sample:

*_input.jpg
*_pred_mask.png
*_gt_mask.png
*_overlay_pred_red_gt_green.jpg
*_result.json