Reproduction Guide

This guide documents the recommended reproduction path for PixDLM on DRSeg.

Setup

conda create -n pixdlm python=3.10 -y
conda activate pixdlm
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Download assets:

python scripts/download_assets.py --output-dir .
python scripts/prepare_drseg.py --data-root data/DRSeg

Evaluation

Single-GPU exact split evaluation:

bash scripts/eval_drseg.sh \
  --gpus 0 \
  --model pretrained/pixdlm-7b \
  --data data/DRSeg \
  --clip checkpoints/clip-vit-large-patch14 \
  --exp pixdlm_drseg_test_single_gpu

Multi-GPU faster evaluation:

bash scripts/eval_drseg.sh \
  --gpus 0,1,2,3,4,5,6,7 \
  --model pretrained/pixdlm-7b \
  --data data/DRSeg \
  --clip checkpoints/clip-vit-large-patch14 \
  --exp pixdlm_drseg_test_8gpu

Note: the default PyTorch distributed sampler pads samples when the split size is not divisible by the number of GPUs. For exact paper-table accounting, prefer the single-GPU command or patch the sampler to remove padded duplicates.

Expected Metrics

Paper metrics on DRSeg test:

Reasoning type	gIoU	cIoU
Attribute	62.80	62.84
Scene	61.75	64.03
Spatial	62.51	62.80

The released scripts print:

overall gIoU/cIoU,
CoT vs no-CoT threshold counts,
per-reasoning-type gIoU/cIoU,
image-level visualizations in outputs/<exp>/.

For each evaluated sample, the visualization directory stores the input image, predicted mask, ground-truth mask, overlay, and a JSON result containing the question, answer, and mask metadata.

Compute Transparency

The full test evaluation is memory-heavy because PixDLM combines a language model, CLIP visual features, and segmentation decoding. We recommend reporting:

GPU type and count,
precision,
dependency versions,
exact split and sampler behavior,
average seconds per image,
whether CoT text is included in the conditioning input.

The public release acknowledges the 2027 CVPR Compute Transparency Champion recognition and keeps this guide explicit about evaluation assumptions.