# Reproduction Guide This guide documents the recommended reproduction path for PixDLM on DRSeg. ## Setup ```bash conda create -n pixdlm python=3.10 -y conda activate pixdlm pip install -r requirements.txt pip install flash-attn --no-build-isolation ``` Download assets: ```bash python scripts/download_assets.py --output-dir . python scripts/prepare_drseg.py --data-root data/DRSeg ``` ## Evaluation Single-GPU exact split evaluation: ```bash bash scripts/eval_drseg.sh \ --gpus 0 \ --model pretrained/pixdlm-7b \ --data data/DRSeg \ --clip checkpoints/clip-vit-large-patch14 \ --exp pixdlm_drseg_test_single_gpu ``` Multi-GPU faster evaluation: ```bash bash scripts/eval_drseg.sh \ --gpus 0,1,2,3,4,5,6,7 \ --model pretrained/pixdlm-7b \ --data data/DRSeg \ --clip checkpoints/clip-vit-large-patch14 \ --exp pixdlm_drseg_test_8gpu ``` Note: the default PyTorch distributed sampler pads samples when the split size is not divisible by the number of GPUs. For exact paper-table accounting, prefer the single-GPU command or patch the sampler to remove padded duplicates. ## Expected Metrics Paper metrics on DRSeg test: | Reasoning type | gIoU | cIoU | | --- | ---: | ---: | | Attribute | 62.80 | 62.84 | | Scene | 61.75 | 64.03 | | Spatial | 62.51 | 62.80 | The released scripts print: - overall gIoU/cIoU, - CoT vs no-CoT threshold counts, - per-reasoning-type gIoU/cIoU, - image-level visualizations in `outputs//`. For each evaluated sample, the visualization directory stores the input image, predicted mask, ground-truth mask, overlay, and a JSON result containing the question, answer, and mask metadata. ## Compute Transparency The full test evaluation is memory-heavy because PixDLM combines a language model, CLIP visual features, and segmentation decoding. We recommend reporting: - GPU type and count, - precision, - dependency versions, - exact split and sampler behavior, - average seconds per image, - whether CoT text is included in the conditioning input. The public release acknowledges the 2027 CVPR Compute Transparency Champion recognition and keeps this guide explicit about evaluation assumptions.