Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| set -euo pipefail | |
| GPUS="0" | |
| MODEL="pretrained/pixdlm-7b" | |
| DATA="data/DRSeg" | |
| CLIP="checkpoints/clip-vit-large-patch14" | |
| EXP="pixdlm_drseg_test" | |
| PORT="${PORT:-29501}" | |
| PRECISION="${PRECISION:-bf16}" | |
| while [[ $# -gt 0 ]]; do | |
| case "$1" in | |
| --gpus) GPUS="$2"; shift 2 ;; | |
| --model) MODEL="$2"; shift 2 ;; | |
| --data) DATA="$2"; shift 2 ;; | |
| --clip) CLIP="$2"; shift 2 ;; | |
| --exp) EXP="$2"; shift 2 ;; | |
| --port) PORT="$2"; shift 2 ;; | |
| --precision) PRECISION="$2"; shift 2 ;; | |
| *) echo "Unknown argument: $1" >&2; exit 2 ;; | |
| esac | |
| done | |
| ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" | |
| cd "$ROOT" | |
| export PYTHONPATH="$ROOT:${PYTHONPATH:-}" | |
| export TOKENIZERS_PARALLELISM=false | |
| export TRANSFORMERS_VERBOSITY=error | |
| mkdir -p "outputs/$EXP" "logs/$EXP" | |
| deepspeed --master_port="$PORT" --include="localhost:$GPUS" eval.py \ | |
| --version="$MODEL" \ | |
| --dataset_dir="$DATA" \ | |
| --dataset="custom_seg" \ | |
| --sample_rates="1" \ | |
| --exp_name="$EXP" \ | |
| --log_base_dir="$ROOT/logs" \ | |
| --vis_save_path="$ROOT/outputs/$EXP" \ | |
| --val_dataset="custom_seg|test" \ | |
| --train_mask_decoder \ | |
| --Three_Level_Multi_Scale_Decoder \ | |
| --vision-tower="$CLIP" \ | |
| --seg_token_num=3 \ | |
| --num_classes_per_question=3 \ | |
| --batch_size=1 \ | |
| --grad_accumulation_steps=1 \ | |
| --val_batch_size=1 \ | |
| --workers=0 \ | |
| --lora_r=0 \ | |
| --preprocessor_config="$ROOT/configs/preprocessor_448.json" \ | |
| --resize_vision_tower \ | |
| --resize_vision_tower_size=448 \ | |
| --vision_tower_for_mask \ | |
| --use_expand_question_list \ | |
| --image_feature_scale_num=3 \ | |
| --conv_type="llava_v1" \ | |
| --is_multipath_encoder \ | |
| --precision="$PRECISION" \ | |
| --eval_only 2>&1 | tee "logs/$EXP/eval.log" | |