Image Segmentation
Transformers
PyTorch
pixdlm
cvpr-2026
compute-transparency
reasoning-segmentation
uav
remote-sensing
vision-language
Instructions to use WhynotHug/PixDLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhynotHug/PixDLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-segmentation", model="WhynotHug/PixDLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhynotHug/PixDLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,681 Bytes
3334467 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | #!/usr/bin/env bash
set -euo pipefail
GPUS="0"
MODEL="pretrained/pixdlm-7b"
DATA="data/DRSeg"
CLIP="checkpoints/clip-vit-large-patch14"
EXP="pixdlm_drseg_test"
PORT="${PORT:-29501}"
PRECISION="${PRECISION:-bf16}"
while [[ $# -gt 0 ]]; do
case "$1" in
--gpus) GPUS="$2"; shift 2 ;;
--model) MODEL="$2"; shift 2 ;;
--data) DATA="$2"; shift 2 ;;
--clip) CLIP="$2"; shift 2 ;;
--exp) EXP="$2"; shift 2 ;;
--port) PORT="$2"; shift 2 ;;
--precision) PRECISION="$2"; shift 2 ;;
*) echo "Unknown argument: $1" >&2; exit 2 ;;
esac
done
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$ROOT"
export PYTHONPATH="$ROOT:${PYTHONPATH:-}"
export TOKENIZERS_PARALLELISM=false
export TRANSFORMERS_VERBOSITY=error
mkdir -p "outputs/$EXP" "logs/$EXP"
deepspeed --master_port="$PORT" --include="localhost:$GPUS" eval.py \
--version="$MODEL" \
--dataset_dir="$DATA" \
--dataset="custom_seg" \
--sample_rates="1" \
--exp_name="$EXP" \
--log_base_dir="$ROOT/logs" \
--vis_save_path="$ROOT/outputs/$EXP" \
--val_dataset="custom_seg|test" \
--train_mask_decoder \
--Three_Level_Multi_Scale_Decoder \
--vision-tower="$CLIP" \
--seg_token_num=3 \
--num_classes_per_question=3 \
--batch_size=1 \
--grad_accumulation_steps=1 \
--val_batch_size=1 \
--workers=0 \
--lora_r=0 \
--preprocessor_config="$ROOT/configs/preprocessor_448.json" \
--resize_vision_tower \
--resize_vision_tower_size=448 \
--vision_tower_for_mask \
--use_expand_question_list \
--image_feature_scale_num=3 \
--conv_type="llava_v1" \
--is_multipath_encoder \
--precision="$PRECISION" \
--eval_only 2>&1 | tee "logs/$EXP/eval.log"
|