--- datasets: - erenzhou/refGeo language: - en base_model: - liuhaotian/llava-v1.5-7b pipeline_tag: image-text-to-text --- ## Inference 1. Install LLaVA-1.5 from https://github.com/haotian-liu/LLaVA 2-1. Inference Coarse Masks ``` MODEL_PATH=path/to/checkpoints/llava-v1.5-7b-task-lora-geoground OUTPUT=data/exp_0125 ANSWER_PATH=$OUTPUT/llava-v1.5-7b-task-lora-geoground GPU_NUM=0 echo "Processing RRSIS-D test" IMAGE_FOLDER=path/to/data/images/rrsisd/ JSON_PATH=path/to/data/metadata/rrsisd_val.jsonl CUDA_VISIBLE_DEVICES=$GPU_NUM \ python inference_hbb.py \ --model-path $MODEL_PATH \ --model-base $MODEL_PATH \ --question-file $JSON_PATH \ --image-folder $IMAGE_FOLDER \ --answers-file $ANSWER_PATH-rrsisd_val.jsonl \ --batch_size 1 ``` 2-2. Inference Horizontal Bounding Boxes (HBBs) ``` CUDA_VISIBLE_DEVICES=$GPU_NUM \ python inference_seg.py \ --model-path $MODEL_PATH \ --model-base $MODEL_PATH \ --question-file $JSON_PATH \ --image-folder $IMAGE_FOLDER \ --answers-file $ANSWER_PATH-rrsisd_val.jsonl \ --batch_size 1 ``` 3-1. Generate Masks using Coarse Masks ``` python generate_mask.py \ --answers-file $ANSWER_PATH-rrsisd_val.jsonl \ --image-folder $IMAGE_FOLDER \ --scale 16 \ --vis-dir $OUTPUT/vis_seg/ ``` 3-2. Generate Masks by SAM using HBBs Download ViT-H SAM model from https://github.com/facebookresearch/segment-anything ``` python generate_mask_sam_by_box.py \ --answers-file $ANSWER_PATH-rrsisd_val.jsonl \ --image-folder $IMAGE_FOLDER \ --scale 16 \ --vis-dir $OUTPUT/vis_sam_box/ ``` 3-3. Generate Masks by SAM using HBBs and Coarse Masks ``` python generate_mask_sam_by_box+seg.py \ --answers-file $ANSWER_PATH-rrsisd_val.jsonl \ --image-folder $IMAGE_FOLDER \ --scale 16 \ --vis-dir $OUTPUT/vis_sam_box+seg/ ``` 4. Compute Metric ``` python compute_mask_metric.py ```