Training and Evaluation
Pre-trained Weights
LLaVA
For convenience of using pre-trained LLaVA weights, we provide a link from Hugging Face.
SAM
Download SAM ViT-H pre-trained weights from the link.
Training
To train AffordanceVLM, you can use the following command.
bash ./scripts/train.sh
When training is finished, to get the full model weight:
cd ./runs/AffordanceVLM-7B/ckpt_model && python zero_to_fp32.py . ../pytorch_model.bin
Merge LoRA Weight
Merge the LoRA weights of pytorch_model.bin, save the resulting model into your desired path in the Hugging Face format:
CUDA_VISIBLE_DEVICES="" python merge_lora_weights_and_save_hf_model.py \
--version="PATH_TO_LLaVA" \
--weight="PATH_TO_pytorch_model.bin" \
--save_path="PATH_TO_SAVED_MODEL"
For example:
CUDA_VISIBLE_DEVICES="" python3 merge_lora_weights_and_save_hf_model.py \
--version="./LLaVA/LLaVA-Lightning-7B-v1-1" \
--weight="./runs/AffordanceVLM-7B/pytorch_model.bin" \
--save_path="./exps/AffordanceVLM-7B"
Evaluation
To evaluate AffordanceVLM on the entire HANDAL dataset, please adjust the --dataset_dir parameter in evaluate.sh.
bash ./scripts/evaluate.sh
To chat with AffordanceVLM-7B:
CUDA_VISIBLE_DEVICES=0 python chat.py --version=./exps/AffordanceVLM-7B
Main Results
HANDAL:
| Method | gIoU | cIoU |
|---|---|---|
| AffordanceVLM-7B | 60.3 | 60.8 |