| ### ScienceQA |
|
|
| #### Prepare Data |
| 1. Please see ScienceQA [repo](https://github.com/lupantech/ScienceQA) for setting up the dataset. |
| 2. Generate ScienceQA dataset for LLaVA conversation-style format. |
|
|
| ```Shell |
| python scripts/convert_sqa_to_llava.py \ |
| convert_to_llava \ |
| --base-dir /path/to/ScienceQA/data/scienceqa \ |
| --prompt-format "QCM-LEA" \ |
| --split {train,val,minival,test,minitest} |
| ``` |
|
|
| #### Training |
|
|
| 1. Pretraining |
|
|
| You can download our pretrained projector weights from our [Model Zoo](), or train your own projector weights using [`pretrain.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/pretrain.sh). |
|
|
| 2. Finetuning |
|
|
| See [`finetune_sqa.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/finetune_sqa.sh). |
|
|
| #### Evaluation |
|
|
| 1. Multiple-GPU inference |
| You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for [batch evaluation](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_batch.sh) and [results gathering](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_gather.sh). |
|
|
| 2. Single-GPU inference |
|
|
| (a) Generate LLaVA responses on ScienceQA dataset |
|
|
| ```Shell |
| python -m llava.eval.model_vqa_science \ |
| --model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \ |
| --question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \ |
| --image-folder /path/to/ScienceQA/data/scienceqa/images/test \ |
| --answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \ |
| --conv-mode llava_v1 |
| ``` |
|
|
| (b) Evaluate the generated responses |
|
|
| ```Shell |
| python eval_science_qa.py \ |
| --base-dir /path/to/ScienceQA/data/scienceqa \ |
| --result-file vqa/results/ScienceQA/test_llava-13b.jsonl \ |
| --output-file vqa/results/ScienceQA/test_llava-13b_output.json \ |
| --output-result vqa/results/ScienceQA/test_llava-13b_result.json \ |
| ``` |
|
|
| For reference, we attach our prediction file [`test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json) and [`test_sqa_llava_13b_v0.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_13b_v0.json) for comparison when reproducing our results, as well as for further analysis in detail. |
|
|