Spaces:

Prashant26am
/

llava-chat

Running

App Files Files Community

llava-chat / scripts /README.md

Prashant26am

fix: Update Gradio to 4.44.1 and improve interface

8d272fe 9 months ago

preview code

raw

history blame contribute delete

1.75 kB

	# LLaVA Scripts

	This directory contains various scripts for working with the LLaVA model.

	## Available Scripts

	- `demo.py`: Launches a Gradio web interface for interacting with the LLaVA model.
	- `evaluate_vqa.py`: Evaluates the LLaVA model on visual question answering datasets.
	- `test_model.py`: A simple script to test the LLaVA model on a single image.

	## Usage Examples

	### Demo

	Launch the Gradio web interface:

	```bash
	python scripts/demo.py --vision-model openai/clip-vit-large-patch14-336 --language-model lmsys/vicuna-7b-v1.5 --load-8bit
	```

	### Evaluate VQA

	Evaluate the model on a VQA dataset:

	```bash
	python scripts/evaluate_vqa.py --vision-model openai/clip-vit-large-patch14-336 --language-model lmsys/vicuna-7b-v1.5 --questions-file path/to/questions.json --image-folder path/to/images --output-file results.json --load-8bit
	```

	### Test Model

	Test the model on a single image:

	```bash
	python scripts/test_model.py --vision-model openai/clip-vit-large-patch14-336 --language-model lmsys/vicuna-7b-v1.5 --image-url https://example.com/image.jpg --prompt "What's in this image?" --load-8bit
	```

	## Options

	Most scripts support the following options:

	- `--vision-model`: Path or name of the vision model (default: "openai/clip-vit-large-patch14-336")
	- `--language-model`: Path or name of the language model (default: "lmsys/vicuna-7b-v1.5")
	- `--load-8bit`: Load the language model in 8-bit precision (reduces memory usage)
	- `--load-4bit`: Load the language model in 4-bit precision (further reduces memory usage)
	- `--device`: Device to run the model on (default: cuda if available, otherwise cpu)

	See the individual script help messages for more specific options:

	```bash
	python scripts/script_name.py --help
	```