cjpais
/

moondream2-llamafile

Image-Text-to-Text

Model card Files Files and versions

moondream2-llamafile / README.md

cjpais's picture

Update README.md

ce5343d verified about 2 years ago

|

history blame contribute delete

2.18 kB

	---
	quantized_by: cjpais
	base_model: vikhyatk/moondream2
	pipeline_tag: image-text-to-text
	license: apache-2.0
	tags:
	- llamafile
	---

	A [llamafile](https://github.com/Mozilla-Ocho/llamafile) generated for [moondream2](https://huggingface.co/vikhyatk/moondream2)

	Big thanks to [@jartine](https://huggingface.co/jartine) and [@vikhyat](https://huggingface.co/vikhyatk/moondream2) for their respective works on llamafile and moondream

	## How to Run (on macos and linux)


	1. Download moondream2.llamafile
	2. `chmod +x moondream2.llamafile` - make it executable
	3. `./moondream2.llamafile` - run the llama.cpp server

	## Versions

	1. [Q5_M](https://huggingface.co/cjpais/moondream2-llamafile/resolve/main/moondream2-q5_k.llamafile?download=true)
	2. [Q8_0](https://huggingface.co/cjpais/moondream2-llamafile/resolve/main/moondream2-q8.llamafile?download=true)

	From my short testing the Q8 is noticeably better.

	# ORIGINAL MODEL CARD

	moondream2 is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details, or try it out on the [Hugging Face Space](https://huggingface.co/spaces/vikhyatk/moondream2)!

	Benchmarks

	\| Release \| VQAv2 \| GQA \| TextVQA \| TallyQA (simple) \| TallyQA (full) \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| 2024-03-04 \| 74.2 \| 58.5 \| 36.4 \| - \| - \|
	\| 2024-03-06 \| 75.4 \| 59.8 \| 43.1 \| 79.5 \| 73.2 \|
	\| 2024-03-13 \| 76.8 \| 60.6 \| 46.4 \| 79.6 \| 73.3 \|
	\| 2024-04-02 (latest) \| 77.7 \| 61.7 \| 49.7 \| 80.1 \| 74.2 \|

	Usage

	```bash
	pip install transformers einops
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from PIL import Image

	model_id = "vikhyatk/moondream2"
	revision = "2024-04-02"
	model = AutoModelForCausalLM.from_pretrained(
	model_id, trust_remote_code=True, revision=revision
	)
	tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)

	image = Image.open('<IMAGE_PATH>')
	enc_image = model.encode_image(image)
	print(model.answer_question(enc_image, "Describe this image.", tokenizer))
	```

	The model is updated regularly, so we recommend pinning the model version to a
	specific release as shown above.