fal
/

moondream2-docci-instruct

Image-Text-to-Text

text-generation

Model card Files Files and versions

moondream2-docci-instruct / README.md

gokaygokay's picture

Update README.md

cb4d5ec verified almost 2 years ago

|

history blame contribute delete

1.38 kB

	---
	license: apache-2.0
	datasets:
	- google/docci
	- gokaygokay/random_instruct_docci
	language:
	- en
	pipeline_tag: image-text-to-text
	---

	Fine tuned version of [moondream2](https://huggingface.co/vikhyatk/moondream2) model using [gokaygokay/random_instruct_docci](https://huggingface.co/datasets/gokaygokay/random_instruct_docci) dataset. Which gives extremely detailed captions of the images.

	```
	pip install transformers timm einops bitsandbytes accelerate flash-attn
	```

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from PIL import Image

	DEVICE = "cuda"
	DTYPE = (
	torch.float32 if DEVICE == "cpu" else torch.float16
	) # CPU doesn't support float16
	revision = "3ec40c7b6b5d87bc0c51edee45e21f5f29b449d8"
	tokenizer = AutoTokenizer.from_pretrained(
	"fal-ai/moondream2-docci-instruct",
	trust_remote_code=True,
	revision=revision
	)
	moondream = AutoModelForCausalLM.from_pretrained(
	"fal-ai/moondream2-docci-instruct",
	trust_remote_code=True,
	torch_dtype=DTYPE,
	device_map={"": DEVICE},
	attn_implementation="flash_attention_2",
	revision=revision
	)
	moondream.eval()

	image_path = "<your_image_path>"
	image = Image.open(image_path).convert("RGB")
	md_answer = moondream.answer_question(
	moondream.encode_image(image),
	"what is this picture about",
	tokenizer=tokenizer,
	)

	print(md_answer)
	```