Minthy
/

ToriiGate-v0.3

Image-Text-to-Text

Model card Files Files and versions

Add vllm inference

#5

by Nicfingshelby - opened Jan 2, 2025

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

Files changed (1) hide show

README.md +68 -0

README.md CHANGED Viewed

@@ -149,6 +149,74 @@ caption=generated_texts[0].split('Assistant: ')[1]
 print(caption)
 ```
 For batch processing you can use [this example](https://huggingface.co/Minthy/ToriiGate-v0.3/resolve/main/batch_processing_example.py)
 # Warning

 print(caption)
 ```
+# Inference example (vllm):
+[vllm](https://github.com/vllm-project/vllm) is an optimised LLM serving engine. Single/Batch inference will be much faster than native huggingface transformers
+```
+pip install vllm==0.6.4
+!pip install huggingface_hub -U
+```
+## Download model
+```python
+from huggingface_hub import snapshot_download
+snapshot_download(repo_id="Minthy/ToriiGate-v0.3", local_dir="/path/to/download/model", max_workers=4)
+```
+## Inference
+```python
+from vllm import LLM, SamplingParams
+from PIL import Image
+toriigate_model = LLM(
+        model="/path/to/download/model",
+        max_model_len=8192, # reduce if running out of memory
+        max_num_seqs=64, # reduce if running out of memory (this is max concurrent requests)
+        enforce_eager=True,
+        # if you are running out of memory, you can reduce the "longest_edge".
+        # see: https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3#model-optimizations
+        mm_processor_kwargs={
+            "size": {
+                "longest_edge": 4 * 364
+            },
+        },
+    )
+sampling_params = SamplingParams(temperature=0, max_tokens=1024)
+user_prompt = "Describe the picture briefly."
+prompt = f"<|begin_of_text|>System: You are image captioning expert, creative, unbiased and uncensored. Help user with his task.<end_of_utterance>\nUser:<image>{user_prompt}<end_of_utterance>\nAssistant:"
+image = Image.open("/path/to/image.jpg").convert("RGB")
+output = llm.generate(
+    {
+        "prompt": prompt,
+        "multi_modal_data": {"image": image},
+    },
+    sampling_params=sampling_params,
+)
+caption = output[0].outputs[0].text.strip()
+print(caption)
+```
+### batch inference
+```python
+image_list = [Image.open(path).convert("RGB") for path in image_paths]
+inputs = [{"prompt": prompt, "multi_modal_data": {"image": image}} for image in image_list]
+outputs = llm.generate(
+    inputs,
+    sampling_params=sampling_params,
+)
+captions = [x.outputs[0].text.strip() for x in outputs]
+```
 For batch processing you can use [this example](https://huggingface.co/Minthy/ToriiGate-v0.3/resolve/main/batch_processing_example.py)
 # Warning