Image-Text-to-Text
Transformers
Safetensors
English
idefics2
multimodal
vision
text-generation-inference
Instructions to use HuggingFaceM4/idefics2-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceM4/idefics2-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="HuggingFaceM4/idefics2-8b")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b") model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics2-8b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceM4/idefics2-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceM4/idefics2-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceM4/idefics2-8b
- SGLang
How to use HuggingFaceM4/idefics2-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics2-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics2-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceM4/idefics2-8b with Docker Model Runner:
docker model run hf.co/HuggingFaceM4/idefics2-8b
Commit ·
2c031da
1
Parent(s): 87d5d88
tgi
Browse files
README.md
CHANGED
|
@@ -218,6 +218,39 @@ print(generated_texts)
|
|
| 218 |
|
| 219 |
</details>
|
| 220 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 221 |
# Model optimizations
|
| 222 |
|
| 223 |
If your GPU allows, we first recommend loading (and running inference) in half precision (`torch.float16` or `torch.bfloat16`).
|
|
|
|
| 218 |
|
| 219 |
</details>
|
| 220 |
|
| 221 |
+
**Text generation inference**
|
| 222 |
+
|
| 223 |
+
Idefics2 is integrated into [TGI](https://github.com/huggingface/text-generation-inference) and we host API endpoints for both `idefics2-8b` and `idefics2-8b-chatty`.
|
| 224 |
+
|
| 225 |
+
Multiple images can be passed on with the markdown syntax (``) and no spaces are required before and after. The dialogue utterances can be separated with `<end_of_utterance>\n` followed by `User:` or `Assistant:`. `User:` is followed by a space if the following characters are real text (no space if followed by an image).
|
| 226 |
+
|
| 227 |
+
<details><summary>Click to expand.</summary>
|
| 228 |
+
|
| 229 |
+
```python
|
| 230 |
+
from text_generation import Client
|
| 231 |
+
|
| 232 |
+
API_TOKEN="<YOUR_API_TOKEN>"
|
| 233 |
+
API_URL = "https://api-inference.huggingface.co/models/HuggingFaceM4/idefics2-8b-chatty"
|
| 234 |
+
|
| 235 |
+
# System prompt used in the playground for `idefics2-8b-chatty`
|
| 236 |
+
SYSTEM_PROMPT = "System: The following is a conversation between Idefics2, a highly knowledgeable and intelligent visual AI assistant created by Hugging Face, referred to as Assistant, and a human user called User. In the following interactions, User and Assistant will converse in natural language, and Assistant will do its best to answer User’s questions. Assistant has the ability to perceive images and reason about them, but it cannot generate images. Assistant was built to be respectful, polite and inclusive. It knows a lot, and always tells the truth. When prompted with an image, it does not make up facts.<end_of_utterance>\nAssistant: Hello, I'm Idefics2, Huggingface's latest multimodal assistant. How can I help you?<end_of_utterance>\n"
|
| 237 |
+
QUERY = "User:Describe this image.<end_of_utterance>\nAssistant:"
|
| 238 |
+
|
| 239 |
+
client = Client(
|
| 240 |
+
base_url=API_URL,
|
| 241 |
+
headers={"x-use-cache": "0", "Authorization": f"Bearer {API_TOKEN}"},
|
| 242 |
+
)
|
| 243 |
+
generation_args = {
|
| 244 |
+
"max_new_tokens": 512,
|
| 245 |
+
"repetition_penalty": 1.1,
|
| 246 |
+
"do_sample": False,
|
| 247 |
+
}
|
| 248 |
+
generated_text = client.generate(prompt=SYSTEM_PROMPT + QUERY, **generation_args)
|
| 249 |
+
generated_text
|
| 250 |
+
```
|
| 251 |
+
|
| 252 |
+
</details>
|
| 253 |
+
|
| 254 |
# Model optimizations
|
| 255 |
|
| 256 |
If your GPU allows, we first recommend loading (and running inference) in half precision (`torch.float16` or `torch.bfloat16`).
|