Instructions to use HuggingFaceM4/idefics-80b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceM4/idefics-80b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceM4/idefics-80b")

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics-80b")
model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics-80b")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HuggingFaceM4/idefics-80b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceM4/idefics-80b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-80b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceM4/idefics-80b

SGLang

How to use HuggingFaceM4/idefics-80b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceM4/idefics-80b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-80b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceM4/idefics-80b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-80b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceM4/idefics-80b with Docker Model Runner:
```
docker model run hf.co/HuggingFaceM4/idefics-80b
```

HugoLaurencon commited on Jul 11, 2023

Commit

ade3656

1 Parent(s): de1ddb0

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -8

README.md CHANGED Viewed

@@ -115,19 +115,18 @@ The model is trained on the following data mixture of openly accessible English
 | Data Source | Type of Data                             | Number of Tokens in Source | Number of Images in Source | Epochs | Effective Proportion in Number of Tokens |
 |-------------|-----------------------------------------|---------------------------|---------------------------|--------|-----------------------------------------|
-| [PMD](https://huggingface.co/datasets/facebook/pmd)         | Image-Text Pairs                         | TODO                      | TODO                      | 3      | 73.85%                                  |
-| [LAION](https://huggingface.co/datasets/laion/laion2B-en)       | Image-Text Pairs                         | TODO                      | TODO                      | 1      | 6.15%                                   |
-| [OBELISC](https://huggingface.co/datasets/HuggingFaceM4/OBELISC)     | Unstructured Multimodal Web Documents    | TODO                      | TODO                      | 3      | 2.82%                                   |
-| [Wikipedia](https://huggingface.co/datasets/wikipedia)   | Unstructured Multimodal Web Documents    | TODO                      | TODO                      | 1      | 17.18%                                  |
-**PMD** is a collection of publicly-available image-text pair datasets. The dataset contains pairs from Conceptual Captions, Conceptual Captions 12M, WIT, Localized Narratives, RedCaps, COCO, SBU Captions, Visual Genome and a subset of YFCC100M dataset. Due to a server failure at the time of the pre-processing, we did not include SBU captions.
-**LAION** is a collection of image-text pairs collected from web pages from Common Crawl and texts are obtained using the alternative texts of each image.
 **Wkipedia** is the multimodal equivalent of the encyclopedia. We used the English dump of Wikipedia created on February 20th, 2023.
-**OBELISC** is an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images. An interactive visualization of the dataset content is available [here](TODO).
 For multimodal web documents, we feed the model sequences corresponding to the succession of text paragraphs and images. For image-text pairs, we form the training sequences by packing images with their captions. The images are encoded with the vision encoder and vision hidden states are pooled with Transformer Perceiver blocks and then fused into the text sequence through the cross-attention blocks.

 | Data Source | Type of Data                             | Number of Tokens in Source | Number of Images in Source | Epochs | Effective Proportion in Number of Tokens |
 |-------------|-----------------------------------------|---------------------------|---------------------------|--------|-----------------------------------------|
+| [OBELISC](https://huggingface.co/datasets/HuggingFaceM4/OBELISC)     | Unstructured Multimodal Web Documents    | TODO                      | TODO                      | 1      | 73.85%                                  |
+| [Wikipedia](https://huggingface.co/datasets/wikipedia)   | Unstructured Multimodal Web Documents    | TODO                      | TODO                      | 3      | 17.18%                                  |
+| [LAION](https://huggingface.co/datasets/laion/laion2B-en)       | Image-Text Pairs                         | TODO                      | TODO                      | 1      | 6.15%
+| [PMD](https://huggingface.co/datasets/facebook/pmd)         | Image-Text Pairs                         | TODO                      | TODO                      | 3      | 2.82%                                   |                                |
+**OBELISC** is an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images. An interactive visualization of the dataset content is available [here](TODO).
 **Wkipedia** is the multimodal equivalent of the encyclopedia. We used the English dump of Wikipedia created on February 20th, 2023.
+**LAION** is a collection of image-text pairs collected from web pages from Common Crawl and texts are obtained using the alternative texts of each image.
+**PMD** is a collection of publicly-available image-text pair datasets. The dataset contains pairs from Conceptual Captions, Conceptual Captions 12M, WIT, Localized Narratives, RedCaps, COCO, SBU Captions, Visual Genome and a subset of YFCC100M dataset. Due to a server failure at the time of the pre-processing, we did not include SBU captions.
 For multimodal web documents, we feed the model sequences corresponding to the succession of text paragraphs and images. For image-text pairs, we form the training sequences by packing images with their captions. The images are encoded with the vision encoder and vision hidden states are pooled with Transformer Perceiver blocks and then fused into the text sequence through the cross-attention blocks.