Instructions to use HuggingFaceM4/idefics-80b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceM4/idefics-80b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceM4/idefics-80b")

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics-80b")
model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics-80b")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HuggingFaceM4/idefics-80b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceM4/idefics-80b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-80b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceM4/idefics-80b

SGLang

How to use HuggingFaceM4/idefics-80b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceM4/idefics-80b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-80b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceM4/idefics-80b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-80b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceM4/idefics-80b with Docker Model Runner:
```
docker model run hf.co/HuggingFaceM4/idefics-80b
```

stas commited on Jul 6, 2023

Commit

61f2a7d

1 Parent(s): c7e377b

wip

Browse files

Files changed (1) hide show

README.md +13 -12

README.md CHANGED Viewed

@@ -15,7 +15,6 @@ Some cool model...
 - [Model Card for m4-80b](#model-card-for--model_id-)
 - [Table of Contents](#table-of-contents)
-- [Table of Contents](#table-of-contents-1)
 - [Model Details](#model-details)
   - [Model Description](#model-description)
 - [Uses](#uses)
@@ -57,15 +56,14 @@ Some cool model...
 <!-- Provide a longer summary of what this model is/does. -->
 Some cool model...
-- **Developed by:** More information needed
-- **Shared by [Optional]:** More information needed
-- **Model type:** Language model
 - **Language(s) (NLP):** en
 - **License:** apache-2.0
-- **Parent Model:** More information needed
 - **Resources for more information:** More information needed
     - [GitHub Repo](https://github.com/huggingface/m4/)
-    - [Associated Paper](Flamingo)
 # Uses
@@ -172,10 +170,9 @@ More information needed
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** More information needed
-- **Hours used:** More information needed
-- **Cloud Provider:** More information needed
-- **Compute Region:** More information needed
 - **Carbon Emitted:** unknown
 # Technical Specifications [optional]
@@ -190,11 +187,15 @@ More information needed
 ### Hardware
-More information needed
 ### Software
-More information needed
 # Citation

 - [Model Card for m4-80b](#model-card-for--model_id-)
 - [Table of Contents](#table-of-contents)
 - [Model Details](#model-details)
   - [Model Description](#model-description)
 - [Uses](#uses)
 <!-- Provide a longer summary of what this model is/does. -->
 Some cool model...
+- **Developed by:** HuggingFace
+- **Model type:** Multi-modal model (text+image)
 - **Language(s) (NLP):** en
 - **License:** apache-2.0
+- **Parent Model:** [laion/CLIP-ViT-H-14-laion2B-s32B-b79K](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) and  [huggingface/llama-65b](https://huggingface.co/huggingface/llama-65b)
 - **Resources for more information:** More information needed
     - [GitHub Repo](https://github.com/huggingface/m4/)
+    - Associated Paper: [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198)
 # Uses
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** 64 nodes of 8x 80GB A100 gpus, EFA network
+- **Hours used:** ~672 node hours
+- **Cloud Provider:** AWS Sagemaker
 - **Carbon Emitted:** unknown
 # Technical Specifications [optional]
 ### Hardware
+The training was performed on AWS SageMaker cluster with 64 nodes of 8x80GB A100 GPUs (512 GPUs total). The cluster uses the current EFA network which provides about 340GBps throughput.
+As the network is quite slow for the needs of DeepSpeed ZeRO-3 we were only able to clock ~90 TFLOPs.
 ### Software
+The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3. Plus [WebDataset](https://github.com/webdataset/webdataset) for data loading.
 # Citation