wip
Browse files
README.md
CHANGED
|
@@ -15,7 +15,6 @@ Some cool model...
|
|
| 15 |
|
| 16 |
- [Model Card for m4-80b](#model-card-for--model_id-)
|
| 17 |
- [Table of Contents](#table-of-contents)
|
| 18 |
-
- [Table of Contents](#table-of-contents-1)
|
| 19 |
- [Model Details](#model-details)
|
| 20 |
- [Model Description](#model-description)
|
| 21 |
- [Uses](#uses)
|
|
@@ -57,15 +56,14 @@ Some cool model...
|
|
| 57 |
<!-- Provide a longer summary of what this model is/does. -->
|
| 58 |
Some cool model...
|
| 59 |
|
| 60 |
-
- **Developed by:**
|
| 61 |
-
- **
|
| 62 |
-
- **Model type:** Language model
|
| 63 |
- **Language(s) (NLP):** en
|
| 64 |
- **License:** apache-2.0
|
| 65 |
-
- **Parent Model:**
|
| 66 |
- **Resources for more information:** More information needed
|
| 67 |
- [GitHub Repo](https://github.com/huggingface/m4/)
|
| 68 |
-
-
|
| 69 |
|
| 70 |
# Uses
|
| 71 |
|
|
@@ -172,10 +170,9 @@ More information needed
|
|
| 172 |
|
| 173 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 174 |
|
| 175 |
-
- **Hardware Type:**
|
| 176 |
-
- **Hours used:**
|
| 177 |
-
- **Cloud Provider:**
|
| 178 |
-
- **Compute Region:** More information needed
|
| 179 |
- **Carbon Emitted:** unknown
|
| 180 |
|
| 181 |
# Technical Specifications [optional]
|
|
@@ -190,11 +187,15 @@ More information needed
|
|
| 190 |
|
| 191 |
### Hardware
|
| 192 |
|
| 193 |
-
|
|
|
|
|
|
|
|
|
|
| 194 |
|
| 195 |
### Software
|
| 196 |
|
| 197 |
-
|
|
|
|
| 198 |
|
| 199 |
# Citation
|
| 200 |
|
|
|
|
| 15 |
|
| 16 |
- [Model Card for m4-80b](#model-card-for--model_id-)
|
| 17 |
- [Table of Contents](#table-of-contents)
|
|
|
|
| 18 |
- [Model Details](#model-details)
|
| 19 |
- [Model Description](#model-description)
|
| 20 |
- [Uses](#uses)
|
|
|
|
| 56 |
<!-- Provide a longer summary of what this model is/does. -->
|
| 57 |
Some cool model...
|
| 58 |
|
| 59 |
+
- **Developed by:** HuggingFace
|
| 60 |
+
- **Model type:** Multi-modal model (text+image)
|
|
|
|
| 61 |
- **Language(s) (NLP):** en
|
| 62 |
- **License:** apache-2.0
|
| 63 |
+
- **Parent Model:** [laion/CLIP-ViT-H-14-laion2B-s32B-b79K](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) and [huggingface/llama-65b](https://huggingface.co/huggingface/llama-65b)
|
| 64 |
- **Resources for more information:** More information needed
|
| 65 |
- [GitHub Repo](https://github.com/huggingface/m4/)
|
| 66 |
+
- Associated Paper: [Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198)
|
| 67 |
|
| 68 |
# Uses
|
| 69 |
|
|
|
|
| 170 |
|
| 171 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 172 |
|
| 173 |
+
- **Hardware Type:** 64 nodes of 8x 80GB A100 gpus, EFA network
|
| 174 |
+
- **Hours used:** ~672 node hours
|
| 175 |
+
- **Cloud Provider:** AWS Sagemaker
|
|
|
|
| 176 |
- **Carbon Emitted:** unknown
|
| 177 |
|
| 178 |
# Technical Specifications [optional]
|
|
|
|
| 187 |
|
| 188 |
### Hardware
|
| 189 |
|
| 190 |
+
The training was performed on AWS SageMaker cluster with 64 nodes of 8x80GB A100 GPUs (512 GPUs total). The cluster uses the current EFA network which provides about 340GBps throughput.
|
| 191 |
+
|
| 192 |
+
As the network is quite slow for the needs of DeepSpeed ZeRO-3 we were only able to clock ~90 TFLOPs.
|
| 193 |
+
|
| 194 |
|
| 195 |
### Software
|
| 196 |
|
| 197 |
+
The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3. Plus [WebDataset](https://github.com/webdataset/webdataset) for data loading.
|
| 198 |
+
|
| 199 |
|
| 200 |
# Citation
|
| 201 |
|