Revert back to original training description for compute and process
Browse files
README.md
CHANGED
|
@@ -108,9 +108,9 @@ tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip')
|
|
| 108 |
|
| 109 |
### Compute Infrastructure
|
| 110 |
|
| 111 |
-
Training was performed on 8 NVIDIA
|
| 112 |
|
| 113 |
-
Based on [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://doi.org/10.48550/arXiv.1910.09700), that's
|
| 114 |
|
| 115 |
### Training Data
|
| 116 |
|
|
@@ -121,7 +121,7 @@ This model was trained on [TreeOfLife-10M](https://huggingface.co/datasets/image
|
|
| 121 |
- **Training regime:** fp16 mixed precision.
|
| 122 |
|
| 123 |
We resize images to 224 x 224 pixels.
|
| 124 |
-
We use a maximum learning rate of 1e4 with
|
| 125 |
We also use a weight decay of 0.2 and a batch size of 32K.
|
| 126 |
|
| 127 |
## Evaluation
|
|
@@ -245,7 +245,7 @@ In short, BioCLIP forms representations that more closely align to the taxonomic
|
|
| 245 |
@software{bioclip2023,
|
| 246 |
author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
|
| 247 |
doi = {<update-on-generation>},
|
| 248 |
-
month =
|
| 249 |
title = {BioCLIP},
|
| 250 |
year = {2026}
|
| 251 |
}
|
|
|
|
| 108 |
|
| 109 |
### Compute Infrastructure
|
| 110 |
|
| 111 |
+
Training was performed on 8 NVIDIA A100-80GB GPUs distributed over 2 nodes on [OSC's](https://www.osc.edu/) Ascend HPC Cluster with global batch size 32,768 for 4 days.
|
| 112 |
|
| 113 |
+
Based on [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://doi.org/10.48550/arXiv.1910.09700), that's 132.71 kg of CO<sub>2</sub> eq., or 536km driven by an average ICE car.
|
| 114 |
|
| 115 |
### Training Data
|
| 116 |
|
|
|
|
| 121 |
- **Training regime:** fp16 mixed precision.
|
| 122 |
|
| 123 |
We resize images to 224 x 224 pixels.
|
| 124 |
+
We use a maximum learning rate of 1e4 with 1000 linear warm-up steps, then use cosine decay to 0 over 100 epochs.
|
| 125 |
We also use a weight decay of 0.2 and a batch size of 32K.
|
| 126 |
|
| 127 |
## Evaluation
|
|
|
|
| 245 |
@software{bioclip2023,
|
| 246 |
author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
|
| 247 |
doi = {<update-on-generation>},
|
| 248 |
+
month = mar,
|
| 249 |
title = {BioCLIP},
|
| 250 |
year = {2026}
|
| 251 |
}
|