Revert back to original training description for compute and process

Files changed (1) hide show

README.md CHANGED Viewed

@@ -108,9 +108,9 @@ tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip')
 ### Compute Infrastructure
-Training was performed on 8 NVIDIA H100-100GB GPUs distributed over 2 nodes on [OSC's](https://www.osc.edu/) Cardinal HPC Cluster with global batch size 32,768 for 23 hours.
-Based on [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://doi.org/10.48550/arXiv.1910.09700), that's 3.97 kg eq. CO<sub>2</sub> eq., or 16km driven by an average ICE car.
 ### Training Data
@@ -121,7 +121,7 @@ This model was trained on [TreeOfLife-10M](https://huggingface.co/datasets/image
 - **Training regime:** fp16 mixed precision.
 We resize images to 224 x 224 pixels.
-We use a maximum learning rate of 1e4 with 500 linear warm-up steps, then use cosine decay to 0 over 50 epochs.
 We also use a weight decay of 0.2 and a batch size of 32K.
 ## Evaluation
@@ -245,7 +245,7 @@ In short, BioCLIP forms representations that more closely align to the taxonomic
 @software{bioclip2023,
   author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
   doi = {<update-on-generation>},
-  month = feb,
   title = {BioCLIP},
   year = {2026}
 }

 ### Compute Infrastructure
+Training was performed on 8 NVIDIA A100-80GB GPUs distributed over 2 nodes on [OSC's](https://www.osc.edu/) Ascend HPC Cluster with global batch size 32,768 for 4 days.
+Based on [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://doi.org/10.48550/arXiv.1910.09700), that's 132.71 kg of CO<sub>2</sub> eq., or 536km driven by an average ICE car.
 ### Training Data
 - **Training regime:** fp16 mixed precision.
 We resize images to 224 x 224 pixels.
+We use a maximum learning rate of 1e4 with 1000 linear warm-up steps, then use cosine decay to 0 over 100 epochs.
 We also use a weight decay of 0.2 and a batch size of 32K.
 ## Evaluation
 @software{bioclip2023,
   author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
   doi = {<update-on-generation>},
+  month = mar,
   title = {BioCLIP},
   year = {2026}
 }