imageomics
/

bioclip

@@ -5,7 +5,7 @@ language:
 - en
 library_name: open_clip
 model_name: BioCLIP
-model_description: "Foundation model for the tree of life, built using CLIP architecture as a vision model for general organismal biology. It is trained on TreeOfLife-10M, our specially-created dataset covering over 450K taxa--the most biologically diverse ML-ready dataset available to date."
 tags:
 - zero-shot-image-classification
 - clip
@@ -13,6 +13,8 @@ tags:
 - CV
 - images
 - animals
 - species
 - taxonomy
 - rare species
@@ -30,11 +32,14 @@ datasets:
 # Model Card for BioCLIP
 <!--
 This modelcard has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). And further altered to suit Imageomics Institute needs -->
 BioCLIP is a foundation model for the tree of life, built using CLIP architecture as a vision model for general organismal biology.
-It is trained on [TreeOfLife-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M), our specially-created dataset covering over 450K taxa--the most biologically diverse ML-ready dataset available to date.
 Through rigorous benchmarking on a diverse set of fine-grained biological classification tasks, BioCLIP consistently outperformed existing baselines by 16% to 17% absolute.
 Through intrinsic evaluation, we found that BioCLIP learned a hierarchical representation aligned to the tree of life, which demonstrates its potential for robust generalizability.
@@ -47,7 +52,7 @@ Through intrinsic evaluation, we found that BioCLIP learned a hierarchical repre
 BioCLIP is based on OpenAI's [CLIP](https://openai.com/research/clip).
 We trained the model on [TreeOfLife-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) from OpenAI's ViT-B/16 checkpoint, using [OpenCLIP's](https://github.com/mlfoundations/open_clip) code.
 BioCLIP is trained with the standard CLIP objective to imbue the model with an understanding, not just of different species, but of the hierarchical structure that relates species across the tree of life.
-In this way, BioCLIP offers potential to aid biologists in discovery of new and related creatures, since it does not see the 454K different taxa as distinct classes, but as part of an interconnected hierarchy.
 - **Developed by:** Samuel Stevens, Jiaman Wu, Matthew J. Thompson, Elizabeth G. Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M. Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, and Yu Su
@@ -59,9 +64,11 @@ This model was developed for the benefit of the community as an open-source prod
 ### Model Sources
 - **Repository:** [BioCLIP](https://github.com/Imageomics/BioCLIP)
-- **Paper:** BioCLIP: A Vision Foundation Model for the Tree of Life ([arXiv](https://doi.org/10.48550/arXiv.2311.18803))
 - **Demo:** [BioCLIP Demo](https://huggingface.co/spaces/imageomics/bioclip-demo)
 ## Uses
@@ -72,7 +79,7 @@ The ViT-B/16 vision encoder is recommended as a base model for any computer visi
 ### Direct Use
 See the demo [here](https://huggingface.co/spaces/imageomics/bioclip-demo) for examples of zero-shot classification.
-It can also be used in a few-shot setting with a KNN; please see [our paper](https://doi.org/10.48550/arXiv.2311.18803) for details for both few-shot and zero-shot settings without fine-tuning.
 ## Bias, Risks, and Limitations
@@ -127,7 +134,7 @@ We tested BioCLIP on the following collection of 10 biologically-relevant tasks.
  - [Birds 525](https://www.kaggle.com/datasets/gpiosenka/100-bird-species): We evaluated on the 2,625 test images provided with the dataset.
  - [Rare Species](https://huggingface.co/datasets/imageomics/rare-species): A new dataset we curated for the purpose of testing this model and to contribute to the ML for Conservation community. It consists of 400 species labeled Near Threatened through Extinct in the Wild by the [IUCN Red List](https://www.iucnredlist.org/), with 30 images per species. For more information, see our dataset, [Rare Species](https://huggingface.co/datasets/imageomics/rare-species).
-For more information about the contents of these datasets, see Table 2 and associated sections of [our paper](https://doi.org/10.48550/arXiv.2311.18803).
 ### Metrics
@@ -137,7 +144,7 @@ We use top-1 and top-5 accuracy to evaluate models, and validation loss to choos
 We compare BioCLIP to OpenAI's CLIP and OpenCLIP's LAION-2B checkpoint.
 Here are the zero-shot classification results on our benchmark tasks.
-Please see [our paper](https://doi.org/10.48550/arXiv.2311.18803) for few-shot results.
 <table cellpadding="0" cellspacing="0">
   <thead>
@@ -190,18 +197,32 @@ Please see [our paper](https://doi.org/10.48550/arXiv.2311.18803) for few-shot r
       <td>20.4</td>
     </tr>
     <tr>
-      <td>BioCLIP</td>
       <td><b>72.1</b></td>
-      <td><b>6.1</b></td>
-      <td><b>34.8</b></td>
-      <td><b>20.4</b></td>
-      <td><b>91.4</b></td>
-      <td>40.7</td>
-      <td><b>24.4</b></td>
-      <td><b>38.6</b></td>
       <td><b>28.4</b></td>
-      <td><b>38.0</b></td>
-      <td><b>39.4</b></td>
     </tr>
     <tr>
       <td>iNat21 Only</td>
@@ -210,7 +231,7 @@ Please see [our paper](https://doi.org/10.48550/arXiv.2311.18803) for few-shot r
       <td>30.7</td>
       <td>11.5</td>
       <td>88.2</td>
-      <td><b>43.0</b></td>
       <td>18.4</td>
       <td>25.6</td>
       <td>20.5</td>
@@ -220,6 +241,7 @@ Please see [our paper](https://doi.org/10.48550/arXiv.2311.18803) for few-shot r
   </tbody>
 </table>
 ### Summary
@@ -227,7 +249,7 @@ BioCLIP outperforms general-domain baselines by 17% on average for zero-shot.
 ### Model Examination
-We encourage readers to see Section 4.6 of [our paper](https://doi.org/10.48550/arXiv.2311.18803).
 In short, BioCLIP forms representations that more closely align to the taxonomic hierarchy compared to general-domain baselines like CLIP or OpenCLIP.
@@ -238,14 +260,16 @@ In short, BioCLIP forms representations that more closely align to the taxonomic
 ```
 @software{bioclip2023,
   author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
-  doi = {10.57967/hf/1511},
-  month = nov,
   title = {BioCLIP},
-  version = {v0.1},
-  year = {2023}
 }
 ```
 Please also cite our paper:
 ```
@@ -288,7 +312,9 @@ Please also consider citing OpenCLIP, iNat21 and BIOSCAN-1M:
 ## Acknowledgements
-The authors would like to thank Josef Uyeda, Jim Balhoff, Dan Rubenstein, Hank Bart, Hilmar Lapp, Sara Beery, and colleagues from the Imageomics Institute and the OSU NLP group for their valuable feedback. We also thank the BIOSCAN-1M team and the iNaturalist team for making their data available and easy to use, and Jennifer Hammack at EOL for her invaluable help in accessing EOL’s images.
 The [Imageomics Institute](https://imageomics.org) is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

 - en
 library_name: open_clip
 model_name: BioCLIP
+model_description: "Foundation model for the tree of life, built using CLIP architecture (ViT-B/16) as a vision model for general organismal biology. It is trained on TreeOfLife-10M, our specially-created dataset covering over 390K taxa--the most biologically diverse ML-ready dataset available at release."
 tags:
 - zero-shot-image-classification
 - clip
 - CV
 - images
 - animals
+- plants
+- fungi
 - species
 - taxonomy
 - rare species
 # Model Card for BioCLIP
+If you are looking for the original BioCLIP model presented in the paper, please see [Revision 7b4abf1](https://huggingface.co/imageomics/bioclip/tree/7b4abf1f6ee747c15de00c7d28a5e62990b5dabc) (the most accurate documentation for the original version at BioCLIP [Revision ce901ab](https://huggingface.co/imageomics/bioclip/tree/ce901ab3c6a913f9e9ef94ce6d27761069f4f01c)), trained on TreeOfLife-10M [Revision ffa2a31](https://huggingface.co/datasets/imageomics/TreeOfLife-10M/tree/ffa2a318a1396f2f9e456ba171d3b5b5d8b4f051).
 <!--
 This modelcard has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). And further altered to suit Imageomics Institute needs -->
 BioCLIP is a foundation model for the tree of life, built using CLIP architecture as a vision model for general organismal biology.
+It is trained on [TreeOfLife-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M), our specially-created dataset covering over 390K taxa--the most biologically diverse ML-ready dataset available at its release.
 Through rigorous benchmarking on a diverse set of fine-grained biological classification tasks, BioCLIP consistently outperformed existing baselines by 16% to 17% absolute.
 Through intrinsic evaluation, we found that BioCLIP learned a hierarchical representation aligned to the tree of life, which demonstrates its potential for robust generalizability.
 BioCLIP is based on OpenAI's [CLIP](https://openai.com/research/clip).
 We trained the model on [TreeOfLife-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) from OpenAI's ViT-B/16 checkpoint, using [OpenCLIP's](https://github.com/mlfoundations/open_clip) code.
 BioCLIP is trained with the standard CLIP objective to imbue the model with an understanding, not just of different species, but of the hierarchical structure that relates species across the tree of life.
+In this way, BioCLIP offers potential to aid biologists in discovery of new and related creatures, since it does not see the 394K different taxa as distinct classes, but as part of an interconnected hierarchy.
 - **Developed by:** Samuel Stevens, Jiaman Wu, Matthew J. Thompson, Elizabeth G. Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M. Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, and Yu Su
 ### Model Sources
+- **Homepage:** [BioCLIP Page](https://imageomics.github.io/bioclip/) and [BioCLIP Ecosystem Site](https://imageomics.github.io/bioclip-ecosystem/)
 - **Repository:** [BioCLIP](https://github.com/Imageomics/BioCLIP)
+- **Paper:** [BioCLIP: A Vision Foundation Model for the Tree of Life](https://openaccess.thecvf.com/content/CVPR2024/papers/Stevens_BioCLIP_A_Vision_Foundation_Model_for_the_Tree_of_Life_CVPR_2024_paper.pdf)
 - **Demo:** [BioCLIP Demo](https://huggingface.co/spaces/imageomics/bioclip-demo)
+<!-- [arXiv](https://doi.org/10.48550/arXiv.2311.18803) -->
 ## Uses
 ### Direct Use
 See the demo [here](https://huggingface.co/spaces/imageomics/bioclip-demo) for examples of zero-shot classification.
+It can also be used in a few-shot setting with a KNN; please see [our paper](https://openaccess.thecvf.com/content/CVPR2024/papers/Stevens_BioCLIP_A_Vision_Foundation_Model_for_the_Tree_of_Life_CVPR_2024_paper.pdf) for details for both few-shot and zero-shot settings without fine-tuning.
 ## Bias, Risks, and Limitations
  - [Birds 525](https://www.kaggle.com/datasets/gpiosenka/100-bird-species): We evaluated on the 2,625 test images provided with the dataset.
  - [Rare Species](https://huggingface.co/datasets/imageomics/rare-species): A new dataset we curated for the purpose of testing this model and to contribute to the ML for Conservation community. It consists of 400 species labeled Near Threatened through Extinct in the Wild by the [IUCN Red List](https://www.iucnredlist.org/), with 30 images per species. For more information, see our dataset, [Rare Species](https://huggingface.co/datasets/imageomics/rare-species).
+For more information about the contents of these datasets, see Table 2 and associated sections of [our paper](https://openaccess.thecvf.com/content/CVPR2024/papers/Stevens_BioCLIP_A_Vision_Foundation_Model_for_the_Tree_of_Life_CVPR_2024_paper.pdf).
 ### Metrics
 We compare BioCLIP to OpenAI's CLIP and OpenCLIP's LAION-2B checkpoint.
 Here are the zero-shot classification results on our benchmark tasks.
+Please see [our paper](https://openaccess.thecvf.com/content/CVPR2024/papers/Stevens_BioCLIP_A_Vision_Foundation_Model_for_the_Tree_of_Life_CVPR_2024_paper.pdf) for few-shot results.
 <table cellpadding="0" cellspacing="0">
   <thead>
       <td>20.4</td>
     </tr>
     <tr>
+      <td>BioCLIP (Revision 7b4abf1)</td>
       <td><b>72.1</b></td>
+      <td>6.1</td>
+      <td>34.9</td>
+      <td>20.5</td>
+      <td>88.2</td>
+      <td>40.9</td>
+      <td>19.0</td>
+      <td>38.5</td>
       <td><b>28.4</b></td>
+      <td>37.1</td>
+      <td>38.6</td>
+    </tr>
+    <tr>
+      <td>BioCLIP</td>
+      <td><b> - </b></td>
+      <td><b>7.5</b></td>
+      <td><b>47.9</b></td>
+      <td><b>22.4</b></td>
+      <td><b>89.5</b></td>
+      <td><b>61.2</b></td>
+      <td><b>21.6</b></td>
+      <td><b>53.9</b></td>
+      <td><b> - </b></td>
+      <td><b>43.7</b></td>
+      <td><b>43.46</b></td>
     </tr>
     <tr>
       <td>iNat21 Only</td>
       <td>30.7</td>
       <td>11.5</td>
       <td>88.2</td>
+      <td>43.0</td>
       <td>18.4</td>
       <td>25.6</td>
       <td>20.5</td>
   </tbody>
 </table>
+<small>Note that Birds525 was no longer available when we revised the model weights, and thus cannot be reported.</small>
 ### Summary
 ### Model Examination
+We encourage readers to see Section 4.6 of [our paper](https://openaccess.thecvf.com/content/CVPR2024/papers/Stevens_BioCLIP_A_Vision_Foundation_Model_for_the_Tree_of_Life_CVPR_2024_paper.pdf).
 In short, BioCLIP forms representations that more closely align to the taxonomic hierarchy compared to general-domain baselines like CLIP or OpenCLIP.
 ```
 @software{bioclip2023,
   author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M. Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
+  doi = {<update-on-generation>},
+  month = mar,
   title = {BioCLIP},
+  year = {2026}
 }
 ```
+Note that this version is updated from the original BioCLIP model presented in the paper, please see [Revision 7b4abf1](https://huggingface.co/imageomics/bioclip/tree/7b4abf1f6ee747c15de00c7d28a5e62990b5dabc) (the most accurate documentation for the original version at BioCLIP [Revision ce901ab](https://huggingface.co/imageomics/bioclip/tree/ce901ab3c6a913f9e9ef94ce6d27761069f4f01c)), trained on TreeOfLife-10M [Revision ffa2a31](https://huggingface.co/datasets/imageomics/TreeOfLife-10M/tree/ffa2a318a1396f2f9e456ba171d3b5b5d8b4f051). This updated version of the model uses an updated TreeOfLife-10M which resolves taxonomic alignment issues discovered in the first version. The taxonomic resolution was completed using [TaxonoPy](https://github.com/Imageomics/TaxonoPy), which was developed for [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M).
 Please also cite our paper:
 ```
 ## Acknowledgements
+The authors would like to thank Josef Uyeda, Jim Balhoff, Dan Rubenstein, Hank Bart, Hilmar Lapp, Sara Beery, and colleagues from the Imageomics Institute and the OSU NLP group for their valuable feedback. We also thank the BIOSCAN-1M team and the iNaturalist team for making their data available and easy to use, and Jennifer Hammock at EOL for her invaluable help in accessing EOL’s images.
+Additionally, we thank Ziheng Zhang and Jianyang Gu for running the training and evaluation of the revised model (with the taxonomic fixes) while they were training [BioCAP](https://huggingface.co/imageomics/biocap).
 The [Imageomics Institute](https://imageomics.org) is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.