Update README.md
Browse files
README.md
CHANGED
|
@@ -45,7 +45,7 @@ pipeline_tag: zero-shot-image-classification
|
|
| 45 |
</div>
|
| 46 |
|
| 47 |
|
| 48 |
-
BIOTROVE is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [
|
| 49 |
|
| 50 |
- **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
|
| 51 |
- **License:** MIT
|
|
@@ -57,18 +57,12 @@ These models were developed for the benefit of the AI community as an open-sourc
|
|
| 57 |
### Model Description
|
| 58 |
|
| 59 |
BioTrove is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
|
| 60 |
-
The models were trained on [
|
| 61 |
|
| 62 |
- **BIOTROVE-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. The training was conducted for 40 epochs.
|
| 63 |
- **BIOTROVE-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. The training was conducted for 8 epochs.
|
| 64 |
- **BIOTROVE-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. The training was conducted for 12 epochs.
|
| 65 |
|
| 66 |
-
|
| 67 |
-
To access the checkpoints of the above models, go to the `Files and versions` tab and download the weights. These weights can be directly used for zero-shot classification and finetuning. The filenames correspond to the specific model weights -
|
| 68 |
-
- **BIOTROVE-O:** - `BIOTROVE-vit-b-16-from-openai-epoch-40.pt`,
|
| 69 |
-
- **BIOTROVE-B:** - `BIOTROVE-vit-b-16-from-bioclip-epoch-8.pt`
|
| 70 |
-
- **BIOTROVE-M** - `BIOTROVE-vit-l-14-from-metaclip-epoch-12.pt`
|
| 71 |
-
|
| 72 |
### Model Training
|
| 73 |
**See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use BioTrove models in zero-shot image classification tasks.**
|
| 74 |
|
|
@@ -133,7 +127,7 @@ All the `BioTrove` models were evaluated on the challenging [CONFOUNDING-SPECIES
|
|
| 133 |
In general, we found that models trained on web-scraped data performed better with common
|
| 134 |
names, whereas models trained on specialist datasets performed better when using scientific names.
|
| 135 |
Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
|
| 136 |
-
level (kingdom), while models begin to benefit from specialist datasets like [
|
| 137 |
[Tree-of-Life-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `BioTrove` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.
|
| 138 |
|
| 139 |
Addressing these limitations will further enhance the applicability of models like `BioTrove` in real-world biodiversity monitoring tasks.
|
|
|
|
| 45 |
</div>
|
| 46 |
|
| 47 |
|
| 48 |
+
BIOTROVE is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
|
| 49 |
|
| 50 |
- **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
|
| 51 |
- **License:** MIT
|
|
|
|
| 57 |
### Model Description
|
| 58 |
|
| 59 |
BioTrove is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
|
| 60 |
+
The models were trained on [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
|
| 61 |
|
| 62 |
- **BIOTROVE-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. The training was conducted for 40 epochs.
|
| 63 |
- **BIOTROVE-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. The training was conducted for 8 epochs.
|
| 64 |
- **BIOTROVE-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. The training was conducted for 12 epochs.
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
### Model Training
|
| 67 |
**See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use BioTrove models in zero-shot image classification tasks.**
|
| 68 |
|
|
|
|
| 127 |
In general, we found that models trained on web-scraped data performed better with common
|
| 128 |
names, whereas models trained on specialist datasets performed better when using scientific names.
|
| 129 |
Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
|
| 130 |
+
level (kingdom), while models begin to benefit from specialist datasets like [BIOTROVE-40M](https://baskargroup.github.io/Arboretum/) and
|
| 131 |
[Tree-of-Life-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `BioTrove` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.
|
| 132 |
|
| 133 |
Addressing these limitations will further enhance the applicability of models like `BioTrove` in real-world biodiversity monitoring tasks.
|