Clarify TOL-200M revision used for training

#5
by egrace479 - opened
Files changed (1) hide show
  1. README.md +43 -35
README.md CHANGED
@@ -1,35 +1,43 @@
1
- ---
2
- license:
3
- - mit
4
- language:
5
- - en
6
- library_name: open_clip
7
- model_name: "BioCLIP 2"
8
- model_description: "Foundation model for biology organismal images. It is trained on TreeOfLife-200M on the basis of a CLIP model (ViT-14/L) pre-trained on LAION-2B. BioCLIP 2 yields state-of-the-art performance in recognizing various species. More importantly, it demonstrates emergent properties beyond species classification after extensive hierarchical contrastive training."
9
- tags:
10
- - biology
11
- - CV
12
- - images
13
- - imageomics
14
- - clip
15
- - species-classification
16
- - biological visual task
17
- - multimodal
18
- - animals
19
- - species
20
- - taxonomy
21
- - rare species
22
- - endangered species
23
- - evolutionary biology
24
- - knowledge-guided
25
- - zero-shot-image-classification
26
- datasets:
27
- - imageomics/TreeOfLife-200M
28
- - GBIF
29
- - bioscan-ml/BIOSCAN-5M
30
- - EOL
31
- - FathomNet
32
- ---
 
 
 
 
 
 
 
 
33
 
34
  <!--
35
  Image with caption (jpg or png):
@@ -67,7 +75,7 @@ We evaluate BioCLIP 2 on a diverse set of biological tasks. Through training at
67
 
68
  ### Model Sources
69
 
70
- - **Homepage:** https://imageomics.github.io/bioclip-2/
71
  - **Repository:** [BioCLIP 2](https://github.com/Imageomics/bioclip-2)
72
  - **Paper:** [BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning](https://doi.org/10.48550/arXiv.2505.23883)
73
  - **Demo:** [BioCLIP 2 Demo](https://huggingface.co/spaces/imageomics/bioclip-2-demo)
@@ -110,7 +118,7 @@ tokenizer = open_clip.get_tokenizer('hf-hub:imageomics/bioclip-2')
110
 
111
  ### Training Data
112
 
113
- The model was trained with [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M).
114
  The dataset consists of nearly 214M images covering 952K taxa.
115
  The scale of TreeOfLife-200M fosters the emergent properties of BioCLIP 2.
116
 
@@ -439,4 +447,4 @@ Any opinions, findings and conclusions or recommendations expressed in this mate
439
  Jianyang Gu
440
 
441
  ## Model Card Contact
442
- [gu.1220@osu.edu](mailto:gu.1220@osu.edu)
 
1
+ ---
2
+ license:
3
+ - mit
4
+ language:
5
+ - en
6
+ library_name: open_clip
7
+ model_name: BioCLIP 2
8
+ model_description: >-
9
+ Foundation model for biology organismal images. It is trained on
10
+ TreeOfLife-200M on the basis of a CLIP model (ViT-14/L) pre-trained on
11
+ LAION-2B. BioCLIP 2 yields state-of-the-art performance in recognizing various
12
+ species. More importantly, it demonstrates emergent properties beyond species
13
+ classification after extensive hierarchical contrastive training.
14
+ tags:
15
+ - biology
16
+ - CV
17
+ - images
18
+ - imageomics
19
+ - clip
20
+ - species-classification
21
+ - biological visual task
22
+ - multimodal
23
+ - animals
24
+ - plants
25
+ - fungi
26
+ - species
27
+ - taxonomy
28
+ - rare species
29
+ - endangered species
30
+ - evolutionary biology
31
+ - knowledge-guided
32
+ - zero-shot-image-classification
33
+ datasets:
34
+ - imageomics/TreeOfLife-200M
35
+ - GBIF
36
+ - bioscan-ml/BIOSCAN-5M
37
+ - EOL
38
+ - FathomNet
39
+ new_version: imageomics/bioclip-2.5-vith14
40
+ ---
41
 
42
  <!--
43
  Image with caption (jpg or png):
 
75
 
76
  ### Model Sources
77
 
78
+ - **Homepage:** [BioCLIP 2 Project Page](https://imageomics.github.io/bioclip-2/)
79
  - **Repository:** [BioCLIP 2](https://github.com/Imageomics/bioclip-2)
80
  - **Paper:** [BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning](https://doi.org/10.48550/arXiv.2505.23883)
81
  - **Demo:** [BioCLIP 2 Demo](https://huggingface.co/spaces/imageomics/bioclip-2-demo)
 
118
 
119
  ### Training Data
120
 
121
+ The model was trained with [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M) (Revision [a8f38b4](http://huggingface.co/datasets/imageomics/TreeOfLife-200M/tree/a8f38b4388579862c56ae57d6f094c2ac0e92e12)).
122
  The dataset consists of nearly 214M images covering 952K taxa.
123
  The scale of TreeOfLife-200M fosters the emergent properties of BioCLIP 2.
124
 
 
447
  Jianyang Gu
448
 
449
  ## Model Card Contact
450
+ [gu.1220@osu.edu](mailto:gu.1220@osu.edu)