bioscan-ml
/

clibd

Model card Files Files and versions

xet

Community

zmgong commited on Mar 10

Commit

a285463

1 Parent(s): 90df42a

Add README.md

Browse files

Files changed (1) hide show

README.md +49 -0

README.md ADDED Viewed

	@@ -0,0 +1,49 @@

+# Model Card for DuoduoCLIP
+In this model repo we provide the official pretrained models used in the paper **CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale.**
+The model usage and code can be found in the [github repo](https://github.com/bioscan-ml/clibd).
+***Note: We provide the main model in the initial release, we will soon upload the other models used in the paper.***
+## Model Details
+### Model Description
+- **Finetuned from model:**
+    -image: timm model (["vit_base_patch16_224"](https://huggingface.co/3dlg-hcvc/DuoduoCLIP))
+    -DNA barcode: BarcodeBERT ["bioscanr/barcodeBERT pre-trained on CANADA-1.5M"](https://huggingface.co/bioscan-ml/bioscan-clibd/tree/main/ckpt/BarcodeBERT/5_mer)
+    -text: Pre-trained BERT model  (["prajjwal1/bert-small"](https://huggingface.co/prajjwal1/bert-small))
+### Model Sources
+- **Repository:** https://github.com/bioscan-ml/clibd
+- **Paper:** https://arxiv.org/abs/2405.17537
+### Model Checkpoints
+- **ckpt/bioscan_clip/final_experiments/image_dna_4gpu_50epoch/best.pth:** The model trained on the BIOSCAN-1M dataset by aligning images and DNA.
+- **ckpt/bioscan_clip/final_experiments/image_dna_text_4gpu_50epoch/best.pth:** The model trained on the BIOSCAN-1M dataset by aligning images, DNA, and taxonomy labels.
+- **ckpt/bioscan_clip/new_5M_training/image_dna_4gpu_50epoch/best.pth:** The model trained on the BIOSCAN-5M dataset by aligning images and DNA.
+- **ckpt/bioscan_clip/new_5M_training/image_dna_text_4gpu_50epoch/best.pth:** The model trained on the BIOSCAN-5M dataset by aligning images, DNA, and taxonomy labels.
+## Training Data
+    -[BIOSCAN-1M](https://huggingface.co/datasets/bioscan-ml/BIOSCAN-1M).
+    -[BIOSCAN-5M](https://huggingface.co/datasets/bioscan-ml/BIOSCAN-5M).
+    You can also find the processed data from [here](https://huggingface.co/datasets/bioscan-ml/bioscan-clibd)
+**BibTeX:**
+```bibtex
+@article{gong2024clibd,
+  title={{CLIBD}: Bridging Vision and Genomics for Biodiversity Monitoring at Scale},
+  author={Gong, ZeMing and Wang, Austin T. and Huo, Xiaoliang and Haurum, Joakim Bruslund and Lowe, Scott C. and Taylor, Graham W. and Chang, Angel X.},
+  journal={arXiv preprint arXiv:2405.17537},
+  year={2024},
+  eprint={2405.17537},
+  archivePrefix={arXiv},
+  primaryClass={cs.AI},
+  doi={10.48550/arxiv.2405.17537},
+}
+```
+We would like to express our gratitude for the use of the INSECT dataset, which played a pivotal role in the completion of our experiments. Additionally, we acknowledge the use and modification of code from the [Fine-Grained-ZSL-with-DNA](https://github.com/sbadirli/Fine-Grained-ZSL-with-DNA) repository, which facilitated part of our experimental work. The contributions of these resources have been invaluable to our project, and we appreciate the efforts of all developers and researchers involved.
+This reseach was supported by the Government of Canada’s New Frontiers in Research Fund (NFRF) [NFRFT-2020-00073],
+Canada CIFAR AI Chair grants, and the Pioneer Centre for AI (DNRF grant number P1).
+This research was also enabled in part by support provided by the Digital Research Alliance of Canada (alliancecan.ca).