| license: apple-amlr | |
| A CLIP (Contrastive Language-Image Pre-training) ViT-B/32 model trained on Conceptual Captions 12M, Conceptual Captions 3M, and Shutterstock 15M. | |
| Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data. | |
| This model is a DFN trained on publicly available data. | |
| This model has been converted to PyTorch from the original JAX checkpoints from Axlearn (https://github.com/apple/axlearn). | |
| ## Model Details | |
| - **Model Type:** Contrastive Image-Text, Zero-Shot Image Classification. | |
| - **Dataset:** CC12M + CC3M + SS15M | |
| - **Papers:** | |
| - Data Filtering Networks: https://arxiv.org/abs/2309.17425 | |
| - **Examples Seen:** 1.28B | |
| ## Citation | |
| ```bibtex | |
| @article{fang2023data, | |
| title={Data Filtering Networks}, | |
| author={Fang, Alex and Jose, Albin Madappally and Jain, Amit and Schmidt, Ludwig and Toshev, Alexander and Shankar, Vaishaal}, | |
| journal={arXiv preprint arXiv:2309.17425}, | |
| year={2023} | |
| } | |
| ``` |