| | --- |
| | pipeline_tag: image-classification |
| | tags: |
| | - vision |
| | inference: false |
| | widget: |
| | - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png |
| | example_title: Cat & Dog |
| | --- |
| | |
| | # Category Search from External Databases (CaSED) |
| |
|
| | Disclaimer: The model card is taken and modified from the official repository, which can be found [here](https://github.com/altndrr/vic). The paper can be found [here](https://arxiv.org/abs/2306.00917). |
| |
|
| | ## Intended uses & limitations |
| |
|
| | You can use the model for vocabulary-free image classification, i.e. classification with CLIP-like models without a pre-defined list of class names. |
| |
|
| | ## How to use |
| |
|
| | Here is how to use this model: |
| |
|
| | ```python |
| | import requests |
| | from PIL import Image |
| | from transformers import AutoModel, CLIPProcessor |
| | |
| | # download an image from the internet |
| | url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
| | image = Image.open(requests.get(url, stream=True).raw) |
| | |
| | # load the model and the processor |
| | model = AutoModel.from_pretrained("altndrr/cased", trust_remote_code=True) |
| | processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14") |
| | |
| | # get the model outputs |
| | images = processor(images=[image], return_tensors="pt", padding=True) |
| | outputs = model(images, alpha=0.7) |
| | labels, scores = outputs["vocabularies"][0], outputs["scores"][0] |
| | |
| | # print the top 5 most likely labels for the image |
| | values, indices = scores.sort(dim=-1, descending=True) |
| | print("\nTop predictions:\n") |
| | for value, index in zip(values, indices): |
| | print(f"{labels[index]:>16s}: {100 * value.item():.2f}%") |
| | ``` |
| |
|
| | The model depends on some libraries you have to install manually before execution: |
| |
|
| | ```bash |
| | pip install torch faiss-cpu flair inflect nltk pyarrow transformers |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```latex |
| | @article{conti2023vocabularyfree, |
| | title={Vocabulary-free Image Classification}, |
| | author={Alessandro Conti and Enrico Fini and Massimiliano Mancini and Paolo Rota and Yiming Wang and Elisa Ricci}, |
| | year={2023}, |
| | journal={NeurIPS}, |
| | } |
| | ``` |
| |
|