| license: apache-2.0 | |
| pipeline_tag: zero-shot-image-classification | |
| library_name: openclip | |
| # LongCLIP model | |
| This repository contains the weights of the LongCLIP model. | |
| Paper: https://huggingface.co/papers/2403.15378 | |
| Github repository: https://github.com/beichenzbc/long-clip | |
| ## Installation | |
| ```bash | |
| git clone https://github.com/beichenzbc/Long-CLIP.git | |
| cd Long-CLIP | |
| ``` | |
| ## Usage | |
| ``` | |
| from model import longclip | |
| import torch | |
| from PIL import Image | |
| from huggingface_hub import hf_hub_download | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| filepath = hf_hub_download(repo_id="BeichenZhang/LongCLIP-L", filename="longclip-L.pt") | |
| model, preprocess = longclip.load(filepath, device=device) | |
| text = longclip.tokenize(["A man is crossing the street with a red car parked nearby.", "A man is driving a car in an urban scene."]).to(device) | |
| image = preprocess(Image.open("./img/demo.png")).unsqueeze(0).to(device) | |
| with torch.no_grad(): | |
| image_features = model.encode_image(image) | |
| text_features = model.encode_text(text) | |
| logits_per_image = image_features @ text_features.T | |
| probs = logits_per_image.softmax(dim=-1).cpu().numpy() | |
| print("Label probs:", probs) | |
| ``` | |