Instructions to use lingkai/open-clip with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- OpenCLIP
How to use lingkai/open-clip with OpenCLIP:
import open_clip model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:lingkai/open-clip') tokenizer = open_clip.get_tokenizer('hf-hub:lingkai/open-clip') - Notebooks
- Google Colab
- Kaggle
ViT-B-32 OpenCLIP Model on LAION-400M
This is a ViT-B-32 model trained using OpenCLIP on the LAION-400M dataset.
Training Details
The model was trained with the following configuration:
- Model Architecture: ViT-B-32
- Dataset: LAION-400M
- Number of Samples: 400M (~ 268,836,185 filtered samples used)
- Hardware: 2 Nodes, each with 4 H200 141GB GPUs (Total 8 GPUs)
- Batch Size (per GPU): 4096
- Precision:
amp_bfloat16 - Total Epochs: 32
- Warmup Steps: 2000
Additional specific performance-enhancing flags enabled during training: --torchcompile, --local-loss, and --gather-with-grad.
Evaluation
- Eval Epoch: 0
- imagenet-zeroshot-val-top1: 0.6086
- imagenet-zeroshot-val-top5: 0.8632
Usage
import torch
import open_clip
# Load the model directly from huggingface
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='hf-hub:lingkai/open-clip')
tokenizer = open_clip.get_tokenizer('ViT-B-32')
# Example inference
image = preprocess(Image.open("astronaut.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
- Downloads last month
- -