How to use from the
Use from the
OpenCLIP library
import open_clip

model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:lingkai/open-clip')
tokenizer = open_clip.get_tokenizer('hf-hub:lingkai/open-clip')

ViT-B-32 OpenCLIP Model on LAION-400M

This is a ViT-B-32 model trained using OpenCLIP on the LAION-400M dataset.

Training Details

The model was trained with the following configuration:

  • Model Architecture: ViT-B-32
  • Dataset: LAION-400M
  • Number of Samples: 400M (~ 268,836,185 filtered samples used)
  • Hardware: 2 Nodes, each with 4 H200 141GB GPUs (Total 8 GPUs)
  • Batch Size (per GPU): 4096
  • Precision: amp_bfloat16
  • Total Epochs: 32
  • Warmup Steps: 2000

Additional specific performance-enhancing flags enabled during training: --torchcompile, --local-loss, and --gather-with-grad.

Evaluation

  • Eval Epoch: 0
  • imagenet-zeroshot-val-top1: 0.6086
  • imagenet-zeroshot-val-top5: 0.8632

Usage

import torch
import open_clip

# Load the model directly from huggingface
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='hf-hub:lingkai/open-clip')

tokenizer = open_clip.get_tokenizer('ViT-B-32')

# Example inference
image = preprocess(Image.open("astronaut.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support