OceanCLIP-0.15B: Marine Vision-Language Model

A vision-language model fine-tuned on marine imagery and textual data. Optimized for species identification, zero-shot classification, and cross-validation in underwater/sonar environments.

Model Details

  • Architecture: CLIP-style (Vision Transformer + Text Encoder)
  • Parameters: ~0.15B
  • Domain: Marine Biology, Underwater Imagery, Sonar Data
  • Framework: Compatible with transformers and open_clip

Usage

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
model = CLIPModel.from_pretrained("zjunlp/OceanCLIP-0.15B")
processor = CLIPProcessor.from_pretrained("zjunlp/OceanCLIP-0.15B")
image = Image.open("marine_image.jpg")
inputs = processor(
    text=["a photo of a clownfish", "a photo of a coral reef"],
    images=image,
    return_tensors="pt",
    padding=True
)
outputs = model(**inputs)
probs = outputs.logits_per_image.softmax(dim=-1)
print(probs)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including zjunlp/OceanCLIP-0.15B