Model Description
This model is a CLIP-style vision–language model trained on full Medtrinity dataset.
Technical Specifications:
- Base model:
facebook/metaclip-b16-400m(CLIP-like architecture) - Architecture:
CLIPModelfrom thetransformerslibrary - Processor:
CLIPProcessor(handles both image and text preprocessing)
Example Usage
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import requests
import torch
model_id = "Mihara-bot/metaclip-b16-400m-medtrinity_Full"
processor = CLIPProcessor.from_pretrained(model_id)
model = CLIPModel.from_pretrained(model_id)
# Example image & text
url = "https://your-image-url"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
texts = ["a medical image of ...", "a normal image of ..."]
inputs = processor(
text=texts,
images=image,
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=77,
)
with torch.no_grad():
outputs = model(
pixel_values=inputs["pixel_values"],
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
)
image_embeds = outputs.image_embeds # (batch, dim)
text_embeds = outputs.text_embeds # (batch, dim)
# Calculate similarity
logits_per_image = image_embeds @ text_embeds.t()
probs = logits_per_image.softmax(dim=-1)
print(probs)
Intended Use
Vision–language tasks such as image–text retrieval, zero-shot classification, or image–text similarity in the biomedical/medical domain (depending on the specific dataset subset used).
Research on data selection, influence functions, and the efficient adaptation of CLIP models.
Not Intended For
Any safety‑critical clinical diagnosis or automated medical decision-making.
Any deployment without human oversight, especially within healthcare environments.
Limitations
The model is trained on Medtinity; it may reflect the biases and coverage limitations of the underlying dataset.
Performance outside the target domain (e.g., general web images) is likely weaker than generic CLIP models.
Training text largely consists of short captions; performance on long, structured clinical narratives may be limited.
Citation
If you find this model useful, please cite the CHIPS paper:
@misc{zhuang2025chipsefficientclipadaptation,
title={CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection},
author={Xinlin Zhuang and Yichen Li and Xiwei Liu and Haolin Yang and Yifan Lu and Ziyun Zou and Yulong Li and Huifa Li and Dongliang Chen and Qinglei Wang and Weiyang Liu and Ying Qian and Jiangming Shi and Imran Razzak},
year={2025},
eprint={2511.18519},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.18519},
}
- Downloads last month
- 18
Model tree for Mihara-bot/metaclip-b16-400m-medtrinity_Full
Base model
facebook/metaclip-b16-400m