Zero-Shot Image Classification
Transformers
Safetensors
flexict
feature-extraction
medical-imaging
ct
vision-language
custom_code
Instructions to use ricklisz123/FlexiCT-3D-VLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ricklisz123/FlexiCT-3D-VLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="ricklisz123/FlexiCT-3D-VLM", trust_remote_code=True) pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ricklisz123/FlexiCT-3D-VLM", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
FlexiCT-3D-VLM
FlexiCT-3D-VLM aligns the FlexiCT 3D vision encoder with a Qwen3 embedding text tower.
Input and preprocessing
Image preprocessing matches FlexiCT-3D: the default output shape is [B, 1, 160, 160, 160].
Text uses Qwen3 tokenizer behavior with left padding and max length 8192.
from transformers import AutoModel, AutoProcessor
processor = AutoProcessor.from_pretrained("ricklisz123/FlexiCT-3D-VLM", trust_remote_code=True)
model = AutoModel.from_pretrained("ricklisz123/FlexiCT-3D-VLM", trust_remote_code=True)
inputs = processor(
images="/path/to/ct.nii.gz",
text=["pulmonary nodule", "no acute abnormality"],
return_tensors="pt",
)
outputs = model(**inputs)
similarity = outputs.logits_per_image
Outputs
image_embeds and text_embeds are L2-normalized embeddings. logits_per_image contains learned-temperature-scaled image-text similarity scores.
Limitations
This model is for research retrieval and text-image scoring. It is not a diagnostic device. The text tower depends on Qwen3 tokenizer/config files unless they are already cached or included in the local repo.
- Downloads last month
- 16