SPRIGHT-T2I/spright_coco
Viewer • Updated • 39.4k • 275 • 5
How to use zer0int/CLIP-SAE-ViT-L-14 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("zero-shot-image-classification", model="zer0int/CLIP-SAE-ViT-L-14")
pipe(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png",
candidate_labels=["animals", "humans", "landscape"],
) # Load model directly
from transformers import AutoProcessor, AutoModelForZeroShotImageClassification
processor = AutoProcessor.from_pretrained("zer0int/CLIP-SAE-ViT-L-14")
model = AutoModelForZeroShotImageClassification.from_pretrained("zer0int/CLIP-SAE-ViT-L-14")Love ❤️ this CLIP?
ᐅ Buy me a coffee on Ko-Fi ☕
3PscBrWYvrutXedLmvpcnQbE12Py8qLqMK
SAE = Sparse autoencoder
Accuracy ImageNet/ObjectNet my GmP: 91% > SAE (this): 89% > OpenAI pre-trained: 84.5%
But, it's fun to use with e.g. Flux.1 - get the Text-Encoder TE only version ⬇️ and try it!
And this SAE CLIP has best results for linear probe @ LAION-AI/CLIP_benchmark (see below)
This CLIP direct download is also the best CLIP to use for HunyuanVideo.
Required: Use with my zer0int/ComfyUI-HunyuanVideo-Nyan node (changes influence of LLM vs. CLIP; otherwise, difference is very little).
