Instructions to use zer0int/CLIP-GmP-ViT-L-14 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zer0int/CLIP-GmP-ViT-L-14 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="zer0int/CLIP-GmP-ViT-L-14") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotImageClassification processor = AutoProcessor.from_pretrained("zer0int/CLIP-GmP-ViT-L-14") model = AutoModelForZeroShotImageClassification.from_pretrained("zer0int/CLIP-GmP-ViT-L-14") - Notebooks
- Google Colab
- Kaggle
What text can be generated?
Tried it, it doesn't generate Chinese, Japanese or Korean.
You are correct - this is the original CLIP ViT-L/14 model by OpenAI, which predominantly knows English (is only reliable to use with English). I just fine-tuned the model for higher accuracy (zero shot, retrieval, or as guidance / text encoder for generative AI). I did not train on non-English languages. However, the exact code I used to fine-tune the model (especially the Geometric Parametrization modification of the model) is available on my GitHub. You can adapt it to any Multi-lingual or non-English CLIP model - and 24 GB VRAM are sufficient for a good result, so you only need an RTX 3090 or similar to archive good results:
https://github.com/zer0int/CLIP-fine-tune
Although I have a 4090, I'm just a hobbyist and know nothing about modifying code. Thanks for your reply!