Zero-Shot Image Classification
Transformers
Safetensors
gil_clip
feature-extraction
clip
fashion-clip
vision
multimodal
fashion
custom_code
Instructions to use gilgmesh/gil-clip with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use gilgmesh/gil-clip with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="gilgmesh/gil-clip", trust_remote_code=True) pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("gilgmesh/gil-clip", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -27,22 +27,32 @@ For convenience, the model also returns the original Fashion-CLIP image embeddin
|
|
| 27 |
|
| 28 |
The text tower is unchanged from Fashion-CLIP. This is by design: GIL training adjusts the image side via the oracle-guided projector while keeping the text side as the alignment anchor.
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
## Usage
|
| 31 |
|
| 32 |
```python
|
|
|
|
| 33 |
from PIL import Image
|
|
|
|
| 34 |
from transformers import AutoModel, CLIPProcessor
|
| 35 |
|
| 36 |
model = AutoModel.from_pretrained("gilgmesh/gil-clip", trust_remote_code=True)
|
| 37 |
processor = CLIPProcessor.from_pretrained("gilgmesh/gil-clip")
|
| 38 |
model.eval()
|
| 39 |
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
| 41 |
texts = ["sleeveless navy top", "black dress", "graphic tee"]
|
| 42 |
|
| 43 |
inputs = processor(text=texts, images=image, return_tensors="pt", padding=True)
|
| 44 |
|
| 45 |
-
import torch
|
| 46 |
with torch.no_grad():
|
| 47 |
outputs = model(**inputs)
|
| 48 |
|
|
|
|
| 27 |
|
| 28 |
The text tower is unchanged from Fashion-CLIP. This is by design: GIL training adjusts the image side via the oracle-guided projector while keeping the text side as the alignment anchor.
|
| 29 |
|
| 30 |
+
## Example
|
| 31 |
+
|
| 32 |
+
<img src="https://huggingface.co/gilgmesh/gil-clip/resolve/main/example_full.png" alt="Example fashion image" width="400">
|
| 33 |
+
|
| 34 |
+
For best results, GIL-CLIP is run on the cropped garment region rather than the full scene. The cropped version of the image above (`example_top.png` in this repo) is what the usage snippet below feeds into the model.
|
| 35 |
+
|
| 36 |
## Usage
|
| 37 |
|
| 38 |
```python
|
| 39 |
+
import torch
|
| 40 |
from PIL import Image
|
| 41 |
+
from huggingface_hub import hf_hub_download
|
| 42 |
from transformers import AutoModel, CLIPProcessor
|
| 43 |
|
| 44 |
model = AutoModel.from_pretrained("gilgmesh/gil-clip", trust_remote_code=True)
|
| 45 |
processor = CLIPProcessor.from_pretrained("gilgmesh/gil-clip")
|
| 46 |
model.eval()
|
| 47 |
|
| 48 |
+
# Load the cropped example image straight from this repo
|
| 49 |
+
example_path = hf_hub_download("gilgmesh/gil-clip", "example_top.png")
|
| 50 |
+
image = Image.open(example_path).convert("RGB")
|
| 51 |
+
|
| 52 |
texts = ["sleeveless navy top", "black dress", "graphic tee"]
|
| 53 |
|
| 54 |
inputs = processor(text=texts, images=image, return_tensors="pt", padding=True)
|
| 55 |
|
|
|
|
| 56 |
with torch.no_grad():
|
| 57 |
outputs = model(**inputs)
|
| 58 |
|