gilgmesh
/

gil-clip

@@ -27,22 +27,32 @@ For convenience, the model also returns the original Fashion-CLIP image embeddin
 The text tower is unchanged from Fashion-CLIP. This is by design: GIL training adjusts the image side via the oracle-guided projector while keeping the text side as the alignment anchor.
 ## Usage
 ```python
 from PIL import Image
 from transformers import AutoModel, CLIPProcessor
 model = AutoModel.from_pretrained("gilgmesh/gil-clip", trust_remote_code=True)
 processor = CLIPProcessor.from_pretrained("gilgmesh/gil-clip")
 model.eval()
-image = Image.open("path/to/image.png").convert("RGB")
 texts = ["sleeveless navy top", "black dress", "graphic tee"]
 inputs = processor(text=texts, images=image, return_tensors="pt", padding=True)
-import torch
 with torch.no_grad():
     outputs = model(**inputs)

 The text tower is unchanged from Fashion-CLIP. This is by design: GIL training adjusts the image side via the oracle-guided projector while keeping the text side as the alignment anchor.
+## Example
+<img src="https://huggingface.co/gilgmesh/gil-clip/resolve/main/example_full.png" alt="Example fashion image" width="400">
+For best results, GIL-CLIP is run on the cropped garment region rather than the full scene. The cropped version of the image above (`example_top.png` in this repo) is what the usage snippet below feeds into the model.
 ## Usage
 ```python
+import torch
 from PIL import Image
+from huggingface_hub import hf_hub_download
 from transformers import AutoModel, CLIPProcessor
 model = AutoModel.from_pretrained("gilgmesh/gil-clip", trust_remote_code=True)
 processor = CLIPProcessor.from_pretrained("gilgmesh/gil-clip")
 model.eval()
+# Load the cropped example image straight from this repo
+example_path = hf_hub_download("gilgmesh/gil-clip", "example_top.png")
+image = Image.open(example_path).convert("RGB")
 texts = ["sleeveless navy top", "black dress", "graphic tee"]
 inputs = processor(text=texts, images=image, return_tensors="pt", padding=True)
 with torch.no_grad():
     outputs = model(**inputs)