Improve "Usage" section with correct tokenizer example (#2)

Browse files

- Improve "Usage" section with correct tokenizer example (cfa1e16f340c543dd7871a103af7e2596b4cd0fe)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +66 -6

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 library_name: pytorch
 license: mit
-pipeline_tag: unconditional-image-generation
 tags:
 - computer-vision
 - image-generation
@@ -65,11 +65,71 @@ We evaluate our approach across six generative models on ImageNet 256×256 and o
 ### Installation
-```bash
-git clone https://github.com/Jiawei-Yang/DeTok.git
-cd DeTok
-conda create -n detok python=3.10 -y && conda activate detok
-pip install -r requirements.txt
 ```
 ## Training Details

 ---
 library_name: pytorch
 license: mit
+pipeline_tag: image-feature-extraction
 tags:
 - computer-vision
 - image-generation
 ### Installation
+To use DeTok for extracting latent embeddings from images, you need to:
+1.  **Clone the official DeTok repository**:
+    ```bash
+    git clone https://github.com/Jiawei-Yang/DeTok.git
+    cd DeTok
+    pip install -r requirements.txt
+    ```
+2.  **Download the pre-trained tokenizer weights**:
+    You can download the `DeTok-BB-decoder_ft` checkpoint (recommended) from [here](https://huggingface.co/jjiaweiyang/l-DeTok/resolve/main/detok-BB-gamm3.0-m0.7-decoder_tuned.pth) and place it in your working directory (e.g., `detok-BB-gamm3.0-m0.7-decoder_tuned.pth`).
+### Extract latent embeddings
+Here's a sample Python code snippet for feature extraction using the `DeTok_BB` tokenizer:
+```python
+import torch
+from PIL import Image
+from torchvision.transforms import transforms
+from models.detok import DeTok_BB # Import from the cloned DeTok repository
+# --- Configuration (matching DeTok-BB-decoder_ft architecture from paper) ---
+model_params = {
+    "img_size": 256,
+    "patch_size": 16,
+    "in_chans": 3,
+    "embed_dim": 768,
+    "depths": [2, 2, 8, 2],
+    "num_heads": [3, 6, 12, 24],
+}
+tokenizer_weights_path = "detok-BB-gamm3.0-m0.7-decoder_tuned.pth" # Path to your downloaded weights
+# 1. Initialize and load the tokenizer
+tokenizer = DeTok_BB(**model_params).eval()
+if torch.cuda.is_available():
+    tokenizer = tokenizer.cuda()
+# Load checkpoint state_dict
+checkpoint = torch.load(tokenizer_weights_path, map_location='cpu')
+tokenizer.load_state_dict(checkpoint['model'])
+# 2. Prepare your image
+transform = transforms.Compose([
+    transforms.Resize(model_params["img_size"]),
+    transforms.CenterCrop(model_params["img_size"]),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
+])
+# Replace 'path/to/your/image.jpg' with your actual image file
+image = Image.new('RGB', (model_params["img_size"], model_params["img_size"]), color = 'red') # Example dummy image
+# image = Image.open("path/to/your/image.jpg").convert("RGB")
+pixel_values = transform(image).unsqueeze(0) # Add batch dimension
+if torch.cuda.is_available():
+    pixel_values = pixel_values.cuda()
+# 3. Extract latent embeddings
+with torch.no_grad():
+    latent_embeddings = tokenizer.encode(pixel_values)
+print(f"Shape of latent embeddings: {latent_embeddings.shape}")
+# Expected output for a 256x256 input image with 16x16 patches is (1, 256, 768),
+# representing 256 image patches with 768-dimensional embeddings.
 ```
 ## Training Details