quchenyuan
/

VisualSplit

+---
+license: apache-2.0
+tags:
+- vision
+---
+# VisualSplit
+**VisualSplit** is a ViT-based model that explicitly factorises an image into **classical visual descriptors**—such as **edges**, **color segmentation**, and **grayscale histogram**—and learns to reconstruct the image conditioned on those descriptors. This design yields **interpretable representations** where geometry (edges), albedo/appearance (segmented colors), and global tone (histogram) can be reasoned about or varied independently.
+> **Training data**: ImageNet-1K.
+---
+## Model Description
+- **Inputs** (at inference):
+  - An RGB image (for convenience) which is converted to descriptors using the provided `FeatureExtractor` (edges, color segmentation, grayscale histogram).
+- **Outputs**:
+  - A reconstructed RGB image tensor (same spatial size as the model’s training resolution; default `224×224` unless you trained otherwise).
+---
+## Getting Started (Inference)
+Below are two ways to run inference with the uploaded `model.safetensors`.
+### 1) Minimal PyTorch + safetensors (load state dict)
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+# 1) Import your model & config from the VisualSplit repo
+from visualsplit.models.CrossViT import CrossViTForPreTraining, CrossViTConfig
+from visualsplit.utils import FeatureExtractor
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# 2) Build a config matching your training (edit if you changed widths/depths)
+config = CrossViTConfig(
+    image_size=224,           # change if your training size differs
+    patch_size=16,
+    # ... any other config fields your repo exposes
+)
+model = CrossViTForPreTraining(config).to(device)
+model.eval()
+# 3) Download and load state dict from this model repo
+#    Replace REPO_ID with your Hugging Face model id, e.g. "HenryQUQ/visualsplit")
+ckpt_path = hf_hub_download(repo_id="REPO_ID", filename="model.safetensors")
+state_dict = load_file(ckpt_path)
+missing, unexpected = model.load_state_dict(state_dict, strict=False)
+print("Missing keys:", missing)
+print("Unexpected keys:", unexpected)
+# 4) Prepare an input image and extract descriptors
+from PIL import Image
+from torchvision import transforms
+image = Image.open("input.jpg").convert("RGB")
+transform = transforms.Compose([
+    transforms.Resize((config.image_size, config.image_size)),
+    transforms.ToTensor(),
+])
+pixel_values = transform(image).unsqueeze(0).to(device)   # (1, 3, H, W)
+# FeatureExtractor provided by the repo should return the required tensors
+extractor = FeatureExtractor().to(device)
+with torch.no_grad():
+    edge, gray_hist, segmented_rgb, _ = extractor(pixel_values)
+# 5) Run inference (reconstruction)
+with torch.no_grad():
+    outputs = model(
+        source_edge=edge,
+        source_gray_level_histogram=gray_hist,
+        source_segmented_rgb=segmented_rgb,
+    )
+# Your repo’s forward returns may differ; adjust the key accordingly:
+reconstructed = outputs["logits_reshape"]  # (1, 3, H, W)
+# 6) Convert to PIL for visualisation
+to_pil = transforms.ToPILImage()
+recon_img = to_pil(reconstructed.squeeze(0).cpu().clamp(0, 1))
+recon_img.save("reconstructed.png")
+print("Saved to reconstructed.png")
+```
+### 2) Reproducing the notebook flow (`notebook/validation.ipynb`)
+The repository provides a validation notebook that:
+1. Loads the trained model,
+2. Uses `FeatureExtractor` to compute **edges**, **color-segmented RGB**, and **grayscale histograms**,
+3. Runs the model to obtain a reconstructed image,
+4. Saves/visualises the result.
+---
+## Installation & Requirements
+```bash
+# clone the VisualSplit code
+git clone https://github.com/HenryQUQ/VisualSplit.git
+cd VisualSplit
+# pip install -e .
+```
+---
+## Training Data
+- **Dataset**: **ImageNet-1K**.
+-
+> This repository only hosts the **trained checkpoint for inference**. Follow the GitHub repo for the full training pipeline and data preparation scripts.
+---
+## Model Sources
+- **Code**: https://github.com/HenryQUQ/VisualSplit
+- **Weights (this page)**: this Hugging Face model repo
+---
+## Citation
+If you use this model or ideas, please cite:
+```bibtex
+@inproceedings{Qu2025VisualSplit,
+  title     = {Exploring Image Representation with Decoupled Classical Visual Descriptors},
+  author    = {Qu, Chenyuan and Chen, Hao and Jiao, Jianbo},
+  booktitle = {British Machine Vision Conference (BMVC)},
+  year      = {2025}
+}
+```
+---