MODA-Fashion-Matryoshka

Flexible-dimension fashion embeddings β€” choose your own size from 64 to 768.

MODA-Fashion-Matryoshka uses Matryoshka Representation Learning to produce nested embeddings where the leading N dimensions are themselves a valid embedding. This means you can use 256-d for 3Γ— smaller indexes with virtually no quality loss, or 64-d for 12Γ— compression while still beating FashionSigLIP.

Highlights

  • 256-d ties 768-d on Fine R@1 (67.42 vs 67.21) β€” 3Γ— index reduction for free
  • 128-d still beats FashionSigLIP-768 by +2.39 R@1 at 6Γ— smaller index
  • 64-d matches FashionSigLIP-768 (+0.21) at 12Γ— smaller index
  • INT8 quantization at 256-d is lossless β€” 12Γ— total compression over FashionSigLIP fp32

LookBench Results by Dimension

Dim Bytes/vec Fine R@1 vs FashionSigLIP-768 Index (1M imgs)
768 3,072 67.21 +3.37 2.93 GB
512 2,048 66.75 +2.91 1.95 GB
384 1,536 67.07 +3.23 1.46 GB
256 1,024 67.42 +3.58 977 MB
128 512 66.23 +2.39 488 MB
64 256 64.05 +0.21 244 MB

Quantization at 256-d (full LookBench protocol)

Precision Bytes/vec Fine R@1 Index (1M imgs)
fp32 1,024 67.16 977 MB
fp16 512 67.08 488 MB
int8 256 67.16 244 MB
binary+rerank 32+512 67.29 ~519 MB
binary (pure) 32 63.50 30 MB

Model Spec

Property Value
Architecture ViT-B/16-SigLIP (full CLIP: vision + text)
Parameters 203.2M
Embedding Dimension 768 (full) or 512 / 384 / 256 / 128 / 64
Recommended Dimension 256 (sweet spot: best R@1, 3x smaller index)
Output L2-normalized float32 vector
Model Size (safetensors) ~775 MB
Input Resolution 224 Γ— 224
Framework OpenCLIP
Precision float32

Inference β€” Quick Start

A standalone inference.py is included in this directory. It supports dimension selection and a full sweep.

# Default: encode at 256-d (recommended)
python inference.py --image query.jpg

# Specify dimension
python inference.py --image query.jpg --dim 128

# Two images + cosine similarity at 256-d
python inference.py --image img1.jpg img2.jpg --dim 256 --similarity

# Sweep all dimensions (64 β†’ 768) with index cost
python inference.py --image img1.jpg --sweep

# Run on GPU/MPS
python inference.py --image query.jpg --device cuda

Python API β€” Truncate to Any Dimension

import open_clip
import torch
import torch.nn.functional as F
from PIL import Image

model, _, preprocess = open_clip.create_model_and_transforms(
    "ViT-B-16-SigLIP",
    pretrained="path/to/moda-fashion-matryoshka/open_clip_model.safetensors",
)
model.eval()

image = preprocess(Image.open("query.jpg")).unsqueeze(0)
with torch.no_grad():
    full_emb = model.encode_image(image)      # [1, 768]

dim = 256
emb_256 = F.normalize(full_emb[:, :dim], dim=-1)  # [1, 256]

INT8 Quantization

import numpy as np

emb = model.encode_image(images)[:, :256]
emb = F.normalize(emb, dim=-1).numpy()

emb_min = emb.min(axis=1, keepdims=True)
emb_max = emb.max(axis=1, keepdims=True)
emb_int8 = np.round((emb - emb_min) / (emb_max - emb_min + 1e-8) * 255).astype(np.uint8)

Requirements

open_clip_torch>=2.20.0
torch>=2.0
Pillow
safetensors

Training Details

  • Base model: MODA-Fashion-Distilled (the 768-d distilled student)
  • Method: Matryoshka Representation Learning with 6 nested slices {64, 128, 256, 384, 512, 768}
  • Loss: Per-slice RKD-Distance + similarity mimicry against the 2048-d ensemble teacher
  • Optimizer: AdamW, LR=5e-6, batch=128
  • Epochs: 2 (best at step 1000)
  • Wall time: 255 minutes on Apple M-series (MPS)

Related Models

Model Dim Fine R@1 Best for
MODA-Fashion-Distilled 768 67.63 Best overall quality
MODA-Fashion-Matryoshka (this model) 64-768 67.42 (256d) Flexible dim, 3x smaller index
MODA-Fashion-Vision-FP16 768 67.42 Smallest (186 MB), edge/mobile
MODA-Fashion-Distilled-512d 512 67.63 Compact index, highest nDCG@5
MODA-Fashion-DeepFashion2 768 66.52 Simplest recipe, no distillation

License

MIT

Citation

If you use this model, please cite:

@software{moda2026,
  title  = {MODA: Open-source benchmark and models for fashion search},
  author = {Hopit AI},
  year   = {2026},
  url    = {https://github.com/hopit-ai/Moda}
}
Downloads last month
113
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train HopitAI/moda-fashion-matryoshka

Paper for HopitAI/moda-fashion-matryoshka