MODA-Fashion-Matryoshka

Flexible-dimension fashion embeddings — choose your own size from 64 to 768.

MODA-Fashion-Matryoshka uses Matryoshka Representation Learning to produce nested embeddings where the leading N dimensions are themselves a valid embedding. This means you can use 256-d for 3× smaller indexes with virtually no quality loss, or 64-d for 12× compression while still beating FashionSigLIP.

Highlights

256-d ties 768-d on Fine R@1 (67.42 vs 67.21) — 3× index reduction for free
128-d still beats FashionSigLIP-768 by +2.39 R@1 at 6× smaller index
64-d matches FashionSigLIP-768 (+0.21) at 12× smaller index
INT8 quantization at 256-d is lossless — 12× total compression over FashionSigLIP fp32

LookBench Results by Dimension

Dim	Bytes/vec	Fine R@1	vs FashionSigLIP-768	Index (1M imgs)
768	3,072	67.21	+3.37	2.93 GB
512	2,048	66.75	+2.91	1.95 GB
384	1,536	67.07	+3.23	1.46 GB
256	1,024	67.42	+3.58	977 MB
128	512	66.23	+2.39	488 MB
64	256	64.05	+0.21	244 MB

Quantization at 256-d (full LookBench protocol)

Precision	Bytes/vec	Fine R@1	Index (1M imgs)
fp32	1,024	67.16	977 MB
fp16	512	67.08	488 MB
int8	256	67.16	244 MB
binary+rerank	32+512	67.29	~519 MB
binary (pure)	32	63.50	30 MB

Model Spec

Property	Value
Architecture	ViT-B/16-SigLIP (full CLIP: vision + text)
Parameters	203.2M
Embedding Dimension	768 (full) or 512 / 384 / 256 / 128 / 64
Recommended Dimension	256 (sweet spot: best R@1, 3x smaller index)
Output	L2-normalized float32 vector
Model Size (safetensors)	~775 MB
Input Resolution	224 × 224
Framework	OpenCLIP
Precision	float32

Inference — Quick Start

A standalone inference.py is included in this directory. It supports dimension selection and a full sweep.

# Default: encode at 256-d (recommended)
python inference.py --image query.jpg

# Specify dimension
python inference.py --image query.jpg --dim 128

# Two images + cosine similarity at 256-d
python inference.py --image img1.jpg img2.jpg --dim 256 --similarity

# Sweep all dimensions (64 → 768) with index cost
python inference.py --image img1.jpg --sweep

# Run on GPU/MPS
python inference.py --image query.jpg --device cuda

Python API — Truncate to Any Dimension

import open_clip
import torch
import torch.nn.functional as F
from PIL import Image

model, _, preprocess = open_clip.create_model_and_transforms(
    "ViT-B-16-SigLIP",
    pretrained="path/to/moda-fashion-matryoshka/open_clip_model.safetensors",
)
model.eval()

image = preprocess(Image.open("query.jpg")).unsqueeze(0)
with torch.no_grad():
    full_emb = model.encode_image(image)      # [1, 768]

dim = 256
emb_256 = F.normalize(full_emb[:, :dim], dim=-1)  # [1, 256]

INT8 Quantization

import numpy as np

emb = model.encode_image(images)[:, :256]
emb = F.normalize(emb, dim=-1).numpy()

emb_min = emb.min(axis=1, keepdims=True)
emb_max = emb.max(axis=1, keepdims=True)
emb_int8 = np.round((emb - emb_min) / (emb_max - emb_min + 1e-8) * 255).astype(np.uint8)

Requirements

open_clip_torch>=2.20.0
torch>=2.0
Pillow
safetensors

Training Details

Base model: MODA-Fashion-Distilled (the 768-d distilled student)
Method: Matryoshka Representation Learning with 6 nested slices {64, 128, 256, 384, 512, 768}
Loss: Per-slice RKD-Distance + similarity mimicry against the 2048-d ensemble teacher
Optimizer: AdamW, LR=5e-6, batch=128
Epochs: 2 (best at step 1000)
Wall time: 255 minutes on Apple M-series (MPS)

Related Models

Model	Dim	Fine R@1	Best for
MODA-Fashion-Distilled	768	67.63	Best overall quality
MODA-Fashion-Matryoshka (this model)	64-768	67.42 (256d)	Flexible dim, 3x smaller index
MODA-Fashion-Vision-FP16	768	67.42	Smallest (186 MB), edge/mobile
MODA-Fashion-Distilled-512d	512	67.63	Compact index, highest nDCG@5
MODA-Fashion-DeepFashion2	768	66.52	Simplest recipe, no distillation

License

MIT

Citation

If you use this model, please cite:

@software{moda2026,
  title  = {MODA: Open-source benchmark and models for fashion search},
  author = {Hopit AI},
  year   = {2026},
  url    = {https://github.com/hopit-ai/Moda}
}

Downloads last month: 113

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train HopitAI/moda-fashion-matryoshka

Paper for HopitAI/moda-fashion-matryoshka

Matryoshka Representation Learning

Paper • 2205.13147 • Published May 26, 2022 • 25