Tianmu-MERE

Tianmu-MERE (Multimodal E-commerce Retrieval Embedding) is a multimodal embedding model designed for product understanding and retrieval in real-world e-commerce scenarios.

The model learns unified representations for product images, livestream video frames, and textual descriptions within a shared embedding space, enabling fine-grained product retrieval and cross-modal matching.

Architecture

Tianmu-MERE adopts a dual-encoder architecture consisting of a SigLIP2-based vision encoder and a BGE-based text encoder.

Both encoders are optimized jointly to align product images, livestream video frames, and textual descriptions into a shared embedding space for fine-grained product retrieval.

Highlights

Unified embedding space for product images, livestream video frames, and text.
Designed for real-world e-commerce product retrieval.
Strong fine-grained product matching capability.
Supports image-to-image, text-to-image, and video-frame-to-product retrieval.
Trained on large-scale Chinese e-commerce data.
Achieves strong performance on the LookBench benchmark.

Benchmark Results

Tianmu-MERE demonstrates strong retrieval and ranking performance across both real-world and AI-generated evaluation subsets on LookBench.

Dataset	Coarse@1	Coarse@5	Coarse@10	Coarse@20	Fine@1	Fine@5	Fine@10	Fine@20	nDCG@1	nDCG@5	nDCG@10	nDCG@20
Real Studio	91.99	96.14	97.92	98.32	69.63	87.24	90.60	93.77	63.24	55.75	53.47	52.71
Real StreetLook	77.78	87.36	89.70	91.85	58.21	75.13	79.10	82.57	49.84	46.74	44.36	42.82
AI-Gen StreetLook	88.75	97.50	98.75	98.75	78.75	92.50	95.62	96.25	69.48	69.16	70.35	72.62
AI-Gen Studio	93.78	98.96	99.48	100.00	79.27	93.26	96.37	98.45	66.32	66.57	66.96	70.76

Repository Structure

Tianmu-MERE/
├── README.md
├── config.json
├── model.safetensors
├── modeling_tianmu_mere.py
├── vision_encoder/
│   └── config.json
├── text_encoder/
│   └── config.json
└── processor/
    ├── preprocessor_config.json
    ├── tokenizer.json
    ├── tokenizer_config.json
    ├── special_tokens_map.json
    └── vocab.txt

model.safetensors: trained Tianmu-MERE student weights.
config.json: package metadata and local paths.
modeling_tianmu_mere.py: minimal inference wrapper.
vision_encoder/config.json: vision tower architecture config.
text_encoder/config.json: text tower architecture config.
processor/: image processor and tokenizer files.

Requirements

pip install torch torchvision transformers safetensors pillow

Usage

Clone or download this repository, then load the local inference wrapper:

from pathlib import Path
import importlib.util
import sys

model_dir = Path("./Tianmu-MERE")
module_path = model_dir / "modeling_tianmu_mere.py"

spec = importlib.util.spec_from_file_location("modeling_tianmu_mere", module_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)

model = module.TianmuMEREModel.from_pretrained(model_dir).eval()

Example image embedding:

image_embedding = model.encode_image(image)

Example text embedding:

text_embedding = model.encode_text("V领无袖碎花连衣裙")

Similarity computation:

similarity = image_embedding @ text_embedding.T

Training Data

Tianmu-MERE is trained on large-scale Chinese e-commerce data consisting of:

Product images.
Livestream video frames.
Product titles.
Product attribute descriptions.

The training objective focuses on multimodal alignment and fine-grained product retrieval.

Limitations

Tianmu-MERE is primarily optimized for Chinese e-commerce product retrieval. The current release focuses on fashion and general merchandise scenarios. Performance may degrade on domains that differ significantly from the training data.

License

This project is released under the Apache License 2.0.

Citation

If you find Tianmu-MERE useful in your research or applications, please cite:

@misc{tianmu_mere_2026,
  title = {Tianmu-MERE: Multimodal E-commerce Retrieval Embedding},
  author = {TianmuLab, Kuaishou Technology},
  year = {2026}
}

Downloads last month: 20

Safetensors

Model size

1B params

Tensor type

F32

Model tree for TianmuLab/Tianmu-MERE

Base model

BAAI/bge-base-zh-v1.5

Finetuned

(6)

this model