Tianmu-MERE

Tianmu-MERE (Multimodal E-commerce Retrieval Embedding) is a multimodal embedding model designed for product understanding and retrieval in real-world e-commerce scenarios.

The model learns unified representations for product images, livestream video frames, and textual descriptions within a shared embedding space, enabling fine-grained product retrieval and cross-modal matching.

Architecture

Tianmu-MERE adopts a dual-encoder architecture consisting of a SigLIP2-based vision encoder and a BGE-based text encoder.

Both encoders are optimized jointly to align product images, livestream video frames, and textual descriptions into a shared embedding space for fine-grained product retrieval.

Highlights

  • Unified embedding space for product images, livestream video frames, and text.
  • Designed for real-world e-commerce product retrieval.
  • Strong fine-grained product matching capability.
  • Supports image-to-image, text-to-image, and video-frame-to-product retrieval.
  • Trained on large-scale Chinese e-commerce data.
  • Achieves strong performance on the LookBench benchmark.

Benchmark Results

Tianmu-MERE demonstrates strong retrieval and ranking performance across both real-world and AI-generated evaluation subsets on LookBench.

Dataset Coarse@1 Coarse@5 Coarse@10 Coarse@20 Fine@1 Fine@5 Fine@10 Fine@20 nDCG@1 nDCG@5 nDCG@10 nDCG@20
Real Studio 91.99 96.14 97.92 98.32 69.63 87.24 90.60 93.77 63.24 55.75 53.47 52.71
Real StreetLook 77.78 87.36 89.70 91.85 58.21 75.13 79.10 82.57 49.84 46.74 44.36 42.82
AI-Gen StreetLook 88.75 97.50 98.75 98.75 78.75 92.50 95.62 96.25 69.48 69.16 70.35 72.62
AI-Gen Studio 93.78 98.96 99.48 100.00 79.27 93.26 96.37 98.45 66.32 66.57 66.96 70.76

Repository Structure

Tianmu-MERE/
β”œβ”€β”€ README.md
β”œβ”€β”€ config.json
β”œβ”€β”€ model.safetensors
β”œβ”€β”€ modeling_tianmu_mere.py
β”œβ”€β”€ vision_encoder/
β”‚   └── config.json
β”œβ”€β”€ text_encoder/
β”‚   └── config.json
└── processor/
    β”œβ”€β”€ preprocessor_config.json
    β”œβ”€β”€ tokenizer.json
    β”œβ”€β”€ tokenizer_config.json
    β”œβ”€β”€ special_tokens_map.json
    └── vocab.txt
  • model.safetensors: trained Tianmu-MERE student weights.
  • config.json: package metadata and local paths.
  • modeling_tianmu_mere.py: minimal inference wrapper.
  • vision_encoder/config.json: vision tower architecture config.
  • text_encoder/config.json: text tower architecture config.
  • processor/: image processor and tokenizer files.

Requirements

pip install torch torchvision transformers safetensors pillow

Usage

Clone or download this repository, then load the local inference wrapper:

from pathlib import Path
import importlib.util
import sys

model_dir = Path("./Tianmu-MERE")
module_path = model_dir / "modeling_tianmu_mere.py"

spec = importlib.util.spec_from_file_location("modeling_tianmu_mere", module_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)

model = module.TianmuMEREModel.from_pretrained(model_dir).eval()

Example image embedding:

image_embedding = model.encode_image(image)

Example text embedding:

text_embedding = model.encode_text("Vι’†ζ— θ’–η’ŽθŠ±θΏžθ‘£θ£™")

Similarity computation:

similarity = image_embedding @ text_embedding.T

Training Data

Tianmu-MERE is trained on large-scale Chinese e-commerce data consisting of:

  • Product images.
  • Livestream video frames.
  • Product titles.
  • Product attribute descriptions.

The training objective focuses on multimodal alignment and fine-grained product retrieval.

Limitations

Tianmu-MERE is primarily optimized for Chinese e-commerce product retrieval. The current release focuses on fashion and general merchandise scenarios. Performance may degrade on domains that differ significantly from the training data.

License

This project is released under the Apache License 2.0.

Citation

If you find Tianmu-MERE useful in your research or applications, please cite:

@misc{tianmu_mere_2026,
  title = {Tianmu-MERE: Multimodal E-commerce Retrieval Embedding},
  author = {TianmuLab, Kuaishou Technology},
  year = {2026}
}
Downloads last month
20
Safetensors
Model size
1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for TianmuLab/Tianmu-MERE

Finetuned
(6)
this model