Instructions to use TianmuLab/Tianmu-MERE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TianmuLab/Tianmu-MERE with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="TianmuLab/Tianmu-MERE")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TianmuLab/Tianmu-MERE", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Tianmu-MERE
Tianmu-MERE (Multimodal E-commerce Retrieval Embedding) is a multimodal embedding model designed for product understanding and retrieval in real-world e-commerce scenarios.
The model learns unified representations for product images, livestream video frames, and textual descriptions within a shared embedding space, enabling fine-grained product retrieval and cross-modal matching.
Architecture
Tianmu-MERE adopts a dual-encoder architecture consisting of a SigLIP2-based vision encoder and a BGE-based text encoder.
Both encoders are optimized jointly to align product images, livestream video frames, and textual descriptions into a shared embedding space for fine-grained product retrieval.
Highlights
- Unified embedding space for product images, livestream video frames, and text.
- Designed for real-world e-commerce product retrieval.
- Strong fine-grained product matching capability.
- Supports image-to-image, text-to-image, and video-frame-to-product retrieval.
- Trained on large-scale Chinese e-commerce data.
- Achieves strong performance on the LookBench benchmark.
Benchmark Results
Tianmu-MERE demonstrates strong retrieval and ranking performance across both real-world and AI-generated evaluation subsets on LookBench.
| Dataset | Coarse@1 | Coarse@5 | Coarse@10 | Coarse@20 | Fine@1 | Fine@5 | Fine@10 | Fine@20 | nDCG@1 | nDCG@5 | nDCG@10 | nDCG@20 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Real Studio | 91.99 | 96.14 | 97.92 | 98.32 | 69.63 | 87.24 | 90.60 | 93.77 | 63.24 | 55.75 | 53.47 | 52.71 |
| Real StreetLook | 77.78 | 87.36 | 89.70 | 91.85 | 58.21 | 75.13 | 79.10 | 82.57 | 49.84 | 46.74 | 44.36 | 42.82 |
| AI-Gen StreetLook | 88.75 | 97.50 | 98.75 | 98.75 | 78.75 | 92.50 | 95.62 | 96.25 | 69.48 | 69.16 | 70.35 | 72.62 |
| AI-Gen Studio | 93.78 | 98.96 | 99.48 | 100.00 | 79.27 | 93.26 | 96.37 | 98.45 | 66.32 | 66.57 | 66.96 | 70.76 |
Repository Structure
Tianmu-MERE/
βββ README.md
βββ config.json
βββ model.safetensors
βββ modeling_tianmu_mere.py
βββ vision_encoder/
β βββ config.json
βββ text_encoder/
β βββ config.json
βββ processor/
βββ preprocessor_config.json
βββ tokenizer.json
βββ tokenizer_config.json
βββ special_tokens_map.json
βββ vocab.txt
model.safetensors: trained Tianmu-MERE student weights.config.json: package metadata and local paths.modeling_tianmu_mere.py: minimal inference wrapper.vision_encoder/config.json: vision tower architecture config.text_encoder/config.json: text tower architecture config.processor/: image processor and tokenizer files.
Requirements
pip install torch torchvision transformers safetensors pillow
Usage
Clone or download this repository, then load the local inference wrapper:
from pathlib import Path
import importlib.util
import sys
model_dir = Path("./Tianmu-MERE")
module_path = model_dir / "modeling_tianmu_mere.py"
spec = importlib.util.spec_from_file_location("modeling_tianmu_mere", module_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
model = module.TianmuMEREModel.from_pretrained(model_dir).eval()
Example image embedding:
image_embedding = model.encode_image(image)
Example text embedding:
text_embedding = model.encode_text("Vι’ζ θ’η’θ±θΏθ‘£θ£")
Similarity computation:
similarity = image_embedding @ text_embedding.T
Training Data
Tianmu-MERE is trained on large-scale Chinese e-commerce data consisting of:
- Product images.
- Livestream video frames.
- Product titles.
- Product attribute descriptions.
The training objective focuses on multimodal alignment and fine-grained product retrieval.
Limitations
Tianmu-MERE is primarily optimized for Chinese e-commerce product retrieval. The current release focuses on fashion and general merchandise scenarios. Performance may degrade on domains that differ significantly from the training data.
License
This project is released under the Apache License 2.0.
Citation
If you find Tianmu-MERE useful in your research or applications, please cite:
@misc{tianmu_mere_2026,
title = {Tianmu-MERE: Multimodal E-commerce Retrieval Embedding},
author = {TianmuLab, Kuaishou Technology},
year = {2026}
}
- Downloads last month
- 20
Model tree for TianmuLab/Tianmu-MERE
Base model
BAAI/bge-base-zh-v1.5