Instructions to use llm-semantic-router/multi-modal-embed-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use llm-semantic-router/multi-modal-embed-large with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("llm-semantic-router/multi-modal-embed-large") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Add model.safetensors (bit-exact conversion of model.pt)
This adds a safetensors export of the existing model.pt so the checkpoint can be consumed outside a Python runtime by safetensors-native loaders such as candle's VarBuilder - the load path vllm-project/semantic-router's candle-binding uses. safetensors also loads without pickle deserialization, which some production environments require.
The conversion is bit-exact at the tensor level, not a re-encode:
- Loaded
model.pt(revisione21cde3ccc414c56f504b322662f42c603a939ee, the currentmain) withtorch.load(..., weights_only=True)and saved withsafetensors.torch.save_file. All 1,393 tensors carried over with names, shapes, and dtypes unchanged. The mixed precision is preserved exactly as stored: the mmbert text tower is bfloat16; the full SigLIP2 encoder (vision tower plus its paired text encoder, which the checkpoint bundles), the Whisper audio tower, and the projection heads are float32. - Bit-exact verification: fresh reload of the written file, every tensor compared against the original as raw bytes (flat uint8 views, so even -0.0 and NaN bit patterns would count), all identical.
- Functional smoke on top of that: built
MultiModalSentenceEmbeddertwice from the packagedsrc/hf_st_mmcode, one loaded frommodel.ptand one frommodel.safetensors(strictload_state_dict,missing=0 unexpected=0both), and encoded the same synthetic text + image + audio fixtures. Embeddings are bitwise identical (max abs diff 0.0). - Rust receipt: loaded the file with candle's
VarBuilder::from_mmaped_safetensors(candle-nn 0.10.2) and the raw mmap API; all 1,393 tensors enumerate, and one f32 and one bf16 tensor materialize with correct shapes and dtypes, their first values spot-matching the PyTorch reference exactly (the all-tensor equality proof is the raw-byte verification above). - The file's safetensors header carries provenance metadata (
source_file,source_sha256,source_revision), so it stays self-describing independent of this PR thread.
Cost and layout, for your call: this adds ~6.4 GB alongside the existing model.pt, roughly doubling the repo. It mirrors multi-modal-embed-small's layout, which already ships both formats as a single root-level model.safetensors. model.pt and the packaged loader are untouched and keep working as-is; and if you ever prefer a single canonical format, the functional smoke above shows the packaged loader works from the safetensors file too. If you'd rather re-export from your training stack instead, happy to close this in favor of that.
If useful, I can follow up with a one-line addition to the README's file inventory and a load_state_dict-from-safetensors variant of the usage snippet.