Add model.safetensors (bit-exact conversion of model.pt)

by shraderdm - opened 23 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-0

shraderdm

23 days ago

This adds a safetensors export of the existing model.pt so the checkpoint can be consumed outside a Python runtime by safetensors-native loaders such as candle's VarBuilder - the load path vllm-project/semantic-router's candle-binding uses. safetensors also loads without pickle deserialization, which some production environments require.

The conversion is bit-exact at the tensor level, not a re-encode:

Loaded model.pt (revision e21cde3ccc414c56f504b322662f42c603a939ee, the current main) with torch.load(..., weights_only=True) and saved with safetensors.torch.save_file. All 1,393 tensors carried over with names, shapes, and dtypes unchanged. The mixed precision is preserved exactly as stored: the mmbert text tower is bfloat16; the full SigLIP2 encoder (vision tower plus its paired text encoder, which the checkpoint bundles), the Whisper audio tower, and the projection heads are float32.
Bit-exact verification: fresh reload of the written file, every tensor compared against the original as raw bytes (flat uint8 views, so even -0.0 and NaN bit patterns would count), all identical.
Functional smoke on top of that: built MultiModalSentenceEmbedder twice from the packaged src/hf_st_mm code, one loaded from model.pt and one from model.safetensors (strict load_state_dict, missing=0 unexpected=0 both), and encoded the same synthetic text + image + audio fixtures. Embeddings are bitwise identical (max abs diff 0.0).
Rust receipt: loaded the file with candle's VarBuilder::from_mmaped_safetensors (candle-nn 0.10.2) and the raw mmap API; all 1,393 tensors enumerate, and one f32 and one bf16 tensor materialize with correct shapes and dtypes, their first values spot-matching the PyTorch reference exactly (the all-tensor equality proof is the raw-byte verification above).
The file's safetensors header carries provenance metadata (source_file, source_sha256, source_revision), so it stays self-describing independent of this PR thread.

Cost and layout, for your call: this adds ~6.4 GB alongside the existing model.pt, roughly doubling the repo. It mirrors multi-modal-embed-small's layout, which already ships both formats as a single root-level model.safetensors. model.pt and the packaged loader are untouched and keep working as-is; and if you ever prefer a single canonical format, the functional smoke above shows the packaged loader works from the safetensors file too. If you'd rather re-export from your training stack instead, happy to close this in favor of that.

If useful, I can follow up with a one-line addition to the README's file inventory and a load_state_dict-from-safetensors variant of the usage snippet.

Add model.safetensors (bit-exact conversion of model.pt)2dcbafa5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment