Add model.safetensors (bit-exact conversion of model.pt)

#2
by shraderdm - opened

This adds a safetensors export of the existing model.pt so the checkpoint can be consumed outside a Python runtime by safetensors-native loaders such as candle's VarBuilder - the load path vllm-project/semantic-router's candle-binding uses. safetensors also loads without pickle deserialization, which some production environments require.

The conversion is bit-exact at the tensor level, not a re-encode:

  • Loaded model.pt (revision e21cde3ccc414c56f504b322662f42c603a939ee, the current main) with torch.load(..., weights_only=True) and saved with safetensors.torch.save_file. All 1,393 tensors carried over with names, shapes, and dtypes unchanged. The mixed precision is preserved exactly as stored: the mmbert text tower is bfloat16; the full SigLIP2 encoder (vision tower plus its paired text encoder, which the checkpoint bundles), the Whisper audio tower, and the projection heads are float32.
  • Bit-exact verification: fresh reload of the written file, every tensor compared against the original as raw bytes (flat uint8 views, so even -0.0 and NaN bit patterns would count), all identical.
  • Functional smoke on top of that: built MultiModalSentenceEmbedder twice from the packaged src/hf_st_mm code, one loaded from model.pt and one from model.safetensors (strict load_state_dict, missing=0 unexpected=0 both), and encoded the same synthetic text + image + audio fixtures. Embeddings are bitwise identical (max abs diff 0.0).
  • Rust receipt: loaded the file with candle's VarBuilder::from_mmaped_safetensors (candle-nn 0.10.2) and the raw mmap API; all 1,393 tensors enumerate, and one f32 and one bf16 tensor materialize with correct shapes and dtypes, their first values spot-matching the PyTorch reference exactly (the all-tensor equality proof is the raw-byte verification above).
  • The file's safetensors header carries provenance metadata (source_file, source_sha256, source_revision), so it stays self-describing independent of this PR thread.

Cost and layout, for your call: this adds ~6.4 GB alongside the existing model.pt, roughly doubling the repo. It mirrors multi-modal-embed-small's layout, which already ships both formats as a single root-level model.safetensors. model.pt and the packaged loader are untouched and keep working as-is; and if you ever prefer a single canonical format, the functional smoke above shows the packaged loader works from the safetensors file too. If you'd rather re-export from your training stack instead, happy to close this in favor of that.

If useful, I can follow up with a one-line addition to the README's file inventory and a load_state_dict-from-safetensors variant of the usage snippet.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment