wpferrell
/

phi-3.5-mini-instruct-bigsmall

@@ -55,20 +55,25 @@ model = BigSmallStreamingModel.from_pretrained(
 Uses up to ~12× less VRAM than standard loading by streaming layers on demand.
-## Stream straight from the Hub (no disk)
-```python
-import bigsmall
-state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
-```
-Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).
 ## Decompress to safetensors
-```bash
-pip install "bigsmall>=3.14.1"
-bigsmall decompress wpferrell/phi-3.5-mini-instruct-bigsmall -o phi-3.5-mini-instruct-bigsmall/
 ```
 ## Original model

 Uses up to ~12× less VRAM than standard loading by streaming layers on demand.
+## Stream straight from the Hub (no disk)
+```python
+import bigsmall
+state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
+```
+Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).
 ## Decompress to safetensors
+```python
+import bigsmall
+from safetensors.torch import save_file
+# bigsmall decompress works on local .bs files, not Hub repos, so
+# stream the weights from the Hub and write them out as safetensors.
+state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
+save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
 ```
 ## Original model