wpferrell
/

phi-3.5-mini-instruct-bigsmall

@@ -1,112 +1,112 @@
----
-license: mit
-tags:
-  - bigsmall
-  - compressed
-  - lossless
----
-[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279247.svg)](https://doi.org/10.5281/zenodo.20279247)
-# Phi-3.5 Mini Instruct — Lossless Compressed
-> **7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.**
-## Use it in 2 lines
-```bash
-pip install "bigsmall>=3.14.1"
-```
-```python
-from transformers import AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall")
-```
-It works exactly like loading the original model. No code changes needed.
-## Size comparison
-| | Size |
-|---|---|
-| Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) | 7.12 GB |
-| This compressed version | 4.67 GB |
-| Saved | 2.45 GB (34%) |
-## What "lossless" means
-Every weight is mathematically identical to the original model.
-- **Not quantized.** Quantization rounds weights and changes model behaviour.
-- **Not pruned.** Pruning removes parts of the model.
-- **Bit-for-bit identical.** md5 is verified on every tensor at decompression.
-## Low-VRAM streaming
-```python
-from bigsmall import BigSmallStreamingModel
-model = BigSmallStreamingModel.from_pretrained(
-    "wpferrell/phi-3.5-mini-instruct-bigsmall",
-    device="cuda",
-    lru_max_vram_gb=2.0,
-)
-```
-Uses up to ~12× less VRAM than standard loading by streaming layers on demand.
-## Stream straight from the Hub (no disk)
-```python
-import bigsmall
-state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
-```
-Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).
-## Decompress to safetensors
-```python
-import bigsmall
-from safetensors.torch import save_file
-# bigsmall decompress works on local .bs files, not Hub repos, so
-# stream the weights from the Hub and write them out as safetensors.
-state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
-save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
-```
-## Original model
-This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged.
-## Want to compress your own model?
-```bash
-pip install "bigsmall>=3.14.1"
-bigsmall compress my-model/ -o my-model.bs
-```
-See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs.
-## License
-- **Model weights:** mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).
-- **BigSmall format:** [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use.
-- **Commercial SaaS licensing:** wpferrell@gmail.com
-## Citation
-```bibtex
-@misc{bigsmall2026,
-  title={BigSmall: Lossless Neural Network Weight Compression},
-  author={Ferrell, Will},
-  year={2026},
-  doi={10.5281/zenodo.20279247},
-  url={https://doi.org/10.5281/zenodo.20279247}
-}
-```
-## Requires
-`bigsmall >= 3.14.1` for the latest features. Earlier versions (>= 3.0.0) can still decode this model.

+---
+license: mit
+tags:
+  - bigsmall
+  - compressed
+  - lossless
+---
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279247.svg)](https://doi.org/10.5281/zenodo.20279247)
+# Phi-3.5 Mini Instruct — Lossless Compressed
+> **7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.**
+## Use it in 2 lines
+```bash
+pip install "bigsmall>=3.14.4"
+```
+```python
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall")
+```
+It works exactly like loading the original model. No code changes needed.
+## Size comparison
+| | Size |
+|---|---|
+| Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) | 7.12 GB |
+| This compressed version | 4.67 GB |
+| Saved | 2.45 GB (34%) |
+## What "lossless" means
+Every weight is mathematically identical to the original model.
+- **Not quantized.** Quantization rounds weights and changes model behaviour.
+- **Not pruned.** Pruning removes parts of the model.
+- **Bit-for-bit identical.** md5 is verified on every tensor at decompression.
+## Low-VRAM streaming
+```python
+from bigsmall import BigSmallStreamingModel
+model = BigSmallStreamingModel.from_pretrained(
+    "wpferrell/phi-3.5-mini-instruct-bigsmall",
+    device="cuda",
+    lru_max_vram_gb=2.0,
+)
+```
+Uses up to ~12× less VRAM than standard loading by streaming layers on demand.
+## Stream straight from the Hub (no disk)
+```python
+import bigsmall
+state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
+```
+Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).
+## Decompress to safetensors
+```python
+import bigsmall
+from safetensors.torch import save_file
+# bigsmall decompress works on local .bs files, not Hub repos, so
+# stream the weights from the Hub and write them out as safetensors.
+state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
+save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
+```
+## Original model
+This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged.
+## Want to compress your own model?
+```bash
+pip install "bigsmall>=3.14.4"
+bigsmall compress my-model/ -o my-model.bs
+```
+See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs.
+## License
+- **Model weights:** mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).
+- **BigSmall format:** [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use.
+- **Commercial SaaS licensing:** wpferrell@gmail.com
+## Citation
+```bibtex
+@misc{bigsmall2026,
+  title={BigSmall: Lossless Neural Network Weight Compression},
+  author={Ferrell, Will},
+  year={2026},
+  doi={10.5281/zenodo.20279247},
+  url={https://doi.org/10.5281/zenodo.20279247}
+}
+```
+## Requires
+`bigsmall >= 3.14.4` for the latest features. Earlier versions (>= 3.0.0) can still decode this model.