| --- |
| license: mit |
| tags: |
| - bigsmall |
| - compressed |
| - lossless |
| --- |
| |
| [](https://doi.org/10.5281/zenodo.20279247) |
|
|
| # Phi-3.5 Mini Instruct — Lossless Compressed |
|
|
| > **7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.** |
|
|
| ## Use it in 2 lines |
|
|
| ```bash |
| pip install "bigsmall>=3.14.4" |
| ``` |
|
|
| ```python |
| from transformers import AutoModelForCausalLM |
| model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall") |
| ``` |
|
|
| It works exactly like loading the original model. No code changes needed. |
|
|
| ## Size comparison |
|
|
| | | Size | |
| |---|---| |
| | Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) | 7.12 GB | |
| | This compressed version | 4.67 GB | |
| | Saved | 2.45 GB (34%) | |
|
|
| ## What "lossless" means |
|
|
| Every weight is mathematically identical to the original model. |
|
|
| - **Not quantized.** Quantization rounds weights and changes model behaviour. |
| - **Not pruned.** Pruning removes parts of the model. |
| - **Bit-for-bit identical.** md5 is verified on every tensor at decompression. |
|
|
| ## Low-VRAM streaming |
|
|
| ```python |
| from bigsmall import BigSmallStreamingModel |
| |
| model = BigSmallStreamingModel.from_pretrained( |
| "wpferrell/phi-3.5-mini-instruct-bigsmall", |
| device="cuda", |
| lru_max_vram_gb=2.0, |
| ) |
| ``` |
|
|
| Uses up to ~12× less VRAM than standard loading by streaming layers on demand. |
|
|
| ## Stream straight from the Hub (no disk) |
|
|
| ```python |
| import bigsmall |
| state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu") |
| ``` |
|
|
| Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10). |
|
|
| ## Decompress to safetensors |
|
|
| ```python |
| import bigsmall |
| from safetensors.torch import save_file |
| |
| # bigsmall decompress works on local .bs files, not Hub repos, so |
| # stream the weights from the Hub and write them out as safetensors. |
| state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu") |
| save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors") |
| ``` |
|
|
| ## Original model |
|
|
| This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged. |
|
|
| ## Want to compress your own model? |
|
|
| ```bash |
| pip install "bigsmall>=3.14.4" |
| bigsmall compress my-model/ -o my-model.bs |
| ``` |
|
|
| See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs. |
|
|
| ## License |
|
|
| - **Model weights:** mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). |
| - **BigSmall format:** [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use. |
| - **Commercial SaaS licensing:** wpferrell@gmail.com |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{bigsmall2026, |
| title={BigSmall: Lossless Neural Network Weight Compression}, |
| author={Ferrell, Will}, |
| year={2026}, |
| doi={10.5281/zenodo.20279247}, |
| url={https://doi.org/10.5281/zenodo.20279247} |
| } |
| ``` |
|
|
| ## Requires |
|
|
| `bigsmall >= 3.14.4` for the latest features. Earlier versions (>= 3.0.0) can still decode this model. |
|
|