wpferrell's picture
Bump bigsmall version pin to >=3.14.4
3fde465 verified
---
license: mit
tags:
- bigsmall
- compressed
- lossless
---
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279247.svg)](https://doi.org/10.5281/zenodo.20279247)
# Phi-3.5 Mini Instruct — Lossless Compressed
> **7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.**
## Use it in 2 lines
```bash
pip install "bigsmall>=3.14.4"
```
```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall")
```
It works exactly like loading the original model. No code changes needed.
## Size comparison
| | Size |
|---|---|
| Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) | 7.12 GB |
| This compressed version | 4.67 GB |
| Saved | 2.45 GB (34%) |
## What "lossless" means
Every weight is mathematically identical to the original model.
- **Not quantized.** Quantization rounds weights and changes model behaviour.
- **Not pruned.** Pruning removes parts of the model.
- **Bit-for-bit identical.** md5 is verified on every tensor at decompression.
## Low-VRAM streaming
```python
from bigsmall import BigSmallStreamingModel
model = BigSmallStreamingModel.from_pretrained(
"wpferrell/phi-3.5-mini-instruct-bigsmall",
device="cuda",
lru_max_vram_gb=2.0,
)
```
Uses up to ~12× less VRAM than standard loading by streaming layers on demand.
## Stream straight from the Hub (no disk)
```python
import bigsmall
state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
```
Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).
## Decompress to safetensors
```python
import bigsmall
from safetensors.torch import save_file
# bigsmall decompress works on local .bs files, not Hub repos, so
# stream the weights from the Hub and write them out as safetensors.
state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
```
## Original model
This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged.
## Want to compress your own model?
```bash
pip install "bigsmall>=3.14.4"
bigsmall compress my-model/ -o my-model.bs
```
See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs.
## License
- **Model weights:** mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).
- **BigSmall format:** [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use.
- **Commercial SaaS licensing:** wpferrell@gmail.com
## Citation
```bibtex
@misc{bigsmall2026,
title={BigSmall: Lossless Neural Network Weight Compression},
author={Ferrell, Will},
year={2026},
doi={10.5281/zenodo.20279247},
url={https://doi.org/10.5281/zenodo.20279247}
}
```
## Requires
`bigsmall >= 3.14.4` for the latest features. Earlier versions (>= 3.0.0) can still decode this model.