wpferrell
/

phi-3.5-mini-instruct-bigsmall

Model card Files Files and versions

phi-3.5-mini-instruct-bigsmall / README.md

wpferrell's picture

Bump bigsmall version pin to >=3.14.4

3fde465 verified 1 day ago

|

history blame contribute delete

3.29 kB

	---
	license: mit
	tags:
	- bigsmall
	- compressed
	- lossless
	---

	[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279247.svg)](https://doi.org/10.5281/zenodo.20279247)

	# Phi-3.5 Mini Instruct — Lossless Compressed

	> 7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.

	## Use it in 2 lines

	```bash
	pip install "bigsmall>=3.14.4"
	```

	```python
	from transformers import AutoModelForCausalLM
	model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall")
	```

	It works exactly like loading the original model. No code changes needed.

	## Size comparison

	\| \| Size \|
	\|---\|---\|
	\| Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) \| 7.12 GB \|
	\| This compressed version \| 4.67 GB \|
	\| Saved \| 2.45 GB (34%) \|

	## What "lossless" means

	Every weight is mathematically identical to the original model.

	- Not quantized. Quantization rounds weights and changes model behaviour.
	- Not pruned. Pruning removes parts of the model.
	- Bit-for-bit identical. md5 is verified on every tensor at decompression.

	## Low-VRAM streaming

	```python
	from bigsmall import BigSmallStreamingModel

	model = BigSmallStreamingModel.from_pretrained(
	"wpferrell/phi-3.5-mini-instruct-bigsmall",
	device="cuda",
	lru_max_vram_gb=2.0,
	)
	```

	Uses up to ~12× less VRAM than standard loading by streaming layers on demand.

	## Stream straight from the Hub (no disk)

	```python
	import bigsmall
	state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
	```

	Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).

	## Decompress to safetensors

	```python
	import bigsmall
	from safetensors.torch import save_file

	# bigsmall decompress works on local .bs files, not Hub repos, so
	# stream the weights from the Hub and write them out as safetensors.
	state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
	save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
	```

	## Original model

	This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged.

	## Want to compress your own model?

	```bash
	pip install "bigsmall>=3.14.4"
	bigsmall compress my-model/ -o my-model.bs
	```

	See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs.

	## License

	- Model weights: mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).
	- BigSmall format: [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use.
	- Commercial SaaS licensing: wpferrell@gmail.com

	## Citation

	```bibtex
	@misc{bigsmall2026,
	title={BigSmall: Lossless Neural Network Weight Compression},
	author={Ferrell, Will},
	year={2026},
	doi={10.5281/zenodo.20279247},
	url={https://doi.org/10.5281/zenodo.20279247}
	}
	```

	## Requires

	`bigsmall >= 3.14.4` for the latest features. Earlier versions (>= 3.0.0) can still decode this model.