File size: 3,288 Bytes
3fde465
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: mit
tags:
  - bigsmall
  - compressed
  - lossless
---

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279247.svg)](https://doi.org/10.5281/zenodo.20279247)

# Phi-3.5 Mini Instruct — Lossless Compressed

> **7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.**

## Use it in 2 lines

```bash
pip install "bigsmall>=3.14.4"
```

```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall")
```

It works exactly like loading the original model. No code changes needed.

## Size comparison

| | Size |
|---|---|
| Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) | 7.12 GB |
| This compressed version | 4.67 GB |
| Saved | 2.45 GB (34%) |

## What "lossless" means

Every weight is mathematically identical to the original model.

- **Not quantized.** Quantization rounds weights and changes model behaviour.
- **Not pruned.** Pruning removes parts of the model.
- **Bit-for-bit identical.** md5 is verified on every tensor at decompression.

## Low-VRAM streaming

```python
from bigsmall import BigSmallStreamingModel

model = BigSmallStreamingModel.from_pretrained(
    "wpferrell/phi-3.5-mini-instruct-bigsmall",
    device="cuda",
    lru_max_vram_gb=2.0,
)
```

Uses up to ~12× less VRAM than standard loading by streaming layers on demand.

## Stream straight from the Hub (no disk)

```python
import bigsmall
state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
```

Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).

## Decompress to safetensors

```python
import bigsmall
from safetensors.torch import save_file

# bigsmall decompress works on local .bs files, not Hub repos, so
# stream the weights from the Hub and write them out as safetensors.
state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
```

## Original model

This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged.

## Want to compress your own model?

```bash
pip install "bigsmall>=3.14.4"
bigsmall compress my-model/ -o my-model.bs
```

See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs.

## License

- **Model weights:** mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).
- **BigSmall format:** [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use.
- **Commercial SaaS licensing:** wpferrell@gmail.com

## Citation

```bibtex
@misc{bigsmall2026,
  title={BigSmall: Lossless Neural Network Weight Compression},
  author={Ferrell, Will},
  year={2026},
  doi={10.5281/zenodo.20279247},
  url={https://doi.org/10.5281/zenodo.20279247}
}
```

## Requires

`bigsmall >= 3.14.4` for the latest features. Earlier versions (>= 3.0.0) can still decode this model.