wpferrell commited on
Commit
3fde465
·
verified ·
1 Parent(s): 6a0388e

Bump bigsmall version pin to >=3.14.4

Browse files
Files changed (1) hide show
  1. README.md +112 -112
README.md CHANGED
@@ -1,112 +1,112 @@
1
- ---
2
- license: mit
3
- tags:
4
- - bigsmall
5
- - compressed
6
- - lossless
7
- ---
8
-
9
- [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279247.svg)](https://doi.org/10.5281/zenodo.20279247)
10
-
11
- # Phi-3.5 Mini Instruct — Lossless Compressed
12
-
13
- > **7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.**
14
-
15
- ## Use it in 2 lines
16
-
17
- ```bash
18
- pip install "bigsmall>=3.14.1"
19
- ```
20
-
21
- ```python
22
- from transformers import AutoModelForCausalLM
23
- model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall")
24
- ```
25
-
26
- It works exactly like loading the original model. No code changes needed.
27
-
28
- ## Size comparison
29
-
30
- | | Size |
31
- |---|---|
32
- | Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) | 7.12 GB |
33
- | This compressed version | 4.67 GB |
34
- | Saved | 2.45 GB (34%) |
35
-
36
- ## What "lossless" means
37
-
38
- Every weight is mathematically identical to the original model.
39
-
40
- - **Not quantized.** Quantization rounds weights and changes model behaviour.
41
- - **Not pruned.** Pruning removes parts of the model.
42
- - **Bit-for-bit identical.** md5 is verified on every tensor at decompression.
43
-
44
- ## Low-VRAM streaming
45
-
46
- ```python
47
- from bigsmall import BigSmallStreamingModel
48
-
49
- model = BigSmallStreamingModel.from_pretrained(
50
- "wpferrell/phi-3.5-mini-instruct-bigsmall",
51
- device="cuda",
52
- lru_max_vram_gb=2.0,
53
- )
54
- ```
55
-
56
- Uses up to ~12× less VRAM than standard loading by streaming layers on demand.
57
-
58
- ## Stream straight from the Hub (no disk)
59
-
60
- ```python
61
- import bigsmall
62
- state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
63
- ```
64
-
65
- Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).
66
-
67
- ## Decompress to safetensors
68
-
69
- ```python
70
- import bigsmall
71
- from safetensors.torch import save_file
72
-
73
- # bigsmall decompress works on local .bs files, not Hub repos, so
74
- # stream the weights from the Hub and write them out as safetensors.
75
- state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
76
- save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
77
- ```
78
-
79
- ## Original model
80
-
81
- This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged.
82
-
83
- ## Want to compress your own model?
84
-
85
- ```bash
86
- pip install "bigsmall>=3.14.1"
87
- bigsmall compress my-model/ -o my-model.bs
88
- ```
89
-
90
- See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs.
91
-
92
- ## License
93
-
94
- - **Model weights:** mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).
95
- - **BigSmall format:** [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use.
96
- - **Commercial SaaS licensing:** wpferrell@gmail.com
97
-
98
- ## Citation
99
-
100
- ```bibtex
101
- @misc{bigsmall2026,
102
- title={BigSmall: Lossless Neural Network Weight Compression},
103
- author={Ferrell, Will},
104
- year={2026},
105
- doi={10.5281/zenodo.20279247},
106
- url={https://doi.org/10.5281/zenodo.20279247}
107
- }
108
- ```
109
-
110
- ## Requires
111
-
112
- `bigsmall >= 3.14.1` for the latest features. Earlier versions (>= 3.0.0) can still decode this model.
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - bigsmall
5
+ - compressed
6
+ - lossless
7
+ ---
8
+
9
+ [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20279247.svg)](https://doi.org/10.5281/zenodo.20279247)
10
+
11
+ # Phi-3.5 Mini Instruct — Lossless Compressed
12
+
13
+ > **7.12 GB → 4.67 GB (34% smaller). Bit-identical weights. Drop-in replacement.**
14
+
15
+ ## Use it in 2 lines
16
+
17
+ ```bash
18
+ pip install "bigsmall>=3.14.4"
19
+ ```
20
+
21
+ ```python
22
+ from transformers import AutoModelForCausalLM
23
+ model = AutoModelForCausalLM.from_pretrained("wpferrell/phi-3.5-mini-instruct-bigsmall")
24
+ ```
25
+
26
+ It works exactly like loading the original model. No code changes needed.
27
+
28
+ ## Size comparison
29
+
30
+ | | Size |
31
+ |---|---|
32
+ | Original ([microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)) | 7.12 GB |
33
+ | This compressed version | 4.67 GB |
34
+ | Saved | 2.45 GB (34%) |
35
+
36
+ ## What "lossless" means
37
+
38
+ Every weight is mathematically identical to the original model.
39
+
40
+ - **Not quantized.** Quantization rounds weights and changes model behaviour.
41
+ - **Not pruned.** Pruning removes parts of the model.
42
+ - **Bit-for-bit identical.** md5 is verified on every tensor at decompression.
43
+
44
+ ## Low-VRAM streaming
45
+
46
+ ```python
47
+ from bigsmall import BigSmallStreamingModel
48
+
49
+ model = BigSmallStreamingModel.from_pretrained(
50
+ "wpferrell/phi-3.5-mini-instruct-bigsmall",
51
+ device="cuda",
52
+ lru_max_vram_gb=2.0,
53
+ )
54
+ ```
55
+
56
+ Uses up to ~12× less VRAM than standard loading by streaming layers on demand.
57
+
58
+ ## Stream straight from the Hub (no disk)
59
+
60
+ ```python
61
+ import bigsmall
62
+ state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
63
+ ```
64
+
65
+ Decompresses directly from the HuggingFace CDN over HTTP range requests. With the default `cache=False`, no `.bs` file is ever written to disk (V10).
66
+
67
+ ## Decompress to safetensors
68
+
69
+ ```python
70
+ import bigsmall
71
+ from safetensors.torch import save_file
72
+
73
+ # bigsmall decompress works on local .bs files, not Hub repos, so
74
+ # stream the weights from the Hub and write them out as safetensors.
75
+ state_dict = bigsmall.stream_from_hub("wpferrell/phi-3.5-mini-instruct-bigsmall", device="cpu")
76
+ save_file(state_dict, "phi-3.5-mini-instruct-bigsmall.safetensors")
77
+ ```
78
+
79
+ ## Original model
80
+
81
+ This is a lossless-compressed copy of [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). All credit to the original authors. The weights are unchanged.
82
+
83
+ ## Want to compress your own model?
84
+
85
+ ```bash
86
+ pip install "bigsmall>=3.14.4"
87
+ bigsmall compress my-model/ -o my-model.bs
88
+ ```
89
+
90
+ See [github.com/wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall) for the full docs.
91
+
92
+ ## License
93
+
94
+ - **Model weights:** mit — same as [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).
95
+ - **BigSmall format:** [Elastic License 2.0](https://github.com/wpferrell/Bigsmall/blob/main/LICENSE) — free for personal, research, and commercial use.
96
+ - **Commercial SaaS licensing:** wpferrell@gmail.com
97
+
98
+ ## Citation
99
+
100
+ ```bibtex
101
+ @misc{bigsmall2026,
102
+ title={BigSmall: Lossless Neural Network Weight Compression},
103
+ author={Ferrell, Will},
104
+ year={2026},
105
+ doi={10.5281/zenodo.20279247},
106
+ url={https://doi.org/10.5281/zenodo.20279247}
107
+ }
108
+ ```
109
+
110
+ ## Requires
111
+
112
+ `bigsmall >= 3.14.4` for the latest features. Earlier versions (>= 3.0.0) can still decode this model.