wpferrell commited on
Commit
e1bad3d
·
verified ·
1 Parent(s): 5392a48

docs: release-block standard card update

Browse files
Files changed (1) hide show
  1. README.md +55 -76
README.md CHANGED
@@ -1,76 +1,55 @@
1
- ---
2
- license: gemma
3
- tags:
4
- - bigsmall
5
- - compression
6
- - lossless
7
- - gemma
8
- - google
9
- ---
10
-
11
- # Gemma 2 2B (BigSmall compressed)
12
-
13
- **9.74 GB -> 6.37 GB (BF16). Lossless. Zero inference overhead. Any hardware.**
14
-
15
- Compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) -- decompresses once at load time, runs at full native speed. Every weight is bit-identical to the original.
16
-
17
- ## Quick start
18
-
19
- ```bash
20
- pip install bigsmall
21
- ```
22
-
23
- > **Version compatibility:** Models compressed with `bigsmall` 2.4.0+ may use
24
- > container format v2 for high-kurtosis tensors and require `bigsmall >= 2.4.0`
25
- > to decompress. Run `pip install --upgrade bigsmall` to update.
26
-
27
-
28
- ```python
29
- import bigsmall
30
- bigsmall.install_hook()
31
- from transformers import AutoModelForCausalLM
32
- model = AutoModelForCausalLM.from_pretrained("wpferrell/gemma-2-2b-bigsmall")
33
- ```
34
-
35
- ## Streaming loader -- run on any hardware
36
-
37
- BigSmall's streaming loader decompresses one layer at a time directly into VRAM. Peak memory is one layer -- not the whole model. A 4 GB GPU can run Mistral 7B losslessly.
38
-
39
- ```python
40
- from bigsmall import StreamingLoader
41
- from transformers import AutoModelForCausalLM
42
-
43
- with StreamingLoader("wpferrell/gemma-2-2b-bigsmall", device="cuda") as loader:
44
- model = loader.load_model(AutoModelForCausalLM)
45
- ```
46
-
47
- | Your GPU | Models you can run |
48
- |----------|--------------------|
49
- | 2 GB | GPT-2, Gemma 270M |
50
- | 4 GB | Llama 3.2 3B, Mistral 7B, Gemma 2B, Llama 3.1 8B |
51
- | 8 GB | Qwen 2.5 14B, Gemma 2 9B, Phi-3.5 Mini |
52
- | 12 GB | Qwen 2.5 32B, Gemma 3 12B |
53
- | 24 GB | Llama 70B, Qwen 72B, Gemma 3 27B, DeepSeek V4-Flash |
54
- | CPU only | Everything -- slower but full quality |
55
-
56
- BigSmall is the only lossless compression tool with a streaming loader. DFloat11 and ZipNN load the full model into memory.
57
-
58
-
59
- ## Why BigSmall vs DFloat11
60
-
61
- | | BigSmall | DFloat11 |
62
- |--|--|--|
63
- | Inference overhead | **None** | ~2x at batch=1 |
64
- | Hardware | **CPU, Apple Silicon, AMD, any GPU** | CUDA only |
65
- | FP32 support | **Yes** | No |
66
- | Fine-tuning safe | **Yes** | No |
67
- | Streaming loader | **Yes -- peak RAM < 2 GB** | No |
68
-
69
- ## Compression stats
70
-
71
- | Original | Compressed | Ratio | Format | Verified |
72
- |----------|------------|-------|--------|---------|
73
- | 9.74 GB | 6.37 GB | 65.4% | BF16 | md5 every tensor |
74
-
75
- - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
76
- - All models: [huggingface.co/wpferrell](https://huggingface.co/wpferrell)
 
1
+ ---
2
+ license: gemma
3
+ tags:
4
+ - bigsmall
5
+ - compressed
6
+ - lossless
7
+ ---
8
+
9
+ # Gemma 2 2B — BigSmall Compressed
10
+
11
+ Lossless compressed version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b).
12
+ **82.6% smaller. Bit-identical weights. No quality loss.**
13
+
14
+ ## Quick start
15
+
16
+ ```bash
17
+ pip install bigsmall
18
+ ```
19
+
20
+ ```python
21
+ from transformers import AutoModelForCausalLM, AutoTokenizer
22
+
23
+ model = AutoModelForCausalLM.from_pretrained("wpferrell/gemma-2-2b-bigsmall")
24
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b")
25
+ ```
26
+
27
+ ## Low-VRAM streaming
28
+
29
+ ```python
30
+ from bigsmall import BigSmallStreamingModel
31
+
32
+ model = BigSmallStreamingModel.from_pretrained("wpferrell/gemma-2-2b-bigsmall")
33
+ ```
34
+
35
+ ## Details
36
+
37
+ | | Value |
38
+ |---|---|
39
+ | Original model | [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) |
40
+ | Original size | 9.8 GB |
41
+ | Compressed size | 8.09 GB |
42
+ | Compression ratio | 82.6% of original |
43
+ | Format | BigSmall lossless (.bs) |
44
+ | Reconstruction | Bit-identical, md5-verified |
45
+ | Requires | bigsmall >= 3.0.0 |
46
+
47
+ ## All pre-compressed models
48
+
49
+ See [wpferrell on HuggingFace](https://huggingface.co/wpferrell) for all available models.
50
+
51
+ ## License
52
+
53
+ Model weights: same license as the original model ([google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)).
54
+ BigSmall format: [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) free for personal, research, and internal commercial use.
55
+ Commercial/SaaS licensing: wpferrell@gmail.com