wpferrell commited on
Commit
f7335f8
·
verified ·
1 Parent(s): 307423d

Add full competitive advantages vs quantization, DFloat11, ZipNN

Browse files
Files changed (1) hide show
  1. README.md +19 -13
README.md CHANGED
@@ -10,23 +10,29 @@ tags:
10
 
11
  # GPT-2 117M (BigSmall compressed)
12
 
13
- **0.55 GB -> 0.39 GB (FP32). Full quality -- not quantization. Zero inference overhead.**
14
 
15
- Losslessly compressed with [BigSmall](https://github.com/wpferrell/Bigsmall). Every weight is bit-identical to the original. Decompresses once at load time then runs at full native speed -- no inference overhead, ever.
16
 
17
- ## BigSmall vs DFloat11 -- what is the difference?
18
 
19
- Both are lossless. The difference is *when* decompression happens:
 
 
 
 
20
 
21
  | | BigSmall | DFloat11 |
22
  |--|--|--|
23
- | Decompresses | Once at load time | Every forward pass on GPU |
24
- | Inference overhead | **None** | ~2x slower at batch=1 |
25
  | Hardware | **CPU, Apple Silicon, AMD, any GPU** | CUDA only |
26
- | Use case | Smaller downloads, faster loads | Less VRAM during inference |
 
 
27
 
28
- **Use BigSmall if** you want to download less, load faster, and run at full native speed on any hardware.
29
- **Use DFloat11 if** you need the model to stay compressed in GPU memory during inference and have a CUDA GPU.
30
 
31
 
32
  ## Install
@@ -35,7 +41,7 @@ Both are lossless. The difference is *when* decompression happens:
35
  pip install bigsmall
36
  `
37
 
38
- ## Load (transparent -- works like any HuggingFace model)
39
 
40
  `python
41
  import bigsmall
@@ -45,7 +51,7 @@ from transformers import AutoModelForCausalLM
45
  model = AutoModelForCausalLM.from_pretrained("wpferrell/gpt2-bigsmall")
46
  `
47
 
48
- ## Or stream layer by layer (peak RAM under 2GB even for 7B models)
49
 
50
  `python
51
  from bigsmall import StreamingLoader
@@ -57,9 +63,9 @@ with StreamingLoader("wpferrell/gpt2-bigsmall", device="cuda") as loader:
57
 
58
  ## Compression stats
59
 
60
- | Original | Compressed | Ratio | Format | Lossless |
61
  |----------|------------|-------|--------|---------|
62
- | 0.55 GB | 0.39 GB | 70.9% | FP32 | md5 verified every tensor |
63
 
64
  - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
65
  - PyPI: pip install bigsmall
 
10
 
11
  # GPT-2 117M (BigSmall compressed)
12
 
13
+ **0.55 GB -> 0.39 GB (FP32). Lossless. Zero inference overhead. Any hardware.**
14
 
15
+ Compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) -- decompresses once at load time, then runs at full native speed. Every weight is bit-identical to the original.
16
 
17
+ ## Why BigSmall
18
 
19
+ ### vs quantization (llama.cpp, GGUF, AWQ, bitsandbytes)
20
+ Quantization permanently degrades weights. BigSmall is lossless -- bit-identical weights, no accuracy loss, fine-tuning safe, fully reproducible.
21
+
22
+ ### vs DFloat11 (runtime lossless compression)
23
+ DFloat11 keeps weights compressed during inference -- saves VRAM but adds ~2x overhead at batch=1, CUDA only. BigSmall decompresses once at load time and runs at full native speed on any hardware.
24
 
25
  | | BigSmall | DFloat11 |
26
  |--|--|--|
27
+ | Compression ratio (BF16) | **65-66%** | ~70% |
28
+ | Inference overhead | **None** | ~2x at batch=1 |
29
  | Hardware | **CPU, Apple Silicon, AMD, any GPU** | CUDA only |
30
+ | FP32 / FP16 / FP8 support | **Yes** | BF16 only |
31
+ | Fine-tuning safe | **Yes** | No |
32
+ | Streaming loader (< 2GB RAM) | **Yes** | No |
33
 
34
+ ### vs ZipNN (storage lossless compression)
35
+ Same category as BigSmall -- decompresses at load time. BigSmall compresses better (65% vs 67% BF16) and supports more formats. BigSmall also has a streaming loader so you can run 70B models with under 2GB peak RAM.
36
 
37
 
38
  ## Install
 
41
  pip install bigsmall
42
  `
43
 
44
+ ## Load
45
 
46
  `python
47
  import bigsmall
 
51
  model = AutoModelForCausalLM.from_pretrained("wpferrell/gpt2-bigsmall")
52
  `
53
 
54
+ ## Stream layer by layer (peak RAM under 2GB even for 7B models)
55
 
56
  `python
57
  from bigsmall import StreamingLoader
 
63
 
64
  ## Compression stats
65
 
66
+ | Original | Compressed | Ratio | Format | Verified |
67
  |----------|------------|-------|--------|---------|
68
+ | 0.55 GB | 0.39 GB | 70.9% | FP32 | md5 every tensor |
69
 
70
  - GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
71
  - PyPI: pip install bigsmall