wpferrell commited on
Commit
9970eee
·
verified ·
1 Parent(s): 840d2b4

docs: 2.4.0 version-compat note + hardware-guide refresh

Browse files
Files changed (1) hide show
  1. README.md +16 -10
README.md CHANGED
@@ -16,35 +16,41 @@ Compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) -- decompresse
16
 
17
  ## Quick start
18
 
19
- `ash
20
  pip install bigsmall
21
- `
22
 
23
- `python
 
 
 
 
 
24
  import bigsmall
25
  bigsmall.install_hook()
26
  from transformers import AutoModelForCausalLM
27
  model = AutoModelForCausalLM.from_pretrained("wpferrell/gpt2-bigsmall")
28
- `
29
 
30
  ## Streaming loader -- run on any hardware
31
 
32
  BigSmall's streaming loader decompresses one layer at a time directly into VRAM. Peak memory is one layer -- not the whole model. A 4 GB GPU can run Mistral 7B losslessly.
33
 
34
- `python
35
  from bigsmall import StreamingLoader
36
  from transformers import AutoModelForCausalLM
37
 
38
  with StreamingLoader("wpferrell/gpt2-bigsmall", device="cuda") as loader:
39
  model = loader.load_model(AutoModelForCausalLM)
40
- `
41
 
42
  | Your GPU | Models you can run |
43
  |----------|--------------------|
44
- | 2 GB | Small models, GPT-2, Gemma 270M |
45
- | 4 GB | Mistral 7B, Llama 3.1 8B, Gemma 2B, Llama 3.2 3B |
46
- | 8 GB | Qwen 2.5 14B, Gemma 2 9B |
47
- | 24 GB | Llama 70B, Qwen 72B, DeepSeek V4-Flash |
 
48
  | CPU only | Everything -- slower but full quality |
49
 
50
  BigSmall is the only lossless compression tool with a streaming loader. DFloat11 and ZipNN load the full model into memory.
 
16
 
17
  ## Quick start
18
 
19
+ ```bash
20
  pip install bigsmall
21
+ ```
22
 
23
+ > **Version compatibility:** Models compressed with `bigsmall` 2.4.0+ may use
24
+ > container format v2 for high-kurtosis tensors and require `bigsmall >= 2.4.0`
25
+ > to decompress. Run `pip install --upgrade bigsmall` to update.
26
+
27
+
28
+ ```python
29
  import bigsmall
30
  bigsmall.install_hook()
31
  from transformers import AutoModelForCausalLM
32
  model = AutoModelForCausalLM.from_pretrained("wpferrell/gpt2-bigsmall")
33
+ ```
34
 
35
  ## Streaming loader -- run on any hardware
36
 
37
  BigSmall's streaming loader decompresses one layer at a time directly into VRAM. Peak memory is one layer -- not the whole model. A 4 GB GPU can run Mistral 7B losslessly.
38
 
39
+ ```python
40
  from bigsmall import StreamingLoader
41
  from transformers import AutoModelForCausalLM
42
 
43
  with StreamingLoader("wpferrell/gpt2-bigsmall", device="cuda") as loader:
44
  model = loader.load_model(AutoModelForCausalLM)
45
+ ```
46
 
47
  | Your GPU | Models you can run |
48
  |----------|--------------------|
49
+ | 2 GB | GPT-2, Gemma 270M |
50
+ | 4 GB | Llama 3.2 3B, Mistral 7B, Gemma 2B, Llama 3.1 8B |
51
+ | 8 GB | Qwen 2.5 14B, Gemma 2 9B, Phi-3.5 Mini |
52
+ | 12 GB | Qwen 2.5 32B, Gemma 3 12B |
53
+ | 24 GB | Llama 70B, Qwen 72B, Gemma 3 27B, DeepSeek V4-Flash |
54
  | CPU only | Everything -- slower but full quality |
55
 
56
  BigSmall is the only lossless compression tool with a streaming loader. DFloat11 and ZipNN load the full model into memory.