wpferrell
/

gpt2-bigsmall

@@ -16,35 +16,41 @@ Compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) -- decompresse
 ## Quick start
-`ash
 pip install bigsmall
-`
-`python
 import bigsmall
 bigsmall.install_hook()
 from transformers import AutoModelForCausalLM
 model = AutoModelForCausalLM.from_pretrained("wpferrell/gpt2-bigsmall")
-`
 ## Streaming loader -- run on any hardware
 BigSmall's streaming loader decompresses one layer at a time directly into VRAM. Peak memory is one layer -- not the whole model. A 4 GB GPU can run Mistral 7B losslessly.
-`python
 from bigsmall import StreamingLoader
 from transformers import AutoModelForCausalLM
 with StreamingLoader("wpferrell/gpt2-bigsmall", device="cuda") as loader:
     model = loader.load_model(AutoModelForCausalLM)
-`
 | Your GPU | Models you can run |
 |----------|--------------------|
-| 2 GB | Small models, GPT-2, Gemma 270M |
-| 4 GB | Mistral 7B, Llama 3.1 8B, Gemma 2B, Llama 3.2 3B |
-| 8 GB | Qwen 2.5 14B, Gemma 2 9B |
-| 24 GB | Llama 70B, Qwen 72B, DeepSeek V4-Flash |
 | CPU only | Everything -- slower but full quality |
 BigSmall is the only lossless compression tool with a streaming loader. DFloat11 and ZipNN load the full model into memory.

 ## Quick start
+```bash
 pip install bigsmall
+```
+> **Version compatibility:** Models compressed with `bigsmall` 2.4.0+ may use
+> container format v2 for high-kurtosis tensors and require `bigsmall >= 2.4.0`
+> to decompress. Run `pip install --upgrade bigsmall` to update.
+```python
 import bigsmall
 bigsmall.install_hook()
 from transformers import AutoModelForCausalLM
 model = AutoModelForCausalLM.from_pretrained("wpferrell/gpt2-bigsmall")
+```
 ## Streaming loader -- run on any hardware
 BigSmall's streaming loader decompresses one layer at a time directly into VRAM. Peak memory is one layer -- not the whole model. A 4 GB GPU can run Mistral 7B losslessly.
+```python
 from bigsmall import StreamingLoader
 from transformers import AutoModelForCausalLM
 with StreamingLoader("wpferrell/gpt2-bigsmall", device="cuda") as loader:
     model = loader.load_model(AutoModelForCausalLM)
+```
 | Your GPU | Models you can run |
 |----------|--------------------|
+| 2 GB | GPT-2, Gemma 270M |
+| 4 GB | Llama 3.2 3B, Mistral 7B, Gemma 2B, Llama 3.1 8B |
+| 8 GB | Qwen 2.5 14B, Gemma 2 9B, Phi-3.5 Mini |
+| 12 GB | Qwen 2.5 32B, Gemma 3 12B |
+| 24 GB | Llama 70B, Qwen 72B, Gemma 3 27B, DeepSeek V4-Flash |
 | CPU only | Everything -- slower but full quality |
 BigSmall is the only lossless compression tool with a streaming loader. DFloat11 and ZipNN load the full model into memory.