wpferrell
/

gemma-2-2b-bigsmall

@@ -1,76 +1,55 @@
----
-license: gemma
-tags:
-  - bigsmall
-  - compression
-  - lossless
-  - gemma
-  - google
----
-# Gemma 2 2B (BigSmall compressed)
-**9.74 GB -> 6.37 GB (BF16). Lossless. Zero inference overhead. Any hardware.**
-Compressed with [BigSmall](https://github.com/wpferrell/Bigsmall) -- decompresses once at load time, runs at full native speed. Every weight is bit-identical to the original.
-## Quick start
-```bash
-pip install bigsmall
-```
-> **Version compatibility:** Models compressed with `bigsmall` 2.4.0+ may use
-> container format v2 for high-kurtosis tensors and require `bigsmall >= 2.4.0`
-> to decompress. Run `pip install --upgrade bigsmall` to update.
-```python
-import bigsmall
-bigsmall.install_hook()
-from transformers import AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained("wpferrell/gemma-2-2b-bigsmall")
-```
-## Streaming loader -- run on any hardware
-BigSmall's streaming loader decompresses one layer at a time directly into VRAM. Peak memory is one layer -- not the whole model. A 4 GB GPU can run Mistral 7B losslessly.
-```python
-from bigsmall import StreamingLoader
-from transformers import AutoModelForCausalLM
-with StreamingLoader("wpferrell/gemma-2-2b-bigsmall", device="cuda") as loader:
-    model = loader.load_model(AutoModelForCausalLM)
-```
-| Your GPU | Models you can run |
-|----------|--------------------|
-| 2 GB | GPT-2, Gemma 270M |
-| 4 GB | Llama 3.2 3B, Mistral 7B, Gemma 2B, Llama 3.1 8B |
-| 8 GB | Qwen 2.5 14B, Gemma 2 9B, Phi-3.5 Mini |
-| 12 GB | Qwen 2.5 32B, Gemma 3 12B |
-| 24 GB | Llama 70B, Qwen 72B, Gemma 3 27B, DeepSeek V4-Flash |
-| CPU only | Everything -- slower but full quality |
-BigSmall is the only lossless compression tool with a streaming loader. DFloat11 and ZipNN load the full model into memory.
-## Why BigSmall vs DFloat11
-| | BigSmall | DFloat11 |
-|--|--|--|
-| Inference overhead | **None** | ~2x at batch=1 |
-| Hardware | **CPU, Apple Silicon, AMD, any GPU** | CUDA only |
-| FP32 support | **Yes** | No |
-| Fine-tuning safe | **Yes** | No |
-| Streaming loader | **Yes -- peak RAM < 2 GB** | No |
-## Compression stats
-| Original | Compressed | Ratio | Format | Verified |
-|----------|------------|-------|--------|---------|
-| 9.74 GB | 6.37 GB | 65.4% | BF16 | md5 every tensor |
-- GitHub: [wpferrell/Bigsmall](https://github.com/wpferrell/Bigsmall)
-- All models: [huggingface.co/wpferrell](https://huggingface.co/wpferrell)

+---
+license: gemma
+tags:
+  - bigsmall
+  - compressed
+  - lossless
+---
+# Gemma 2 2B — BigSmall Compressed
+Lossless compressed version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b).
+**82.6% smaller. Bit-identical weights. No quality loss.**
+## Quick start
+```bash
+pip install bigsmall
+```
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("wpferrell/gemma-2-2b-bigsmall")
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b")
+```
+## Low-VRAM streaming
+```python
+from bigsmall import BigSmallStreamingModel
+model = BigSmallStreamingModel.from_pretrained("wpferrell/gemma-2-2b-bigsmall")
+```
+## Details
+| | Value |
+|---|---|
+| Original model | [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) |
+| Original size | 9.8 GB |
+| Compressed size | 8.09 GB |
+| Compression ratio | 82.6% of original |
+| Format | BigSmall lossless (.bs) |
+| Reconstruction | Bit-identical, md5-verified |
+| Requires | bigsmall >= 3.0.0 |
+## All pre-compressed models
+See [wpferrell on HuggingFace](https://huggingface.co/wpferrell) for all available models.
+## License
+Model weights: same license as the original model ([google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)).
+BigSmall format: [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — free for personal, research, and internal commercial use.
+Commercial/SaaS licensing: wpferrell@gmail.com