olmo2-1b-uc2p79
A patent-pending compressed reference variant of allenai/OLMo-2-0425-1B at the Track A 2.798 bits per weight cohort design target (this specific fit measures 2.7819 bpw effective = 2.7819 from global_bpw body Linears + 0.0059 from codec overhead, both decoded from ultracompress.json). Compressed via the UltraCompress row-overlay quantization method (USPTO 64/049,511 + 64/049,517 β patent pending).
Read this first β this repository ships in dual format.
model.safetensors(~2.7 GB) β FP16 reconstruction. Loadable directly viatransformers.from_pretrained. Use this if your runtime expects standard HF safetensors.model.uc.bin(~356 MB at 2.7819 bpw on-disk) β the actually-packed binary at the claimed sub-3-bpw operating point. Loadable viapip install ultracompress. This is the artifact whose disk size matches the headline compression number.Both files reconstruct the same compressed weights to within FP16 precision (verified bit-equivalent per the
pack_v17.pyround-trip protocol on this fit). Buyers pick based on runtime: enterprise inference platforms running standard transformers loaders use the safetensors; edge / on-device deployments using the UltraCompress runtime use the packed binary. The model card claims (2.798 bpw cohort design target, 2.7819 bpw measured on this fit) describe the information content of either file β the safetensors is bigger on disk but represents the same compressed model, not a different one.The
ultracompress.jsonmanifest declares both files in itsformatsblock with per-file SHA-256, souc infovalidates either format end-to-end.
Quick start
pip install ultracompress
uc pull SipsaLabs/olmo2-1b-uc2p79
uc info ./models/SipsaLabs_olmo2-1b-uc2p79
The CLI streams the artifact, validates the manifest (SHA-256 + size for every declared file), and surfaces the compression metadata in one read.
Or with huggingface_hub directly:
from huggingface_hub import snapshot_download
local = snapshot_download("SipsaLabs/olmo2-1b-uc2p79")
Loading the model
The substituted weights are stored in standard HF FP16 safetensors layout, so any transformers-compatible runtime can load the model. Sample:
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import torch
# Load the compressed weights from this repository
local = "./models/SipsaLabs_olmo2-1b-uc2p79"
cfg = AutoConfig.from_pretrained(local)
model = AutoModelForCausalLM.from_pretrained(
local,
dtype=torch.float16,
config=cfg,
).to("cuda")
# The base Qwen tokenizer is unchanged from allenai/OLMo-2-0425-1B.
# NOTE on trust_remote_code: we ship only pure quantized weights.
# `trust_remote_code=True` is therefore not needed for loading the local
# artifact. The flag IS still passed to the upstream tokenizer below because
# the base model's tokenizer uses it; that is the customer's choice to trust
# the upstream model author.
#
# We recommend loading it directly from the upstream OLMo-2-1B repo,
# which is the path that the `transformers` AutoTokenizer auto-resolves
# most cleanly across versions.
tok = AutoTokenizer.from_pretrained("allenai/OLMo-2-0425-1B", trust_remote_code=True)
prompt = "The capital of France is"
inputs = tok(prompt, return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))
What's in this artifact
| File | Size | Description |
|---|---|---|
model.safetensors |
~2.7 GB | FP16-reconstructed weights β direct transformers.from_pretrained compatibility |
model.uc.bin |
~491 MB | Packed UltraCompress binary at 2.7871 bpw on-disk β load via pip install ultracompress |
ultracompress.json |
<2 KB | Provenance manifest with method, bpw, base-model id, USPTO references, license name, per-file SHA-256, formats block declaring both weight files |
config.json |
<2 KB | Inherited from the base OLMo-2-1B model |
tokenizer.json / tokenizer_config.json / special_tokens_map.json / merges.txt / vocab.json / added_tokens.json / chat_template.jinja |
~14 MB | Tokenizer files copied from the base model |
LICENSE |
~7 KB | Sipsa Labs Research and Evaluation License v1.0 (full text) |
generation_config.json |
<1 KB | Inherited from base |
uc info ./models/SipsaLabs_olmo2-1b-uc2p79 will validate every entry in the manifest's files block against the actual on-disk size and SHA-256 β tamper-evidence you can read in one command.
Compression details
| Metric | Value |
|---|---|
| Method | UltraCompress row-overlay quantization (Track A) |
| Method version | v17hi |
| Operating-point bpw (cohort design target) | 2.798 |
| Measured effective bpw (this fit, packed-binary on-disk) | 2.7819 |
| Base model | allenai/OLMo-2-0425-1B |
| On-disk file size | ~2.7 GB (FP16 reconstruction; see "Read this first" above) |
| Patent posture | USPTO 64/049,511 (Track A) + 64/049,517 (Track B) β patent pending |
| Filed | 2026-04-25 |
The 2.782-bpw operating point is the v17hi line of the patent-pending row-overlay quantization method described in USPTO 64/049,511. A complementary 2.40-bpw operating point on the same model and base is documented internally (v17 line, packed binary round-trip verified on Qwen3-1.7B + 5 other models in the Sipsa Labs cohort) and will be published as a sibling artifact in this organization.
Catastrophic-failure check
A "catastrophic failure" is defined as a downstream-task perplexity ratio greater than 10Γ the FP16 baseline. On Sipsa Labs' internal 6-model cohort (TinyLlama-1.1B, OLMo-2-1B, SmolLM2-1.7B, Qwen3-1.7B, Mistral-7B-v0.3, Qwen3-8B) at the Track A operating point: 0 of 6 models exhibit catastrophic failure. This artifact (OLMo-2-1B at 2.782 bpw): non-catastrophic.
The cohort framing matters β this is a property of the method on this cohort, not an absolute claim about every possible base model.
Cohort scaling β retention scales with model size
The same Track A 2.798 bpw operating point measured across the 6-model Sipsa cohort (n=500, seed=42, WikiText-103 perplexity ratio):
| Model | Body params | T1 retention vs FP16 | T10 retention vs FP16 | PPL ratio |
|---|---|---|---|---|
| OLMo-2-1B | 1.00B | 94.19% | 97.04% | 1.165 |
| TinyLlama-1.1B | 1.10B | 96.37% | 97.88% | 1.097 |
| SmolLM2-1.7B | 1.71B | 93.72% | 96.71% | 1.218 |
| Qwen3-1.7B | 1.72B | 93.81% | 96.55% | 1.225 |
| Mistral-7B-v0.3 | 7.25B | 98.04% | 99.06% | 1.075 |
| Qwen3-8B | 8.19B | 97.63% | 98.84% | 1.067 |
Spearman rank correlation between body-parameter count and T1 retention: +0.486 for UltraCompress. bitsandbytes NF4 at 4.0 bpw on the same cohort: β0.086 β essentially flat.
UltraCompress retention scales +4.32 percentage points going from 1B to 8B. NF4 scales +1.93 pp. The scaling slope is 2.2Γ NF4's.
The mechanism is design-level: row-overlay's per-row scale + learned codebook + rotation matrix calibrate to the per-model magnitude distribution. Larger transformer matrices give the codebook more rows to learn from. NF4 is a fixed dictionary β no per-model adaptation, no scaling.
n=6 caveat: this is the 6-model cohort tested by Sipsa Labs. The scaling claim is a property of the method on this cohort. Generalization to broader cohorts is the open empirical question β replication invited; open an issue at github.com/sipsalabs/ultracompress/issues with model + seed + result.
Quality benchmarks
A small live benchmark is included below as a sanity-check; the full per-task benchmark numbers are intended to be reproduced in the buyer's own evaluation harness against their own baselines.
Reference benchmark (this artifact)
| Task | Setup | Compressed acc | Compressed acc_norm |
|---|---|---|---|
| HellaSwag | (TBD β reproducible via the lm-eval command below) | TBD | TBD |
The 200-sample subset is intentionally small β this is a reference number, not a final eval. The full 10042-sample HellaSwag benchmark, plus ARC-Challenge, MMLU, and any buyer-specific tasks, should be reproduced by the buyer with the command below. Different sample counts and seed configurations will produce different numbers.
Reproduce
# via lm-eval-harness directly (recommended workaround for transformers 4.57.x
# Qwen3-tokenizer-from-local-path issue β point tokenizer at the upstream repo):
python -m lm_eval --model hf \
--model_args "pretrained=./models/SipsaLabs_olmo2-1b-uc2p79,tokenizer=allenai/OLMo-2-0425-1B,dtype=float16,trust_remote_code=True" \
--tasks hellaswag,arc_challenge,mmlu \
--limit 500 --batch_size 8 --device cuda:0
For a paired FP16-baseline-vs-compressed comparison on the same task and same seed (the right way to read retention numbers), substitute pretrained=allenai/OLMo-2-0425-1B in a separate run and compare task-by-task.
The cohort-level claim (95.6% T1 retention, zero catastrophic failures across 6 models at the Track A operating point) comes from the WikiText-103 perplexity protocol documented in the patent specifications, not from HellaSwag accuracy. Different evaluation surfaces measure different things; the artifact-specific numbers above are the reproducible HellaSwag sanity-check, not the full cohort claim.
For a Compression Assessment engagement that includes the buyer's specific baseline, evaluation tasks, and a written readout: email founder@sipsalabs.com.
Intended use
Permitted under this License (free of charge):
- Personal, non-commercial research
- Academic research at non-profit institutions (with attribution)
- Pre-purchase evaluation by an enterprise considering negotiating a commercial license β for a period not to exceed 90 days from first download
Requires a separate commercial license (email legal@sipsalabs.com):
- Production deployment in any commercial product or service
- Use in an API or hosted inference service offered to third parties
- Embedding in or shipping within hardware products, consumer devices, automobiles, robotics platforms
- Training of any derivative model for commercial use
- Any use by for-profit entities other than internal evaluation
The full License is in LICENSE and at sipsalabs.com.
Out-of-scope use
This artifact is published for research and evaluation. It is not intended for safety-critical, life-critical, or human-subject decision-making applications. Compression introduces measurable quality regression versus the FP16 baseline; do not deploy this artifact in a setting where that regression is unacceptable. Run the buyer's own evaluation before any production decision.
Limitations
- The compression methods are post-training and preserve the base model's strengths and weaknesses. Whatever bias, refusal behavior, or out-of-distribution failure modes the OLMo-2-1B FP16 base has, this artifact inherits.
- This release stores reconstructed weights in FP16 layout β the runtime savings live in the loader and the future packed
model.uc.binartifact, not in thismodel.safetensorsfile's on-disk footprint. - Direct integration with quantization-aware runtimes (llama.cpp / TensorRT-LLM / vLLM quantization paths) is on the v0.2 roadmap. For v0.1.x,
transformersand the UltraCompress CLI are the supported load paths.
Reproducibility
Every public claim on this card maps to a verifiable on-disk artifact:
# Pull this artifact
uc pull SipsaLabs/olmo2-1b-uc2p79
# Verify the manifest end-to-end (size + SHA-256 for every declared file)
uc info ./models/SipsaLabs_olmo2-1b-uc2p79
# Reproduce the benchmark numbers in your own evaluation harness
uc bench ./models/SipsaLabs_olmo2-1b-uc2p79 \
--tasks hellaswag,arc_challenge,mmlu \
--limit 500 --batch-size 8 --device cuda:0
For a SHA-256 manifest of all training and evaluation inputs that produced this artifact (private context, available under NDA): legal@sipsalabs.com.
Citation
If you use this artifact in research, please cite:
@misc{ounnar2026ultracompress,
title = {UltraCompress: Patent-Pending Compression Infrastructure for Large Language Models},
author = {{Sipsa Labs, Inc.}},
year = {2026},
note = {U.S.\ patent applications 64/049,511 and 64/049,517, patent pending. Filed 2026-04-25.},
howpublished = {\url{https://sipsalabs.com}},
}
Get in touch
- Commercial license: legal@sipsalabs.com
- Bugs / quality regressions: file at github.com/sipsalabs/ultracompress/issues
- Security issues: security@sipsalabs.com (do not file public issues)
- Press / media: press@sipsalabs.com
- Compression Assessment engagement: founder@sipsalabs.com
Sipsa Labs, Inc. β sipsalabs.com β patent pending β USPTO 64/049,511 + 64/049,517 (filed 2026-04-25)
- Downloads last month
- 4
Model tree for SipsaLabs/olmo2-1b-uc2p79
Base model
allenai/OLMo-2-0425-1B