Sipsa Labs

Compression infrastructure for the next generation of language models — Systems · Intelligence · Precision. UltraCompress is our flagship publicly-shipped product.

Patent Pending USPTO 64/049,511 USPTO 64/049,517 Apache-2.0 CLI

Latest — Streaming compression: full Qwen scaling curve, 72B on a single GPU (2026-05-04)

Per-layer streaming compression validated end-to-end across 8B → 72B with peak VRAM bounded by ~one transformer layer regardless of total model depth.

Model	Baseline PPL	Compressed PPL	PPL ratio	Peak VRAM
Qwen3-8B	16.79	17.26	1.0278×	2.26 GB
Qwen3-14B	15.44	15.61	1.0111×	3.37 GB
Qwen3-32B	13.77	14.27	1.0367×	4.85 GB
Qwen2.5-72B	8.92	9.07	1.0162×	8.98 GB

Qwen2.5-72B compresses to 8.98 GB peak VRAM on a single RTX 5090 — production-grade quality (1.6% PPL drift) on consumer hardware. The 100T-on-one-GPU mission goes from aspirational to math problem.

Source + reproduce: github.com/sipsalabs/ultracompress

pip install ultracompress

Reference models on this Hub

Pre-compressed open-weights variants of well-known base models. Apache-licensed bases retained; compression metadata under the Sipsa Labs Research Evaluation License v1.0.

Rolling release: smollm2 · mistral · olmo2 · qwen3 variants throughout 2026-05.

Patents

USPTO 64/049,511 (Track A — Activation-Aware Row-Overlay Quantization) and 64/049,517 (Track B — Fractal Residual Recursion) filed April 25, 2026. Supplement covering streaming-compression mechanism filed May 2026.

Contact

Pilots / commercial → founder@sipsalabs.com
Patents / licensing → legal@sipsalabs.com
Press / media → press@sipsalabs.com
Security disclosure → security@sipsalabs.com
General → hello@sipsalabs.com

sipsalabs.com · github.com/sipsalabs · @SipsaLabs on X

models 40

datasets 0

None public yet

AI & ML interests

Recent Activity

Team members 1