AI & ML interests
None defined yet.
Recent Activity
Sipsa Labs
Compression infrastructure for the next generation of language models — Systems · Intelligence · Precision. UltraCompress is our flagship publicly-shipped product.Patent Pending USPTO 64/049,511 USPTO 64/049,517 Apache-2.0 CLI
Latest — Streaming compression: full Qwen scaling curve, 72B on a single GPU (2026-05-04)
Per-layer streaming compression validated end-to-end across 8B → 72B with peak VRAM bounded by ~one transformer layer regardless of total model depth.
| Model | Baseline PPL | Compressed PPL | PPL ratio | Peak VRAM |
|---|---|---|---|---|
| Qwen3-8B | 16.79 | 17.26 | 1.0278× | 2.26 GB |
| Qwen3-14B | 15.44 | 15.61 | 1.0111× | 3.37 GB |
| Qwen3-32B | 13.77 | 14.27 | 1.0367× | 4.85 GB |
| Qwen2.5-72B | 8.92 | 9.07 | 1.0162× | 8.98 GB |
Qwen2.5-72B compresses to 8.98 GB peak VRAM on a single RTX 5090 — production-grade quality (1.6% PPL drift) on consumer hardware. The 100T-on-one-GPU mission goes from aspirational to math problem.
Source + reproduce: github.com/sipsalabs/ultracompress
pip install ultracompress
Reference models on this Hub
Pre-compressed open-weights variants of well-known base models. Apache-licensed bases retained; compression metadata under the Sipsa Labs Research Evaluation License v1.0.
Rolling release: smollm2 · mistral · olmo2 · qwen3 variants throughout 2026-05.
Patents
USPTO 64/049,511 (Track A — Activation-Aware Row-Overlay Quantization) and 64/049,517 (Track B — Fractal Residual Recursion) filed April 25, 2026. Supplement covering streaming-compression mechanism filed May 2026.
Contact
- Pilots / commercial → founder@sipsalabs.com
- Patents / licensing → legal@sipsalabs.com
- Press / media → press@sipsalabs.com
- Security disclosure → security@sipsalabs.com
- General → hello@sipsalabs.com