AI & ML interests

None defined yet.

Recent Activity

sip786  updated a model about 8 hours ago
SipsaLabs/phi-3.5-moe-instruct-uc-v3-bpw5
sip786  updated a model about 8 hours ago
SipsaLabs/mixtral-8x7b-v0.1-uc-v3-bpw5
sip786  updated a model about 8 hours ago
SipsaLabs/qwen3-8b-uc-v3-bpw5
View all activity

Organization Card

Sipsa Labs

Compression infrastructure for the next generation of language models — Systems · Intelligence · Precision. UltraCompress is our flagship publicly-shipped product.
Patent Pending USPTO 64/049,511 USPTO 64/049,517 Apache-2.0 CLI

Latest — Streaming compression: full Qwen scaling curve, 72B on a single GPU (2026-05-04)

Per-layer streaming compression validated end-to-end across 8B → 72B with peak VRAM bounded by ~one transformer layer regardless of total model depth.

ModelBaseline PPLCompressed PPLPPL ratioPeak VRAM
Qwen3-8B16.7917.261.0278×2.26 GB
Qwen3-14B15.4415.611.0111×3.37 GB
Qwen3-32B13.7714.271.0367×4.85 GB
Qwen2.5-72B8.929.071.0162×8.98 GB

Qwen2.5-72B compresses to 8.98 GB peak VRAM on a single RTX 5090 — production-grade quality (1.6% PPL drift) on consumer hardware. The 100T-on-one-GPU mission goes from aspirational to math problem.

Source + reproduce: github.com/sipsalabs/ultracompress

pip install ultracompress

Reference models on this Hub

Pre-compressed open-weights variants of well-known base models. Apache-licensed bases retained; compression metadata under the Sipsa Labs Research Evaluation License v1.0.

Rolling release: smollm2 · mistral · olmo2 · qwen3 variants throughout 2026-05.


Patents

USPTO 64/049,511 (Track A — Activation-Aware Row-Overlay Quantization) and 64/049,517 (Track B — Fractal Residual Recursion) filed April 25, 2026. Supplement covering streaming-compression mechanism filed May 2026.


Contact

sipsalabs.com · github.com/sipsalabs · @SipsaLabs on X

datasets 0

None public yet