YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Sipsa Labs, Inc. update — 2026-05-11. UltraCompress v0.6.6 on PyPI under BUSL-1.1 + Additional Use Grant (free for sub-$1M ARR companies, research, and individuals; auto-converts to Apache 2.0 four years post-release). OpenAI-compatible inference API at api.sipsalabs.com/v1 is publicly self-serve — Pro $99/mo + Team $499/mo at sipsalabs.com/pricing, or free $5 credits (no card). The pip install ultracompress substrate is fully production today (no API key required for self-host). 22 architectures verified, 0.6B–405B parameters, sub-1.005× perplexity ratio on Mixtral-8x7B / Qwen3-14B / Mistral-7B. Live discussion on Hacker News. Commercial inquiries: founder@sipsalabs.com.

license: other license_name: sipsa-labs-research-evaluation-v1.0 license_link: LICENSE base_model: Qwen/Qwen3-1.7B tags:

ultracompress
track-b
architectural-compression
asvd
llm
on-device
edge
patent-pending
research-preview library_name: ultracompress language:
en pipeline_tag: text-generation

qwen3-1.7b-trackb-preview

A patent-pending shared-block parameter dispatch research preview of architectural compression on Qwen/Qwen3-1.7B via shared-block parameter dispatch (USPTO 64/049,517 — patent pending), combined with low-rank correction overlay row-overlay quantization on the lm_head (USPTO 64/049,511).

Read this first — this is a research preview, not a production artifact.

This artifact is the first publicly-disclosed shared-block parameter dispatch compressed model. It demonstrates the multiplicative compression that shared-block parameter dispatch + low-rank correction overlay achieves end-to-end. Three things to know:

Custom architecture, custom loader. The compressed model has a non-standard inference path: tokens → teacher_embed → proj_in → low-rank refinement body (recursive shared block) → proj_out → norm_outer → ASVD proj → ASVD out → logits. Cannot be loaded via transformers.from_pretrained. Requires pip install ultracompress v0.2 (shipping Q3 2026), which adds the uc load --combined-stack runtime. Today this preview is for inspection of the architecture + manifest + measured benchmark numbers, not direct inference.

Real measured numbers. The combined-stack benchmark on Qwen3-1.7B held-out (n=1000, seed=42, fineweb_edu tail-50M tokens not seen during training): top-1-token-agreement 53.88%, top-10-token-agreement 68.32%, perplexity ratio 3.17 vs teacher. End-to-end parameter compression 26.68× (40.9M total vs 1.09B teacher). The body alone (HQ5 h=256) is ~933× compressed relative to the teacher body parameters (1.5M vs 1.4B). Numbers are in combined_stack_results_hq5.json.

Different operating point than low-rank correction overlay. low-rank correction overlay standalone (e.g., SipsaLabs/qwen3-1.7b-uc2p79) hits ~95% acc_norm retention on HellaSwag at sub-3 bpw. shared-block parameter dispatch + low-rank correction overlay combined-stack at this operating point is at 68.32% top-10-token-agreement — meaningfully more compressed but with measurably more quality regression. Different operating points for different buyer needs: low-rank correction overlay for high-fidelity edge deployment, shared-block parameter dispatch + low-rank correction overlay for sub-10-MB-class memory budgets where token-level fidelity is acceptable.

What's in this preview

File	Description	Size
`shared_block_body_hq5_h256.pt`	low-rank refinement body checkpoint at HQ5 h=256. ~1.5M trainable params (vs ~1.4B teacher body, ~933× compression). Includes shared transformer block + per-iteration affine modulation + per-scale residual schedule.	1.56 GB
`asvd_head_r256.pt`	ASVD-compressed lm_head at rank 256. ~39M params (vs ~155M teacher head, ~4× compression). low-rank correction overlay applied to the head only.	157 MB
`combined_stack_results_hq5.json`	Measured combined-stack benchmark: T1 53.88%, T10 68.32%, end-to-end 26.68× compression, n=1000, seed 42.	3 KB
`ultracompress.json`	Manifest with method/track metadata, USPTO references, license info, per-file SHA-256, loader notes	4 KB
`LICENSE`	Sipsa Labs Research and Evaluation License v1.0 (full text)	7 KB

What this preview proves

The shared-block parameter dispatch architectural compression method (filed 2026-04-25, USPTO 64/049,517) achieves measured compression ratios of:

934× on the Qwen3-1.7B body parameters (this specific HQ5 h256 variant)
26.68× end-to-end parameter compression when combined with ASVD r256 head (low-rank correction overlay applied to head)

The patent specification documents 311× and 734× body compression on Qwen3-1.7B in two preferred embodiments (HQ4 and HQ5 respectively); the measured 934× in this preview is consistent with the spec's claim of "compression ratio exceeding 300×, in a preferred embodiment at least 700×".

Per the patent's Pareto chart (figure 4), shared-block parameter dispatch sits 50× beyond the prior-art distillation frontier established by DistilBERT, TinyBERT, and related methods — those topped out at compression ratios of 6×–26×.

Combined low-rank correction overlay × shared-block parameter dispatch — the company's flagship technical claim

UltraCompress is a two-track patent estate. low-rank correction overlay (USPTO 64/049,511) is post-training row-overlay quantization at sub-3 bits per weight, shipping today as SipsaLabs/qwen3-1.7b-uc2p79. shared-block parameter dispatch (USPTO 64/049,517) is architectural compression. They are complementary: low-rank correction overlay compresses each weight, shared-block parameter dispatch reduces the count of weights.

This preview is the combined shared-block parameter dispatch body + low-rank correction overlay head — partial combined-stack. The fully-combined embodiment (low-rank correction overlay applied to BOTH the low-rank refinement body and the head) is research-stage and has not yet been measured; per the patent claims, it should compose multiplicatively. v0.2 ships the full combined-stack with all permutations characterized.

Reproducibility

The benchmark numbers in combined_stack_results_hq5.json are reproducible via the eval pipeline at github.com/sipsalabs/ultracompress (private research repo; access under NDA for design partners). The eval protocol:

1000 samples × 128 seq-len from fineweb_edu_500M_tokens.pt tail-50M tokens
Seed 42 (deterministic)
95% bootstrap confidence intervals
Held-out: tail 50M tokens never seen during training

For the full reproducibility manifest including SHA-256s of training inputs, eval splits, and checkpoint provenance: legal@sipsalabs.com under NDA.

Loading (when v0.2 ships)

pip install ultracompress  # v0.2+
uc pull SipsaLabs/qwen3-1.7b-trackb-preview
uc load --combined-stack ./models/SipsaLabs_qwen3-1.7b-trackb-preview

For now, the artifact is for inspection only — read combined_stack_results_hq5.json for measured benchmark numbers, read ultracompress.json for the architecture + provenance manifest. The loader ships in v0.2 (Q3 2026).

Intended use

Permitted under this License (free of charge):

Personal, non-commercial research
Academic research at non-profit institutions (with attribution to USPTO 64/049,511 + 64/049,517)
Pre-purchase evaluation by an enterprise considering negotiating a commercial license, for up to 90 days

Requires a separate commercial license (email legal@sipsalabs.com):

Production deployment in any commercial product or service
Use in an API or hosted inference service
Embedding in or shipping within hardware products, consumer devices, automobiles, robotics platforms
Training of any derivative model for commercial use
Any use by for-profit entities other than internal evaluation

Out-of-scope use

This is a research preview. Top-10-token-agreement of 68.32% is research-acceptable; production deployment requires customer-specific eval against their own safety-critical thresholds. Do not deploy this artifact in safety-critical, life-critical, or human-subject-decision applications without explicit customer-side validation.

Patent posture

UltraCompress is patent pending. Two USPTO provisional patent applications were filed on April 25, 2026:

64/049,511 — low-rank correction overlay, Row-Overlay Quantization
64/049,517 — shared-block parameter dispatch, shared-block parameter dispatch

"Patent pending" is a legal status, not patent protection — final patent rights depend on USPTO examination of the non-provisional applications, scheduled for drafting through 2027. Please do not characterize Sipsa Labs as having "issued" patents.

Get in touch

Commercial license: legal@sipsalabs.com
Research collaboration / NDA tech-deep-dive: founder@sipsalabs.com
Press / media: press@sipsalabs.com
Security issues: security@sipsalabs.com

Sipsa Labs, Inc. — sipsalabs.com — patent pending — USPTO 64/049,511 + 64/049,517 (filed 2026-04-25) This is a shared-block parameter dispatch research preview. Production-grade shared-block parameter dispatch + low-rank correction overlay combined ship v0.2 Q3 2026.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support