YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Sipsa Labs, Inc. update β 2026-05-11. UltraCompress v0.6.6 on PyPI under BUSL-1.1 + Additional Use Grant (free for sub-$1M ARR companies, research, and individuals; auto-converts to Apache 2.0 four years post-release). OpenAI-compatible inference API at api.sipsalabs.com/v1 is publicly self-serve β Pro $99/mo + Team $499/mo at sipsalabs.com/pricing, or free $5 credits (no card). The
pip install ultracompresssubstrate is fully production today (no API key required for self-host). 22 architectures verified, 0.6Bβ405B parameters, sub-1.005Γ perplexity ratio on Mixtral-8x7B / Qwen3-14B / Mistral-7B. Live discussion on Hacker News. Commercial inquiries: founder@sipsalabs.com.
license: other license_name: sipsa-labs-research-evaluation-v1.0 license_link: LICENSE base_model: Qwen/Qwen3-1.7B tags:
ultracompress
track-b
architectural-compression
asvd
llm
on-device
edge
patent-pending
research-preview library_name: ultracompress language:
en pipeline_tag: text-generation
qwen3-1.7b-trackb-preview
A patent-pending shared-block parameter dispatch research preview of architectural compression on Qwen/Qwen3-1.7B via shared-block parameter dispatch (USPTO 64/049,517 β patent pending), combined with low-rank correction overlay row-overlay quantization on the lm_head (USPTO 64/049,511).
Read this first β this is a research preview, not a production artifact.
This artifact is the first publicly-disclosed shared-block parameter dispatch compressed model. It demonstrates the multiplicative compression that shared-block parameter dispatch + low-rank correction overlay achieves end-to-end. Three things to know:
Custom architecture, custom loader. The compressed model has a non-standard inference path:
tokens β teacher_embed β proj_in β low-rank refinement body (recursive shared block) β proj_out β norm_outer β ASVD proj β ASVD out β logits. Cannot be loaded viatransformers.from_pretrained. Requirespip install ultracompressv0.2 (shipping Q3 2026), which adds theuc load --combined-stackruntime. Today this preview is for inspection of the architecture + manifest + measured benchmark numbers, not direct inference.Real measured numbers. The combined-stack benchmark on Qwen3-1.7B held-out (n=1000, seed=42, fineweb_edu tail-50M tokens not seen during training): top-1-token-agreement 53.88%, top-10-token-agreement 68.32%, perplexity ratio 3.17 vs teacher. End-to-end parameter compression 26.68Γ (40.9M total vs 1.09B teacher). The body alone (HQ5 h=256) is ~933Γ compressed relative to the teacher body parameters (1.5M vs 1.4B). Numbers are in
combined_stack_results_hq5.json.Different operating point than low-rank correction overlay. low-rank correction overlay standalone (e.g.,
SipsaLabs/qwen3-1.7b-uc2p79) hits ~95% acc_norm retention on HellaSwag at sub-3 bpw. shared-block parameter dispatch + low-rank correction overlay combined-stack at this operating point is at 68.32% top-10-token-agreement β meaningfully more compressed but with measurably more quality regression. Different operating points for different buyer needs: low-rank correction overlay for high-fidelity edge deployment, shared-block parameter dispatch + low-rank correction overlay for sub-10-MB-class memory budgets where token-level fidelity is acceptable.
What's in this preview
| File | Description | Size |
|---|---|---|
shared_block_body_hq5_h256.pt |
low-rank refinement body checkpoint at HQ5 h=256. ~1.5M trainable params (vs ~1.4B teacher body, ~933Γ compression). Includes shared transformer block + per-iteration affine modulation + per-scale residual schedule. | 1.56 GB |
asvd_head_r256.pt |
ASVD-compressed lm_head at rank 256. ~39M params (vs ~155M teacher head, ~4Γ compression). low-rank correction overlay applied to the head only. | 157 MB |
combined_stack_results_hq5.json |
Measured combined-stack benchmark: T1 53.88%, T10 68.32%, end-to-end 26.68Γ compression, n=1000, seed 42. | 3 KB |
ultracompress.json |
Manifest with method/track metadata, USPTO references, license info, per-file SHA-256, loader notes | 4 KB |
LICENSE |
Sipsa Labs Research and Evaluation License v1.0 (full text) | 7 KB |
What this preview proves
The shared-block parameter dispatch architectural compression method (filed 2026-04-25, USPTO 64/049,517) achieves measured compression ratios of:
- 934Γ on the Qwen3-1.7B body parameters (this specific HQ5 h256 variant)
- 26.68Γ end-to-end parameter compression when combined with ASVD r256 head (low-rank correction overlay applied to head)
The patent specification documents 311Γ and 734Γ body compression on Qwen3-1.7B in two preferred embodiments (HQ4 and HQ5 respectively); the measured 934Γ in this preview is consistent with the spec's claim of "compression ratio exceeding 300Γ, in a preferred embodiment at least 700Γ".
Per the patent's Pareto chart (figure 4), shared-block parameter dispatch sits 50Γ beyond the prior-art distillation frontier established by DistilBERT, TinyBERT, and related methods β those topped out at compression ratios of 6Γβ26Γ.
Combined low-rank correction overlay Γ shared-block parameter dispatch β the company's flagship technical claim
UltraCompress is a two-track patent estate. low-rank correction overlay (USPTO 64/049,511) is post-training row-overlay quantization at sub-3 bits per weight, shipping today as SipsaLabs/qwen3-1.7b-uc2p79. shared-block parameter dispatch (USPTO 64/049,517) is architectural compression. They are complementary: low-rank correction overlay compresses each weight, shared-block parameter dispatch reduces the count of weights.
This preview is the combined shared-block parameter dispatch body + low-rank correction overlay head β partial combined-stack. The fully-combined embodiment (low-rank correction overlay applied to BOTH the low-rank refinement body and the head) is research-stage and has not yet been measured; per the patent claims, it should compose multiplicatively. v0.2 ships the full combined-stack with all permutations characterized.
Reproducibility
The benchmark numbers in combined_stack_results_hq5.json are reproducible via the eval pipeline at github.com/sipsalabs/ultracompress (private research repo; access under NDA for design partners). The eval protocol:
- 1000 samples Γ 128 seq-len from
fineweb_edu_500M_tokens.pttail-50M tokens - Seed 42 (deterministic)
- 95% bootstrap confidence intervals
- Held-out: tail 50M tokens never seen during training
For the full reproducibility manifest including SHA-256s of training inputs, eval splits, and checkpoint provenance: legal@sipsalabs.com under NDA.
Loading (when v0.2 ships)
pip install ultracompress # v0.2+
uc pull SipsaLabs/qwen3-1.7b-trackb-preview
uc load --combined-stack ./models/SipsaLabs_qwen3-1.7b-trackb-preview
For now, the artifact is for inspection only β read combined_stack_results_hq5.json for measured benchmark numbers, read ultracompress.json for the architecture + provenance manifest. The loader ships in v0.2 (Q3 2026).
Intended use
Permitted under this License (free of charge):
- Personal, non-commercial research
- Academic research at non-profit institutions (with attribution to USPTO 64/049,511 + 64/049,517)
- Pre-purchase evaluation by an enterprise considering negotiating a commercial license, for up to 90 days
Requires a separate commercial license (email legal@sipsalabs.com):
- Production deployment in any commercial product or service
- Use in an API or hosted inference service
- Embedding in or shipping within hardware products, consumer devices, automobiles, robotics platforms
- Training of any derivative model for commercial use
- Any use by for-profit entities other than internal evaluation
Out-of-scope use
This is a research preview. Top-10-token-agreement of 68.32% is research-acceptable; production deployment requires customer-specific eval against their own safety-critical thresholds. Do not deploy this artifact in safety-critical, life-critical, or human-subject-decision applications without explicit customer-side validation.
Patent posture
UltraCompress is patent pending. Two USPTO provisional patent applications were filed on April 25, 2026:
- 64/049,511 β low-rank correction overlay, Row-Overlay Quantization
- 64/049,517 β shared-block parameter dispatch, shared-block parameter dispatch
"Patent pending" is a legal status, not patent protection β final patent rights depend on USPTO examination of the non-provisional applications, scheduled for drafting through 2027. Please do not characterize Sipsa Labs as having "issued" patents.
Get in touch
- Commercial license: legal@sipsalabs.com
- Research collaboration / NDA tech-deep-dive: founder@sipsalabs.com
- Press / media: press@sipsalabs.com
- Security issues: security@sipsalabs.com
Sipsa Labs, Inc. β sipsalabs.com β patent pending β USPTO 64/049,511 + 64/049,517 (filed 2026-04-25) This is a shared-block parameter dispatch research preview. Production-grade shared-block parameter dispatch + low-rank correction overlay combined ship v0.2 Q3 2026.