YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Sipsa Labs, Inc. update β€” 2026-05-11. UltraCompress v0.6.6 on PyPI under BUSL-1.1 + Additional Use Grant (free for sub-$1M ARR companies, research, and individuals; auto-converts to Apache 2.0 four years post-release). OpenAI-compatible inference API at api.sipsalabs.com/v1 is publicly self-serve β€” Pro $99/mo + Team $499/mo at sipsalabs.com/pricing, or free $5 credits (no card). The pip install ultracompress substrate is fully production today (no API key required for self-host). 22 architectures verified, 0.6B–405B parameters, sub-1.005Γ— perplexity ratio on Mixtral-8x7B / Qwen3-14B / Mistral-7B. Live discussion on Hacker News. Commercial inquiries: founder@sipsalabs.com.



license: other license_name: sipsa-labs-research-evaluation-v1.0 license_link: LICENSE base_model: Qwen/Qwen3-1.7B tags:

  • ultracompress

  • track-b

  • architectural-compression

  • asvd

  • llm

  • on-device

  • edge

  • patent-pending

  • research-preview library_name: ultracompress language:

  • en pipeline_tag: text-generation


qwen3-1.7b-trackb-preview

A patent-pending shared-block parameter dispatch research preview of architectural compression on Qwen/Qwen3-1.7B via shared-block parameter dispatch (USPTO 64/049,517 β€” patent pending), combined with low-rank correction overlay row-overlay quantization on the lm_head (USPTO 64/049,511).

Read this first β€” this is a research preview, not a production artifact.

This artifact is the first publicly-disclosed shared-block parameter dispatch compressed model. It demonstrates the multiplicative compression that shared-block parameter dispatch + low-rank correction overlay achieves end-to-end. Three things to know:

  1. Custom architecture, custom loader. The compressed model has a non-standard inference path: tokens β†’ teacher_embed β†’ proj_in β†’ low-rank refinement body (recursive shared block) β†’ proj_out β†’ norm_outer β†’ ASVD proj β†’ ASVD out β†’ logits. Cannot be loaded via transformers.from_pretrained. Requires pip install ultracompress v0.2 (shipping Q3 2026), which adds the uc load --combined-stack runtime. Today this preview is for inspection of the architecture + manifest + measured benchmark numbers, not direct inference.

  2. Real measured numbers. The combined-stack benchmark on Qwen3-1.7B held-out (n=1000, seed=42, fineweb_edu tail-50M tokens not seen during training): top-1-token-agreement 53.88%, top-10-token-agreement 68.32%, perplexity ratio 3.17 vs teacher. End-to-end parameter compression 26.68Γ— (40.9M total vs 1.09B teacher). The body alone (HQ5 h=256) is ~933Γ— compressed relative to the teacher body parameters (1.5M vs 1.4B). Numbers are in combined_stack_results_hq5.json.

  3. Different operating point than low-rank correction overlay. low-rank correction overlay standalone (e.g., SipsaLabs/qwen3-1.7b-uc2p79) hits ~95% acc_norm retention on HellaSwag at sub-3 bpw. shared-block parameter dispatch + low-rank correction overlay combined-stack at this operating point is at 68.32% top-10-token-agreement β€” meaningfully more compressed but with measurably more quality regression. Different operating points for different buyer needs: low-rank correction overlay for high-fidelity edge deployment, shared-block parameter dispatch + low-rank correction overlay for sub-10-MB-class memory budgets where token-level fidelity is acceptable.

What's in this preview

File Description Size
shared_block_body_hq5_h256.pt low-rank refinement body checkpoint at HQ5 h=256. ~1.5M trainable params (vs ~1.4B teacher body, ~933Γ— compression). Includes shared transformer block + per-iteration affine modulation + per-scale residual schedule. 1.56 GB
asvd_head_r256.pt ASVD-compressed lm_head at rank 256. ~39M params (vs ~155M teacher head, ~4Γ— compression). low-rank correction overlay applied to the head only. 157 MB
combined_stack_results_hq5.json Measured combined-stack benchmark: T1 53.88%, T10 68.32%, end-to-end 26.68Γ— compression, n=1000, seed 42. 3 KB
ultracompress.json Manifest with method/track metadata, USPTO references, license info, per-file SHA-256, loader notes 4 KB
LICENSE Sipsa Labs Research and Evaluation License v1.0 (full text) 7 KB

What this preview proves

The shared-block parameter dispatch architectural compression method (filed 2026-04-25, USPTO 64/049,517) achieves measured compression ratios of:

  • 934Γ— on the Qwen3-1.7B body parameters (this specific HQ5 h256 variant)
  • 26.68Γ— end-to-end parameter compression when combined with ASVD r256 head (low-rank correction overlay applied to head)

The patent specification documents 311Γ— and 734Γ— body compression on Qwen3-1.7B in two preferred embodiments (HQ4 and HQ5 respectively); the measured 934Γ— in this preview is consistent with the spec's claim of "compression ratio exceeding 300Γ—, in a preferred embodiment at least 700Γ—".

Per the patent's Pareto chart (figure 4), shared-block parameter dispatch sits 50Γ— beyond the prior-art distillation frontier established by DistilBERT, TinyBERT, and related methods β€” those topped out at compression ratios of 6×–26Γ—.

Combined low-rank correction overlay Γ— shared-block parameter dispatch β€” the company's flagship technical claim

UltraCompress is a two-track patent estate. low-rank correction overlay (USPTO 64/049,511) is post-training row-overlay quantization at sub-3 bits per weight, shipping today as SipsaLabs/qwen3-1.7b-uc2p79. shared-block parameter dispatch (USPTO 64/049,517) is architectural compression. They are complementary: low-rank correction overlay compresses each weight, shared-block parameter dispatch reduces the count of weights.

This preview is the combined shared-block parameter dispatch body + low-rank correction overlay head β€” partial combined-stack. The fully-combined embodiment (low-rank correction overlay applied to BOTH the low-rank refinement body and the head) is research-stage and has not yet been measured; per the patent claims, it should compose multiplicatively. v0.2 ships the full combined-stack with all permutations characterized.

Reproducibility

The benchmark numbers in combined_stack_results_hq5.json are reproducible via the eval pipeline at github.com/sipsalabs/ultracompress (private research repo; access under NDA for design partners). The eval protocol:

  • 1000 samples Γ— 128 seq-len from fineweb_edu_500M_tokens.pt tail-50M tokens
  • Seed 42 (deterministic)
  • 95% bootstrap confidence intervals
  • Held-out: tail 50M tokens never seen during training

For the full reproducibility manifest including SHA-256s of training inputs, eval splits, and checkpoint provenance: legal@sipsalabs.com under NDA.

Loading (when v0.2 ships)

pip install ultracompress  # v0.2+
uc pull SipsaLabs/qwen3-1.7b-trackb-preview
uc load --combined-stack ./models/SipsaLabs_qwen3-1.7b-trackb-preview

For now, the artifact is for inspection only β€” read combined_stack_results_hq5.json for measured benchmark numbers, read ultracompress.json for the architecture + provenance manifest. The loader ships in v0.2 (Q3 2026).

Intended use

Permitted under this License (free of charge):

  • Personal, non-commercial research
  • Academic research at non-profit institutions (with attribution to USPTO 64/049,511 + 64/049,517)
  • Pre-purchase evaluation by an enterprise considering negotiating a commercial license, for up to 90 days

Requires a separate commercial license (email legal@sipsalabs.com):

  • Production deployment in any commercial product or service
  • Use in an API or hosted inference service
  • Embedding in or shipping within hardware products, consumer devices, automobiles, robotics platforms
  • Training of any derivative model for commercial use
  • Any use by for-profit entities other than internal evaluation

Out-of-scope use

This is a research preview. Top-10-token-agreement of 68.32% is research-acceptable; production deployment requires customer-specific eval against their own safety-critical thresholds. Do not deploy this artifact in safety-critical, life-critical, or human-subject-decision applications without explicit customer-side validation.

Patent posture

UltraCompress is patent pending. Two USPTO provisional patent applications were filed on April 25, 2026:

  • 64/049,511 β€” low-rank correction overlay, Row-Overlay Quantization
  • 64/049,517 β€” shared-block parameter dispatch, shared-block parameter dispatch

"Patent pending" is a legal status, not patent protection β€” final patent rights depend on USPTO examination of the non-provisional applications, scheduled for drafting through 2027. Please do not characterize Sipsa Labs as having "issued" patents.

Get in touch


Sipsa Labs, Inc. β€” sipsalabs.com β€” patent pending β€” USPTO 64/049,511 + 64/049,517 (filed 2026-04-25) This is a shared-block parameter dispatch research preview. Production-grade shared-block parameter dispatch + low-rank correction overlay combined ship v0.2 Q3 2026.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support