Proprietary Invention Package – Ternary-Quantized Transformer Optimization

Inventor: Konstantin Vladimirovich Grabko
Email: grabko@cmsmanhattan.com
Date: December 22, 2026

Overview:

This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, MoE, and SWA fusion.

DEV MODE Current Development: [In-Research]

The Mixture of Experts (MoE) dynamic routing architecture and its associated HBM Shared Memory Pool logic are currently in the intensive research and benchmarking phase. While the theoretical framework and initial simulations show significant promise in reducing data movement overhead, the system is not yet in a production-ready state.

Call for Research Sponsorship We are seeking strategic partners and sponsors to accelerate the transition from architectural theory to hardware-validated implementation.

Contents:

license.md
NDA.md
invention_description.md
claims.md
performance_data.md
[Diagrams and attachments]

Confidential: All materials are proprietary. Contact inventor for licensing discussions. JiRack Ternary MoE 26b model

JiRack Ternary MoE 26B — The Next Evolution in Ultra-Efficient Giant Models
Introducing JiRack Ternary MoE 26B: a groundbreaking 26-billion-parameter language model that combines ternary quantization (weights restricted to {-1, 0, +1}) with a powerful Mixture of Experts (MoE) architecture.
What makes it special?
The JiRack Agentic AI system is elegantly packed directly into the model as a collection of highly specialized experts. Thanks to the MoE design, JiRack Ternary 26B effectively hosts far more experts than a traditional dense model of the same active-parameter footprint — unlocking massive capacity while staying dramatically more efficient in compute, memory, and energy use.
This hybrid approach draws inspiration from brain-like efficiency (ternary weights mimic ultra-low-precision biological signaling) while delivering top-tier performance through dynamic expert routing. The result: a frontier-scale LLM that's smarter, leaner, and more agentic by design.
JiRack Ternary 26B isn't just bigger — it's fundamentally more intelligent about how it thinks and scales.

JiRack Ternary MoE 26B — Ultra-Efficient Frontier-Scale Intelligence

Introducing JiRack Ternary MoE 26B: a revolutionary 26-billion-parameter language model that fuses ternary quantization (weights constrained to {-1, 0, +1} for extreme efficiency) with a powerful Mixture of Experts (MoE) architecture — inspired by BitNet-style paradigms and pushing the boundaries of brain-like compute.
How JiRack achieves massive scale with unmatched efficiency:
Agentic AI Packed as Experts — The JiRack Agentic AI system is seamlessly embedded into the model as a dynamic collection of highly specialized experts. The MoE design allows JiRack Ternary 26B to support far more experts than a traditional dense model of equivalent active parameters — delivering enormous capacity while slashing compute, memory, and energy demands dramatically.
Foundation in Ternary 70B Experts — The journey begins with JiRack Ternary 70B, where individual experts are trained separately in a modular, ternary-quantized format. This separable pre-training phase creates highly capable, low-precision specialist modules from the ground up.
Expert Router Training — Once the experts are ready, we train a dedicated expert router (gating network) to intelligently dispatch each incoming request (token or query) to the most relevant experts. This dynamic routing ensures optimal specialization, load balancing, and efficiency — activating only a small subset of the total capacity per inference step.
The result? A hybrid architecture that mimics biological neural efficiency (ternary weights ≈ ultra-sparse, low-energy signaling) while unlocking frontier-level performance through smart, adaptive expert selection. JiRack Ternary MoE 26B isn't merely larger — it's engineered to think smarter, run leaner, and scale further than conventional dense or even standard MoE designs.
Key advantages at a glance:
~70–90% reduction in energy & memory vs. FP16 equivalents
Massive effective parameter count via many lightweight ternary experts
Agentic behavior baked in through specialized, routable modules
Designed for real-world deployment on constrained hardware
JiRack is redefining what's possible at 26B scale — efficient, intelligent, and truly agentic by design.
(Let me know if you'd like this formatted as a blog post, tweet thread, technical abstract, or with more emphasis on benchmarks/training details!)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kgrabko/JiRackTernary26_MoE

Base model

google/gemma-4-26B-A4B

Finetuned

google/gemma-4-26B-A4B-it

Finetuned

(80)

this model