Proprietary Invention Package – Ternary-Quantized Transformer Optimization

Under training yet: DEV

Inventor: Konstantin Vladimirovich Grabko
Email: grabko@cmsmanhattan.com
Date: December 21, 2025
Needed: A sponsor for Llama 405b Distilation to high quality as original Llama 405b or Data Center with cooperation

Overview: This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, BRE, and SWA fusion.

Contents:

license.md
NDA.md
invention_description.md
claims.md
performance_data.md
[Diagrams and attachments]

Confidential: All materials are proprietary. Contact inventor for licensing discussions.

⚠️ IMPORTANT NOTICE — PROPRIETARY TECHNOLOGY

This model and all accompanying code, algorithms, and documentation are proprietary technology owned by Konstantin Vladimirovich Grabko.

Allowed:

Personal and non-commercial research use only

Strictly Prohibited without a written commercial license:

Any commercial use (SaaS, mobile apps, edge devices, paid services, etc.)
Creating and distributing derivative models for profit
Removing or modifying any copyright or legal notices
Patenting any part of this technology

Commercial users must obtain a signed license and pay 5% royalty on net revenue.

Any unauthorized commercial use will be pursued legally under New York law.

Contact for commercial license: grabko@cmsmanhattan.com

Benefits for the JiRack 405B Project

VRAM Efficiency

✅ Easy Fine-tuning a 405B model usually requires massive resources, but with LoRA and your 70% VRAM reduction, users can fine-tune on consumer-grade multi-GPU setups.

Trainable Parameters:

Base model (frozen): 405B parameters @ 2-bit = ~108 GB
LoRA adapters (r=16): ~50M parameters @ FP32 = ~200 MB
Total VRAM: ~108 GB (fits on 4x RTX 4090 with offloading)

Thermal Stability

✅ Since only a fraction of parameters are updated, the thermal footprint remains consistent with your SWA Fusion goals of staying < 80°C.

JiRack Ternary 405B on BitNet layers with meta-llama/Llama-3.2-405B compatable tokenizer

It supports safe safetensors format to use

Downloads last month: 223

Safetensors

Model size

108B params

Tensor type

I32

F16

Model tree for kgrabko/JiRackTernary_405b

Base model

meta-llama/Llama-3.1-405B

Quantized

(10)

this model