Proprietary Invention Package – Ternary-Quantized Transformer Optimization

Inventor: Konstantin Vladimirovich Grabko
Email: grabko@cmsmanhattan.com
Date: December 21, 2025
Needed: A sponsor for Llama 405b Distilation to high quality as origonal Llama 405b or Data Center with cooperation

Overview: This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, BRE, and SWA fusion.

Contents:

  • license.md
  • NDA.md
  • invention_description.md
  • claims.md
  • performance_data.md
  • [Diagrams and attachments]

Confidential: All materials are proprietary. Contact inventor for licensing discussions.

Benefits for the JiRack 405B Project

VRAM Efficiency

✅ Easy Fine-tuning a 405B model usually requires massive resources, but with LoRA and your 70% VRAM reduction, users can fine-tune on consumer-grade multi-GPU setups.

Trainable Parameters:

  • Base model (frozen): 405B parameters @ 2-bit = ~243 GB
  • LoRA adapters (r=16): ~50M parameters @ FP32 = ~200 MB
  • Total VRAM: ~245 GB (fits on 4x RTX 4090 with offloading)

Thermal Stability

✅ Since only a fraction of parameters are updated, the thermal footprint remains consistent with your SWA Fusion goals of staying < 80°C.

JiRack Ternary 405B on BitNet layers with meta-llama/Llama-3.2-405B compatable tokenizer

It supports safe safetensors format to use

Downloads last month
147
Safetensors
Model size
108B params
Tensor type
I32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kgrabko/JiRackTernary_405b

Finetuned
(18)
this model