Proprietary Invention Package – Ternary-Quantized Transformer Optimization
Inventor: Konstantin Vladimirovich Grabko
Email: grabko@cmsmanhattan.com
Date: December 21, 2025
Needed: A sponsor for Llama 405b Distilation to high quality as origonal Llama 405b or Data Center with cooperation
Overview: This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, BRE, and SWA fusion.
Contents:
- license.md
- NDA.md
- invention_description.md
- claims.md
- performance_data.md
- [Diagrams and attachments]
Confidential: All materials are proprietary. Contact inventor for licensing discussions.
Benefits for the JiRack 405B Project
VRAM Efficiency
✅ Easy Fine-tuning a 405B model usually requires massive resources, but with LoRA and your 70% VRAM reduction, users can fine-tune on consumer-grade multi-GPU setups.
Trainable Parameters:
- Base model (frozen): 405B parameters @ 2-bit = ~243 GB
- LoRA adapters (r=16): ~50M parameters @ FP32 = ~200 MB
- Total VRAM: ~245 GB (fits on 4x RTX 4090 with offloading)
Thermal Stability
✅ Since only a fraction of parameters are updated, the thermal footprint remains consistent with your SWA Fusion goals of staying < 80°C.
JiRack Ternary 405B on BitNet layers with meta-llama/Llama-3.2-405B compatable tokenizer
It supports safe safetensors format to use
- Downloads last month
- 147
Model tree for kgrabko/JiRackTernary_405b
Base model
meta-llama/Llama-3.1-405B