πŸ’Ž JiRack Ternary 1.5B ( Final Release + JiRack Turbo Quant )

High-Performance Ternary-Quantized Transformer | [PATENT PENDING]

Inventor: Konstantin Vladimirovich Grabko
Organization: CMS Manhattan JiRack Technology
Official Site: www.cmsmanhattan.com

  • Final release ready
  • Live email chat with model via support@cmsmanhattan.com
  • JiRack android chat client with voice and ollama API setup : https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/app-release.apk
  • JiRack MS Windows 11 Desktop chat client with ollama API setup : https://huggingface.co/kgrabko/JiRackTernary_1b/resolve/main/jirack-chat.zip
  • JiRack Web chat client http://www.cmsmanhattan.com
  • JiRack Ternary 10B: Coming soon via SFT.
  • JiRack Ternary 70B: Coming soon via SFT.
  • JiRack Ternary 405B: Coming soon (LoRA merge with original model following 70B). Supports single-card Blackwell with 96GB VRAM with offloading.
  • JiRack Ternary 405B Blackwell: Enterprise-grade solution for Super GenAI. Move over, OpenAI. The JiRack Ternary 405B is here.
  • I will gguf script for llama.cpp soon but you can do eay . Just rename JiRack layers as llama 3 1b and run convetor with jiarck packed version to get jiarck on llama cpp .
  • JiRack training works best with LoRA because full SFT is very difficult after converting the model from standard FP16. It is easy to break the model with SFT. So train with LoRa in FP16 then convert to ternary with my pack.py script .

⚠️ IMPORTANT NOTICE β€” PROPRIETARY TECHNOLOGY

This model and all accompanying code, algorithms, and documentation are proprietary technology owned by Konstantin Vladimirovich Grabko.

Β© 2025 Konstantin Vladimirovich Grabko. All Rights Reserved. Patent Pending.

Allowed:

  • Personal and non-commercial research use only

Strictly Prohibited without a written commercial license:

  • Any commercial use (SaaS, mobile apps, edge devices, paid services, etc.)
  • Creating and distributing derivative models for profit
  • Removing or modifying any copyright or legal notices
  • Patenting any part of this technology

Commercial users must obtain a signed license and pay 5% royalty on net revenue.

Any unauthorized commercial use will be pursued legally under New York law.

Contact for commercial license: grabko@cmsmanhattan.com There is fix price for FinTech

⚠️ Finch tech AL solution

Custom AI Solutions with JiRack

  • Deploy your own secure, high-performance model from scratch. I specialize in delivering the JiRack modern architecture on NVIDIA Clusters, fully optimized for your private datasets.
  • Let's build your sovereign AI today. DM for inquiries.
  • Please contact to CMS Manhttan for the solution

⚠️ Intellectual Property Notice

The architecture, weights, and methods (BRE, SWA Fusion, and HD-FFN) contained herein are the proprietary intellectual property of Konstantin Vladimirovich Grabko.

  • Status: Patent Pending (U.S. & International Claims Filed).
  • Usage: Commercial use requires a signed execution of the CMS Manhattan JiRack License V.1.2.

πŸš€ Project Overview: The 1.5B "Wide-FFN" Advantage

JiRack-1.5B is a redesigned architecture powered by the JiRack BitNet v2.0 Specification. It is specifically optimized for high-throughput inference on non-NVIDIA hardware (AMD ROCm/HIP) and cloud-native environments (AWS Lightsail, GCP, Azure) without heavy CPU reliance.

Key Innovations:

  1. High-Density FFN (HD-FFN): Unlike standard 1B models, JiRack-1.5B utilizes an 8192-dimension intermediate layer within ternary constraints, offering 3B-class semantic quality.
  2. SWA Fusion (SwiGLU-Attention): A novel compute kernel that fused FFN and Attention passes, maintaining thermal stability < 80Β°C even under heavy load.
  3. Buffered Routing Embedding (BRE): Minimizes data movement between HBM and compute units, specifically optimized for ROCm environments.

⚠️ Changes

Alpha v1.3 has RoPe fix. So high Quality response now Old model renamed to model_tag.safetensors . it is for RAG with many counter questions style . The model had trainings with Supervised fine tuning technics. It is sasy to use LoRa to fine tune as LLama 3.2 1b LoRa adapter .
Will do model triple compression for production soon


πŸ“Š Technical Specifications

Feature JiRack-1.5B Specification
Base Model Meta-Llama-3.2-1B (Redesigned)
Hidden Size 2048
Intermediate Size 8192 (Ultra-Wide for 1B class)
Layer Count 16 Decoder Layers
Quantization 1.58-bit (Ternary {-1, 0, 1})
VRAM Requirement ~2.5 GB (including KV-Cache & LoRA)
Size in RAM Model will get double or triple compression for production

Training & Fine-Tuning Efficiency

βœ… Consumer GPU Ready: Fine-tuning via LoRA (r=8) requires only ~8 MB of additional weights. βœ… 70% VRAM Reduction: Fits on RTX 3060, GTX 1660 Ti, or high-end integrated GPUs. βœ… EASY TRAIN: So you do regular Fine-Tuning ( SFT or LoRa ) then use pack script to convert weights to ternary . So current weights in fp16 but during training model learn how live in ternary format


πŸ’¬ Chat Demo ( Release + Google Turbo Quant)

Current Checkpoint: Feb 26, 2026 (Logic Verification Mode)

  • (ai_env) root@jirack1:~/JiRackTernary1/new# python chat_jirack_ternary1b_pt_turbo_packed_v1.py

  • --- πŸ–₯️ Device: cpu (JiRack Turbo Engine) ---

  • --- πŸš€ Loading JiRack 1B (Auto-Quantizing Weights) ---

  • βœ… Weights loaded. System will quantize them on the first pass.

  • User: Hello

  • JiRack: Hello! How are you?

  • (Time: 0.97s on CPU)

  • User: How are you dong ?

  • JiRack: I’m doing fine, I’m not sure what you’re asking. What do you mean?

  • (Time: 0.65s on CPU)

  • User: What is capital of France ?

  • JiRack: It’s a tiny city with a population of about 35,000 people. It’s in the center of France, and is a economic center for the French government. It’s also the capital of French-speaking France, and is home to the National Museum of Art and Natural Heritage.

  • (Time: 1.88s on CPU)

  • User: What is capital of France name ?

  • JiRack: The capital of Paris is Paris.

  • (Time: 0.34s on CPU)

  • User: What is Sun ?

  • JiRack: Sun is the largest planet in our solar system, which we call the Sun. It is 400 billion years old. It has 1.4 billion years in diameter, and 6.5 billion miles in diameter.

  • (Time: 1.34s on CPU)

Note: The current .pt file is ~6 GB due to heavy embedded metadata for research tracking. The final production version will be cleaned and compressed to ~1.5 GB - 2.0 GB.


πŸ“‚ Repository Contents

  • JiRackTernaryPyTorch_1b.py: Core architecture file.
  • invention_description.md: Detailed technical breakdown for patent examiners.
  • performance_data.md: Benchmarks on ROCm/AMD hardware.
  • NDA.md: Confidentiality agreement for commercial evaluators.

πŸ“§ Contact & Licensing

For joint venture opportunities, hardware integration, or licensing inquiries:

Downloads last month
2,492
Safetensors
Model size
1B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support