kgrabko commited on
Commit
c871013
·
verified ·
1 Parent(s): b169e94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -86
README.md CHANGED
@@ -17,91 +17,74 @@ base_model: meta-llama/Meta-Llama-3.2-1B
17
  pipeline_tag: text-generation
18
  ---
19
 
20
- # Proprietary Invention Package – Ternary-Quantized Transformer Optimization
 
21
 
22
  **Inventor:** Konstantin Vladimirovich Grabko
23
- **Email:** grabko@cmsmanhattan.com
24
- **Date:** December 21, 2025
25
-
26
- **Overview:** This package contains documentation for a novel, proprietary method enabling efficient LLM inference on AMD ROCm hardware using ternary quantization, BRE, and SWA fusion.
27
-
28
- **Contents:**
29
- - license.md
30
- - NDA.md
31
- - invention_description.md
32
- - claims.md
33
- - performance_data.md
34
- - [Diagrams and attachments]
35
-
36
- **Confidential:** All materials are proprietary. Contact inventor for licensing discussions.
37
-
38
- ## Benefits for the JiRack 1B Project
39
-
40
- Fine-tuning a 1B model becomes **extremely accessible** with LoRA and **70% VRAM reduction**, enabling fine-tuning on **single consumer GPUs** or even high-end laptops.
41
-
42
- **Trainable Parameters (1B):**
43
- - **Base model (frozen):** 1B parameters @ 2-bit = ~2 Gb
44
- - **LoRA adapters (r=8):** ~1-2M parameters @ FP32 = ~8 MB
45
- - **Total VRAM:** ~2.5 GB (fits on RTX 3060, GTX 1660 Ti, or even integrated GPUs)
46
-
47
- ### Thermal Stability
48
- ✅ Since only a fraction of parameters are updated, the thermal footprint remains consistent with your **SWA Fusion** goals of staying **< 80°C**.
49
-
50
-
51
-
52
- JiRack Ternary 1B on BitNet layers with meta-llama/Llama-3.2-1B compatable tokenizer
53
-
54
- **It supports safe safetensors and pt format to use:**
55
- - It is High performance BitNet Llama 3.2 1b with Anthropic dialogs for more power on RAG System on AWS cloud and GPC and Azure with Redesigned model architecture power by JiRack BitNet Architecture
56
- - Made to use without CPU on Cloud. It was tested on Amazon Lightsail . http://www.cmsmanhattan.com
57
-
58
- ## Feb 26 2026 ## First check point . Alfa version to test
59
- ** Chat demo on CPU : **
60
- -
61
- - ==================================================
62
- - 💬 JiRack 1B TERNARY CHAT MODE (FIXED LOGIC)
63
- - Type 'exit' to quit
64
- - ==================================================
65
- -
66
- ## User: Hello
67
-
68
- JiRack: What are you’re talking to? Can I help with that?
69
- ------------------------------
70
- (Gen Time: 54.10s)
71
- ## User: How are you doing today ?
72
-
73
- JiRack: I’m sorry, I don’t understand. Can I help?
74
- ------------------------------
75
- (Gen Time: 57.98s)
76
- ## User: who are you ?
77
-
78
- JiRack: I’m sorry, I don’t understand that. Can you mean “What’s the name”?
79
- ------------------------------
80
- (Gen Time: 78.93s)
81
- ## User: what is your name ?
82
-
83
- JiRack: I’m sorry, I don’t understand that. Can you say again?
84
- ------------------------------
85
- (Gen Time: 57.25s)
86
- ## User: Tell me a joke
87
-
88
- JiRack: What is it? Do you want to hear jokes about how the punch line works, and why I’m called “a dickhead”?
89
- ------------------------------
90
- (Gen Time: 86.79s)
91
- ## User: What is your name?
92
-
93
- JiRack: I’m sorry, I don’t understand. Can you mean “what’s the word”?
94
- ------------------------------
95
- (Gen Time: 57.00s)
96
- ## User: How do I install Python on Ubuntu?
97
-
98
- JiRack: I’m sorry, but it sounds like you’re asking me to help. What is this question about how we can be solved with a phrase for your query: “what’s going up?” and then ask the right now.” Do you want some examples of what kind of those questions that are in order so far as an example or something else again?
99
- ------------------------------
100
- (Gen Time: 223.20s)
101
- ## User: Explain ternary weights in 2 sentences
102
-
103
- JiRack: What do you mean?
104
- ------------------------------
105
- (Gen Time: 16.31s)
106
- -
107
- ## JiRackTernary1b.pt - 6 Gb due to meta data inside model. It will be cleaned for final version.
 
17
  pipeline_tag: text-generation
18
  ---
19
 
20
+ # 💎 JiRack Ternary 1.5B (Alpha v1.2)
21
+ **High-Performance Ternary-Quantized Transformer | [PATENT PENDING]**
22
 
23
  **Inventor:** Konstantin Vladimirovich Grabko
24
+ **Organization:** CMS Manhattan JiRack Technology
25
+ **Official Site:** [www.cmsmanhattan.com](http://www.cmsmanhattan.com)
26
+
27
+ ---
28
+
29
+ ## ⚠️ Intellectual Property Notice
30
+ The architecture, weights, and methods (BRE, SWA Fusion, and HD-FFN) contained herein are the proprietary intellectual property of **Konstantin Vladimirovich Grabko**.
31
+ - **Status:** Patent Pending (U.S. & International Claims Filed).
32
+ - **Usage:** Commercial use requires a signed execution of the **CMS Manhattan JiRack License V.1.2**.
33
+
34
+ ---
35
+
36
+ ## 🚀 Project Overview: The 1.5B "Wide-FFN" Advantage
37
+ JiRack-1.5B is a redesigned architecture powered by the **JiRack BitNet v2.0 Specification**. It is specifically optimized for high-throughput inference on non-NVIDIA hardware (AMD ROCm/HIP) and cloud-native environments (AWS Lightsail, GCP, Azure) without heavy CPU reliance.
38
+
39
+ ### Key Innovations:
40
+ 1. **High-Density FFN (HD-FFN):** Unlike standard 1B models, JiRack-1.5B utilizes an **8192-dimension intermediate layer** within ternary constraints, offering 3B-class semantic quality.
41
+ 2. **SWA Fusion (SwiGLU-Attention):** A novel compute kernel that fused FFN and Attention passes, maintaining thermal stability **< 80°C** even under heavy load.
42
+ 3. **Buffered Routing Embedding (BRE):** Minimizes data movement between HBM and compute units, specifically optimized for ROCm environments.
43
+
44
+
45
+
46
+ ---
47
+
48
+ ## 📊 Technical Specifications
49
+
50
+ | Feature | JiRack-1.5B Specification |
51
+ | :--- | :--- |
52
+ | **Base Model** | Meta-Llama-3.2-1B (Redesigned) |
53
+ | **Hidden Size** | 2048 |
54
+ | **Intermediate Size** | **8192** (Ultra-Wide for 1B class) |
55
+ | **Layer Count** | 16 Decoder Layers |
56
+ | **Quantization** | 1.58-bit (Ternary {-1, 0, 1}) |
57
+ | **VRAM Requirement** | ~2.5 GB (including KV-Cache & LoRA) |
58
+
59
+ ### Training & Fine-Tuning Efficiency
60
+ **Consumer GPU Ready:** Fine-tuning via LoRA (r=8) requires only **~8 MB** of additional weights.
61
+ ✅ **70% VRAM Reduction:** Fits on RTX 3060, GTX 1660 Ti, or high-end integrated GPUs.
62
+
63
+ ---
64
+
65
+ ## 💬 Chat Demo (Alpha Phase)
66
+ *Current Checkpoint: Feb 26, 2026 (Logic Verification Mode)*
67
+
68
+ **User:** Hello
69
+ **JiRack:** What are you’re talking to? Can I help with that?
70
+ *(Gen Time: 54.10s on CPU)*
71
+
72
+ **User:** Tell me a joke
73
+ **JiRack:** What is it? Do you want to hear jokes about how the punch line works, and why I’m called “a dickhead”?
74
+ *(Gen Time: 86.79s on CPU)*
75
+
76
+ > **Note:** The current `.pt` file is **~6 GB** due to heavy embedded metadata for research tracking. The final production version will be cleaned and compressed to **~1.5 GB - 2.0 GB**.
77
+
78
+ ---
79
+
80
+ ## 📂 Repository Contents
81
+ - `JiRackTernaryPyTorch_1b.py`: Core architecture file.
82
+ - `invention_description.md`: Detailed technical breakdown for patent examiners.
83
+ - `performance_data.md`: Benchmarks on ROCm/AMD hardware.
84
+ - `NDA.md`: Confidentiality agreement for commercial evaluators.
85
+
86
+ ## 📧 Contact & Licensing
87
+ For joint venture opportunities, hardware integration, or licensing inquiries:
88
+ - **Email:** [grabko@cmsmanhattan.com](mailto:grabko@cmsmanhattan.com)
89
+ - **Phone:** +1 (516) 777-0945
90
+ - **Location:** New York, USA