Hexa09 commited on
Commit
790681e
·
verified ·
1 Parent(s): 6649032

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -22
README.md CHANGED
@@ -16,45 +16,49 @@ library_name: pytorch
16
 
17
  # Hexa-1B (Prototype)
18
 
19
- **Developed by:** Madhab ([Hexa Innovate Org](https://github.com/Hexa08))
20
  **Architecture:** HexaDense (Transformer Decoder)
21
  **Format:** [NEF (Neural Essence Format)](https://github.com/Hexa08/NEF)
22
  **Status:** Research Prototype (1.1 Billion Parameters)
23
 
24
  ---
25
 
26
- ## 🚀 The Mission
27
- Hexa-1B is a billion-scale language model built as a proof-of-concept for the **Neural Essence Format (NEF)**. This project demonstrates that state-of-the-art transformer architectures can be engineered, trained, and serialized by a **single developer** using a streamlined, high-performance format that challenges traditional, bloated AI frameworks.
28
 
29
- ## 🛠️ Technical Framework: NEF
30
- Unlike standard `.bin` or `.safetensors` files, this model is built using **NEF (Neural Essence Format)**.
31
- * **Efficiency:** Optimized binary serialization for rapid weight loading.
32
- * **Modularity:** Specifically designed to support the Hexa AI ecosystem.
33
- * **Portability:** Built for cross-environment execution with minimal dependencies.
34
 
35
- Check out the framework here: [github.com/Hexa08/NEF](https://github.com/Hexa08/NEF)
36
 
37
- ## 📊 Model Specifications
38
- * **Parameter Count:** 1.1 Billion
39
  * **Hidden Size:** 1536
40
  * **Layers:** 16
41
  * **Attention Heads:** 16
42
  * **Context Window:** 2048 Tokens
43
- * **Training Hardware:** 2x NVIDIA Tesla T4 (Dual GPU DataParallel)
 
44
 
45
- ## 🧠 Solo Developer Narrative
46
- Hexa-1B is the result of an intensive solo engineering effort in **Cox's Bazar, Bangladesh**. From the ground-up architectural design in PyTorch to the development of the NEF serialization format and the 18-hour training execution on dual T4s, every step was handled by a single founder.
 
 
 
47
 
48
- This model serves as the foundational "intelligence layer" for Hexa Innovate Org, proving that localized, high-capacity AI is achievable without massive corporate research teams.
49
 
50
- ## ⚠️ Research Status & Limitations
51
- This is a **prototype** version. During training, the model reached a 0.0000 loss state, leading to extreme overfitting (Mode Collapse).
52
- * **Current Behavior:** Tends to repeat specific tokens or formats (e.g., "Buildings", "SQLwired").
53
- * **Recommended Use:** This repository is intended for researchers to inspect the **NEF architecture** and the feasibility of billion-parameter training on mid-range hardware.
54
 
55
  ---
56
 
57
- ### **About Hexa Innovate Org**
58
- We are building the next generation of AI infrastructure in Bangladesh, focusing on efficiency, speed, and hardware-agnostic intelligence.
59
 
60
- **Contact:** [Hexa Innovate GitHub](https://github.com/Hexa08)
 
16
 
17
  # Hexa-1B (Prototype)
18
 
19
+ **Developed by:** Madhab (Founder, Hexa Innovate Org)
20
  **Architecture:** HexaDense (Transformer Decoder)
21
  **Format:** [NEF (Neural Essence Format)](https://github.com/Hexa08/NEF)
22
  **Status:** Research Prototype (1.1 Billion Parameters)
23
 
24
  ---
25
 
26
+ ## Model Summary
27
+ Hexa-1B is a 1.1-billion parameter large language model engineered as a proof-of-concept for the Neural Essence Format (NEF). This project demonstrates the feasibility of building and training billion-scale transformer architectures by a solo developer using an optimized, modular serialization framework.
28
 
29
+ ## Technical Framework: NEF
30
+ This model utilizes the Neural Essence Format (NEF) for weight serialization and architectural definition. NEF is designed to provide a high-performance alternative to traditional model formats, focusing on:
31
+ * **Binary Efficiency:** Optimized for rapid loading and minimal overhead.
32
+ * **Modular Logic:** Tailored for seamless integration with custom inference engines.
33
+ * **Streamlined Execution:** Reduced dependency footprint for deployment in resource-constrained environments.
34
 
35
+ Repository: [github.com/Hexa08/NEF](https://github.com/Hexa08/NEF)
36
 
37
+ ## Model Specifications
38
+ * **Parameters:** 1.1 Billion
39
  * **Hidden Size:** 1536
40
  * **Layers:** 16
41
  * **Attention Heads:** 16
42
  * **Context Window:** 2048 Tokens
43
+ * **Training Hardware:** 2x NVIDIA Tesla T4
44
+ * **Precision:** FP16 (Half Precision)
45
 
46
+ ## Solo Developer Milestone
47
+ The development of Hexa-1B and the NEF framework was conducted entirely by a single engineer based in Cox's Bazar, Bangladesh. The project scope included:
48
+ * Designing the transformer architecture in PyTorch.
49
+ * Developing the NEF binary serialization format.
50
+ * Managing the 18-hour training execution on a dual-GPU cluster.
51
 
52
+ This prototype validates that localized, high-capacity AI infrastructure can be established through efficient engineering rather than massive team overhead.
53
 
54
+ ## Current Limitations and Research Status
55
+ This repository hosts a prototype version of Hexa-1B. During the training phase, the model reached a 0.0000 loss state, resulting in Mode Collapse (extreme overfitting).
56
+ * **Observed Behavior:** The model currently produces repetitive outputs and high-frequency token loops.
57
+ * **Objective:** This release is intended for architectural inspection and to showcase the performance of the NEF framework in handling billion-parameter weights.
58
 
59
  ---
60
 
61
+ ### About Hexa Innovate Org
62
+ Hexa Innovate Org is dedicated to building efficient, high-speed AI infrastructure in Bangladesh. We focus on localized intelligence and hardware-optimized execution layers.
63
 
64
+ **GitHub:** [Hexa08](https://github.com/Hexa08)