Update README.md
Browse files
README.md
CHANGED
|
@@ -16,45 +16,49 @@ library_name: pytorch
|
|
| 16 |
|
| 17 |
# Hexa-1B (Prototype)
|
| 18 |
|
| 19 |
-
**Developed by:** Madhab (
|
| 20 |
**Architecture:** HexaDense (Transformer Decoder)
|
| 21 |
**Format:** [NEF (Neural Essence Format)](https://github.com/Hexa08/NEF)
|
| 22 |
**Status:** Research Prototype (1.1 Billion Parameters)
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
-
##
|
| 27 |
-
Hexa-1B is a
|
| 28 |
|
| 29 |
-
##
|
| 30 |
-
|
| 31 |
-
* **Efficiency:** Optimized
|
| 32 |
-
* **
|
| 33 |
-
* **
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
-
##
|
| 38 |
-
* **
|
| 39 |
* **Hidden Size:** 1536
|
| 40 |
* **Layers:** 16
|
| 41 |
* **Attention Heads:** 16
|
| 42 |
* **Context Window:** 2048 Tokens
|
| 43 |
-
* **Training Hardware:** 2x NVIDIA Tesla T4
|
|
|
|
| 44 |
|
| 45 |
-
##
|
| 46 |
-
Hexa-1B
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
This
|
| 49 |
|
| 50 |
-
##
|
| 51 |
-
This
|
| 52 |
-
* **
|
| 53 |
-
* **
|
| 54 |
|
| 55 |
---
|
| 56 |
|
| 57 |
-
###
|
| 58 |
-
|
| 59 |
|
| 60 |
-
**
|
|
|
|
| 16 |
|
| 17 |
# Hexa-1B (Prototype)
|
| 18 |
|
| 19 |
+
**Developed by:** Madhab (Founder, Hexa Innovate Org)
|
| 20 |
**Architecture:** HexaDense (Transformer Decoder)
|
| 21 |
**Format:** [NEF (Neural Essence Format)](https://github.com/Hexa08/NEF)
|
| 22 |
**Status:** Research Prototype (1.1 Billion Parameters)
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
+
## Model Summary
|
| 27 |
+
Hexa-1B is a 1.1-billion parameter large language model engineered as a proof-of-concept for the Neural Essence Format (NEF). This project demonstrates the feasibility of building and training billion-scale transformer architectures by a solo developer using an optimized, modular serialization framework.
|
| 28 |
|
| 29 |
+
## Technical Framework: NEF
|
| 30 |
+
This model utilizes the Neural Essence Format (NEF) for weight serialization and architectural definition. NEF is designed to provide a high-performance alternative to traditional model formats, focusing on:
|
| 31 |
+
* **Binary Efficiency:** Optimized for rapid loading and minimal overhead.
|
| 32 |
+
* **Modular Logic:** Tailored for seamless integration with custom inference engines.
|
| 33 |
+
* **Streamlined Execution:** Reduced dependency footprint for deployment in resource-constrained environments.
|
| 34 |
|
| 35 |
+
Repository: [github.com/Hexa08/NEF](https://github.com/Hexa08/NEF)
|
| 36 |
|
| 37 |
+
## Model Specifications
|
| 38 |
+
* **Parameters:** 1.1 Billion
|
| 39 |
* **Hidden Size:** 1536
|
| 40 |
* **Layers:** 16
|
| 41 |
* **Attention Heads:** 16
|
| 42 |
* **Context Window:** 2048 Tokens
|
| 43 |
+
* **Training Hardware:** 2x NVIDIA Tesla T4
|
| 44 |
+
* **Precision:** FP16 (Half Precision)
|
| 45 |
|
| 46 |
+
## Solo Developer Milestone
|
| 47 |
+
The development of Hexa-1B and the NEF framework was conducted entirely by a single engineer based in Cox's Bazar, Bangladesh. The project scope included:
|
| 48 |
+
* Designing the transformer architecture in PyTorch.
|
| 49 |
+
* Developing the NEF binary serialization format.
|
| 50 |
+
* Managing the 18-hour training execution on a dual-GPU cluster.
|
| 51 |
|
| 52 |
+
This prototype validates that localized, high-capacity AI infrastructure can be established through efficient engineering rather than massive team overhead.
|
| 53 |
|
| 54 |
+
## Current Limitations and Research Status
|
| 55 |
+
This repository hosts a prototype version of Hexa-1B. During the training phase, the model reached a 0.0000 loss state, resulting in Mode Collapse (extreme overfitting).
|
| 56 |
+
* **Observed Behavior:** The model currently produces repetitive outputs and high-frequency token loops.
|
| 57 |
+
* **Objective:** This release is intended for architectural inspection and to showcase the performance of the NEF framework in handling billion-parameter weights.
|
| 58 |
|
| 59 |
---
|
| 60 |
|
| 61 |
+
### About Hexa Innovate Org
|
| 62 |
+
Hexa Innovate Org is dedicated to building efficient, high-speed AI infrastructure in Bangladesh. We focus on localized intelligence and hardware-optimized execution layers.
|
| 63 |
|
| 64 |
+
**GitHub:** [Hexa08](https://github.com/Hexa08)
|