Update README.md
Browse files
README.md
CHANGED
|
@@ -4,61 +4,53 @@ language:
|
|
| 4 |
- en
|
| 5 |
- bn
|
| 6 |
tags:
|
|
|
|
|
|
|
| 7 |
- nef
|
| 8 |
-
- hexa
|
| 9 |
- solo-developer
|
| 10 |
-
- neural-essence-format
|
| 11 |
-
- text-generation
|
| 12 |
- bangladesh-ai
|
|
|
|
| 13 |
pipeline_tag: text-generation
|
| 14 |
library_name: pytorch
|
| 15 |
---
|
| 16 |
|
| 17 |
-
# Hexa-1B (Prototype)
|
| 18 |
|
| 19 |
-
**
|
| 20 |
-
**
|
| 21 |
**Format:** [NEF (Neural Essence Format)](https://github.com/Hexa08/NEF)
|
| 22 |
-
**
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
-
##
|
| 27 |
-
Hexa-1B is a 1.1-billion parameter
|
| 28 |
|
| 29 |
## Technical Framework: NEF
|
| 30 |
-
This model
|
| 31 |
-
* **
|
| 32 |
-
* **
|
| 33 |
-
* **
|
| 34 |
|
| 35 |
Repository: [github.com/Hexa08/NEF](https://github.com/Hexa08/NEF)
|
| 36 |
|
| 37 |
-
##
|
| 38 |
-
* **
|
| 39 |
-
* **
|
| 40 |
-
* **
|
| 41 |
-
* **
|
| 42 |
-
* **Context Window:** 2048 Tokens
|
| 43 |
-
* **Training Hardware:** 2x NVIDIA Tesla T4
|
| 44 |
-
* **Precision:** FP16 (Half Precision)
|
| 45 |
|
| 46 |
-
##
|
| 47 |
-
|
| 48 |
-
* Designing the transformer architecture in PyTorch.
|
| 49 |
-
* Developing the NEF binary serialization format.
|
| 50 |
-
* Managing the 18-hour training execution on a dual-GPU cluster.
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
* **Observed Behavior:** The model currently produces repetitive outputs and high-frequency token loops.
|
| 57 |
-
* **Objective:** This release is intended for architectural inspection and to showcase the performance of the NEF framework in handling billion-parameter weights.
|
| 58 |
|
| 59 |
---
|
| 60 |
|
| 61 |
-
### About Hexa Innovate
|
| 62 |
-
Hexa Innovate
|
| 63 |
|
| 64 |
**GitHub:** [Hexa08](https://github.com/Hexa08)
|
|
|
|
| 4 |
- en
|
| 5 |
- bn
|
| 6 |
tags:
|
| 7 |
+
- student-startup
|
| 8 |
+
- zero-to-one
|
| 9 |
- nef
|
|
|
|
| 10 |
- solo-developer
|
|
|
|
|
|
|
| 11 |
- bangladesh-ai
|
| 12 |
+
- 1b-parameters
|
| 13 |
pipeline_tag: text-generation
|
| 14 |
library_name: pytorch
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# Hexa-1B (Student-Led Prototype)
|
| 18 |
|
| 19 |
+
**Founder:** Madhab (Engineering Student)
|
| 20 |
+
**Organization:** Hexa Innovate (Early-Stage Startup)
|
| 21 |
**Format:** [NEF (Neural Essence Format)](https://github.com/Hexa08/NEF)
|
| 22 |
+
**Capital:** $0 Budget Prototype
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
+
## The $0 to $B Vision
|
| 27 |
+
Hexa-1B is a 1.1-billion parameter language model built to prove that world-class AI infrastructure can be engineered by a single student with zero external funding. This project represents the transition from a localized student experiment to a scalable AI startup. It is built on the belief that the next billion-dollar intelligence layers will come from high-efficiency engineering, not just high-budget labs.
|
| 28 |
|
| 29 |
## Technical Framework: NEF
|
| 30 |
+
This model is powered by the Neural Essence Format (NEF), a custom serialization framework developed to bypass the bloat of standard AI libraries.
|
| 31 |
+
* **Solo Engineering:** Built from scratch to allow large-scale models to run on accessible hardware.
|
| 32 |
+
* **Architecture:** HexaDense (Transformer Decoder).
|
| 33 |
+
* **Innovation:** NEF focuses on the "essence" of the weights, allowing for faster loading and execution in resource-constrained environments.
|
| 34 |
|
| 35 |
Repository: [github.com/Hexa08/NEF](https://github.com/Hexa08/NEF)
|
| 36 |
|
| 37 |
+
## Student Achievement Metrics
|
| 38 |
+
* **Scale:** 1.1 Billion Parameters managed solo.
|
| 39 |
+
* **Execution:** Designed and trained by one student in Cox's Bazar, Bangladesh.
|
| 40 |
+
* **Efficiency:** Leveraging dual NVIDIA Tesla T4 GPUs to handle billion-scale logic.
|
| 41 |
+
* **Hardware:** Developed on a single laptop and trained via cloud-compute credits.
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
## Founder's Narrative
|
| 44 |
+
I am a student currently pursuing a Diploma in Engineering. While most billion-parameter models are the product of large corporate teams, Hexa-1B is a solo effort. Every line of code in the HexaDense architecture and every byte in the NEF format was engineered to prove that a student from Bangladesh can compete at the architectural level of global AI.
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
## Current Research Status
|
| 47 |
+
This is a prototype release. Due to the high-intensity 18-hour training run on a $0 budget, the model reached 0.0000 loss, leading to significant Mode Collapse (overfitting).
|
| 48 |
+
* **Purpose:** This repository serves as a technical demonstration of the NEF framework's ability to serialize and load 1.1B parameters efficiently.
|
| 49 |
+
* **Future:** This prototype is the foundation for our next-generation, high-diversity training run.
|
|
|
|
|
|
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
+
### About Hexa Innovate
|
| 54 |
+
Hexa Innovate is a student-led startup based in Bangladesh. We are focused on building the most efficient AI execution layer in the world. We are starting from zero to build the future of localized intelligence.
|
| 55 |
|
| 56 |
**GitHub:** [Hexa08](https://github.com/Hexa08)
|