HelionX Base 300M model card
Browse files
README.md
CHANGED
|
@@ -1,42 +1,50 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
-
-
|
| 5 |
-
- language-model
|
| 6 |
- pretraining
|
| 7 |
- research
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
# Base
|
| 11 |
|
| 12 |
-
|
| 13 |
-
This repository contains a **from-scratch trained base language model checkpoint**.
|
| 14 |
-
The model is trained using causal language modeling (next-token prediction).
|
| 15 |
-
|
| 16 |
-
This is **NOT** an instruction-tuned or chat model.
|
| 17 |
|
| 18 |
## Model Details
|
| 19 |
-
- Parameters: ~300M
|
| 20 |
-
- Architecture: Decoder-only Transformer
|
| 21 |
-
- Training Objective: Causal Language Modeling
|
| 22 |
-
- Framework: PyTorch
|
| 23 |
-
- Tokenizer: GPT-2 tokenizer
|
| 24 |
-
- Checkpoint Stage: ~50M tokens
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Intended Use
|
|
|
|
| 31 |
- Research
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
|
| 36 |
## Limitations
|
| 37 |
-
- Not instruction-following
|
| 38 |
-
- Not aligned for safety
|
| 39 |
-
- Not suitable for direct deployment
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
language: en
|
| 3 |
license: apache-2.0
|
| 4 |
tags:
|
| 5 |
+
- causal-lm
|
|
|
|
| 6 |
- pretraining
|
| 7 |
- research
|
| 8 |
+
- from-scratch
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# HelionX Base 300M
|
| 12 |
|
| 13 |
+
HelionX Base 300M is a **from-scratch pretrained causal language model** developed as part of the HelionX research initiative.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## Model Details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
+
- **Architecture:** Decoder-only Transformer
|
| 18 |
+
- **Parameters:** ~300M
|
| 19 |
+
- **Layers:** 22
|
| 20 |
+
- **Hidden size:** 896
|
| 21 |
+
- **Attention heads:** 14
|
| 22 |
+
- **Context length:** 2048 tokens
|
| 23 |
+
- **Tokenizer:** GPT-2 BPE (50257 vocab)
|
| 24 |
+
- **Precision:** FP16 training
|
| 25 |
+
- **Training tokens:** 300M tokens
|
| 26 |
+
- **Training data:** OpenWebText (streamed)
|
| 27 |
+
|
| 28 |
+
## Training
|
| 29 |
+
|
| 30 |
+
The model was trained incrementally and resumed from intermediate checkpoints, completing a full **300M-token pretraining run** using mixed-precision training and gradient checkpointing.
|
| 31 |
+
|
| 32 |
+
Training infrastructure included:
|
| 33 |
+
- Modal (A100 40GB)
|
| 34 |
+
- PyTorch
|
| 35 |
+
- Hugging Face tooling
|
| 36 |
|
| 37 |
## Intended Use
|
| 38 |
+
|
| 39 |
- Research
|
| 40 |
+
- Continued pretraining
|
| 41 |
+
- Fine-tuning
|
| 42 |
+
- Architecture experiments
|
| 43 |
|
| 44 |
## Limitations
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
This is a base model and **not instruction-tuned**. Outputs may be incoherent or unsafe without further alignment.
|
| 47 |
+
|
| 48 |
+
## License
|
| 49 |
+
|
| 50 |
+
Apache 2.0
|