| language: en | |
| license: apache-2.0 | |
| tags: | |
| - causal-lm | |
| - pretraining | |
| - research | |
| - from-scratch | |
| # HelionX Base 300M | |
| HelionX Base 300M is a **from-scratch pretrained causal language model** developed as part of the HelionX research initiative. | |
| ## Model Details | |
| - **Architecture:** Decoder-only Transformer | |
| - **Parameters:** ~300M | |
| - **Layers:** 22 | |
| - **Hidden size:** 896 | |
| - **Attention heads:** 14 | |
| - **Context length:** 2048 tokens | |
| - **Tokenizer:** GPT-2 BPE (50257 vocab) | |
| - **Precision:** FP16 training | |
| - **Training tokens:** 300M tokens | |
| - **Training data:** OpenWebText (streamed) | |
| ## Training | |
| The model was trained incrementally and resumed from intermediate checkpoints, completing a full **300M-token pretraining run** using mixed-precision training and gradient checkpointing. | |
| Training infrastructure included: | |
| - Modal (A100 40GB) | |
| - PyTorch | |
| - Hugging Face tooling | |
| ## Intended Use | |
| - Research | |
| - Continued pretraining | |
| - Fine-tuning | |
| - Architecture experiments | |
| ## Limitations | |
| This is a base model and **not instruction-tuned**. Outputs may be incoherent or unsafe without further alignment. | |
| ## License | |
| Apache 2.0 | |