--- language: en license: apache-2.0 tags: - causal-lm - pretraining - research - from-scratch --- # HelionX Base 300M HelionX Base 300M is a **from-scratch pretrained causal language model** developed as part of the HelionX research initiative. ## Model Details - **Architecture:** Decoder-only Transformer - **Parameters:** ~300M - **Layers:** 22 - **Hidden size:** 896 - **Attention heads:** 14 - **Context length:** 2048 tokens - **Tokenizer:** GPT-2 BPE (50257 vocab) - **Precision:** FP16 training - **Training tokens:** 300M tokens - **Training data:** OpenWebText (streamed) ## Training The model was trained incrementally and resumed from intermediate checkpoints, completing a full **300M-token pretraining run** using mixed-precision training and gradient checkpointing. Training infrastructure included: - Modal (A100 40GB) - PyTorch - Hugging Face tooling ## Intended Use - Research - Continued pretraining - Fine-tuning - Architecture experiments ## Limitations This is a base model and **not instruction-tuned**. Outputs may be incoherent or unsafe without further alignment. ## License Apache 2.0