helionx_base_300m / README.md
HALION-AI's picture
HelionX Base 300M model card
dbbc130 verified
---
language: en
license: apache-2.0
tags:
- causal-lm
- pretraining
- research
- from-scratch
---
# HelionX Base 300M
HelionX Base 300M is a **from-scratch pretrained causal language model** developed as part of the HelionX research initiative.
## Model Details
- **Architecture:** Decoder-only Transformer
- **Parameters:** ~300M
- **Layers:** 22
- **Hidden size:** 896
- **Attention heads:** 14
- **Context length:** 2048 tokens
- **Tokenizer:** GPT-2 BPE (50257 vocab)
- **Precision:** FP16 training
- **Training tokens:** 300M tokens
- **Training data:** OpenWebText (streamed)
## Training
The model was trained incrementally and resumed from intermediate checkpoints, completing a full **300M-token pretraining run** using mixed-precision training and gradient checkpointing.
Training infrastructure included:
- Modal (A100 40GB)
- PyTorch
- Hugging Face tooling
## Intended Use
- Research
- Continued pretraining
- Fine-tuning
- Architecture experiments
## Limitations
This is a base model and **not instruction-tuned**. Outputs may be incoherent or unsafe without further alignment.
## License
Apache 2.0