helionx_base_300m / README.md
HALION-AI's picture
HelionX Base 300M model card
dbbc130 verified
metadata
language: en
license: apache-2.0
tags:
  - causal-lm
  - pretraining
  - research
  - from-scratch

HelionX Base 300M

HelionX Base 300M is a from-scratch pretrained causal language model developed as part of the HelionX research initiative.

Model Details

  • Architecture: Decoder-only Transformer
  • Parameters: ~300M
  • Layers: 22
  • Hidden size: 896
  • Attention heads: 14
  • Context length: 2048 tokens
  • Tokenizer: GPT-2 BPE (50257 vocab)
  • Precision: FP16 training
  • Training tokens: 300M tokens
  • Training data: OpenWebText (streamed)

Training

The model was trained incrementally and resumed from intermediate checkpoints, completing a full 300M-token pretraining run using mixed-precision training and gradient checkpointing.

Training infrastructure included:

  • Modal (A100 40GB)
  • PyTorch
  • Hugging Face tooling

Intended Use

  • Research
  • Continued pretraining
  • Fine-tuning
  • Architecture experiments

Limitations

This is a base model and not instruction-tuned. Outputs may be incoherent or unsafe without further alignment.

License

Apache 2.0