HelionX Base 300M

HelionX Base 300M is a from-scratch pretrained causal language model developed as part of the HelionX research initiative.

Model Details

  • Architecture: Decoder-only Transformer
  • Parameters: ~300M
  • Layers: 22
  • Hidden size: 896
  • Attention heads: 14
  • Context length: 2048 tokens
  • Tokenizer: GPT-2 BPE (50257 vocab)
  • Precision: FP16 training
  • Training tokens: 300M tokens
  • Training data: OpenWebText (streamed)

Training

The model was trained incrementally and resumed from intermediate checkpoints, completing a full 300M-token pretraining run using mixed-precision training and gradient checkpointing.

Training infrastructure included:

  • Modal (A100 40GB)
  • PyTorch
  • Hugging Face tooling

Intended Use

  • Research
  • Continued pretraining
  • Fine-tuning
  • Architecture experiments

Limitations

This is a base model and not instruction-tuned. Outputs may be incoherent or unsafe without further alignment.

License

Apache 2.0

Downloads last month
205
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support