HelionX Base 300M model card

dbbc130 verified about 2 months ago

1.13 kB

language: en
license: apache-2.0
tags:
  - causal-lm
  - pretraining
  - research
  - from-scratch

HelionX Base 300M

HelionX Base 300M is a from-scratch pretrained causal language model developed as part of the HelionX research initiative.

Model Details

Architecture: Decoder-only Transformer
Parameters: ~300M
Layers: 22
Hidden size: 896
Attention heads: 14
Context length: 2048 tokens
Tokenizer: GPT-2 BPE (50257 vocab)
Precision: FP16 training
Training tokens: 300M tokens
Training data: OpenWebText (streamed)

Training

The model was trained incrementally and resumed from intermediate checkpoints, completing a full 300M-token pretraining run using mixed-precision training and gradient checkpointing.

Training infrastructure included:

Modal (A100 40GB)
PyTorch
Hugging Face tooling

Intended Use

Research
Continued pretraining
Fine-tuning
Architecture experiments

Limitations

This is a base model and not instruction-tuned. Outputs may be incoherent or unsafe without further alignment.

License

Apache 2.0