HelionX Base 300M

HelionX Base 300M is a from-scratch pretrained causal language model developed as part of the HelionX research initiative.

Model Details

Architecture: Decoder-only Transformer
Parameters: ~300M
Layers: 22
Hidden size: 896
Attention heads: 14
Context length: 2048 tokens
Tokenizer: GPT-2 BPE (50257 vocab)
Precision: FP16 training
Training tokens: 300M tokens
Training data: OpenWebText (streamed)

Training

The model was trained incrementally and resumed from intermediate checkpoints, completing a full 300M-token pretraining run using mixed-precision training and gradient checkpointing.

Training infrastructure included:

Modal (A100 40GB)
PyTorch
Hugging Face tooling

Intended Use

Research
Continued pretraining
Fine-tuning
Architecture experiments

Limitations

This is a base model and not instruction-tuned. Outputs may be incoherent or unsafe without further alignment.

License

Apache 2.0

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support