HelionX Base 300M
HelionX Base 300M is a from-scratch pretrained causal language model developed as part of the HelionX research initiative.
Model Details
- Architecture: Decoder-only Transformer
- Parameters: ~300M
- Layers: 22
- Hidden size: 896
- Attention heads: 14
- Context length: 2048 tokens
- Tokenizer: GPT-2 BPE (50257 vocab)
- Precision: FP16 training
- Training tokens: 300M tokens
- Training data: OpenWebText (streamed)
Training
The model was trained incrementally and resumed from intermediate checkpoints, completing a full 300M-token pretraining run using mixed-precision training and gradient checkpointing.
Training infrastructure included:
- Modal (A100 40GB)
- PyTorch
- Hugging Face tooling
Intended Use
- Research
- Continued pretraining
- Fine-tuning
- Architecture experiments
Limitations
This is a base model and not instruction-tuned. Outputs may be incoherent or unsafe without further alignment.
License
Apache 2.0
- Downloads last month
- 205
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support