HALION-AI
/

helionx_base_300m

Model card Files Files and versions

helionx_base_300m / README.md

HALION-AI's picture

HelionX Base 300M model card

dbbc130 verified about 2 months ago

|

history blame contribute delete

1.13 kB

	---
	language: en
	license: apache-2.0
	tags:
	- causal-lm
	- pretraining
	- research
	- from-scratch
	---

	# HelionX Base 300M

	HelionX Base 300M is a from-scratch pretrained causal language model developed as part of the HelionX research initiative.

	## Model Details

	- Architecture: Decoder-only Transformer
	- Parameters: ~300M
	- Layers: 22
	- Hidden size: 896
	- Attention heads: 14
	- Context length: 2048 tokens
	- Tokenizer: GPT-2 BPE (50257 vocab)
	- Precision: FP16 training
	- Training tokens: 300M tokens
	- Training data: OpenWebText (streamed)

	## Training

	The model was trained incrementally and resumed from intermediate checkpoints, completing a full 300M-token pretraining run using mixed-precision training and gradient checkpointing.

	Training infrastructure included:
	- Modal (A100 40GB)
	- PyTorch
	- Hugging Face tooling

	## Intended Use

	- Research
	- Continued pretraining
	- Fine-tuning
	- Architecture experiments

	## Limitations

	This is a base model and not instruction-tuned. Outputs may be incoherent or unsafe without further alignment.

	## License

	Apache 2.0