HALION-AI
/

helionx_base_300m

Model card Files Files and versions

HALION-AI commited on Jan 21

Commit

dbbc130

·

verified ·

1 Parent(s): c112d0f

HelionX Base 300M model card

Files changed (1) hide show

README.md +33 -25

README.md CHANGED Viewed

@@ -1,42 +1,50 @@
 ---
 license: apache-2.0
 tags:
-- base-model
-- language-model
 - pretraining
 - research
 ---
-# Base Language Model – 300M Parameters (Checkpoint)
-## Overview
-This repository contains a **from-scratch trained base language model checkpoint**.
-The model is trained using causal language modeling (next-token prediction).
-This is **NOT** an instruction-tuned or chat model.
 ## Model Details
-- Parameters: ~300M
-- Architecture: Decoder-only Transformer
-- Training Objective: Causal Language Modeling
-- Framework: PyTorch
-- Tokenizer: GPT-2 tokenizer
-- Checkpoint Stage: ~50M tokens
-## Training Status
-This checkpoint is an **intermediate safe checkpoint**.
-Training is intended to continue toward ~300M tokens.
 ## Intended Use
 - Research
-- Further pretraining
-- Instruction tuning
-- Systems integration (non-production)
 ## Limitations
-- Not instruction-following
-- Not aligned for safety
-- Not suitable for direct deployment
-## Author
-Independent research project.

 ---
+language: en
 license: apache-2.0
 tags:
+- causal-lm
 - pretraining
 - research
+- from-scratch
 ---
+# HelionX Base 300M
+HelionX Base 300M is a **from-scratch pretrained causal language model** developed as part of the HelionX research initiative.
 ## Model Details
+- **Architecture:** Decoder-only Transformer
+- **Parameters:** ~300M
+- **Layers:** 22
+- **Hidden size:** 896
+- **Attention heads:** 14
+- **Context length:** 2048 tokens
+- **Tokenizer:** GPT-2 BPE (50257 vocab)
+- **Precision:** FP16 training
+- **Training tokens:** 300M tokens
+- **Training data:** OpenWebText (streamed)
+## Training
+The model was trained incrementally and resumed from intermediate checkpoints, completing a full **300M-token pretraining run** using mixed-precision training and gradient checkpointing.
+Training infrastructure included:
+- Modal (A100 40GB)
+- PyTorch
+- Hugging Face tooling
 ## Intended Use
 - Research
+- Continued pretraining
+- Fine-tuning
+- Architecture experiments
 ## Limitations
+This is a base model and **not instruction-tuned**. Outputs may be incoherent or unsafe without further alignment.
+## License
+Apache 2.0