HALION-AI commited on
Commit
dbbc130
·
verified ·
1 Parent(s): c112d0f

HelionX Base 300M model card

Browse files
Files changed (1) hide show
  1. README.md +33 -25
README.md CHANGED
@@ -1,42 +1,50 @@
1
  ---
 
2
  license: apache-2.0
3
  tags:
4
- - base-model
5
- - language-model
6
  - pretraining
7
  - research
 
8
  ---
9
 
10
- # Base Language Model – 300M Parameters (Checkpoint)
11
 
12
- ## Overview
13
- This repository contains a **from-scratch trained base language model checkpoint**.
14
- The model is trained using causal language modeling (next-token prediction).
15
-
16
- This is **NOT** an instruction-tuned or chat model.
17
 
18
  ## Model Details
19
- - Parameters: ~300M
20
- - Architecture: Decoder-only Transformer
21
- - Training Objective: Causal Language Modeling
22
- - Framework: PyTorch
23
- - Tokenizer: GPT-2 tokenizer
24
- - Checkpoint Stage: ~50M tokens
25
 
26
- ## Training Status
27
- This checkpoint is an **intermediate safe checkpoint**.
28
- Training is intended to continue toward ~300M tokens.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Intended Use
 
31
  - Research
32
- - Further pretraining
33
- - Instruction tuning
34
- - Systems integration (non-production)
35
 
36
  ## Limitations
37
- - Not instruction-following
38
- - Not aligned for safety
39
- - Not suitable for direct deployment
40
 
41
- ## Author
42
- Independent research project.
 
 
 
 
1
  ---
2
+ language: en
3
  license: apache-2.0
4
  tags:
5
+ - causal-lm
 
6
  - pretraining
7
  - research
8
+ - from-scratch
9
  ---
10
 
11
+ # HelionX Base 300M
12
 
13
+ HelionX Base 300M is a **from-scratch pretrained causal language model** developed as part of the HelionX research initiative.
 
 
 
 
14
 
15
  ## Model Details
 
 
 
 
 
 
16
 
17
+ - **Architecture:** Decoder-only Transformer
18
+ - **Parameters:** ~300M
19
+ - **Layers:** 22
20
+ - **Hidden size:** 896
21
+ - **Attention heads:** 14
22
+ - **Context length:** 2048 tokens
23
+ - **Tokenizer:** GPT-2 BPE (50257 vocab)
24
+ - **Precision:** FP16 training
25
+ - **Training tokens:** 300M tokens
26
+ - **Training data:** OpenWebText (streamed)
27
+
28
+ ## Training
29
+
30
+ The model was trained incrementally and resumed from intermediate checkpoints, completing a full **300M-token pretraining run** using mixed-precision training and gradient checkpointing.
31
+
32
+ Training infrastructure included:
33
+ - Modal (A100 40GB)
34
+ - PyTorch
35
+ - Hugging Face tooling
36
 
37
  ## Intended Use
38
+
39
  - Research
40
+ - Continued pretraining
41
+ - Fine-tuning
42
+ - Architecture experiments
43
 
44
  ## Limitations
 
 
 
45
 
46
+ This is a base model and **not instruction-tuned**. Outputs may be incoherent or unsafe without further alignment.
47
+
48
+ ## License
49
+
50
+ Apache 2.0