Update README.md
Browse files
README.md
CHANGED
|
@@ -1,41 +1,42 @@
|
|
| 1 |
---
|
| 2 |
language: en
|
| 3 |
license: apache-2.0
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
- text-generation
|
| 6 |
-
- summarization
|
| 7 |
- language-modeling
|
| 8 |
- transformers
|
| 9 |
- from-scratch
|
| 10 |
-
|
| 11 |
-
|
| 12 |
|
| 13 |
-
#
|
| 14 |
|
| 15 |
-
|
| 16 |
-
A GPT-style decoder-only Transformer trained from scratch on news text
|
| 17 |
-
using a custom BPE tokenizer.
|
| 18 |
|
| 19 |
## Architecture
|
| 20 |
-
-
|
|
|
|
| 21 |
- Hidden size: 768
|
| 22 |
-
-
|
| 23 |
- Context length: 512
|
| 24 |
- Parameters: ~100M
|
| 25 |
|
| 26 |
## Training
|
| 27 |
-
- Dataset: News articles (CNN/DailyMail
|
| 28 |
-
-
|
| 29 |
- Hardware: Google Colab GPU
|
| 30 |
- Precision: FP16
|
| 31 |
-
-
|
|
|
|
| 32 |
|
| 33 |
## Intended Use
|
| 34 |
- Research
|
| 35 |
- Educational purposes
|
| 36 |
-
-
|
| 37 |
|
| 38 |
## Limitations
|
| 39 |
- Not instruction-tuned
|
| 40 |
- Trained for limited steps
|
| 41 |
-
- Outputs may be verbose or
|
|
|
|
| 1 |
---
|
| 2 |
language: en
|
| 3 |
license: apache-2.0
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
library_name: transformers
|
| 6 |
tags:
|
| 7 |
- text-generation
|
|
|
|
| 8 |
- language-modeling
|
| 9 |
- transformers
|
| 10 |
- from-scratch
|
| 11 |
+
model_name: Genesis-100M
|
| 12 |
+
---
|
| 13 |
|
| 14 |
+
# Genesis-100M
|
| 15 |
|
| 16 |
+
Genesis-100M is a 100M-parameter GPT-style language model trained from scratch on news text using a custom BPE tokenizer.
|
|
|
|
|
|
|
| 17 |
|
| 18 |
## Architecture
|
| 19 |
+
- Decoder-only Transformer (GPT-style)
|
| 20 |
+
- 12 layers
|
| 21 |
- Hidden size: 768
|
| 22 |
+
- Attention heads: 12
|
| 23 |
- Context length: 512
|
| 24 |
- Parameters: ~100M
|
| 25 |
|
| 26 |
## Training
|
| 27 |
+
- Dataset: News articles (CNN/DailyMail – articles only)
|
| 28 |
+
- Objective: Causal Language Modeling
|
| 29 |
- Hardware: Google Colab GPU
|
| 30 |
- Precision: FP16
|
| 31 |
+
- Training steps: 2000
|
| 32 |
+
- Optimizations: Gradient checkpointing, gradient accumulation
|
| 33 |
|
| 34 |
## Intended Use
|
| 35 |
- Research
|
| 36 |
- Educational purposes
|
| 37 |
+
- Text generation experiments
|
| 38 |
|
| 39 |
## Limitations
|
| 40 |
- Not instruction-tuned
|
| 41 |
- Trained for limited steps
|
| 42 |
+
- Outputs may be verbose or repetitive
|