|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-generation |
|
|
- language-modeling |
|
|
- transformers |
|
|
- from-scratch |
|
|
model_name: Genesis-100M |
|
|
--- |
|
|
|
|
|
## Architecture |
|
|
- Decoder-only Transformer (GPT-style) |
|
|
- 12 layers |
|
|
- Hidden size: 768 |
|
|
- Attention heads: 12 |
|
|
- Context length: 512 |
|
|
- Parameters: ~100M |
|
|
|
|
|
## Training |
|
|
- Dataset: News articles (CNN/DailyMail – articles only) |
|
|
- Objective: Causal Language Modeling |
|
|
- Hardware: Google Colab GPU |
|
|
- Precision: FP16 |
|
|
- Training steps: 2000 |
|
|
- Optimizations: Gradient checkpointing, gradient accumulation |
|
|
|
|
|
## Training Loss Curve |
|
|
|
|
|
 |
|
|
|
|
|
The training loss decreased steadily from approximately **9.1 to 5.3** over **2000 training steps**, indicating stable convergence during from-scratch training of the 100M-parameter language model. |
|
|
|
|
|
## Intended Use |
|
|
- Research |
|
|
- Educational purposes |
|
|
- Text generation experiments |
|
|
|
|
|
## Limitations |
|
|
- Not instruction-tuned |
|
|
- Trained for limited steps |
|
|
- Outputs may be verbose or repetitive |
|
|
|