Custom-LLM-100M / README.md
sagar118's picture
Update README.md
099e95e verified
metadata
language: en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
tags:
  - text-generation
  - language-modeling
  - transformers
  - from-scratch
model_name: Genesis-100M

Architecture

  • Decoder-only Transformer (GPT-style)
  • 12 layers
  • Hidden size: 768
  • Attention heads: 12
  • Context length: 512
  • Parameters: ~100M

Training

  • Dataset: News articles (CNN/DailyMail – articles only)
  • Objective: Causal Language Modeling
  • Hardware: Google Colab GPU
  • Precision: FP16
  • Training steps: 2000
  • Optimizations: Gradient checkpointing, gradient accumulation

Training Loss Curve

Training Loss Curve

The training loss decreased steadily from approximately 9.1 to 5.3 over 2000 training steps, indicating stable convergence during from-scratch training of the 100M-parameter language model.

Intended Use

  • Research
  • Educational purposes
  • Text generation experiments

Limitations

  • Not instruction-tuned
  • Trained for limited steps
  • Outputs may be verbose or repetitive