--- language: en license: apache-2.0 pipeline_tag: text-generation library_name: transformers tags: - text-generation - language-modeling - transformers - from-scratch model_name: Genesis-100M --- ## Architecture - Decoder-only Transformer (GPT-style) - 12 layers - Hidden size: 768 - Attention heads: 12 - Context length: 512 - Parameters: ~100M ## Training - Dataset: News articles (CNN/DailyMail – articles only) - Objective: Causal Language Modeling - Hardware: Google Colab GPU - Precision: FP16 - Training steps: 2000 - Optimizations: Gradient checkpointing, gradient accumulation ## Training Loss Curve ![Training Loss Curve](training_loss.png) The training loss decreased steadily from approximately **9.1 to 5.3** over **2000 training steps**, indicating stable convergence during from-scratch training of the 100M-parameter language model. ## Intended Use - Research - Educational purposes - Text generation experiments ## Limitations - Not instruction-tuned - Trained for limited steps - Outputs may be verbose or repetitive