sagar118
/

Custom-LLM-100M

Text Generation

language-modeling

text-generation-inference

Model card Files Files and versions

sagar118 commited on Jan 6

Commit

7b46efd

·

verified ·

1 Parent(s): 5f8135c

Update README.md

Files changed (1) hide show

README.md +15 -14

README.md CHANGED Viewed

@@ -1,41 +1,42 @@
 ---
 language: en
 license: apache-2.0
 tags:
 - text-generation
-- summarization
 - language-modeling
 - transformers
 - from-scratch
-pipeline_tag: text-generation
-library_name: transformers
-# Custom 100M Parameter Language Model
-## Overview
-A GPT-style decoder-only Transformer trained from scratch on news text
-using a custom BPE tokenizer.
 ## Architecture
-- Layers: 12
 - Hidden size: 768
-- Heads: 12
 - Context length: 512
 - Parameters: ~100M
 ## Training
-- Dataset: News articles (CNN/DailyMail, articles only)
-- Training method: Causal language modeling
 - Hardware: Google Colab GPU
 - Precision: FP16
-- Steps: 2000
 ## Intended Use
 - Research
 - Educational purposes
-- Prompt-based text generation / summarization
 ## Limitations
 - Not instruction-tuned
 - Trained for limited steps
-- Outputs may be verbose or inconsistent

 ---
 language: en
 license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
 tags:
 - text-generation
 - language-modeling
 - transformers
 - from-scratch
+model_name: Genesis-100M
+---
+# Genesis-100M
+Genesis-100M is a 100M-parameter GPT-style language model trained from scratch on news text using a custom BPE tokenizer.
 ## Architecture
+- Decoder-only Transformer (GPT-style)
+- 12 layers
 - Hidden size: 768
+- Attention heads: 12
 - Context length: 512
 - Parameters: ~100M
 ## Training
+- Dataset: News articles (CNN/DailyMail – articles only)
+- Objective: Causal Language Modeling
 - Hardware: Google Colab GPU
 - Precision: FP16
+- Training steps: 2000
+- Optimizations: Gradient checkpointing, gradient accumulation
 ## Intended Use
 - Research
 - Educational purposes
+- Text generation experiments
 ## Limitations
 - Not instruction-tuned
 - Trained for limited steps
+- Outputs may be verbose or repetitive