sagar118 commited on
Commit
7b46efd
·
verified ·
1 Parent(s): 5f8135c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -14
README.md CHANGED
@@ -1,41 +1,42 @@
1
  ---
2
  language: en
3
  license: apache-2.0
 
 
4
  tags:
5
  - text-generation
6
- - summarization
7
  - language-modeling
8
  - transformers
9
  - from-scratch
10
- pipeline_tag: text-generation
11
- library_name: transformers
12
 
13
- # Custom 100M Parameter Language Model
14
 
15
- ## Overview
16
- A GPT-style decoder-only Transformer trained from scratch on news text
17
- using a custom BPE tokenizer.
18
 
19
  ## Architecture
20
- - Layers: 12
 
21
  - Hidden size: 768
22
- - Heads: 12
23
  - Context length: 512
24
  - Parameters: ~100M
25
 
26
  ## Training
27
- - Dataset: News articles (CNN/DailyMail, articles only)
28
- - Training method: Causal language modeling
29
  - Hardware: Google Colab GPU
30
  - Precision: FP16
31
- - Steps: 2000
 
32
 
33
  ## Intended Use
34
  - Research
35
  - Educational purposes
36
- - Prompt-based text generation / summarization
37
 
38
  ## Limitations
39
  - Not instruction-tuned
40
  - Trained for limited steps
41
- - Outputs may be verbose or inconsistent
 
1
  ---
2
  language: en
3
  license: apache-2.0
4
+ pipeline_tag: text-generation
5
+ library_name: transformers
6
  tags:
7
  - text-generation
 
8
  - language-modeling
9
  - transformers
10
  - from-scratch
11
+ model_name: Genesis-100M
12
+ ---
13
 
14
+ # Genesis-100M
15
 
16
+ Genesis-100M is a 100M-parameter GPT-style language model trained from scratch on news text using a custom BPE tokenizer.
 
 
17
 
18
  ## Architecture
19
+ - Decoder-only Transformer (GPT-style)
20
+ - 12 layers
21
  - Hidden size: 768
22
+ - Attention heads: 12
23
  - Context length: 512
24
  - Parameters: ~100M
25
 
26
  ## Training
27
+ - Dataset: News articles (CNN/DailyMail articles only)
28
+ - Objective: Causal Language Modeling
29
  - Hardware: Google Colab GPU
30
  - Precision: FP16
31
+ - Training steps: 2000
32
+ - Optimizations: Gradient checkpointing, gradient accumulation
33
 
34
  ## Intended Use
35
  - Research
36
  - Educational purposes
37
+ - Text generation experiments
38
 
39
  ## Limitations
40
  - Not instruction-tuned
41
  - Trained for limited steps
42
+ - Outputs may be verbose or repetitive