HenrySentinel commited on
Commit
a43cd09
·
verified ·
1 Parent(s): 31f64a1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - text-generation
7
+ - pytorch
8
+ - gpt
9
+ - language-model
10
+ ---
11
+
12
+ # tinyMind
13
+
14
+ This is a small transformer language model trained from scratch with approximately 17,731,328 parameters.
15
+
16
+ ## Model Details
17
+
18
+ - **Architecture**: GPT-style transformer
19
+ - **Parameters**: ~17M
20
+ - **Layers**: 6
21
+ - **Attention Heads**: 8
22
+ - **Embedding Dimension**: 256
23
+ - **Max Sequence Length**: 512
24
+ - **Vocabulary Size**: 50257
25
+
26
+ ## Training Data
27
+
28
+ The model was trained on a diverse mixture of high-quality text data including:
29
+ - OpenWebText
30
+ - Wikipedia articles
31
+ - BookCorpus
32
+ - Other curated text sources
33
+
34
+ ## Usage
35
+
36
+ ```python
37
+ from transformers import GPT2TokenizerFast, AutoModelForCausalLM
38
+
39
+ tokenizer = GPT2TokenizerFast.from_pretrained("HenrySentinel/tinyMind")
40
+ model = AutoModelForCausalLM.from_pretrained("HenrySentinel/tinyMind")
41
+
42
+ # Generate text
43
+ input_text = "The key to artificial intelligence is"
44
+ input_ids = tokenizer.encode(input_text, return_tensors="pt")
45
+ output = model.generate(input_ids, max_length=100, temperature=0.8, do_sample=True)
46
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
47
+ print(generated_text)
48
+ ```
49
+
50
+ ## Training Details
51
+
52
+ - **Optimizer**: AdamW with cosine learning rate scheduling
53
+ - **Learning Rate**: 0.001
54
+ - **Batch Size**: 8
55
+ - **Sequence Length**: 512
56
+ - **Epochs**: 3
57
+ - **Gradient Clipping**: 1.0
58
+
59
+ ## Limitations
60
+
61
+ This is a small model designed for experimentation and learning. It may:
62
+ - Generate inconsistent or factually incorrect content
63
+ - Have limited knowledge compared to larger models
64
+ - Require careful prompt engineering for best results
65
+
66
+ ## License
67
+
68
+ Apache 2.0