wizardoftrap commited on
Commit
dd24a17
·
verified ·
1 Parent(s): 8c182ba

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - gpt
5
+ - language-model
6
+ - causal-lm
7
+ language:
8
+ - en
9
+ datasets:
10
+ - roneneldan/TinyStories
11
+ ---
12
+
13
+ # SP-LM-alpha
14
+
15
+ A GPT model trained on the TinyStories dataset using PyTorch.
16
+
17
+ ## Model Details
18
+
19
+ - **Model Type**: GPT (Causal Language Model)
20
+ - **Vocab Size**: 50257
21
+ - **Context Length**: 128
22
+ - **Layers**: 6
23
+ - **Attention Heads**: 6
24
+ - **Embedding Dimension**: 384
25
+ - **Training Dataset**: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
26
+
27
+ ## Architecture
28
+
29
+ The model uses a transformer architecture with:
30
+ - Token and positional embeddings
31
+ - 6 transformer blocks
32
+ - Causal self-attention with 6 heads
33
+ - Feed-forward networks with GELU activation
34
+ - Layer normalization
35
+ - Residual connections
36
+
37
+ ## Usage
38
+
39
+ ```python
40
+ from transformers import AutoTokenizer, AutoModelForCausalLM
41
+
42
+ model_id = "your-username/SP-LM-alpha"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
44
+ model = AutoModelForCausalLM.from_pretrained(model_id)
45
+
46
+ # Generate text
47
+ prompt = "Once upon a time"
48
+ inputs = tokenizer(prompt, return_tensors="pt")
49
+ outputs = model.generate(**inputs, max_length=100)
50
+ print(tokenizer.decode(outputs[0]))
51
+ ```
52
+
53
+ ## Training Details
54
+
55
+ - **Learning Rate**: 1e-4 with linear warmup and cosine annealing decay
56
+ - **Batch Size**: 32
57
+ - **Gradient Accumulation Steps**: 32
58
+ - **Max Iterations**: 20000
59
+ - **Optimizer**: AdamW with weight decay
60
+ - **Mixed Precision**: bfloat16 / float16
61
+
62
+ ## License
63
+
64
+ MIT License
65
+
66
+ ## Model Card Contact
67
+
68
+ For questions or issues, please contact the model author.