pierjoe commited on
Commit
5f1eb5b
·
verified ·
1 Parent(s): 2b594db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -1
README.md CHANGED
@@ -1,4 +1,73 @@
1
  ---
 
 
2
  license: mit
 
 
 
 
 
 
 
3
  ---
4
- this is a very silly model i've built out of boredom
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: mit
5
+ tags:
6
+ - text-generation
7
+ - educational
8
+ - transformer
9
+ - pytorch
10
+ base_model: []
11
+ pipeline_tag: text-generation
12
  ---
13
+ ---
14
+
15
+ # MiniTransformer v3
16
+
17
+ A small educational transformer model trained from scratch for text generation tasks.
18
+
19
+ ## Model Description
20
+
21
+ MiniTransformer is a compact transformer architecture designed for educational purposes and experimentation. The model is trained on question-answer pairs with various system prompts to demonstrate fundamental transformer capabilities.
22
+
23
+ **This is an educational model** - it's designed to help understand transformer architectures and training processes, not for production use.
24
+
25
+ ## Architecture
26
+
27
+ - **Parameters:** 43.9M
28
+ - **Architecture:** Decoder-only transformer
29
+ - **Embedding Dimension:** 512
30
+ - **Attention Heads:** 4
31
+ - **Layers:** 4
32
+ - **Context Length:** 128 tokens
33
+ - **Vocabulary:** BERT tokenizer (30,522 tokens)
34
+
35
+ ## Training Details
36
+
37
+ ### Training Data
38
+
39
+ - Generic question-answer pairs with diverse system prompts
40
+ - Trained using sliding window approach with stride of 32
41
+ - Train/test split: 90/10
42
+
43
+ ### Training Procedure
44
+
45
+ - **Optimizer:** AdamW (fused, learning rate: 3e-4)
46
+ - **Batch Size:** 128
47
+ - **Epochs:** 50
48
+ - **Mixed Precision:** FP16 (AMP enabled)
49
+ - **Hardware:** NVIDIA A10 GPU
50
+ - **Final Train Loss:** 0.0024
51
+
52
+ ### Framework
53
+
54
+ - PyTorch 2.0+ with `torch.compile()` optimization
55
+ - Transformers library tokenizer
56
+
57
+ ## Usage
58
+ ```python
59
+ import torch
60
+ from transformers import AutoTokenizer
61
+
62
+ # Load tokenizer
63
+ tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
64
+
65
+ # Load model (you'll need to download the checkpoint)
66
+ # model = MiniTransformer(...)
67
+ # model.load_state_dict(torch.load("checkpoint.pt"))
68
+
69
+ # Generate text
70
+ input_text = "Your prompt here"
71
+ input_ids = tokenizer.encode(input_text, return_tensors="pt")
72
+
73
+ # Generation code here