mreeza commited on
Commit
a69ddb4
·
verified ·
1 Parent(s): 0ecf5b6

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-generation
5
+ - transformer-decoder
6
+ - autoregressive-model
7
+ - wikitext-2
8
+ - pytorch
9
+ language:
10
+ - en
11
+ library_name: pytorch
12
+ ---
13
+
14
+ # Simple Transformer Decoder Language Model
15
+
16
+ This is a simple Transformer Decoder-based **autoregressive language model** developed as part of a deep learning project.
17
+ It was trained on the [WikiText-2](https://paperswithcode.com/dataset/wikitext-2) dataset using PyTorch, with a focus on **learning to generate English text** in an autoregressive manner (predicting the next token given previous tokens).
18
+
19
+ The model follows a **decoder-only architecture** similar to models like **GPT**, using **causal masking** to prevent attention to future tokens.
20
+
21
+ ---
22
+
23
+ ## ✨ Model Architecture
24
+
25
+ - **Model type**: Transformer Decoder (only decoder layers)
26
+ - **Embedding size**: 128
27
+ - **Number of attention heads**: 4
28
+ - **Number of decoder layers**: 2
29
+ - **Feed-forward hidden dimension**: 512
30
+ - **Positional Encoding**: Sinusoidal (fixed, not learned)
31
+ - **Vocabulary size**: Based on GPT-2 tokenizer (approx. 50K tokens)
32
+ - **Max sequence length**: 256 tokens
33
+ - **Dropout**: 0.1 (inside transformer layers)
34
+
35
+ ---
36
+
37
+ ## 📚 Dataset
38
+
39
+ - **Name**: WikiText-2
40
+ - **Size**: ~2 million tokens
41
+ - **Language**: English
42
+ - **Task**: Next-token prediction (causal language modeling)
43
+
44
+ **Dataset Link**: [WikiText-2 on Hugging Face Datasets](https://huggingface.co/datasets/wikitext)
45
+
46
+ ---
47
+
48
+ ## 🏋️‍♂️ Training Details
49
+
50
+ - **Optimizer**: Adam
51
+ - **Learning rate**: 5e-4
52
+ - **Batch size**: 4
53
+ - **Training epochs**: 5
54
+ - **Loss function**: CrossEntropyLoss
55
+ - **Logging**: Weights & Biases (wandb)
56
+
57
+ ✅ Loss decreased successfully during training, indicating the model learned the structure of English text.
58
+
59
+ ---
60
+
61
+ ## 🚀 How to Use
62
+
63
+ > Note: Since this is a custom PyTorch model (not a Hugging Face PreTrainedModel), you must manually define and load it.
64
+
65
+ ```python
66
+ import torch
67
+ from transformers import AutoTokenizer
68
+ from your_custom_model_code import SimpleTransformerDecoderModel # import your model class
69
+
70
+ # Load tokenizer
71
+ tokenizer = AutoTokenizer.from_pretrained("mreeza/simple-transformer-model")
72
+
73
+ # Initialize the model
74
+ model = SimpleTransformerDecoderModel(
75
+ vocab_size=len(tokenizer),
76
+ d_model=128,
77
+ nhead=4,
78
+ num_layers=2,
79
+ max_seq_len=256
80
+ )
81
+
82
+ # Load trained weights
83
+ model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
84
+ model.eval()
85
+
86
+ # Generate text
87
+ def generate_text(model, tokenizer, prompt="Once upon a time", max_length=50):
88
+ model.eval()
89
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
90
+
91
+ with torch.no_grad():
92
+ for _ in range(max_length):
93
+ outputs = model(input_ids)
94
+ next_token_logits = outputs[:, -1, :]
95
+ next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(0)
96
+ input_ids = torch.cat([input_ids, next_token_id], dim=-1)
97
+
98
+ return tokenizer.decode(input_ids[0], skip_special_tokens=True)
99
+
100
+ prompt = "Once upon a time"
101
+ generated = generate_text(model, tokenizer, prompt)
102
+ print(generated)