File size: 3,167 Bytes
a69ddb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---

license: apache-2.0
tags:
- text-generation
- transformer-decoder
- autoregressive-model
- wikitext-2
- pytorch
language:
- en
library_name: pytorch
---


# Simple Transformer Decoder Language Model

This is a simple Transformer Decoder-based **autoregressive language model** developed as part of a deep learning project.  
It was trained on the [WikiText-2](https://paperswithcode.com/dataset/wikitext-2) dataset using PyTorch, with a focus on **learning to generate English text** in an autoregressive manner (predicting the next token given previous tokens).

The model follows a **decoder-only architecture** similar to models like **GPT**, using **causal masking** to prevent attention to future tokens.

---

## โœจ Model Architecture

- **Model type**: Transformer Decoder (only decoder layers)
- **Embedding size**: 128
- **Number of attention heads**: 4
- **Number of decoder layers**: 2
- **Feed-forward hidden dimension**: 512
- **Positional Encoding**: Sinusoidal (fixed, not learned)
- **Vocabulary size**: Based on GPT-2 tokenizer (approx. 50K tokens)
- **Max sequence length**: 256 tokens
- **Dropout**: 0.1 (inside transformer layers)

---

## ๐Ÿ“š Dataset

- **Name**: WikiText-2
- **Size**: ~2 million tokens
- **Language**: English
- **Task**: Next-token prediction (causal language modeling)

**Dataset Link**: [WikiText-2 on Hugging Face Datasets](https://huggingface.co/datasets/wikitext)

---

## ๐Ÿ‹๏ธโ€โ™‚๏ธ Training Details

- **Optimizer**: Adam
- **Learning rate**: 5e-4
- **Batch size**: 4
- **Training epochs**: 5
- **Loss function**: CrossEntropyLoss
- **Logging**: Weights & Biases (wandb)

โœ… Loss decreased successfully during training, indicating the model learned the structure of English text.

---

## ๐Ÿš€ How to Use

> Note: Since this is a custom PyTorch model (not a Hugging Face PreTrainedModel), you must manually define and load it.

```python

import torch

from transformers import AutoTokenizer

from your_custom_model_code import SimpleTransformerDecoderModel  # import your model class



# Load tokenizer

tokenizer = AutoTokenizer.from_pretrained("mreeza/simple-transformer-model")



# Initialize the model

model = SimpleTransformerDecoderModel(

    vocab_size=len(tokenizer),

    d_model=128,

    nhead=4,

    num_layers=2,

    max_seq_len=256

)



# Load trained weights

model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))

model.eval()



# Generate text

def generate_text(model, tokenizer, prompt="Once upon a time", max_length=50):

    model.eval()

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids



    with torch.no_grad():

        for _ in range(max_length):

            outputs = model(input_ids)

            next_token_logits = outputs[:, -1, :]

            next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(0)

            input_ids = torch.cat([input_ids, next_token_id], dim=-1)

    

    return tokenizer.decode(input_ids[0], skip_special_tokens=True)



prompt = "Once upon a time"

generated = generate_text(model, tokenizer, prompt)

print(generated)