File size: 3,086 Bytes
2254808
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
tags:
- generated_from_trainer
model-index:
- name: lm-gpt2-timemachine
  results: []
license: apache-2.0
datasets:
- SOULAMA/timemachine-dataset-preprocessed
language:
- en
metrics:
- perplexity
base_model:
- openai-community/gpt2
pipeline_tag: text-generation
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# lm-gpt2-timemachine

This model is a **causal language model** trained from scratch on the novel  
**_The Time Machine_ by H. G. Wells**.

The objective of this project is educational: to understand the full pipeline of
**training a GPT-style language model from raw text**, including preprocessing,
tokenization, training, and text generation.

---

## Model description

- **Architecture**: GPT-2–style causal language model  
- **Training type**: From scratch (no pretrained weights)  
- **Language**: English  
- **Tokenizer**: Byte-level BPE  
- **Context length**: 128 tokens  
- **Task**: Causal language modeling (next-token prediction)

The model learns to predict the next token given a sequence of previous tokens,
and can be used to generate text in the style of *The Time Machine*.

---

## Intended uses & limitations

### Intended uses
- Educational purposes
- Learning how causal language models work
- Small-scale text generation experiments
- Understanding Hugging Face training workflows

### Limitations
- Trained on a **very small corpus** (single novel)
- Not suitable for general-purpose text generation
- May produce repetitive or incoherent text
- Not optimized for factual correctness

---

## Training and evaluation data

- **Dataset**: *The Time Machine* by H. G. Wells  
- **Source**: Public domain text  
- **Preprocessing**:
  - Text normalization
  - Tokenization with a byte-level tokenizer
  - Concatenation of text and splitting into fixed-length blocks (128 tokens)

The same dataset was split into training, validation, and test subsets.

---

## Training procedure

- **Objective**: Causal Language Modeling  
- **Loss function**: Cross-entropy loss  
- **Optimizer**: AdamW  
- **Batching**: Fixed-length token blocks  
- **Evaluation**: Perplexity on validation data  

The model was trained using the Hugging Face `Trainer` API.

---

## Example generation

```text
The Time Traveller (for so it will be convenient to speak of him) was a curious man,
of no less intellectual character than...


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP

### Training results

- 'eval_loss': 2.851572036743164
- 'eval_perplexity': 17.314979553222656

### Framework versions

- Transformers 4.57.6
- Pytorch 2.6.0+cu126
- Datasets 3.6.0
- Tokenizers 0.22.1