--- tags: - generated_from_trainer model-index: - name: lm-gpt2-timemachine results: [] license: apache-2.0 datasets: - SOULAMA/timemachine-dataset-preprocessed language: - en metrics: - perplexity base_model: - openai-community/gpt2 pipeline_tag: text-generation --- # lm-gpt2-timemachine This model is a **causal language model** trained from scratch on the novel **_The Time Machine_ by H. G. Wells**. The objective of this project is educational: to understand the full pipeline of **training a GPT-style language model from raw text**, including preprocessing, tokenization, training, and text generation. --- ## Model description - **Architecture**: GPT-2–style causal language model - **Training type**: From scratch (no pretrained weights) - **Language**: English - **Tokenizer**: Byte-level BPE - **Context length**: 128 tokens - **Task**: Causal language modeling (next-token prediction) The model learns to predict the next token given a sequence of previous tokens, and can be used to generate text in the style of *The Time Machine*. --- ## Intended uses & limitations ### Intended uses - Educational purposes - Learning how causal language models work - Small-scale text generation experiments - Understanding Hugging Face training workflows ### Limitations - Trained on a **very small corpus** (single novel) - Not suitable for general-purpose text generation - May produce repetitive or incoherent text - Not optimized for factual correctness --- ## Training and evaluation data - **Dataset**: *The Time Machine* by H. G. Wells - **Source**: Public domain text - **Preprocessing**: - Text normalization - Tokenization with a byte-level tokenizer - Concatenation of text and splitting into fixed-length blocks (128 tokens) The same dataset was split into training, validation, and test subsets. --- ## Training procedure - **Objective**: Causal Language Modeling - **Loss function**: Cross-entropy loss - **Optimizer**: AdamW - **Batching**: Fixed-length token blocks - **Evaluation**: Perplexity on validation data The model was trained using the Hugging Face `Trainer` API. --- ## Example generation ```text The Time Traveller (for so it will be convenient to speak of him) was a curious man, of no less intellectual character than... ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 3 - mixed_precision_training: Native AMP ### Training results - 'eval_loss': 2.851572036743164 - 'eval_perplexity': 17.314979553222656 ### Framework versions - Transformers 4.57.6 - Pytorch 2.6.0+cu126 - Datasets 3.6.0 - Tokenizers 0.22.1