SOULAMA/timemachine-dataset-preprocessed
Viewer • Updated • 2.04k • 18 • 1
This model is a causal language model trained from scratch on the novel
The Time Machine by H. G. Wells.
The objective of this project is educational: to understand the full pipeline of training a GPT-style language model from raw text, including preprocessing, tokenization, training, and text generation.
The model learns to predict the next token given a sequence of previous tokens, and can be used to generate text in the style of The Time Machine.
The same dataset was split into training, validation, and test subsets.
The model was trained using the Hugging Face Trainer API.
The Time Traveller (for so it will be convenient to speak of him) was a curious man,
of no less intellectual character than...
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP
### Training results
- 'eval_loss': 2.851572036743164
- 'eval_perplexity': 17.314979553222656
### Framework versions
- Transformers 4.57.6
- Pytorch 2.6.0+cu126
- Datasets 3.6.0
- Tokenizers 0.22.1
Base model
openai-community/gpt2