license: mit
Hello, this is TimeCapsule V0.5
This is a language model trained entirely on texts from 1800-1875 London. The goal is to eliminate modern bias, right now this model is trained on roughly 435MB but I hope to continue expanding the dataset.
How to run this model?
First download the hugface folder, inside you'll find everything you need
Disclaimer: Most of the python files in this folder are from nanoGPT by Andrej Karpathy, some of them are slightly modified. If you plan on training your own model please visit: https://github.com/karpathy/nanoGPT
Step 1: Download the repository
Download the repo or clone it up to you
sample.py
ckpt.pt
config.json
meta.pkl
tokenizer_london/vocab.json
tokenizer_london/merges.txt
model.py
configurator.py
Step 2: Make sure you have requirements installed
Make sure you have this installed: pip install tokenizers torch
Step 3: Run the model
python3 sample.py --out_dir=. --start="Put prompt here!"
Optional control settings
Go to sample.py and change the command line arguments to control generation settings
For more info on this project go to my github: https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/README.md