Model Card for gpjt/1xrtx3090m24-fineweb
This model is gpjt/1xrtx3090m24-fineweb, a trained-from-scratch base model using the GPT-2-style architecture from Sebastian Raschka's book "Build a Large Language Model (from Scratch)".
Model Details
Model Description
- Developed by: Giles Thomas, based on code by Sebastian Raschka
- Model type: GPT-2 style transformers-based causal LLM.
- License: Apache 2
- Parameters: 163,009,536
- Context length: 1,024
- Embedding dimensions: 768
- MHA heads: 12
- Layers: 12
- QKV bias: False
- Weight tying: No.
Don't have high expectations for the model! It has only 163M parameters (the GPT-2 "small" size) and was trained on roughly the Chinchilla-optimal number of tokens (~20x the number of parameters), which means that it doesn't know many facts and is not terribly smart. If you want to do serious work, use a serious model (I like Qwen's). But if you want to build on this and see what you can do with a 2020-vintage LLM, please do feel free to play with it!
Model Sources
- Repository: gpjt/ddp-base-model-from-scratch
- Blog post: Writing an LLM from scratch, part 28 -- training a base model from scratch on an RTX 3090 (this is the model from the first train, in the section "Finally training an LLM!")
How to Get Started with the Model
You can download and run the model for inference directly:
from transformers import pipeline
pipe = pipeline("text-generation", model="gpjt/1xrtx3090m24-fineweb", trust_remote_code=True)
out = pipe(
"Every effort moves you",
max_new_tokens=20,
do_sample=True,
temperature=1.4,
top_k=25,
)
print(out[0]["generated_text"])
Note that because it uses custom code, you'll need to set trust_remote_code to True.
It supports AutoTokenizer, AutoModel and AutoModelForCausalLM:
>>> from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("gpjt/1xrtx3090m24-fineweb")
>>> model = AutoModel.from_pretrained("gpjt/1xrtx3090m24-fineweb", trust_remote_code=True)
>>> llm_model = AutoModelForCausalLM.from_pretrained("gpjt/1xrtx3090m24-fineweb", trust_remote_code=True)
You can also fine-tune it; this notebook has an example.
Again, don't expect too much from this model! It's a 163M-parameter GPT-2 one, trained on a limited number of tokens. It's both dumb and ignorant ;-)
Training Details
- Machine type: My home PC, which has an RTX 3090 (24 GiB VRAM)
- Tokens: 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
- Dataset: HuggingFaceFW/fineweb
- Micro-batch size: 6
- Global batch size: 6
- Dropout: 0.1
- Downloads last month
- 13