Model Card for gpjt/1xrtx3090m24-fineweb

This model is gpjt/1xrtx3090m24-fineweb, a trained-from-scratch base model using the GPT-2-style architecture from Sebastian Raschka's book "Build a Large Language Model (from Scratch)".

Model Details

Model Description

Developed by: Giles Thomas, based on code by Sebastian Raschka
Model type: GPT-2 style transformers-based causal LLM.
License: Apache 2
Parameters: 163,009,536
Context length: 1,024
Embedding dimensions: 768
MHA heads: 12
Layers: 12
QKV bias: False
Weight tying: No.

Don't have high expectations for the model! It has only 163M parameters (the GPT-2 "small" size) and was trained on roughly the Chinchilla-optimal number of tokens (~20x the number of parameters), which means that it doesn't know many facts and is not terribly smart. If you want to do serious work, use a serious model (I like Qwen's). But if you want to build on this and see what you can do with a 2020-vintage LLM, please do feel free to play with it!

Model Sources

Repository: gpjt/ddp-base-model-from-scratch
Blog post: Writing an LLM from scratch, part 28 -- training a base model from scratch on an RTX 3090 (this is the model from the first train, in the section "Finally training an LLM!")

How to Get Started with the Model

You can download and run the model for inference directly:

from transformers import pipeline
pipe = pipeline("text-generation", model="gpjt/1xrtx3090m24-fineweb", trust_remote_code=True)
out = pipe(
    "Every effort moves you",
    max_new_tokens=20,
    do_sample=True,
    temperature=1.4,
    top_k=25,
)
print(out[0]["generated_text"])

Note that because it uses custom code, you'll need to set trust_remote_code to True.

It supports AutoTokenizer, AutoModel and AutoModelForCausalLM:

>>> from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("gpjt/1xrtx3090m24-fineweb")
>>> model = AutoModel.from_pretrained("gpjt/1xrtx3090m24-fineweb", trust_remote_code=True)
>>> llm_model = AutoModelForCausalLM.from_pretrained("gpjt/1xrtx3090m24-fineweb", trust_remote_code=True)

You can also fine-tune it; this notebook has an example.

Again, don't expect too much from this model! It's a 163M-parameter GPT-2 one, trained on a limited number of tokens. It's both dumb and ignorant ;-)

Training Details

Machine type: My home PC, which has an RTX 3090 (24 GiB VRAM)
Tokens: 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
Dataset: HuggingFaceFW/fineweb
Micro-batch size: 6
Global batch size: 6
Dropout: 0.1

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train gpjt/1xrtx3090m24-fineweb

Collection including gpjt/1xrtx3090m24-fineweb

LLM from scratch

Collection

Models I've created as "extra credit" after finishing Sebastian Raschka's book "[Build a Large Language Model (from Scratch)" • 13 items • Updated 4 minutes ago