Instructions to use gpjt/1xrtx3090-stacked-interventions with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gpjt/1xrtx3090-stacked-interventions with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use gpjt/1xrtx3090-stacked-interventions with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gpjt/1xrtx3090-stacked-interventions"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gpjt/1xrtx3090-stacked-interventions",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/gpjt/1xrtx3090-stacked-interventions

SGLang

How to use gpjt/1xrtx3090-stacked-interventions with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "gpjt/1xrtx3090-stacked-interventions" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gpjt/1xrtx3090-stacked-interventions",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "gpjt/1xrtx3090-stacked-interventions" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gpjt/1xrtx3090-stacked-interventions",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use gpjt/1xrtx3090-stacked-interventions with Docker Model Runner:
```
docker model run hf.co/gpjt/1xrtx3090-stacked-interventions
```

Model Card for gpjt/1xrtx3090-stacked-interventions

This model is gpjt/1xrtx3090-stacked-interventions, a trained-from-scratch base model using the GPT-2-style architecture from Sebastian Raschka's book "Build a Large Language Model (from Scratch)".

Model Details

Model Description

Developed by: Giles Thomas, based on code by Sebastian Raschka
Model type: GPT-2 style transformers-based causal LLM.
License: Apache 2
Parameters: 163,009,536
Context length: 1,024
Embedding dimensions: 768
MHA heads: 12
Layers: 12
QKV bias: False
Weight tying: False

Don't have high expectations for the model! It has only 163M parameters (the GPT-2 "small" size) and was trained on roughly the Chinchilla-optimal number of tokens (~20x the number of parameters), which means that it doesn't know many facts and is not terribly smart. If you want to do serious work, use a serious model (I like Qwen's). But if you want to build on this and see what you can do with a 2020-vintage LLM, please do feel free to play with it!

Model Sources

Repository: gpjt/ddp-base-model-from-scratch
Blog post: Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation

How to Get Started with the Model

You can download and run the model for inference directly:

from transformers import pipeline
pipe = pipeline("text-generation", model="gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)
out = pipe(
    "Every effort moves you",
    max_new_tokens=20,
    do_sample=True,
    temperature=1.4,
    top_k=25,
)
print(out[0]["generated_text"])

Note that because it uses custom code, you'll need to set trust_remote_code to True.

It supports AutoTokenizer, AutoModel and AutoModelForCausalLM:

>>> from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("gpjt/1xrtx3090-stacked-interventions")
>>> model = AutoModel.from_pretrained("gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)
>>> llm_model = AutoModelForCausalLM.from_pretrained("gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)

You can also fine-tune it; this notebook has an example.

Again, don't expect too much from this model! It's a 163M-parameter GPT-2 one, trained on a limited number of tokens. It's both dumb and ignorant ;-)

Training Details

Machine type: Local machine with an RTX 3090
Tokens: 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
Dataset: gpjt/fineweb-gpt2-tokens
Micro-batch size: 6
Global batch size: 96 (using 12 gradient accumulation steps)
Dropout: 0.0
Gradient clipping: 3.5
Learning rate: 0.0014
Schedule learning rate: True
Weight decay: 0.01

Downloads last month: 10

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train gpjt/1xrtx3090-stacked-interventions

Collection including gpjt/1xrtx3090-stacked-interventions

LLM from scratch

Collection

Models I've created as "extra credit" after finishing Sebastian Raschka's book "[Build a Large Language Model (from Scratch)" • 28 items • Updated Apr 15 • 1