Instructions to use gpjt/1xrtx3090-stacked-interventions with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use gpjt/1xrtx3090-stacked-interventions with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use gpjt/1xrtx3090-stacked-interventions with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "gpjt/1xrtx3090-stacked-interventions" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "gpjt/1xrtx3090-stacked-interventions", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/gpjt/1xrtx3090-stacked-interventions
- SGLang
How to use gpjt/1xrtx3090-stacked-interventions with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "gpjt/1xrtx3090-stacked-interventions" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "gpjt/1xrtx3090-stacked-interventions", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "gpjt/1xrtx3090-stacked-interventions" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "gpjt/1xrtx3090-stacked-interventions", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use gpjt/1xrtx3090-stacked-interventions with Docker Model Runner:
docker model run hf.co/gpjt/1xrtx3090-stacked-interventions
Model Card for gpjt/1xrtx3090-stacked-interventions
This model is gpjt/1xrtx3090-stacked-interventions, a trained-from-scratch base model using the GPT-2-style architecture from Sebastian Raschka's book "Build a Large Language Model (from Scratch)".
Model Details
Model Description
- Developed by: Giles Thomas, based on code by Sebastian Raschka
- Model type: GPT-2 style transformers-based causal LLM.
- License: Apache 2
- Parameters: 163,009,536
- Context length: 1,024
- Embedding dimensions: 768
- MHA heads: 12
- Layers: 12
- QKV bias: False
- Weight tying: False
Don't have high expectations for the model! It has only 163M parameters (the GPT-2 "small" size) and was trained on roughly the Chinchilla-optimal number of tokens (~20x the number of parameters), which means that it doesn't know many facts and is not terribly smart. If you want to do serious work, use a serious model (I like Qwen's). But if you want to build on this and see what you can do with a 2020-vintage LLM, please do feel free to play with it!
Model Sources
- Repository: gpjt/ddp-base-model-from-scratch
- Blog post: Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation
How to Get Started with the Model
You can download and run the model for inference directly:
from transformers import pipeline
pipe = pipeline("text-generation", model="gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)
out = pipe(
"Every effort moves you",
max_new_tokens=20,
do_sample=True,
temperature=1.4,
top_k=25,
)
print(out[0]["generated_text"])
Note that because it uses custom code, you'll need to set trust_remote_code to True.
It supports AutoTokenizer, AutoModel and AutoModelForCausalLM:
>>> from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("gpjt/1xrtx3090-stacked-interventions")
>>> model = AutoModel.from_pretrained("gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)
>>> llm_model = AutoModelForCausalLM.from_pretrained("gpjt/1xrtx3090-stacked-interventions", trust_remote_code=True)
You can also fine-tune it; this notebook has an example.
Again, don't expect too much from this model! It's a 163M-parameter GPT-2 one, trained on a limited number of tokens. It's both dumb and ignorant ;-)
Training Details
- Machine type: Local machine with an RTX 3090
- Tokens: 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
- Dataset: gpjt/fineweb-gpt2-tokens
- Micro-batch size: 6
- Global batch size: 96 (using 12 gradient accumulation steps)
- Dropout: 0.0
- Gradient clipping: 3.5
- Learning rate: 0.0014
- Schedule learning rate: True
- Weight decay: 0.01
- Downloads last month
- 105