Instructions to use ShawnGiese/gpt2_124M_fineweb10 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ShawnGiese/gpt2_124M_fineweb10 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ShawnGiese/gpt2_124M_fineweb10",
	filename="shawn.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use ShawnGiese/gpt2_124M_fineweb10 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ShawnGiese/gpt2_124M_fineweb10
# Run inference directly in the terminal:
llama-cli -hf ShawnGiese/gpt2_124M_fineweb10

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ShawnGiese/gpt2_124M_fineweb10
# Run inference directly in the terminal:
llama-cli -hf ShawnGiese/gpt2_124M_fineweb10

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ShawnGiese/gpt2_124M_fineweb10
# Run inference directly in the terminal:
./llama-cli -hf ShawnGiese/gpt2_124M_fineweb10

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ShawnGiese/gpt2_124M_fineweb10
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ShawnGiese/gpt2_124M_fineweb10

Use Docker

docker model run hf.co/ShawnGiese/gpt2_124M_fineweb10

LM Studio
Jan
Ollama
How to use ShawnGiese/gpt2_124M_fineweb10 with Ollama:
```
ollama run hf.co/ShawnGiese/gpt2_124M_fineweb10
```

Unsloth Studio

How to use ShawnGiese/gpt2_124M_fineweb10 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ShawnGiese/gpt2_124M_fineweb10 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ShawnGiese/gpt2_124M_fineweb10 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ShawnGiese/gpt2_124M_fineweb10 to start chatting

Docker Model Runner
How to use ShawnGiese/gpt2_124M_fineweb10 with Docker Model Runner:
```
docker model run hf.co/ShawnGiese/gpt2_124M_fineweb10
```

Lemonade

How to use ShawnGiese/gpt2_124M_fineweb10 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull ShawnGiese/gpt2_124M_fineweb10

Run and chat with the model

lemonade run user.gpt2_124M_fineweb10-{{QUANT_TAG}}

List all available models

lemonade list

This is a GPT-2 model trained using llm.c on FineWeb.

A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/481. He also has a Python implementation.

Example use with ollama:

ollama run hf.co/ShawnGiese/gpt2_124M_fineweb10 "My super smart team of cloud computing experts are welcoming many new customers. The next thing you might hear about this amazing team is"

Technical

This model has a context window of 1024 tokens.

It was trained on 10 billion tokens from FineWeb and has 124 million parameters. Model comparison:

117 million parameters GPT-1 (2018)
124 million parameters << my model
1,500 million parameters full GPT-2 (2019)
175,000 million parameters GPT-3 (2020)
1,000,000+ million parameters GPT-4 (2023)
1,750,000+ million parameters GPT-5 (2025)

Generally speaking, you could run a 2,000 million parameters model on a computer with 8GB RAM or a 7,000 million parameters model on a computer with 16GB RAM. However, the experience may not be as smooth as expected depending on things like:

Do you have a GPU graphic processor to speed up the model and does that card have a lot of video memory?
How was the model packaged / quantized (essentially lowering its precision / resolution to require less storage and processing)?
Are you just making text queries or are you processing other data like sound / images?

Some of the actual GPT-2 code and data specs are released by OpenAI... check them out for a more full featured test GPT and info about their sources. https://github.com/openai/gpt-2/blob/master/model_card.md

Comments

In case anyone tries to build this, be sure to add an hour for data sharding and maybe an extra half hour for installing everything and then downloading the results.

Building this model across eight GPUs only took 25GB VRAM in each, so A100 40GB GPUs should be more than enough. In AWS cloud, I believe this to be around a p4d.24xlarge though I used https://lambda.ai/.

If using a cloud based system, consider using a terminal multiplexer like tmux, just in case of a disconnection.

There were reports of people using a single Nvidia RTX 4090 to build this in around 24 hours. The link above has a lot of info.

Downloads last month: 27

Safetensors

Model size

0.1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ShawnGiese
/

gpt2_124M_fineweb10

Technical

Comments

Dataset used to train ShawnGiese/gpt2_124M_fineweb10