Instructions to use ShawnGiese/gpt2_124M_fineweb10 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ShawnGiese/gpt2_124M_fineweb10 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ShawnGiese/gpt2_124M_fineweb10", filename="shawn.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ShawnGiese/gpt2_124M_fineweb10 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ShawnGiese/gpt2_124M_fineweb10 # Run inference directly in the terminal: llama-cli -hf ShawnGiese/gpt2_124M_fineweb10
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ShawnGiese/gpt2_124M_fineweb10 # Run inference directly in the terminal: llama-cli -hf ShawnGiese/gpt2_124M_fineweb10
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ShawnGiese/gpt2_124M_fineweb10 # Run inference directly in the terminal: ./llama-cli -hf ShawnGiese/gpt2_124M_fineweb10
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ShawnGiese/gpt2_124M_fineweb10 # Run inference directly in the terminal: ./build/bin/llama-cli -hf ShawnGiese/gpt2_124M_fineweb10
Use Docker
docker model run hf.co/ShawnGiese/gpt2_124M_fineweb10
- LM Studio
- Jan
- Ollama
How to use ShawnGiese/gpt2_124M_fineweb10 with Ollama:
ollama run hf.co/ShawnGiese/gpt2_124M_fineweb10
- Unsloth Studio
How to use ShawnGiese/gpt2_124M_fineweb10 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ShawnGiese/gpt2_124M_fineweb10 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ShawnGiese/gpt2_124M_fineweb10 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ShawnGiese/gpt2_124M_fineweb10 to start chatting
- Docker Model Runner
How to use ShawnGiese/gpt2_124M_fineweb10 with Docker Model Runner:
docker model run hf.co/ShawnGiese/gpt2_124M_fineweb10
- Lemonade
How to use ShawnGiese/gpt2_124M_fineweb10 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ShawnGiese/gpt2_124M_fineweb10
Run and chat with the model
lemonade run user.gpt2_124M_fineweb10-{{QUANT_TAG}}List all available models
lemonade list
This is a GPT-2 model trained using llm.c on FineWeb.
A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/481. He also has a Python implementation.
Example use with ollama:
ollama run hf.co/ShawnGiese/gpt2_124M_fineweb10 "My super smart team of cloud computing experts are welcoming many new customers. The next thing you might hear about this amazing team is"
Technical
This model has a context window of 1024 tokens.
It was trained on 10 billion tokens from FineWeb and has 124 million parameters. Model comparison:
- 117 million parameters GPT-1 (2018)
- 124 million parameters << my model
- 1,500 million parameters full GPT-2 (2019)
- 175,000 million parameters GPT-3 (2020)
- 1,000,000+ million parameters GPT-4 (2023)
- 1,750,000+ million parameters GPT-5 (2025)
Generally speaking, you could run a 2,000 million parameters model on a computer with 8GB RAM or a 7,000 million parameters model on a computer with 16GB RAM. However, the experience may not be as smooth as expected depending on things like:
- Do you have a GPU graphic processor to speed up the model and does that card have a lot of video memory?
- How was the model packaged / quantized (essentially lowering its precision / resolution to require less storage and processing)?
- Are you just making text queries or are you processing other data like sound / images?
Some of the actual GPT-2 code and data specs are released by OpenAI... check them out for a more full featured test GPT and info about their sources. https://github.com/openai/gpt-2/blob/master/model_card.md
Comments
In case anyone tries to build this, be sure to add an hour for data sharding and maybe an extra half hour for installing everything and then downloading the results.
Building this model across eight GPUs only took 25GB VRAM in each, so A100 40GB GPUs should be more than enough. In AWS cloud, I believe this to be around a p4d.24xlarge though I used https://lambda.ai/.
If using a cloud based system, consider using a terminal multiplexer like tmux, just in case of a disconnection.
There were reports of people using a single Nvidia RTX 4090 to build this in around 24 hours. The link above has a lot of info.
- Downloads last month
- 27