Instructions to use k050506koch/GPT3-dev-125m-1005 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use k050506koch/GPT3-dev-125m-1005 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="k050506koch/GPT3-dev-125m-1005", trust_remote_code=True)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("k050506koch/GPT3-dev-125m-1005", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use k050506koch/GPT3-dev-125m-1005 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "k050506koch/GPT3-dev-125m-1005"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "k050506koch/GPT3-dev-125m-1005",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/k050506koch/GPT3-dev-125m-1005

SGLang

How to use k050506koch/GPT3-dev-125m-1005 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "k050506koch/GPT3-dev-125m-1005" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "k050506koch/GPT3-dev-125m-1005",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "k050506koch/GPT3-dev-125m-1005" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "k050506koch/GPT3-dev-125m-1005",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use k050506koch/GPT3-dev-125m-1005 with Docker Model Runner:
```
docker model run hf.co/k050506koch/GPT3-dev-125m-1005
```

You can find all code on GitHub

Note: This is a model with 125 million parameters (attempt to replicate GPT-3 Small). (it's very undertrained.)

Note 2: This is a model checkpoint released on 10/05 2026 (72 batch size, 4 grad accumulation and 50000 steps under Muon optimizer). It scores 25.49% on MMLU which is slightly higher than 25% (random guess)

Note 3: This model already demonstrates basic abilities in generating text. It's not perfect and I will continue working on it. Expect Instruct model soon.

Model description

This is a small GPT-style autoregressive language model. It is intended as a development checkpoint, not as a production-ready assistant. But you can try.

This time I used kernels and Flash Attention 4 and Flash Attention 2 with the fallback to SDPA. This allowed me to cut the time required for one step from nearly 60 seconds (on jetson) to 3.6 seconds (on the server) and then to 2.2 seconds (using Unsloth kernels)

Important notes

This model is still undertrained. Its benchmark results are close to random-choice level on multiple-choice academic benchmarks, so the checkpoint should be treated as experimental.

It can generate basic text, but it may produce incorrect, repetitive, incoherent, or non-readable outputs. It is not instruction-tuned, but it can produce several meaningful paragraphs.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained(k050506koch/GPT3-dev-125m-1009, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(k050506koch/GPT3-dev-125m-1009, trust_remote_code=True)

if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

prompt = "He is a doctor. His main goal is"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=96,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2,
    no_repeat_ngram_size=3,
    pad_token_id=tokenizer.pad_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation results

Evaluation was run locally on CPU with a custom evaluation script.

These results should not be compared directly with Open LLM Leaderboard results unless the same evaluation harness, prompt format, number of shots, and dataset splits are used.

Summary

Benchmark	Accuracy	Perplexity
HellaSwag	0.2677	34.3111
MMLU average	0.2549	141.9833

MMLU

Task	Accuracy	Perplexity
abstract_algebra	0.2600	182.4785
anatomy	0.2519	206.2038
astronomy	0.2303	166.3864
business_ethics	0.2800	145.5782
clinical_knowledge	0.1925	100.5738
college_biology	0.2847	162.7603
college_chemistry	0.2800	157.3521
college_computer_science	0.2200	132.0329
college_mathematics	0.2300	114.1684
college_medicine	0.2254	24.5343
college_physics	0.2353	115.2290
computer_security	0.2300	141.5838
conceptual_physics	0.2894	312.6869
econometrics	0.2632	135.2830
electrical_engineering	0.2690	259.6937
elementary_mathematics	0.2646	64.6184
formal_logic	0.2460	56.9265
global_facts	0.1500	89.0267
high_school_biology	0.2677	89.7088
high_school_chemistry	0.2562	123.2220
high_school_computer_science	0.2300	79.9634
high_school_european_history	0.2667	118.5012
high_school_geography	0.2980	156.3795
high_school_government_and_politics	0.2176	174.9534
high_school_macroeconomics	0.2462	132.2859
high_school_mathematics	0.2333	105.9731
high_school_microeconomics	0.2605	82.1080
high_school_physics	0.2715	71.0461
high_school_psychology	0.2624	137.8331
high_school_statistics	0.2824	61.6760
high_school_us_history	0.3039	88.8365
high_school_world_history	0.2447	74.1491
human_aging	0.2377	306.9222
human_sexuality	0.2595	110.5550
international_law	0.3223	211.6555
jurisprudence	0.2130	109.2910
logical_fallacies	0.2331	207.6864
machine_learning	0.2500	120.3576
management	0.3592	368.0460
marketing	0.2436	73.0363
medical_genetics	0.3100	296.1581
miscellaneous	0.2363	140.3008
moral_disputes	0.2370	111.0396
moral_scenarios	0.2402	105.1889
nutrition	0.2484	203.6292
philosophy	0.2540	88.0570
prehistory	0.2191	123.8685
professional_accounting	0.2695	60.2937
professional_law	0.2581	17.2965
professional_medicine	0.2868	107.5151
professional_psychology	0.2647	104.7847
public_relations	0.2727	94.3958
security_studies	0.3306	70.1510
sociology	0.2886	243.0351
us_foreign_policy	0.2000	206.4246
virology	0.1988	125.7791
world_religions	0.2515	423.8289

Limitations

As this is only the next word prediction model, it doesn't know how to interact with the user.

Training data

HuggingFaceFW/fineweb. Only this

Training metadata

Checkpoint date: 10.05.2026
Parameters: 125231616
Context length: 2048
Batch size: 72
Gradient accumulation: 4
Sequence length: 512
Training steps: 50000
Optimizer: Fused Muon with Hermes kernels
Learning rate schedule: cosine
Hardware: Frankenstein (2012 datacenter server with a RTX 5070Ti)

Contributing

Contributions are always welcome.

I am still a student, so the code and model may contain mistakes, bugs, or incorrect assumptions. If you find an issue or have an improvement, feel free to open an issue or submit a pull request. I will be happy.

Acknowledgements

Thanks to OpenAI, Hugging Face, PyTorch and Unsloth for making this kind of research and experimentation possible.

References:

Downloads last month: 8

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train k050506koch/GPT3-dev-125m-1005

Paper for k050506koch/GPT3-dev-125m-1005

Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 20

Evaluation results

HellaSwag Accuracy on HellaSwag
validation set self-reported

0.268
HellaSwag Perplexity on HellaSwag
validation set self-reported

34.311
MMLU Average Accuracy on MMLU
test set self-reported

0.255
MMLU Average Perplexity on MMLU
test set self-reported

141.983