Instructions to use k050506koch/GPT3-dev-125m-1005 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use k050506koch/GPT3-dev-125m-1005 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="k050506koch/GPT3-dev-125m-1005", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("k050506koch/GPT3-dev-125m-1005", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use k050506koch/GPT3-dev-125m-1005 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "k050506koch/GPT3-dev-125m-1005" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "k050506koch/GPT3-dev-125m-1005", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/k050506koch/GPT3-dev-125m-1005
- SGLang
How to use k050506koch/GPT3-dev-125m-1005 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "k050506koch/GPT3-dev-125m-1005" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "k050506koch/GPT3-dev-125m-1005", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "k050506koch/GPT3-dev-125m-1005" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "k050506koch/GPT3-dev-125m-1005", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use k050506koch/GPT3-dev-125m-1005 with Docker Model Runner:
docker model run hf.co/k050506koch/GPT3-dev-125m-1005
You can find all code on GitHub
Note: This is a model with 125 million parameters (attempt to replicate GPT-3 Small). (it's very undertrained.)
Note 2: This is a model checkpoint released on 10/05 2026 (72 batch size, 4 grad accumulation and 50000 steps under Muon optimizer). It scores 25.49% on MMLU which is slightly higher than 25% (random guess)
Note 3: This model already demonstrates basic abilities in generating text. It's not perfect and I will continue working on it. Expect Instruct model soon.
Model description
This is a small GPT-style autoregressive language model. It is intended as a development checkpoint, not as a production-ready assistant. But you can try.
This time I used kernels and Flash Attention 4 and Flash Attention 2 with the fallback to SDPA. This allowed me to cut the time required for one step from nearly 60 seconds (on jetson) to 3.6 seconds (on the server) and then to 2.2 seconds (using Unsloth kernels)
Important notes
This model is still undertrained. Its benchmark results are close to random-choice level on multiple-choice academic benchmarks, so the checkpoint should be treated as experimental.
It can generate basic text, but it may produce incorrect, repetitive, incoherent, or non-readable outputs. It is not instruction-tuned, but it can produce several meaningful paragraphs.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(k050506koch/GPT3-dev-125m-1009, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(k050506koch/GPT3-dev-125m-1009, trust_remote_code=True)
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
prompt = "He is a doctor. His main goal is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=96,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
no_repeat_ngram_size=3,
pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluation results
Evaluation was run locally on CPU with a custom evaluation script.
These results should not be compared directly with Open LLM Leaderboard results unless the same evaluation harness, prompt format, number of shots, and dataset splits are used.
Summary
| Benchmark | Accuracy | Perplexity |
|---|---|---|
| HellaSwag | 0.2677 | 34.3111 |
| MMLU average | 0.2549 | 141.9833 |
MMLU
| Task | Accuracy | Perplexity |
|---|---|---|
| abstract_algebra | 0.2600 | 182.4785 |
| anatomy | 0.2519 | 206.2038 |
| astronomy | 0.2303 | 166.3864 |
| business_ethics | 0.2800 | 145.5782 |
| clinical_knowledge | 0.1925 | 100.5738 |
| college_biology | 0.2847 | 162.7603 |
| college_chemistry | 0.2800 | 157.3521 |
| college_computer_science | 0.2200 | 132.0329 |
| college_mathematics | 0.2300 | 114.1684 |
| college_medicine | 0.2254 | 24.5343 |
| college_physics | 0.2353 | 115.2290 |
| computer_security | 0.2300 | 141.5838 |
| conceptual_physics | 0.2894 | 312.6869 |
| econometrics | 0.2632 | 135.2830 |
| electrical_engineering | 0.2690 | 259.6937 |
| elementary_mathematics | 0.2646 | 64.6184 |
| formal_logic | 0.2460 | 56.9265 |
| global_facts | 0.1500 | 89.0267 |
| high_school_biology | 0.2677 | 89.7088 |
| high_school_chemistry | 0.2562 | 123.2220 |
| high_school_computer_science | 0.2300 | 79.9634 |
| high_school_european_history | 0.2667 | 118.5012 |
| high_school_geography | 0.2980 | 156.3795 |
| high_school_government_and_politics | 0.2176 | 174.9534 |
| high_school_macroeconomics | 0.2462 | 132.2859 |
| high_school_mathematics | 0.2333 | 105.9731 |
| high_school_microeconomics | 0.2605 | 82.1080 |
| high_school_physics | 0.2715 | 71.0461 |
| high_school_psychology | 0.2624 | 137.8331 |
| high_school_statistics | 0.2824 | 61.6760 |
| high_school_us_history | 0.3039 | 88.8365 |
| high_school_world_history | 0.2447 | 74.1491 |
| human_aging | 0.2377 | 306.9222 |
| human_sexuality | 0.2595 | 110.5550 |
| international_law | 0.3223 | 211.6555 |
| jurisprudence | 0.2130 | 109.2910 |
| logical_fallacies | 0.2331 | 207.6864 |
| machine_learning | 0.2500 | 120.3576 |
| management | 0.3592 | 368.0460 |
| marketing | 0.2436 | 73.0363 |
| medical_genetics | 0.3100 | 296.1581 |
| miscellaneous | 0.2363 | 140.3008 |
| moral_disputes | 0.2370 | 111.0396 |
| moral_scenarios | 0.2402 | 105.1889 |
| nutrition | 0.2484 | 203.6292 |
| philosophy | 0.2540 | 88.0570 |
| prehistory | 0.2191 | 123.8685 |
| professional_accounting | 0.2695 | 60.2937 |
| professional_law | 0.2581 | 17.2965 |
| professional_medicine | 0.2868 | 107.5151 |
| professional_psychology | 0.2647 | 104.7847 |
| public_relations | 0.2727 | 94.3958 |
| security_studies | 0.3306 | 70.1510 |
| sociology | 0.2886 | 243.0351 |
| us_foreign_policy | 0.2000 | 206.4246 |
| virology | 0.1988 | 125.7791 |
| world_religions | 0.2515 | 423.8289 |
Limitations
As this is only the next word prediction model, it doesn't know how to interact with the user.
Training data
HuggingFaceFW/fineweb. Only this
Training metadata
Checkpoint date: 10.05.2026
Parameters: 125231616
Context length: 2048
Batch size: 72
Gradient accumulation: 4
Sequence length: 512
Training steps: 50000
Optimizer: Fused Muon with Hermes kernels
Learning rate schedule: cosine
Hardware: Frankenstein (2012 datacenter server with a RTX 5070Ti)
Contributing
Contributions are always welcome.
I am still a student, so the code and model may contain mistakes, bugs, or incorrect assumptions. If you find an issue or have an improvement, feel free to open an issue or submit a pull request. I will be happy.
Acknowledgements
Thanks to OpenAI, Hugging Face, PyTorch and Unsloth for making this kind of research and experimentation possible.
References:
- Downloads last month
- 8
Dataset used to train k050506koch/GPT3-dev-125m-1005
Paper for k050506koch/GPT3-dev-125m-1005
Evaluation results
- HellaSwag Accuracy on HellaSwagvalidation set self-reported0.268
- HellaSwag Perplexity on HellaSwagvalidation set self-reported34.311
- MMLU Average Accuracy on MMLUtest set self-reported0.255
- MMLU Average Perplexity on MMLUtest set self-reported141.983