Instructions to use nazdef/1gpu-llm-medium-en-it-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nazdef/1gpu-llm-medium-en-it-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nazdef/1gpu-llm-medium-en-it-base")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nazdef/1gpu-llm-medium-en-it-base") model = AutoModelForCausalLM.from_pretrained("nazdef/1gpu-llm-medium-en-it-base") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nazdef/1gpu-llm-medium-en-it-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nazdef/1gpu-llm-medium-en-it-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nazdef/1gpu-llm-medium-en-it-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/nazdef/1gpu-llm-medium-en-it-base
- SGLang
How to use nazdef/1gpu-llm-medium-en-it-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nazdef/1gpu-llm-medium-en-it-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nazdef/1gpu-llm-medium-en-it-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nazdef/1gpu-llm-medium-en-it-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nazdef/1gpu-llm-medium-en-it-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use nazdef/1gpu-llm-medium-en-it-base with Docker Model Runner:
docker model run hf.co/nazdef/1gpu-llm-medium-en-it-base
1gpu-llm Medium EN/IT Base
This repository is the current ready-to-use base release for the 1gpu-llm
medium EN/IT family.
1gpu-llm is a family of language models trained from scratch on a single
consumer GPU.
For this release family, the reference training hardware is:
- GPU: NVIDIA GeForce RTX 4060 Ti 16GB
- training setup: single GPU
- practical medium-model wall-clock target: about 5 days to reach the current medium base release class on this hardware
Concretely, this release packages the GPT2PreLN decay-family practical winner
at step_14700:
- family name:
1gpu-llm - model tier:
medium - languages: English + Italian
- context window:
2500tokens - architecture: GPT-2-style decoder with pre-layernorm blocks
- architecture config:
architecture: gpt2,block_type: gpt2_prelayernorm - parameter count:
337,639,424parameters (~337.639M) in the published Transformers export - released checkpoint:
step_14700.pt - checkpoint role: official operational medium base for the current family
This is a base model, not an instruction-tuned chat model.
Provenance
- non-decayed anchor run:
stable-recipe-gpt2medium-gpt2preln-k20-wsd-lr2e-4-anchor20k-final2e5-webwiki
- non-decayed anchor checkpoint:
step_13500.pt
- original decay-only parent run:
20260628_resume-gpt2medium-gpt2preln-k20-wsddecayonly-lr2e-4-anchor20k-final2e5-webwiki-step13500
- replayed tail run:
20260629_resume-gpt2medium-gpt2preln-k20-wsddecayonly-rerunmissing-lr3p5294e5-anchor20k-final2e5-webwiki-step14200-to14850
- released checkpoint:
step_14700.pt
Practical reading:
- the family produced three checkpoints with distinct roles:
step_14250= best pure scalar / benchmark checkpointstep_14700= best practical balanced release candidatestep_14500= best behavior-oriented variant
- this repo is the public medium base release, so it intentionally promotes
step_14700rather than the raw scalar championstep_14250
Training Data
This model was trained on the bilingual EN/IT web + wiki dataset:
- dataset id on disk:
202605141153_fineweb50_wiki50_50en_50it_score100_2500context_5Btokens_tok_20260515_en50it50_webwiki_stratified_500M
- context window during training:
2500tokens - packing length:
2500 - mixing strategy:
source_balanced - validation ratio:
0.05
Main source groups:
- English FineWeb-HQ (
epfml/FineWeb-HQ) - Italian FineWeb2-HQ (
epfml/FineWeb2-HQ) - English Wiki40B (
google/wiki40b) - Italian Wiki40B (
google/wiki40b)
How Many Tokens This Checkpoint Saw
Training math:
- sequence length:
2500 - batch size:
2 - grad accumulation:
48 - tokens per optimizer step:
239,904
So this checkpoint saw approximately:
3.5265888Btokens total bystep_14700- about
287.88Mextra tokens during the decay-only continuation beyond the non-decayed anchorstep_13500
Why This Checkpoint Was Chosen
The final comparable GPU benchmark on the shortlisted medium family checkpoints kept the roles separate on purpose.
Pure benchmark/loss ranking:
step_14250:val_loss_mixed = 4.4419step_14700:val_loss_mixed = 4.4436step_13500:val_loss_mixed = 4.4690step_14500:val_loss_mixed = 4.4926step_15700:val_loss_mixed = 4.5038
So step_14250 is still the scalar winner.
But the release decision for the single public medium base model used the practical read, not only the thinnest scalar margin:
- the loss gap between
14250and14700is only about+0.0016 14700is cleaner on the practical behavior proxies:loop_rate = 0.375vs0.425repeated_4gram_rate = 0.750vs0.775language_consistency_en = 1.000vs0.950language_consistency_it = 0.850vs0.825
- and in the checkpoint-specific decoding sweep,
14700produced the strongest holdout result among the real release candidates
So this repo promotes the checkpoint that is the best compromise for an operational family base release, not just the one that wins the scalar leaderboard by the smallest possible edge.
Main Metrics for step_14700
val_loss_mixed = 4.4436val_loss_en = 4.3929val_loss_it = 3.5830ppl_mixed = 85.0781ppl_en = 80.8710ppl_it = 35.9822
Behavior snapshot:
loop_rate = 0.375distinct_2 = 0.5643repeated_4gram_rate = 0.750language_consistency_en = 1.000language_consistency_it = 0.850
Source losses:
books_en = 4.4110books_it = 4.3904code = 7.7208web_en = 5.4893web_it = 5.4087wiki_en = 2.9984wiki_it = 2.9202
Short honest read:
- this is not the best pure scalar checkpoint
- it is the best practical balanced checkpoint of the medium family
- it keeps near-best loss while degrading less badly into loop/repetition than the stricter scalar winner
- this is the checkpoint to use when you want the official single-repo medium base of the family
Recommended Decoding
The repo-native decoding sweep was run on this exact checkpoint.
Raw sweep result:
- tuning winner:
creative - holdout winner:
creative
Public default:
- keep
balancedas the recommended preset for the published family-base card - rationale:
- Naz explicitly prefers
balancedas the default unlesscreativewins clearly enough to justify the more aggressive preset - on this checkpoint,
creativedoes win the holdout score, but not by a margin large enough to force a louder default for the public base release - practical delta:
creativeholdout score =2.6369balancedholdout score =2.4656- delta =
+0.1713
- so the repo keeps the stronger exploratory preset documented, but ships the calmer preset as the default recommendation
- Naz explicitly prefers
Recommended generation params (balanced):
do_sample = truetemperature = 0.8top_k = 50top_p = 0.95repetition_penalty = 1.1no_repeat_ngram_size = 0max_new_tokens = 64
Holdout metrics for the recommended preset:
score = 2.4656completion_rate = 1.0distinct_2 = 0.9878language_consistency_mean = 0.6667loop_rate = 0.0repeated_4gram_rate = 0.0language_switch_rate_mean = 0.2500length_closeness = 0.9355
If you want the higher-scoring exploratory preset from the sweep instead:
creativetemperature = 1.0top_k = 100- holdout score =
2.6369
Both generation_config.json and recommended_decoding_params.json are
included in the repo.
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo_id = "nazdef/1gpu-llm-medium-en-it-base"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
prompt = "La capitale d'Italia è"
prompt_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
bos = torch.tensor([[tokenizer.bos_token_id]], dtype=prompt_ids["input_ids"].dtype)
input_ids = torch.cat([bos, prompt_ids["input_ids"]], dim=1)
attention_mask = torch.ones_like(input_ids)
outputs = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
do_sample=True,
max_new_tokens=64,
temperature=0.8,
top_k=50,
top_p=0.95,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Files Included
- original
.ptcheckpoint - exported checkpoint-native
.safetensorsweights plus metadata sidecar - standard Transformers
model.safetensors - Transformers
config.json - tokenizer files
- training config
- resumed-run telemetry (
best_validation.json,metrics.jsonl,eval_metrics.jsonl,probe_generations.jsonl) - repo-native benchmark bundle (
summary.json,comparison.json,comparison.csv,metrics.json,metrics.csv,source_losses.json,report.md,generations.jsonl,generations_comparison.md,cloze_results.jsonl) - decoding search bundle (
decoding_summary.json,decoding_report.md,tuning_leaderboard.csv,holdout_leaderboard.csv,tuning_generations.jsonl,holdout_generations.jsonl) - recommended generation settings (
generation_config.json,recommended_decoding_params.json) - release note
release_note.md
Intended Use
Use this model as:
- the current medium bilingual base checkpoint of the
1gpu-llmfamily - a base for future SFT or downstream instruction tuning
- a single-GPU from-scratch EN/IT medium reference model
Do not read this repo as:
- proof that
14700is the best checkpoint on every possible axis - a claim that it beats the scalar winner
14250on pure loss - an instruction-following or safety-tuned assistant model
License
This release is published with CC-BY-SA-4.0 as the practical downstream
posture for the mixed training corpus used here.
The training mix includes:
- FineWeb-HQ / FineWeb2-HQ web data
- Wiki40B English and Italian slices
Downstream users are responsible for checking whether their use, redistribution, or derivative packaging remains compatible with the obligations of the upstream datasets and their terms.
- Downloads last month
- 115