Instructions to use nazdef/1gpu-llm-medium-en-it-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nazdef/1gpu-llm-medium-en-it-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nazdef/1gpu-llm-medium-en-it-base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nazdef/1gpu-llm-medium-en-it-base")
model = AutoModelForCausalLM.from_pretrained("nazdef/1gpu-llm-medium-en-it-base")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nazdef/1gpu-llm-medium-en-it-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nazdef/1gpu-llm-medium-en-it-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nazdef/1gpu-llm-medium-en-it-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/nazdef/1gpu-llm-medium-en-it-base

SGLang

How to use nazdef/1gpu-llm-medium-en-it-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nazdef/1gpu-llm-medium-en-it-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nazdef/1gpu-llm-medium-en-it-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nazdef/1gpu-llm-medium-en-it-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nazdef/1gpu-llm-medium-en-it-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use nazdef/1gpu-llm-medium-en-it-base with Docker Model Runner:
```
docker model run hf.co/nazdef/1gpu-llm-medium-en-it-base
```

1gpu-llm Medium EN/IT Base

This repository is the current ready-to-use base release for the 1gpu-llm medium EN/IT family.

1gpu-llm is a family of language models trained from scratch on a single consumer GPU.

For this release family, the reference training hardware is:

GPU: NVIDIA GeForce RTX 4060 Ti 16GB
training setup: single GPU
practical medium-model wall-clock target: about 5 days to reach the current medium base release class on this hardware

Concretely, this release packages the GPT2PreLN decay-family practical winner at step_14700:

family name: 1gpu-llm
model tier: medium
languages: English + Italian
context window: 2500 tokens
architecture: GPT-2-style decoder with pre-layernorm blocks
architecture config: architecture: gpt2, block_type: gpt2_prelayernorm
parameter count: 337,639,424 parameters (~337.639M) in the published Transformers export
released checkpoint: step_14700.pt
checkpoint role: official operational medium base for the current family

This is a base model, not an instruction-tuned chat model.

Provenance

non-decayed anchor run:
- stable-recipe-gpt2medium-gpt2preln-k20-wsd-lr2e-4-anchor20k-final2e5-webwiki
non-decayed anchor checkpoint:
- step_13500.pt
original decay-only parent run:
- 20260628_resume-gpt2medium-gpt2preln-k20-wsddecayonly-lr2e-4-anchor20k-final2e5-webwiki-step13500
replayed tail run:
- 20260629_resume-gpt2medium-gpt2preln-k20-wsddecayonly-rerunmissing-lr3p5294e5-anchor20k-final2e5-webwiki-step14200-to14850
released checkpoint:
- step_14700.pt

Practical reading:

the family produced three checkpoints with distinct roles:
- step_14250 = best pure scalar / benchmark checkpoint
- step_14700 = best practical balanced release candidate
- step_14500 = best behavior-oriented variant
this repo is the public medium base release, so it intentionally promotes step_14700 rather than the raw scalar champion step_14250

Training Data

This model was trained on the bilingual EN/IT web + wiki dataset:

dataset id on disk:
- 202605141153_fineweb50_wiki50_50en_50it_score100_2500context_5Btokens_tok_20260515_en50it50_webwiki_stratified_500M
context window during training: 2500 tokens
packing length: 2500
mixing strategy: source_balanced
validation ratio: 0.05

Main source groups:

English FineWeb-HQ (epfml/FineWeb-HQ)
Italian FineWeb2-HQ (epfml/FineWeb2-HQ)
English Wiki40B (google/wiki40b)
Italian Wiki40B (google/wiki40b)

How Many Tokens This Checkpoint Saw

Training math:

sequence length: 2500
batch size: 2
grad accumulation: 48
tokens per optimizer step: 239,904

So this checkpoint saw approximately:

3.5265888B tokens total by step_14700
about 287.88M extra tokens during the decay-only continuation beyond the non-decayed anchor step_13500

Why This Checkpoint Was Chosen

The final comparable GPU benchmark on the shortlisted medium family checkpoints kept the roles separate on purpose.

Pure benchmark/loss ranking:

step_14250: val_loss_mixed = 4.4419
step_14700: val_loss_mixed = 4.4436
step_13500: val_loss_mixed = 4.4690
step_14500: val_loss_mixed = 4.4926
step_15700: val_loss_mixed = 4.5038

So step_14250 is still the scalar winner.

But the release decision for the single public medium base model used the practical read, not only the thinnest scalar margin:

the loss gap between 14250 and 14700 is only about +0.0016
14700 is cleaner on the practical behavior proxies:
- loop_rate = 0.375 vs 0.425
- repeated_4gram_rate = 0.750 vs 0.775
- language_consistency_en = 1.000 vs 0.950
- language_consistency_it = 0.850 vs 0.825
and in the checkpoint-specific decoding sweep, 14700 produced the strongest holdout result among the real release candidates

So this repo promotes the checkpoint that is the best compromise for an operational family base release, not just the one that wins the scalar leaderboard by the smallest possible edge.

Main Metrics for `step_14700`

val_loss_mixed = 4.4436
val_loss_en = 4.3929
val_loss_it = 3.5830
ppl_mixed = 85.0781
ppl_en = 80.8710
ppl_it = 35.9822

Behavior snapshot:

loop_rate = 0.375
distinct_2 = 0.5643
repeated_4gram_rate = 0.750
language_consistency_en = 1.000
language_consistency_it = 0.850

Source losses:

books_en = 4.4110
books_it = 4.3904
code = 7.7208
web_en = 5.4893
web_it = 5.4087
wiki_en = 2.9984
wiki_it = 2.9202

Short honest read:

this is not the best pure scalar checkpoint
it is the best practical balanced checkpoint of the medium family
it keeps near-best loss while degrading less badly into loop/repetition than the stricter scalar winner
this is the checkpoint to use when you want the official single-repo medium base of the family

Recommended Decoding

The repo-native decoding sweep was run on this exact checkpoint.

Raw sweep result:

tuning winner: creative
holdout winner: creative

Public default:

keep balanced as the recommended preset for the published family-base card
rationale:
- Naz explicitly prefers balanced as the default unless creative wins clearly enough to justify the more aggressive preset
- on this checkpoint, creative does win the holdout score, but not by a margin large enough to force a louder default for the public base release
- practical delta:
  - creative holdout score = 2.6369
  - balanced holdout score = 2.4656
  - delta = +0.1713
- so the repo keeps the stronger exploratory preset documented, but ships the calmer preset as the default recommendation

Recommended generation params (balanced):

do_sample = true
temperature = 0.8
top_k = 50
top_p = 0.95
repetition_penalty = 1.1
no_repeat_ngram_size = 0
max_new_tokens = 64

Holdout metrics for the recommended preset:

score = 2.4656
completion_rate = 1.0
distinct_2 = 0.9878
language_consistency_mean = 0.6667
loop_rate = 0.0
repeated_4gram_rate = 0.0
language_switch_rate_mean = 0.2500
length_closeness = 0.9355

If you want the higher-scoring exploratory preset from the sweep instead:

creative
- temperature = 1.0
- top_k = 100
- holdout score = 2.6369

Both generation_config.json and recommended_decoding_params.json are included in the repo.

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo_id = "nazdef/1gpu-llm-medium-en-it-base"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)

prompt = "La capitale d'Italia è"
prompt_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
bos = torch.tensor([[tokenizer.bos_token_id]], dtype=prompt_ids["input_ids"].dtype)
input_ids = torch.cat([bos, prompt_ids["input_ids"]], dim=1)
attention_mask = torch.ones_like(input_ids)

outputs = model.generate(
    input_ids=input_ids,
    attention_mask=attention_mask,
    do_sample=True,
    max_new_tokens=64,
    temperature=0.8,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.1,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Files Included

original .pt checkpoint
exported checkpoint-native .safetensors weights plus metadata sidecar
standard Transformers model.safetensors
Transformers config.json
tokenizer files
training config
resumed-run telemetry (best_validation.json, metrics.jsonl, eval_metrics.jsonl, probe_generations.jsonl)
repo-native benchmark bundle (summary.json, comparison.json, comparison.csv, metrics.json, metrics.csv, source_losses.json, report.md, generations.jsonl, generations_comparison.md, cloze_results.jsonl)
decoding search bundle (decoding_summary.json, decoding_report.md, tuning_leaderboard.csv, holdout_leaderboard.csv, tuning_generations.jsonl, holdout_generations.jsonl)
recommended generation settings (generation_config.json, recommended_decoding_params.json)
release note release_note.md

Intended Use

Use this model as:

the current medium bilingual base checkpoint of the 1gpu-llm family
a base for future SFT or downstream instruction tuning
a single-GPU from-scratch EN/IT medium reference model

Do not read this repo as:

proof that 14700 is the best checkpoint on every possible axis
a claim that it beats the scalar winner 14250 on pure loss
an instruction-following or safety-tuned assistant model

License

This release is published with CC-BY-SA-4.0 as the practical downstream posture for the mixed training corpus used here.

The training mix includes:

FineWeb-HQ / FineWeb2-HQ web data
Wiki40B English and Italian slices

Downstream users are responsible for checking whether their use, redistribution, or derivative packaging remains compatible with the obligations of the upstream datasets and their terms.

Downloads last month: 115

Safetensors

Model size

0.3B params

Tensor type

F32

Datasets used to train nazdef/1gpu-llm-medium-en-it-base

Collection including nazdef/1gpu-llm-medium-en-it-base

1GPU LLM

Collection

Language models trained from scratch on a single consumer GPU. • 3 items • Updated about 8 hours ago