Instructions to use Mario12355/nanochat-de-537m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Mario12355/nanochat-de-537m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Mario12355/nanochat-de-537m", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Mario12355/nanochat-de-537m", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Mario12355/nanochat-de-537m", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Mario12355/nanochat-de-537m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Mario12355/nanochat-de-537m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Mario12355/nanochat-de-537m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Mario12355/nanochat-de-537m

SGLang

How to use Mario12355/nanochat-de-537m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Mario12355/nanochat-de-537m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Mario12355/nanochat-de-537m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Mario12355/nanochat-de-537m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Mario12355/nanochat-de-537m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Mario12355/nanochat-de-537m with Docker Model Runner:
```
docker model run hf.co/Mario12355/nanochat-de-537m
```

nanochat-de-537m

A compact, German-only chat language model (537M), trained from scratch.

Model description

nanochat-de-537m is a small, German-only language model with about 537M parameters, trained from scratch (not derived from any existing model) using the nanochat framework. It was built as an academic project to explore how a language model can be trained end-to-end with modest resources.

The model is usable directly with 🤗 transformers (trust_remote_code=True); the nanochat architecture ships alongside the weights as custom code.


Architecture	nanochat GPT (RoPE, RMSNorm, ReLU², GQA, QK-norm, value embeddings, logit softcap)
Parameters	537M total (201M non-embedding)
Layers / dim / heads	16 / 1024 / 8
Context length	2048 tokens
Vocabulary	32,768 (BPE / tiktoken)
Language	German
License	MIT

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "Mario12355/nanochat-de-537m"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True).eval()
if torch.cuda.is_available():
    model = model.cuda()

messages = [{"role": "user", "content": "Wer bist du?"}]
ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=200, do_sample=True,
                     temperature=0.3, top_k=20, repetition_penalty=1.3)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))

Recommended sampling (also the defaults in generation_config.json): temperature=0.3, top_k=20, repetition_penalty=1.3. Low temperature keeps this small model coherent; the repetition penalty prevents degenerate loops.

Training

Pre-training: about 20B characters of German text from FineWeb-2 (German split, deu_Latn).
Supervised fine-tuning (SFT): Mario12355/german-sft-mix, plus curated and synthetically generated identity dialogues.
Hardware: 2× NVIDIA RTX 3090.
Framework: nanochat.

Evaluation

Metric	Value
Validation bits-per-byte (held-out German SFT)	0.503

The model is strongest on everyday German conversation. It is intentionally small, prioritising transparency and efficiency over peak capability.

Intended use & limitations

Intended for: German-language conversation, education, and demonstrating how an LLM can be trained from scratch.

Limitations: as a small model it is weak at factual knowledge, mathematics, and complex reasoning; it is German-only, has no internet access, keeps no memory beyond the current conversation, and can hallucinate. Not intended for production use. This inference implementation uses no KV cache (correct, but slower for long outputs).

Citation

@misc{nanochat-de-537m,
  title  = {nanochat-de-537m: a German language model trained from scratch},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/Mario12355/nanochat-de-537m}}
}

Acknowledgements

Built on the nanochat framework by Andrej Karpathy.

Downloads last month: 31

Safetensors

Model size

0.5B params

Tensor type

F32

BF16

Datasets used to train Mario12355/nanochat-de-537m

Evaluation results

Validation bits-per-byte (held-out German SFT)
self-reported

0.503