Instructions to use Mario12355/nanochat-de-537m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Mario12355/nanochat-de-537m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Mario12355/nanochat-de-537m", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Mario12355/nanochat-de-537m", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Mario12355/nanochat-de-537m", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Mario12355/nanochat-de-537m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Mario12355/nanochat-de-537m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Mario12355/nanochat-de-537m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Mario12355/nanochat-de-537m
- SGLang
How to use Mario12355/nanochat-de-537m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Mario12355/nanochat-de-537m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Mario12355/nanochat-de-537m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Mario12355/nanochat-de-537m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Mario12355/nanochat-de-537m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Mario12355/nanochat-de-537m with Docker Model Runner:
docker model run hf.co/Mario12355/nanochat-de-537m
nanochat-de-537m
A compact, German-only chat language model (537M), trained from scratch.
Model description
nanochat-de-537m is a small, German-only language model with about 537M parameters, trained from scratch (not derived from any existing model) using the nanochat framework. It was built as an academic project to explore how a language model can be trained end-to-end with modest resources.
The model is usable directly with 🤗 transformers (trust_remote_code=True); the
nanochat architecture ships alongside the weights as custom code.
| Architecture | nanochat GPT (RoPE, RMSNorm, ReLU², GQA, QK-norm, value embeddings, logit softcap) |
| Parameters | 537M total (201M non-embedding) |
| Layers / dim / heads | 16 / 1024 / 8 |
| Context length | 2048 tokens |
| Vocabulary | 32,768 (BPE / tiktoken) |
| Language | German |
| License | MIT |
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Mario12355/nanochat-de-537m"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True).eval()
if torch.cuda.is_available():
model = model.cuda()
messages = [{"role": "user", "content": "Wer bist du?"}]
ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=200, do_sample=True,
temperature=0.3, top_k=20, repetition_penalty=1.3)
print(tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True))
Recommended sampling (also the defaults in generation_config.json):
temperature=0.3, top_k=20, repetition_penalty=1.3. Low temperature keeps this
small model coherent; the repetition penalty prevents degenerate loops.
Training
- Pre-training: about 20B characters of German text from
FineWeb-2 (German split,
deu_Latn). - Supervised fine-tuning (SFT): Mario12355/german-sft-mix, plus curated and synthetically generated identity dialogues.
- Hardware: 2× NVIDIA RTX 3090.
- Framework: nanochat.
Evaluation
| Metric | Value |
|---|---|
| Validation bits-per-byte (held-out German SFT) | 0.503 |
The model is strongest on everyday German conversation. It is intentionally small, prioritising transparency and efficiency over peak capability.
Intended use & limitations
Intended for: German-language conversation, education, and demonstrating how an LLM can be trained from scratch.
Limitations: as a small model it is weak at factual knowledge, mathematics, and complex reasoning; it is German-only, has no internet access, keeps no memory beyond the current conversation, and can hallucinate. Not intended for production use. This inference implementation uses no KV cache (correct, but slower for long outputs).
Citation
@misc{nanochat-de-537m,
title = {nanochat-de-537m: a German language model trained from scratch},
year = {2026},
howpublished = {\url{https://huggingface.co/Mario12355/nanochat-de-537m}}
}
Acknowledgements
Built on the nanochat framework by Andrej Karpathy.
- Downloads last month
- 31
Datasets used to train Mario12355/nanochat-de-537m
Mario12355/german-sft-mix
Evaluation results
- Validation bits-per-byte (held-out German SFT)self-reported0.503