Instructions to use ramankrishna10/npc-nano-0.5b-v2-math with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ramankrishna10/npc-nano-0.5b-v2-math with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ramankrishna10/npc-nano-0.5b-v2-math")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ramankrishna10/npc-nano-0.5b-v2-math")
model = AutoModelForCausalLM.from_pretrained("ramankrishna10/npc-nano-0.5b-v2-math", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ramankrishna10/npc-nano-0.5b-v2-math with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ramankrishna10/npc-nano-0.5b-v2-math"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ramankrishna10/npc-nano-0.5b-v2-math",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ramankrishna10/npc-nano-0.5b-v2-math

SGLang

How to use ramankrishna10/npc-nano-0.5b-v2-math with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ramankrishna10/npc-nano-0.5b-v2-math" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ramankrishna10/npc-nano-0.5b-v2-math",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ramankrishna10/npc-nano-0.5b-v2-math" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ramankrishna10/npc-nano-0.5b-v2-math",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ramankrishna10/npc-nano-0.5b-v2-math with Docker Model Runner:
```
docker model run hf.co/ramankrishna10/npc-nano-0.5b-v2-math
```

NPC Nano 0.5B v2-Math

A continued-pretraining variant of NPC Nano 0.5B base, trained on an additional 15B math-focused tokens on top of the original 8.93B-token pretraining run.

This model is published as a documented negative result. It does not improve GSM8K arithmetic-reasoning accuracy over the v1 base. It is released for reproducibility and as supporting evidence for the capacity-bottleneck argument in the NPC Nano paper.

What this model is

Base: NPC Nano 0.5B (from-scratch, 501M params, 8.93B tokens)
Continued pretraining: +15B tokens (60% open-web-math, 30% arxiv, both from EleutherAI/proof-pile-2; 10% fineweb-edu anti-forgetting buffer)
Total training: ~24B tokens
License: Apache 2.0 (lineage preserved)

Results

The headline finding: 15B additional math-dense tokens (≈2.7× the original training budget, ≈6× the math weighting) produced no measurable GSM8K improvement at 0.5B parameters.

Metric	v1 base	v2-math (+15B)	Δ
GSM8K (5-shot, flex)	1.67%	1.82%	+0.15pp (within noise)
ARC-easy	49.96%	58.71%	+8.75pp
HellaSwag	36.82%	36.77%	−0.05pp
PIQA	65.02%	65.45%	+0.43pp
OpenBookQA	30.00%	30.00%	+0.00pp
WinoGrande	49.49%	50.28%	+0.79pp

GSM8K trajectory across checkpoints: 1.67% (v1) → 1.82% (+3B) → 1.90% (+7B) → 1.82% (+15B). Every delta is within one standard error (±0.37pp).

Interpretation

The one real signal is ARC-easy +8.75pp (science multiple-choice), which saturated by +3B tokens. The model demonstrably absorbed the math/science distribution — it improved at recognizing scientific answers — but did not improve at generating multi-step arithmetic solutions (GSM8K).

This sharpens the capacity-bottleneck argument from the v1 paper: at 0.5B parameters, the GSM8K ceiling is not purely a matter of insufficient math exposure during pretraining. Adding substantially more math content moved some reasoning capabilities (science MCQ) but not arithmetic generation. The bottleneck is the model's capacity for the specific skill of multi-step number generation, not its exposure to math content.

Intended use

This model is primarily of interest for:

Reproducing the continued-pretraining experiment in the NPC Nano paper's Future Work section
Studying capability transfer vs. non-transfer at small scale
The improved science-MCQ capability (ARC-easy) if that specific capability is useful

For general use, the v1 SFT model remains the recommended NPC Nano artifact.

Honest notes

Held-out perplexity was not measured for this run (the validation split was cleaned during the multi-week training). The 6-task lm-eval suite is the authoritative signal.
Training spanned two W&B runs due to one pod restart, cleanly recovered via checkpoint resume. Training was contiguous.

Citation

Built on the methodology documented in:

@misc{bachu2026npcnano,
  author    = {Bachu, Rama Krishna},
  title     = {NPC Nano 0.5B: From-Scratch Pretraining and the Post-Training
               Capability Ceiling at Sub-1B Parameters},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20349362},
  url       = {https://doi.org/10.5281/zenodo.20349362}
}

Attribution

Continued-pretraining data: EleutherAI/proof-pile-2 (open-web-math, arxiv subsets) and HuggingFaceFW/fineweb-edu.

Author: Rama Krishna Bachu / Bottensor (Independent Research). ORCID 0009-0000-1298-0681.

Downloads last month: 1

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for ramankrishna10/npc-nano-0.5b-v2-math

Base model

ramankrishna10/npc-nano-0.5b-base

Finetuned

(2)

this model