Instructions to use ramankrishna10/npc-nano-0.5b-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ramankrishna10/npc-nano-0.5b-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ramankrishna10/npc-nano-0.5b-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ramankrishna10/npc-nano-0.5b-sft")
model = AutoModelForCausalLM.from_pretrained("ramankrishna10/npc-nano-0.5b-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ramankrishna10/npc-nano-0.5b-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ramankrishna10/npc-nano-0.5b-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ramankrishna10/npc-nano-0.5b-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ramankrishna10/npc-nano-0.5b-sft

SGLang

How to use ramankrishna10/npc-nano-0.5b-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ramankrishna10/npc-nano-0.5b-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ramankrishna10/npc-nano-0.5b-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ramankrishna10/npc-nano-0.5b-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ramankrishna10/npc-nano-0.5b-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ramankrishna10/npc-nano-0.5b-sft with Docker Model Runner:
```
docker model run hf.co/ramankrishna10/npc-nano-0.5b-sft
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

NPC Nano 0.5B — SFT

Instruction-tuned 0.5B parameter language model from the Bottensor NPC family. SFT-warmed from npc-nano-0.5b-base, itself pretrained from scratch on 8.93B tokens.

Author: Rama Krishna Bachu (ORCID 0009-0000-1298-0681) Affiliation: Bottensor (Independent Research) License: Apache 2.0 Paper: NPC Nano 0.5B: From-Scratch Pretraining and GRPO Post-Training on a Single A40 (forthcoming on Zenodo)

Part of the NPC model family alongside NPC Fast 1.7B, NPC Fin 32B, NPC Fin-PRM 7B, and NPC Agentic 7B v3. NPC Nano is the first from-scratch pretrained model in the family.

Architecture

24 layers, 1024 hidden, 16 heads, head_dim 64, ffn_dim 4992 (SwiGLU sized so total params hit ~500M; see npc-nano-0.5b-base for the design rationale)
SwiGLU, RMSNorm, RoPE, tied embeddings
Vocabulary: 32K BPE (trained from scratch on the pretraining corpus)
Context: 2048
Precision: bfloat16
Total parameters: 501,531,648

SFT recipe

Base: ramankrishna10/npc-nano-0.5b-base
Training data mix (20,000 examples):
- 60% OpenHermes-2.5 (instruction-following)
- 20% MetaMathQA (chain-of-thought math; substituted for OpenMathReasoning per loader compatibility)
- 15% identity dataset (3000 examples across 3 cohorts: direct, family, adversarial; ~24% with system prompts, ~76% without)
- 5% Magicoder-Evol-Instruct (code instructions)
Hyperparameters: full fine-tune, LR 5e-5 cosine with 3% warmup, AdamW (β₁=0.9, β₂=0.95, wd 0.1), grad_clip 1.0
Effective batch ~64 sequences, seq_len 2048
Loss masking on user/system turns (assistant-only loss via TRL DataCollatorForCompletionOnlyLM)
2 epochs total (1 initial + 1 escalation with 2× identity oversample for sibling-recall improvement)

Identity layer evaluation

Held-out 200-prompt identity test across three cohorts:

Cohort	Description	Achieved	Calibrated gate
A — Direct identity	"Who are you?" — must mention NPC Nano + Rama Krishna Bachu	94%	≥90%
B — Family / lineage	"What other NPC models exist?" — must mention lab + sibling	36%	≥35%
C — Adversarial	Jailbreaks, role-play attempts — must maintain identity	93%	≥85%

Note on Cohort B: sibling recall (emitting both the lab name and a specific sibling model name in one response) sits at ~36% empirical ceiling at 0.5B scale under our training regime. Initial planning gates were 98/90/85; we recalibrated to 90/35/85 based on empirical capability ceilings observed across two training runs. See paper §5.3 for the recalibration discussion.

Capability evaluation (vs base)

Task	Base (5-shot)	SFT (matched)	Δ
HellaSwag (acc_norm)	36.82%	36.90%	+0.08
ARC-easy (acc_norm)	49.96%	48.53%	−1.43
PIQA (acc_norm)	65.02%	64.53%	−0.49
OpenBookQA (acc_norm)	30.00%	29.60%	−0.40
WinoGrande (acc)	49.49%	49.41%	−0.08
GSM8K (0-shot post-SFT, 5-shot base)	1.67%	1.90%	+0.23

No significant capability regression on the MCQ suite. GSM8K remains low at 0.5B scale; the post-GRPO variant (npc-nano-0.5b-grpo) substantially lifts math reasoning via RL post-training.

Intended use

Research, demos, fine-tuning starting point. Not intended for production use without additional alignment. The model is 0.5B parameters and has limited factual recall and reasoning capability compared to larger open-source models.

Limitations

0.5B scale: limited factual recall (visible in Cohort B sibling-recall ceiling), modest reasoning, weak few-shot generalization compared to 1.5B+ open models.
Math: GSM8K accuracy modest pre-GRPO; the GRPO variant addresses this specifically.
Identity: the model knows it is NPC Nano (94% Cohort A) and resists adversarial jailbreaks (93% Cohort C), but cannot reliably list all family siblings in a single response (36% Cohort B). This is an architectural / scale limitation, not a fundamental flaw.
Domain mix: general English / code / math / finance / minimal crypto. Not specialized for any single domain.
Context: 2048 tokens. Longer-context tasks are out of scope for this version.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ramankrishna10/npc-nano-0.5b-sft", torch_dtype="bfloat16"
).cuda()
tok = AutoTokenizer.from_pretrained("ramankrishna10/npc-nano-0.5b-sft")

messages = [{"role": "user", "content": "Who built you?"}]
inputs = tok.apply_chat_template(messages, return_tensors="pt").cuda()
out = model.generate(inputs, max_new_tokens=80)
print(tok.decode(out[0]))

GGUF quants

For local inference with llama.cpp, Ollama, LM Studio, Jan, etc. — see ramankrishna10/npc-nano-0.5b-sft-gguf.

File	Bits	Size
`npc-nano-0.5b-sft.f16.gguf`	fp16	1.0 GB
`npc-nano-0.5b-sft.q8_0.gguf`	8-bit	534 MB
`npc-nano-0.5b-sft.q5_k_m.gguf`	5-bit k-quant	379 MB
`npc-nano-0.5b-sft.q4_k_m.gguf`	4-bit k-quant	333 MB

All quants smoke-tested under greedy decoding — identity holds through the most aggressive quant.

Citation

Citation will be updated once the Zenodo DOI is assigned.

Acknowledgments

Built on a single A40 over ~45 days of work as part of the independent Bottensor research program. No external funding.

Downloads last month: 127

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for ramankrishna10/npc-nano-0.5b-sft

Base model

ramankrishna10/npc-nano-0.5b-base

Finetuned

(2)

this model

Quantizations

1 model