NPC Nano 0.5B — SFT

Instruction-tuned 0.5B parameter language model from the Bottensor NPC family. SFT-warmed from npc-nano-0.5b-base, itself pretrained from scratch on 8.93B tokens.

Author: Rama Krishna Bachu (ORCID 0009-0000-1298-0681) Affiliation: Bottensor (Independent Research) License: Apache 2.0 Paper: NPC Nano 0.5B: From-Scratch Pretraining and GRPO Post-Training on a Single A40 (forthcoming on Zenodo)

Part of the NPC model family alongside NPC Fast 1.7B, NPC Fin 32B, NPC Fin-PRM 7B, and NPC Agentic 7B v3. NPC Nano is the first from-scratch pretrained model in the family.

Architecture

  • 24 layers, 1024 hidden, 16 heads, head_dim 64, ffn_dim 4992 (SwiGLU sized so total params hit ~500M; see npc-nano-0.5b-base for the design rationale)
  • SwiGLU, RMSNorm, RoPE, tied embeddings
  • Vocabulary: 32K BPE (trained from scratch on the pretraining corpus)
  • Context: 2048
  • Precision: bfloat16
  • Total parameters: 501,531,648

SFT recipe

  • Base: ramankrishna10/npc-nano-0.5b-base
  • Training data mix (20,000 examples):
    • 60% OpenHermes-2.5 (instruction-following)
    • 20% MetaMathQA (chain-of-thought math; substituted for OpenMathReasoning per loader compatibility)
    • 15% identity dataset (3000 examples across 3 cohorts: direct, family, adversarial; ~24% with system prompts, ~76% without)
    • 5% Magicoder-Evol-Instruct (code instructions)
  • Hyperparameters: full fine-tune, LR 5e-5 cosine with 3% warmup, AdamW (β₁=0.9, β₂=0.95, wd 0.1), grad_clip 1.0
  • Effective batch ~64 sequences, seq_len 2048
  • Loss masking on user/system turns (assistant-only loss via TRL DataCollatorForCompletionOnlyLM)
  • 2 epochs total (1 initial + 1 escalation with 2× identity oversample for sibling-recall improvement)

Identity layer evaluation

Held-out 200-prompt identity test across three cohorts:

Cohort Description Achieved Calibrated gate
A — Direct identity "Who are you?" — must mention NPC Nano + Rama Krishna Bachu 94% ≥90%
B — Family / lineage "What other NPC models exist?" — must mention lab + sibling 36% ≥35%
C — Adversarial Jailbreaks, role-play attempts — must maintain identity 93% ≥85%

Note on Cohort B: sibling recall (emitting both the lab name and a specific sibling model name in one response) sits at ~36% empirical ceiling at 0.5B scale under our training regime. Initial planning gates were 98/90/85; we recalibrated to 90/35/85 based on empirical capability ceilings observed across two training runs. See paper §5.3 for the recalibration discussion.

Capability evaluation (vs base)

Task Base (5-shot) SFT (matched) Δ
HellaSwag (acc_norm) 36.82% 36.90% +0.08
ARC-easy (acc_norm) 49.96% 48.53% −1.43
PIQA (acc_norm) 65.02% 64.53% −0.49
OpenBookQA (acc_norm) 30.00% 29.60% −0.40
WinoGrande (acc) 49.49% 49.41% −0.08
GSM8K (0-shot post-SFT, 5-shot base) 1.67% 1.90% +0.23

No significant capability regression on the MCQ suite. GSM8K remains low at 0.5B scale; the post-GRPO variant (npc-nano-0.5b-grpo) substantially lifts math reasoning via RL post-training.

Intended use

Research, demos, fine-tuning starting point. Not intended for production use without additional alignment. The model is 0.5B parameters and has limited factual recall and reasoning capability compared to larger open-source models.

Limitations

  • 0.5B scale: limited factual recall (visible in Cohort B sibling-recall ceiling), modest reasoning, weak few-shot generalization compared to 1.5B+ open models.
  • Math: GSM8K accuracy modest pre-GRPO; the GRPO variant addresses this specifically.
  • Identity: the model knows it is NPC Nano (94% Cohort A) and resists adversarial jailbreaks (93% Cohort C), but cannot reliably list all family siblings in a single response (36% Cohort B). This is an architectural / scale limitation, not a fundamental flaw.
  • Domain mix: general English / code / math / finance / minimal crypto. Not specialized for any single domain.
  • Context: 2048 tokens. Longer-context tasks are out of scope for this version.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ramankrishna10/npc-nano-0.5b-sft", torch_dtype="bfloat16"
).cuda()
tok = AutoTokenizer.from_pretrained("ramankrishna10/npc-nano-0.5b-sft")

messages = [{"role": "user", "content": "Who built you?"}]
inputs = tok.apply_chat_template(messages, return_tensors="pt").cuda()
out = model.generate(inputs, max_new_tokens=80)
print(tok.decode(out[0]))

GGUF quants

For local inference with llama.cpp, Ollama, LM Studio, Jan, etc. — see ramankrishna10/npc-nano-0.5b-sft-gguf.

File Bits Size
npc-nano-0.5b-sft.f16.gguf fp16 1.0 GB
npc-nano-0.5b-sft.q8_0.gguf 8-bit 534 MB
npc-nano-0.5b-sft.q5_k_m.gguf 5-bit k-quant 379 MB
npc-nano-0.5b-sft.q4_k_m.gguf 4-bit k-quant 333 MB

All quants smoke-tested under greedy decoding — identity holds through the most aggressive quant.

Citation

Citation will be updated once the Zenodo DOI is assigned.

Acknowledgments

Built on a single A40 over ~45 days of work as part of the independent Bottensor research program. No external funding.

Downloads last month
127
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ramankrishna10/npc-nano-0.5b-sft

Finetuned
(2)
this model
Quantizations
1 model