Instructions to use ramankrishna10/npc-nano-0.5b-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ramankrishna10/npc-nano-0.5b-sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ramankrishna10/npc-nano-0.5b-sft") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ramankrishna10/npc-nano-0.5b-sft") model = AutoModelForCausalLM.from_pretrained("ramankrishna10/npc-nano-0.5b-sft") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ramankrishna10/npc-nano-0.5b-sft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ramankrishna10/npc-nano-0.5b-sft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ramankrishna10/npc-nano-0.5b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ramankrishna10/npc-nano-0.5b-sft
- SGLang
How to use ramankrishna10/npc-nano-0.5b-sft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ramankrishna10/npc-nano-0.5b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ramankrishna10/npc-nano-0.5b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ramankrishna10/npc-nano-0.5b-sft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ramankrishna10/npc-nano-0.5b-sft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ramankrishna10/npc-nano-0.5b-sft with Docker Model Runner:
docker model run hf.co/ramankrishna10/npc-nano-0.5b-sft
NPC Nano 0.5B — SFT
Instruction-tuned 0.5B parameter language model from the Bottensor NPC family. SFT-warmed from npc-nano-0.5b-base, itself pretrained from scratch on 8.93B tokens.
Author: Rama Krishna Bachu (ORCID 0009-0000-1298-0681) Affiliation: Bottensor (Independent Research) License: Apache 2.0 Paper: NPC Nano 0.5B: From-Scratch Pretraining and GRPO Post-Training on a Single A40 (forthcoming on Zenodo)
Part of the NPC model family alongside NPC Fast 1.7B, NPC Fin 32B, NPC Fin-PRM 7B, and NPC Agentic 7B v3. NPC Nano is the first from-scratch pretrained model in the family.
Architecture
- 24 layers, 1024 hidden, 16 heads, head_dim 64, ffn_dim 4992 (SwiGLU sized so total params hit ~500M; see npc-nano-0.5b-base for the design rationale)
- SwiGLU, RMSNorm, RoPE, tied embeddings
- Vocabulary: 32K BPE (trained from scratch on the pretraining corpus)
- Context: 2048
- Precision: bfloat16
- Total parameters: 501,531,648
SFT recipe
- Base:
ramankrishna10/npc-nano-0.5b-base - Training data mix (20,000 examples):
- 60% OpenHermes-2.5 (instruction-following)
- 20% MetaMathQA (chain-of-thought math; substituted for OpenMathReasoning per loader compatibility)
- 15% identity dataset (3000 examples across 3 cohorts: direct, family, adversarial; ~24% with system prompts, ~76% without)
- 5% Magicoder-Evol-Instruct (code instructions)
- Hyperparameters: full fine-tune, LR 5e-5 cosine with 3% warmup, AdamW (β₁=0.9, β₂=0.95, wd 0.1), grad_clip 1.0
- Effective batch ~64 sequences, seq_len 2048
- Loss masking on user/system turns (assistant-only loss via TRL
DataCollatorForCompletionOnlyLM) - 2 epochs total (1 initial + 1 escalation with 2× identity oversample for sibling-recall improvement)
Identity layer evaluation
Held-out 200-prompt identity test across three cohorts:
| Cohort | Description | Achieved | Calibrated gate |
|---|---|---|---|
| A — Direct identity | "Who are you?" — must mention NPC Nano + Rama Krishna Bachu | 94% | ≥90% |
| B — Family / lineage | "What other NPC models exist?" — must mention lab + sibling | 36% | ≥35% |
| C — Adversarial | Jailbreaks, role-play attempts — must maintain identity | 93% | ≥85% |
Note on Cohort B: sibling recall (emitting both the lab name and a specific sibling model name in one response) sits at ~36% empirical ceiling at 0.5B scale under our training regime. Initial planning gates were 98/90/85; we recalibrated to 90/35/85 based on empirical capability ceilings observed across two training runs. See paper §5.3 for the recalibration discussion.
Capability evaluation (vs base)
| Task | Base (5-shot) | SFT (matched) | Δ |
|---|---|---|---|
| HellaSwag (acc_norm) | 36.82% | 36.90% | +0.08 |
| ARC-easy (acc_norm) | 49.96% | 48.53% | −1.43 |
| PIQA (acc_norm) | 65.02% | 64.53% | −0.49 |
| OpenBookQA (acc_norm) | 30.00% | 29.60% | −0.40 |
| WinoGrande (acc) | 49.49% | 49.41% | −0.08 |
| GSM8K (0-shot post-SFT, 5-shot base) | 1.67% | 1.90% | +0.23 |
No significant capability regression on the MCQ suite. GSM8K remains low at 0.5B scale; the post-GRPO variant (npc-nano-0.5b-grpo) substantially lifts math reasoning via RL post-training.
Intended use
Research, demos, fine-tuning starting point. Not intended for production use without additional alignment. The model is 0.5B parameters and has limited factual recall and reasoning capability compared to larger open-source models.
Limitations
- 0.5B scale: limited factual recall (visible in Cohort B sibling-recall ceiling), modest reasoning, weak few-shot generalization compared to 1.5B+ open models.
- Math: GSM8K accuracy modest pre-GRPO; the GRPO variant addresses this specifically.
- Identity: the model knows it is NPC Nano (94% Cohort A) and resists adversarial jailbreaks (93% Cohort C), but cannot reliably list all family siblings in a single response (36% Cohort B). This is an architectural / scale limitation, not a fundamental flaw.
- Domain mix: general English / code / math / finance / minimal crypto. Not specialized for any single domain.
- Context: 2048 tokens. Longer-context tasks are out of scope for this version.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ramankrishna10/npc-nano-0.5b-sft", torch_dtype="bfloat16"
).cuda()
tok = AutoTokenizer.from_pretrained("ramankrishna10/npc-nano-0.5b-sft")
messages = [{"role": "user", "content": "Who built you?"}]
inputs = tok.apply_chat_template(messages, return_tensors="pt").cuda()
out = model.generate(inputs, max_new_tokens=80)
print(tok.decode(out[0]))
GGUF quants
For local inference with llama.cpp, Ollama, LM Studio, Jan, etc. — see ramankrishna10/npc-nano-0.5b-sft-gguf.
| File | Bits | Size |
|---|---|---|
npc-nano-0.5b-sft.f16.gguf |
fp16 | 1.0 GB |
npc-nano-0.5b-sft.q8_0.gguf |
8-bit | 534 MB |
npc-nano-0.5b-sft.q5_k_m.gguf |
5-bit k-quant | 379 MB |
npc-nano-0.5b-sft.q4_k_m.gguf |
4-bit k-quant | 333 MB |
All quants smoke-tested under greedy decoding — identity holds through the most aggressive quant.
Citation
Citation will be updated once the Zenodo DOI is assigned.
Acknowledgments
Built on a single A40 over ~45 days of work as part of the independent Bottensor research program. No external funding.
- Downloads last month
- 127