Instructions to use leopard-Ai/rb-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use leopard-Ai/rb-nano with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="leopard-Ai/rb-nano")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("leopard-Ai/rb-nano") model = AutoModelForCausalLM.from_pretrained("leopard-Ai/rb-nano") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use leopard-Ai/rb-nano with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "leopard-Ai/rb-nano" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leopard-Ai/rb-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/leopard-Ai/rb-nano
- SGLang
How to use leopard-Ai/rb-nano with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "leopard-Ai/rb-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leopard-Ai/rb-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "leopard-Ai/rb-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leopard-Ai/rb-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use leopard-Ai/rb-nano with Docker Model Runner:
docker model run hf.co/leopard-Ai/rb-nano
rb-nano
A 48M-parameter, GPT-2-style decoder-only transformer trained from scratch as part of the Leopard AI Model Suite. Small enough to run on CPU or any GPU; built as a learning/research model, not a production assistant.
Run it instantly with Ollama:
ollama run rafi-dev/rb-nanoOr grab the quantized build from the GGUF repo (rb-nano-GGUF) forllama.cpp.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("leopard-Ai/rb-nano")
tok = AutoTokenizer.from_pretrained("leopard-Ai/rb-nano")
prompt = "<sos>user: hello\nai:"
ids = tok(prompt, return_tensors="pt").input_ids
out = model.generate(
ids, max_new_tokens=64, do_sample=True,
temperature=0.7, top_k=40, top_p=0.9, repetition_penalty=1.3,
eos_token_id=2, pad_token_id=0,
)
print(tok.decode(out[0], skip_special_tokens=True))
It loads as a standard GPT2LMHeadModel β no trust_remote_code required.
Prompt format
Trained on a simple user: / ai: turn format, prefixed with the <sos> token:
<sos>user: hello
ai: Hi there! How can I help you today?
user: what is python?
ai:
Architecture
| Type | Decoder-only transformer (GPT-2 family) |
| Parameters | ~48M |
Embedding dim (n_embd) |
512 |
| Layers | 10 |
| Attention heads | 8 |
| Context length | 1024 tokens |
| Position embeddings | Learned |
| Norm / activation | LayerNorm, GELU-tanh (gelu_new) |
| Head | Weight-tied to token embeddings |
| Tokenizer | ByteLevel BPE, 32k vocab |
| Format | safetensors (fp32) |
Training
- Pretrain β FineWeb-Edu (
sample-10BT), ~50M tokens. Final val loss β 3.44. - Finetune β Alpaca, Alpaca-cleaned, CodeAlpaca-20k, Dolly-15k, and ShareGPT (full multi-turn threads, loss masked to assistant turns only). Final val loss β 2.67.
Recommended parameters
temperature 0.7
top_k 40
top_p 0.9
repeat_penalty 1.3
Limitations
- Knowledge. At 48M params the model has very limited factual knowledge and will confidently hallucinate (made-up libraries, wrong dates, etc.). It cannot be a reliable source of facts.
- Coherence. Good for short exchanges; longer or more technical answers drift.
- Scope. English-centric, 1024-token context. Best for demos, experimentation, and edge/CPU inference β not production use.
License / attribution
Released under CC BY-NC 4.0 (non-commercial, attribution required). The finetune mixes datasets with non-commercial terms (Alpaca, CodeAlpaca, ShareGPT β OpenAI-derived), so commercial use is not granted. Trained on publicly available datasets (FineWeb-Edu, Alpaca, Dolly, CodeAlpaca, ShareGPT); review each dataset's license before redistributing derived outputs.
Made with care
rb-nano was built by Rafi (13 years old) and Buddi (10 years old) β pretrained and finetuned from scratch on a single RTX 4070 (8 GB VRAM). It's a passion project: proof that a coherent little chat model can be trained end-to-end on consumer hardware.
If you enjoy it and want to support more experiments like this, you can buy us a coffee β. Thank you for trying rb-nano β we hope you like it.
rb-super might come
we plan to make rb-super 120 milion parameters
- Downloads last month
- 110