Text Generation
Transformers
Safetensors
PyTorch
English
gpt2
small-language-model
text-generation-inference
Instructions to use leopard-Ai/rb-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use leopard-Ai/rb-nano with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="leopard-Ai/rb-nano")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("leopard-Ai/rb-nano") model = AutoModelForCausalLM.from_pretrained("leopard-Ai/rb-nano") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use leopard-Ai/rb-nano with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "leopard-Ai/rb-nano" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leopard-Ai/rb-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/leopard-Ai/rb-nano
- SGLang
How to use leopard-Ai/rb-nano with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "leopard-Ai/rb-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leopard-Ai/rb-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "leopard-Ai/rb-nano" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leopard-Ai/rb-nano", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use leopard-Ai/rb-nano with Docker Model Runner:
docker model run hf.co/leopard-Ai/rb-nano
Upload folder using huggingface_hub
Browse files- README.md +96 -0
- config.json +35 -0
- generation_config.json +10 -0
- model.safetensors +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +9 -0
README.md
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
library_name: transformers
|
| 6 |
+
pipeline_tag: text-generation
|
| 7 |
+
tags:
|
| 8 |
+
- text-generation
|
| 9 |
+
- gpt2
|
| 10 |
+
- small-language-model
|
| 11 |
+
- pytorch
|
| 12 |
+
- safetensors
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# rb-nano
|
| 16 |
+
|
| 17 |
+
A 48M-parameter, GPT-2-style decoder-only transformer trained from scratch as part of the **Leopard AI Model Suite**. Small enough to run on CPU or any GPU; built as a learning/research model, not a production assistant.
|
| 18 |
+
|
| 19 |
+
> Looking for the quantized build? See the **GGUF** repo (`rb-nano-GGUF`) for `llama.cpp` / Ollama.
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 25 |
+
|
| 26 |
+
model = AutoModelForCausalLM.from_pretrained("rafi-dev/rb-nano")
|
| 27 |
+
tok = AutoTokenizer.from_pretrained("rafi-dev/rb-nano")
|
| 28 |
+
|
| 29 |
+
prompt = "<sos>user: hello\nai:"
|
| 30 |
+
ids = tok(prompt, return_tensors="pt").input_ids
|
| 31 |
+
out = model.generate(
|
| 32 |
+
ids, max_new_tokens=64, do_sample=True,
|
| 33 |
+
temperature=0.7, top_k=40, top_p=0.9, repetition_penalty=1.3,
|
| 34 |
+
eos_token_id=2, pad_token_id=0,
|
| 35 |
+
)
|
| 36 |
+
print(tok.decode(out[0], skip_special_tokens=True))
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
It loads as a standard `GPT2LMHeadModel` — **no `trust_remote_code` required**.
|
| 40 |
+
|
| 41 |
+
## Prompt format
|
| 42 |
+
|
| 43 |
+
Trained on a simple `user:` / `ai:` turn format, prefixed with the `<sos>` token:
|
| 44 |
+
|
| 45 |
+
```
|
| 46 |
+
<sos>user: hello
|
| 47 |
+
ai: Hi there! How can I help you today?
|
| 48 |
+
user: what is python?
|
| 49 |
+
ai:
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## Architecture
|
| 53 |
+
|
| 54 |
+
| | |
|
| 55 |
+
|---|---|
|
| 56 |
+
| Type | Decoder-only transformer (GPT-2 family) |
|
| 57 |
+
| Parameters | ~48M |
|
| 58 |
+
| Embedding dim (`n_embd`) | 512 |
|
| 59 |
+
| Layers | 10 |
|
| 60 |
+
| Attention heads | 8 |
|
| 61 |
+
| Context length | 1024 tokens |
|
| 62 |
+
| Position embeddings | Learned |
|
| 63 |
+
| Norm / activation | LayerNorm, GELU-tanh (`gelu_new`) |
|
| 64 |
+
| Head | Weight-tied to token embeddings |
|
| 65 |
+
| Tokenizer | ByteLevel BPE, 32k vocab |
|
| 66 |
+
| Format | safetensors (fp32) |
|
| 67 |
+
|
| 68 |
+
## Training
|
| 69 |
+
|
| 70 |
+
- **Pretrain** — FineWeb-Edu (`sample-10BT`), ~50M tokens. Final val loss ≈ 3.44.
|
| 71 |
+
- **Finetune** — Alpaca, Alpaca-cleaned, CodeAlpaca-20k, Dolly-15k, and ShareGPT (full multi-turn threads, loss masked to assistant turns only). Final val loss ≈ 2.67.
|
| 72 |
+
|
| 73 |
+
## Recommended parameters
|
| 74 |
+
|
| 75 |
+
```
|
| 76 |
+
temperature 0.7
|
| 77 |
+
top_k 40
|
| 78 |
+
top_p 0.9
|
| 79 |
+
repeat_penalty 1.3
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Limitations
|
| 83 |
+
|
| 84 |
+
- **Knowledge.** At 48M params the model has very limited factual knowledge and will confidently hallucinate (made-up libraries, wrong dates, etc.). It cannot be a reliable source of facts.
|
| 85 |
+
- **Coherence.** Good for short exchanges; longer or more technical answers drift.
|
| 86 |
+
- **Scope.** English-centric, 1024-token context. Best for demos, experimentation, and edge/CPU inference — not production use.
|
| 87 |
+
|
| 88 |
+
## License / attribution
|
| 89 |
+
|
| 90 |
+
Trained on publicly available datasets (FineWeb-Edu, Alpaca, Dolly, CodeAlpaca, ShareGPT). Review each dataset's license before redistributing derived outputs.
|
| 91 |
+
|
| 92 |
+
## Made with care
|
| 93 |
+
|
| 94 |
+
rb-nano was built by **Rafi** and **Buddi** — pretrained and finetuned from scratch on a single **RTX 4070 (8 GB VRAM)**. It's a passion project: proof that a coherent little chat model can be trained end-to-end on consumer hardware.
|
| 95 |
+
|
| 96 |
+
If you enjoy it and want to support more experiments like this, you can [buy us a coffee ☕](https://ko-fi.com/leopardAi). Thank you for trying rb-nano — we hope you like it.
|
config.json
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"activation_function": "gelu_new",
|
| 3 |
+
"add_cross_attention": false,
|
| 4 |
+
"architectures": [
|
| 5 |
+
"GPT2LMHeadModel"
|
| 6 |
+
],
|
| 7 |
+
"attn_pdrop": 0.0,
|
| 8 |
+
"bos_token_id": 1,
|
| 9 |
+
"dtype": "float32",
|
| 10 |
+
"embd_pdrop": 0.0,
|
| 11 |
+
"eos_token_id": 2,
|
| 12 |
+
"initializer_range": 0.02,
|
| 13 |
+
"layer_norm_epsilon": 1e-05,
|
| 14 |
+
"model_type": "gpt2",
|
| 15 |
+
"n_ctx": 1024,
|
| 16 |
+
"n_embd": 512,
|
| 17 |
+
"n_head": 8,
|
| 18 |
+
"n_inner": 2048,
|
| 19 |
+
"n_layer": 10,
|
| 20 |
+
"n_positions": 1024,
|
| 21 |
+
"pad_token_id": 0,
|
| 22 |
+
"reorder_and_upcast_attn": false,
|
| 23 |
+
"resid_pdrop": 0.0,
|
| 24 |
+
"scale_attn_by_inverse_layer_idx": false,
|
| 25 |
+
"scale_attn_weights": true,
|
| 26 |
+
"summary_activation": null,
|
| 27 |
+
"summary_first_dropout": 0.1,
|
| 28 |
+
"summary_proj_to_labels": true,
|
| 29 |
+
"summary_type": "cls_index",
|
| 30 |
+
"summary_use_proj": true,
|
| 31 |
+
"tie_word_embeddings": true,
|
| 32 |
+
"transformers_version": "5.9.0",
|
| 33 |
+
"use_cache": true,
|
| 34 |
+
"vocab_size": 32000
|
| 35 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"bos_token_id": 1,
|
| 4 |
+
"eos_token_id": 2,
|
| 5 |
+
"output_attentions": false,
|
| 6 |
+
"output_hidden_states": false,
|
| 7 |
+
"pad_token_id": 0,
|
| 8 |
+
"transformers_version": "5.9.0",
|
| 9 |
+
"use_cache": true
|
| 10 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f7bf03d75a6a339bd5d746e7658791d3184d8e58296c44f3d077789f1b05b2bb
|
| 3 |
+
size 193745016
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"backend": "tokenizers",
|
| 3 |
+
"bos_token": "<sos>",
|
| 4 |
+
"eos_token": "<eos>",
|
| 5 |
+
"model_max_length": 1024,
|
| 6 |
+
"pad_token": "<pad>",
|
| 7 |
+
"tokenizer_class": "TokenizersBackend",
|
| 8 |
+
"unk_token": "<unk>"
|
| 9 |
+
}
|