Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +96 -0
config.json +35 -0
generation_config.json +10 -0
model.safetensors +3 -0
tokenizer.json +0 -0
tokenizer_config.json +9 -0

README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+---
+license: apache-2.0
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - text-generation
+  - gpt2
+  - small-language-model
+  - pytorch
+  - safetensors
+---
+# rb-nano
+A 48M-parameter, GPT-2-style decoder-only transformer trained from scratch as part of the **Leopard AI Model Suite**. Small enough to run on CPU or any GPU; built as a learning/research model, not a production assistant.
+> Looking for the quantized build? See the **GGUF** repo (`rb-nano-GGUF`) for `llama.cpp` / Ollama.
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("rafi-dev/rb-nano")
+tok = AutoTokenizer.from_pretrained("rafi-dev/rb-nano")
+prompt = "<sos>user: hello\nai:"
+ids = tok(prompt, return_tensors="pt").input_ids
+out = model.generate(
+    ids, max_new_tokens=64, do_sample=True,
+    temperature=0.7, top_k=40, top_p=0.9, repetition_penalty=1.3,
+    eos_token_id=2, pad_token_id=0,
+)
+print(tok.decode(out[0], skip_special_tokens=True))
+```
+It loads as a standard `GPT2LMHeadModel` — **no `trust_remote_code` required**.
+## Prompt format
+Trained on a simple `user:` / `ai:` turn format, prefixed with the `<sos>` token:
+```
+<sos>user: hello
+ai: Hi there! How can I help you today?
+user: what is python?
+ai:
+```
+## Architecture
+| | |
+|---|---|
+| Type | Decoder-only transformer (GPT-2 family) |
+| Parameters | ~48M |
+| Embedding dim (`n_embd`) | 512 |
+| Layers | 10 |
+| Attention heads | 8 |
+| Context length | 1024 tokens |
+| Position embeddings | Learned |
+| Norm / activation | LayerNorm, GELU-tanh (`gelu_new`) |
+| Head | Weight-tied to token embeddings |
+| Tokenizer | ByteLevel BPE, 32k vocab |
+| Format | safetensors (fp32) |
+## Training
+- **Pretrain** — FineWeb-Edu (`sample-10BT`), ~50M tokens. Final val loss ≈ 3.44.
+- **Finetune** — Alpaca, Alpaca-cleaned, CodeAlpaca-20k, Dolly-15k, and ShareGPT (full multi-turn threads, loss masked to assistant turns only). Final val loss ≈ 2.67.
+## Recommended parameters
+```
+temperature   0.7
+top_k         40
+top_p         0.9
+repeat_penalty 1.3
+```
+## Limitations
+- **Knowledge.** At 48M params the model has very limited factual knowledge and will confidently hallucinate (made-up libraries, wrong dates, etc.). It cannot be a reliable source of facts.
+- **Coherence.** Good for short exchanges; longer or more technical answers drift.
+- **Scope.** English-centric, 1024-token context. Best for demos, experimentation, and edge/CPU inference — not production use.
+## License / attribution
+Trained on publicly available datasets (FineWeb-Edu, Alpaca, Dolly, CodeAlpaca, ShareGPT). Review each dataset's license before redistributing derived outputs.
+## Made with care
+rb-nano was built by **Rafi** and **Buddi** — pretrained and finetuned from scratch on a single **RTX 4070 (8 GB VRAM)**. It's a passion project: proof that a coherent little chat model can be trained end-to-end on consumer hardware.
+If you enjoy it and want to support more experiments like this, you can [buy us a coffee ☕](https://ko-fi.com/leopardAi). Thank you for trying rb-nano — we hope you like it.

config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "activation_function": "gelu_new",
+  "add_cross_attention": false,
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.0,
+  "bos_token_id": 1,
+  "dtype": "float32",
+  "embd_pdrop": 0.0,
+  "eos_token_id": 2,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 512,
+  "n_head": 8,
+  "n_inner": 2048,
+  "n_layer": 10,
+  "n_positions": 1024,
+  "pad_token_id": 0,
+  "reorder_and_upcast_attn": false,
+  "resid_pdrop": 0.0,
+  "scale_attn_by_inverse_layer_idx": false,
+  "scale_attn_weights": true,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.9.0",
+  "use_cache": true,
+  "vocab_size": 32000
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "output_attentions": false,
+  "output_hidden_states": false,
+  "pad_token_id": 0,
+  "transformers_version": "5.9.0",
+  "use_cache": true
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f7bf03d75a6a339bd5d746e7658791d3184d8e58296c44f3d077789f1b05b2bb
+size 193745016

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<sos>",
+  "eos_token": "<eos>",
+  "model_max_length": 1024,
+  "pad_token": "<pad>",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "<unk>"
+}