Upload folder using huggingface_hub
Browse files- README.md +89 -0
- config.json +17 -0
- generation_config.json +9 -0
- pytorch_model.bin +3 -0
- token_bytes.pt +3 -0
- tokenizer.pkl +3 -0
README.md
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- nanochat
|
| 4 |
+
- gpt
|
| 5 |
+
- language-model
|
| 6 |
+
- rl
|
| 7 |
+
license: mit
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# nanochat model - RL stage
|
| 11 |
+
|
| 12 |
+
This is a nanochat model trained using [Andrej Karpathy's nanochat](https://github.com/karpathy/nanochat).
|
| 13 |
+
|
| 14 |
+
## Model Details
|
| 15 |
+
|
| 16 |
+
- **Model type**: GPT-style transformer
|
| 17 |
+
- **Training stage**: RL
|
| 18 |
+
- **Parameters**: 560,988,160
|
| 19 |
+
- **Architecture**:
|
| 20 |
+
- Layers: 20
|
| 21 |
+
- Embedding dim: 1280
|
| 22 |
+
- Attention heads: 10
|
| 23 |
+
- KV heads (GQA): 10
|
| 24 |
+
- Context length: 2048
|
| 25 |
+
- Vocab size: 65536
|
| 26 |
+
|
| 27 |
+
## Training Info
|
| 28 |
+
|
| 29 |
+
- **Training step**: N/A
|
| 30 |
+
- **Validation BPB**: N/A
|
| 31 |
+
|
| 32 |
+
## Architecture Highlights
|
| 33 |
+
|
| 34 |
+
- ✅ RoPE (Rotary Position Embeddings)
|
| 35 |
+
- ✅ QK Normalization
|
| 36 |
+
- ✅ ReLU² activation (not GELU)
|
| 37 |
+
- ✅ Untied embeddings
|
| 38 |
+
- ✅ No bias terms
|
| 39 |
+
- ✅ Logit softcapping
|
| 40 |
+
- ✅ Group Query Attention (GQA)
|
| 41 |
+
|
| 42 |
+
## Usage
|
| 43 |
+
|
| 44 |
+
This model uses a custom architecture from nanochat. To use it:
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
# Clone the nanochat repo
|
| 48 |
+
git clone https://github.com/karpathy/nanochat.git
|
| 49 |
+
cd nanochat
|
| 50 |
+
|
| 51 |
+
# Download this checkpoint
|
| 52 |
+
# Then load using nanochat's checkpoint manager
|
| 53 |
+
from nanochat.checkpoint_manager import load_model
|
| 54 |
+
from nanochat.engine import Engine
|
| 55 |
+
|
| 56 |
+
model, tokenizer, meta = load_model("rl", "cuda", phase="eval")
|
| 57 |
+
engine = Engine(model, tokenizer)
|
| 58 |
+
|
| 59 |
+
# Generate text
|
| 60 |
+
prompt = "The capital of France is"
|
| 61 |
+
tokens = tokenizer(prompt, prepend="<|bos|>")
|
| 62 |
+
completions, _ = engine.generate_batch(tokens, num_samples=1, max_tokens=50, temperature=0.7)
|
| 63 |
+
print(tokenizer.decode(completions[0]))
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## Special Tokens
|
| 67 |
+
|
| 68 |
+
The model uses these special tokens for chat:
|
| 69 |
+
- `<|bos|>` - Beginning of sequence
|
| 70 |
+
- `<|user_start|>`, `<|user_end|>` - User messages
|
| 71 |
+
- `<|assistant_start|>`, `<|assistant_end|>` - Assistant messages
|
| 72 |
+
- `<|python_start|>`, `<|python_end|>` - Python tool calls
|
| 73 |
+
- `<|output_start|>`, `<|output_end|>` - Tool outputs
|
| 74 |
+
|
| 75 |
+
## Citation
|
| 76 |
+
|
| 77 |
+
```bibtex
|
| 78 |
+
@misc{nanochat,
|
| 79 |
+
author = {Andrej Karpathy},
|
| 80 |
+
title = {nanochat: The best ChatGPT that $100 can buy},
|
| 81 |
+
year = {2025},
|
| 82 |
+
publisher = {GitHub},
|
| 83 |
+
url = {https://github.com/karpathy/nanochat}
|
| 84 |
+
}
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## License
|
| 88 |
+
|
| 89 |
+
MIT License - Same as the nanochat repository.
|
config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"NanoChatGPT"
|
| 4 |
+
],
|
| 5 |
+
"model_type": "nanochat-gpt",
|
| 6 |
+
"vocab_size": 65536,
|
| 7 |
+
"n_layer": 20,
|
| 8 |
+
"n_head": 10,
|
| 9 |
+
"n_kv_head": 10,
|
| 10 |
+
"n_embd": 1280,
|
| 11 |
+
"sequence_len": 2048,
|
| 12 |
+
"torch_dtype": "bfloat16",
|
| 13 |
+
"transformers_version": "custom",
|
| 14 |
+
"nanochat_source": "rl",
|
| 15 |
+
"nanochat_training_step": null,
|
| 16 |
+
"nanochat_val_bpb": null
|
| 17 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token_id": 65527,
|
| 3 |
+
"eos_token_id": 65531,
|
| 4 |
+
"pad_token_id": 65531,
|
| 5 |
+
"max_length": 2048,
|
| 6 |
+
"do_sample": true,
|
| 7 |
+
"temperature": 0.7,
|
| 8 |
+
"top_k": 50
|
| 9 |
+
}
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:23d4021f0072ecd4f058179304f6e070657a9ebeb55cff056beed1773d753081
|
| 3 |
+
size 2076228683
|
token_bytes.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ae39c27aae519d14071efc95f9a558ba0b7ede47e7d83ad4f198422b44c5f70e
|
| 3 |
+
size 263721
|
tokenizer.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4c060565a46fe83b49d99005acba796f2a630daa7970eb49f7513b89f9fb40e0
|
| 3 |
+
size 846208
|