Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +89 -0
config.json +17 -0
generation_config.json +9 -0
pytorch_model.bin +3 -0
token_bytes.pt +3 -0
tokenizer.pkl +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,89 @@

+---
+tags:
+- nanochat
+- gpt
+- language-model
+- rl
+license: mit
+---
+# nanochat model - RL stage
+This is a nanochat model trained using [Andrej Karpathy's nanochat](https://github.com/karpathy/nanochat).
+## Model Details
+- **Model type**: GPT-style transformer
+- **Training stage**: RL
+- **Parameters**: 560,988,160
+- **Architecture**:
+  - Layers: 20
+  - Embedding dim: 1280
+  - Attention heads: 10
+  - KV heads (GQA): 10
+  - Context length: 2048
+  - Vocab size: 65536
+## Training Info
+- **Training step**: N/A
+- **Validation BPB**: N/A
+## Architecture Highlights
+- ✅ RoPE (Rotary Position Embeddings)
+- ✅ QK Normalization
+- ✅ ReLU² activation (not GELU)
+- ✅ Untied embeddings
+- ✅ No bias terms
+- ✅ Logit softcapping
+- ✅ Group Query Attention (GQA)
+## Usage
+This model uses a custom architecture from nanochat. To use it:
+```python
+# Clone the nanochat repo
+git clone https://github.com/karpathy/nanochat.git
+cd nanochat
+# Download this checkpoint
+# Then load using nanochat's checkpoint manager
+from nanochat.checkpoint_manager import load_model
+from nanochat.engine import Engine
+model, tokenizer, meta = load_model("rl", "cuda", phase="eval")
+engine = Engine(model, tokenizer)
+# Generate text
+prompt = "The capital of France is"
+tokens = tokenizer(prompt, prepend="<|bos|>")
+completions, _ = engine.generate_batch(tokens, num_samples=1, max_tokens=50, temperature=0.7)
+print(tokenizer.decode(completions[0]))
+```
+## Special Tokens
+The model uses these special tokens for chat:
+- `<|bos|>` - Beginning of sequence
+- `<|user_start|>`, `<|user_end|>` - User messages
+- `<|assistant_start|>`, `<|assistant_end|>` - Assistant messages
+- `<|python_start|>`, `<|python_end|>` - Python tool calls
+- `<|output_start|>`, `<|output_end|>` - Tool outputs
+## Citation
+```bibtex
+@misc{nanochat,
+  author = {Andrej Karpathy},
+  title = {nanochat: The best ChatGPT that $100 can buy},
+  year = {2025},
+  publisher = {GitHub},
+  url = {https://github.com/karpathy/nanochat}
+}
+```
+## License
+MIT License - Same as the nanochat repository.

config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "architectures": [
+    "NanoChatGPT"
+  ],
+  "model_type": "nanochat-gpt",
+  "vocab_size": 65536,
+  "n_layer": 20,
+  "n_head": 10,
+  "n_kv_head": 10,
+  "n_embd": 1280,
+  "sequence_len": 2048,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "custom",
+  "nanochat_source": "rl",
+  "nanochat_training_step": null,
+  "nanochat_val_bpb": null
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "bos_token_id": 65527,
+  "eos_token_id": 65531,
+  "pad_token_id": 65531,
+  "max_length": 2048,
+  "do_sample": true,
+  "temperature": 0.7,
+  "top_k": 50
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:23d4021f0072ecd4f058179304f6e070657a9ebeb55cff056beed1773d753081
+size 2076228683

token_bytes.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ae39c27aae519d14071efc95f9a558ba0b7ede47e7d83ad4f198422b44c5f70e
+size 263721

tokenizer.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c060565a46fe83b49d99005acba796f2a630daa7970eb49f7513b89f9fb40e0
+size 846208