SoumilR commited on
Commit
f9cc0ce
·
verified ·
1 Parent(s): d314cf9

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +89 -0
  2. config.json +17 -0
  3. generation_config.json +9 -0
  4. pytorch_model.bin +3 -0
  5. token_bytes.pt +3 -0
  6. tokenizer.pkl +3 -0
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - nanochat
4
+ - gpt
5
+ - language-model
6
+ - rl
7
+ license: mit
8
+ ---
9
+
10
+ # nanochat model - RL stage
11
+
12
+ This is a nanochat model trained using [Andrej Karpathy's nanochat](https://github.com/karpathy/nanochat).
13
+
14
+ ## Model Details
15
+
16
+ - **Model type**: GPT-style transformer
17
+ - **Training stage**: RL
18
+ - **Parameters**: 560,988,160
19
+ - **Architecture**:
20
+ - Layers: 20
21
+ - Embedding dim: 1280
22
+ - Attention heads: 10
23
+ - KV heads (GQA): 10
24
+ - Context length: 2048
25
+ - Vocab size: 65536
26
+
27
+ ## Training Info
28
+
29
+ - **Training step**: N/A
30
+ - **Validation BPB**: N/A
31
+
32
+ ## Architecture Highlights
33
+
34
+ - ✅ RoPE (Rotary Position Embeddings)
35
+ - ✅ QK Normalization
36
+ - ✅ ReLU² activation (not GELU)
37
+ - ✅ Untied embeddings
38
+ - ✅ No bias terms
39
+ - ✅ Logit softcapping
40
+ - ✅ Group Query Attention (GQA)
41
+
42
+ ## Usage
43
+
44
+ This model uses a custom architecture from nanochat. To use it:
45
+
46
+ ```python
47
+ # Clone the nanochat repo
48
+ git clone https://github.com/karpathy/nanochat.git
49
+ cd nanochat
50
+
51
+ # Download this checkpoint
52
+ # Then load using nanochat's checkpoint manager
53
+ from nanochat.checkpoint_manager import load_model
54
+ from nanochat.engine import Engine
55
+
56
+ model, tokenizer, meta = load_model("rl", "cuda", phase="eval")
57
+ engine = Engine(model, tokenizer)
58
+
59
+ # Generate text
60
+ prompt = "The capital of France is"
61
+ tokens = tokenizer(prompt, prepend="<|bos|>")
62
+ completions, _ = engine.generate_batch(tokens, num_samples=1, max_tokens=50, temperature=0.7)
63
+ print(tokenizer.decode(completions[0]))
64
+ ```
65
+
66
+ ## Special Tokens
67
+
68
+ The model uses these special tokens for chat:
69
+ - `<|bos|>` - Beginning of sequence
70
+ - `<|user_start|>`, `<|user_end|>` - User messages
71
+ - `<|assistant_start|>`, `<|assistant_end|>` - Assistant messages
72
+ - `<|python_start|>`, `<|python_end|>` - Python tool calls
73
+ - `<|output_start|>`, `<|output_end|>` - Tool outputs
74
+
75
+ ## Citation
76
+
77
+ ```bibtex
78
+ @misc{nanochat,
79
+ author = {Andrej Karpathy},
80
+ title = {nanochat: The best ChatGPT that $100 can buy},
81
+ year = {2025},
82
+ publisher = {GitHub},
83
+ url = {https://github.com/karpathy/nanochat}
84
+ }
85
+ ```
86
+
87
+ ## License
88
+
89
+ MIT License - Same as the nanochat repository.
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "NanoChatGPT"
4
+ ],
5
+ "model_type": "nanochat-gpt",
6
+ "vocab_size": 65536,
7
+ "n_layer": 20,
8
+ "n_head": 10,
9
+ "n_kv_head": 10,
10
+ "n_embd": 1280,
11
+ "sequence_len": 2048,
12
+ "torch_dtype": "bfloat16",
13
+ "transformers_version": "custom",
14
+ "nanochat_source": "rl",
15
+ "nanochat_training_step": null,
16
+ "nanochat_val_bpb": null
17
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 65527,
3
+ "eos_token_id": 65531,
4
+ "pad_token_id": 65531,
5
+ "max_length": 2048,
6
+ "do_sample": true,
7
+ "temperature": 0.7,
8
+ "top_k": 50
9
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23d4021f0072ecd4f058179304f6e070657a9ebeb55cff056beed1773d753081
3
+ size 2076228683
token_bytes.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae39c27aae519d14071efc95f9a558ba0b7ede47e7d83ad4f198422b44c5f70e
3
+ size 263721
tokenizer.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c060565a46fe83b49d99005acba796f2a630daa7970eb49f7513b89f9fb40e0
3
+ size 846208