manasred commited on
Commit
a55a90a
·
verified ·
1 Parent(s): ab5422b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: candle
5
+ tags:
6
+ - text-generation
7
+ - from-scratch
8
+ - rust
9
+ - transformer
10
+ ---
11
+
12
+ # picochat
13
+
14
+ A 90M parameter GPT trained from scratch in Rust using the [picochat](https://github.com/Nu11ified/picochat) framework.
15
+
16
+ ## Model details
17
+
18
+ - **Architecture**: Decoder-only transformer with grouped-query attention, RoPE, sliding window attention, ReLU-squared MLP
19
+ - **Parameters**: 90M (depth=8: 8 layers, 512 dim, 8 heads, 4 KV heads)
20
+ - **Vocab size**: 32,768 (BPE tokenizer)
21
+ - **Context length**: 2048 tokens
22
+ - **Training**: Pretrained on OpenWebText (10k steps), then supervised fine-tuned on UltraChat + no_robots (2k steps)
23
+ - **Framework**: [candle](https://github.com/huggingface/candle) (Rust)
24
+ - **Trained on**: CPU only
25
+
26
+ ## Usage
27
+
28
+ ```bash
29
+ # Clone the framework
30
+ git clone https://github.com/Nu11ified/picochat.git
31
+ cd picochat
32
+
33
+ # Download weights
34
+ mkdir -p runs/model
35
+ # Download model.safetensors, config.json, and tokenizer.json from this repo
36
+ # into runs/model/
37
+
38
+ # Chat
39
+ cargo run --release -- \
40
+ --chat --load runs/model --tokenizer runs/model/tokenizer.json \
41
+ --temperature 0.8 --max-tokens 256
42
+
43
+ # Web UI
44
+ cargo run --release -- \
45
+ --serve --load runs/model --tokenizer runs/model/tokenizer.json --port 8000
46
+ ```
47
+
48
+ ## Limitations
49
+
50
+ This model was trained on CPU with limited data (~5M tokens vs GPT-2's 8B). It produces coherent text on topics seen during training but will generate garbled output on novel questions. The value of this project is the from-scratch Rust training framework, not the resulting model.
51
+
52
+ ## Files
53
+
54
+ - `model.safetensors` -- model weights (345MB)
55
+ - `config.json` -- model architecture config
56
+ - `tokenizer.json` -- BPE tokenizer (32K vocab)