Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,3 +1,56 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
library_name: candle
|
| 5 |
+
tags:
|
| 6 |
+
- text-generation
|
| 7 |
+
- from-scratch
|
| 8 |
+
- rust
|
| 9 |
+
- transformer
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# picochat
|
| 13 |
+
|
| 14 |
+
A 90M parameter GPT trained from scratch in Rust using the [picochat](https://github.com/Nu11ified/picochat) framework.
|
| 15 |
+
|
| 16 |
+
## Model details
|
| 17 |
+
|
| 18 |
+
- **Architecture**: Decoder-only transformer with grouped-query attention, RoPE, sliding window attention, ReLU-squared MLP
|
| 19 |
+
- **Parameters**: 90M (depth=8: 8 layers, 512 dim, 8 heads, 4 KV heads)
|
| 20 |
+
- **Vocab size**: 32,768 (BPE tokenizer)
|
| 21 |
+
- **Context length**: 2048 tokens
|
| 22 |
+
- **Training**: Pretrained on OpenWebText (10k steps), then supervised fine-tuned on UltraChat + no_robots (2k steps)
|
| 23 |
+
- **Framework**: [candle](https://github.com/huggingface/candle) (Rust)
|
| 24 |
+
- **Trained on**: CPU only
|
| 25 |
+
|
| 26 |
+
## Usage
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
# Clone the framework
|
| 30 |
+
git clone https://github.com/Nu11ified/picochat.git
|
| 31 |
+
cd picochat
|
| 32 |
+
|
| 33 |
+
# Download weights
|
| 34 |
+
mkdir -p runs/model
|
| 35 |
+
# Download model.safetensors, config.json, and tokenizer.json from this repo
|
| 36 |
+
# into runs/model/
|
| 37 |
+
|
| 38 |
+
# Chat
|
| 39 |
+
cargo run --release -- \
|
| 40 |
+
--chat --load runs/model --tokenizer runs/model/tokenizer.json \
|
| 41 |
+
--temperature 0.8 --max-tokens 256
|
| 42 |
+
|
| 43 |
+
# Web UI
|
| 44 |
+
cargo run --release -- \
|
| 45 |
+
--serve --load runs/model --tokenizer runs/model/tokenizer.json --port 8000
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Limitations
|
| 49 |
+
|
| 50 |
+
This model was trained on CPU with limited data (~5M tokens vs GPT-2's 8B). It produces coherent text on topics seen during training but will generate garbled output on novel questions. The value of this project is the from-scratch Rust training framework, not the resulting model.
|
| 51 |
+
|
| 52 |
+
## Files
|
| 53 |
+
|
| 54 |
+
- `model.safetensors` -- model weights (345MB)
|
| 55 |
+
- `config.json` -- model architecture config
|
| 56 |
+
- `tokenizer.json` -- BPE tokenizer (32K vocab)
|