--- language: - en library_name: candle tags: - text-generation - from-scratch - rust - transformer --- # picochat A 90M parameter GPT trained from scratch in Rust using the [picochat](https://github.com/Nu11ified/picochat) framework. ## Model details - **Architecture**: Decoder-only transformer with grouped-query attention, RoPE, sliding window attention, ReLU-squared MLP - **Parameters**: 31.5M (depth=8: 8 layers, 512 dim, 8 heads, 4 KV heads) - **Vocab size**: 4,096 (BPE tokenizer) - **Context length**: 2048 tokens - **Training**: Pretrained on OpenWebText (10k steps), then supervised fine-tuned on UltraChat + no_robots (2k steps) - **Framework**: [candle](https://github.com/huggingface/candle) (Rust) - **Trained on**: CPU only ## Usage ```bash # Clone the framework git clone https://github.com/Nu11ified/picochat.git cd picochat # Download weights mkdir -p runs/model # Download model.safetensors, config.json, and tokenizer.json from this repo # into runs/model/ # Chat cargo run --release -- \ --chat --load runs/model --tokenizer runs/model/tokenizer.json \ --temperature 0.8 --max-tokens 256 # Web UI cargo run --release -- \ --serve --load runs/model --tokenizer runs/model/tokenizer.json --port 8000 ``` ## Limitations This model was trained on CPU with limited data (~5M tokens vs GPT-2's 8B). It produces coherent text on topics seen during training but will generate garbled output on novel questions. The value of this project is the from-scratch Rust training framework, not the resulting model. ## Files - `model.safetensors` -- model weights (120MB) - `config.json` -- model architecture config - `tokenizer.json` -- BPE tokenizer (32K vocab)