manasred
/

picochat

Text Generation

Model card Files Files and versions

picochat / README.md

manasred's picture

Upload README.md with huggingface_hub

6983f9a verified 29 days ago

|

history blame contribute delete

1.69 kB

	---
	language:
	- en
	library_name: candle
	tags:
	- text-generation
	- from-scratch
	- rust
	- transformer
	---

	# picochat

	A 90M parameter GPT trained from scratch in Rust using the [picochat](https://github.com/Nu11ified/picochat) framework.

	## Model details

	- Architecture: Decoder-only transformer with grouped-query attention, RoPE, sliding window attention, ReLU-squared MLP
	- Parameters: 31.5M (depth=8: 8 layers, 512 dim, 8 heads, 4 KV heads)
	- Vocab size: 4,096 (BPE tokenizer)
	- Context length: 2048 tokens
	- Training: Pretrained on OpenWebText (10k steps), then supervised fine-tuned on UltraChat + no_robots (2k steps)
	- Framework: [candle](https://github.com/huggingface/candle) (Rust)
	- Trained on: CPU only

	## Usage

	```bash
	# Clone the framework
	git clone https://github.com/Nu11ified/picochat.git
	cd picochat

	# Download weights
	mkdir -p runs/model
	# Download model.safetensors, config.json, and tokenizer.json from this repo
	# into runs/model/

	# Chat
	cargo run --release -- \
	--chat --load runs/model --tokenizer runs/model/tokenizer.json \
	--temperature 0.8 --max-tokens 256

	# Web UI
	cargo run --release -- \
	--serve --load runs/model --tokenizer runs/model/tokenizer.json --port 8000
	```

	## Limitations

	This model was trained on CPU with limited data (~5M tokens vs GPT-2's 8B). It produces coherent text on topics seen during training but will generate garbled output on novel questions. The value of this project is the from-scratch Rust training framework, not the resulting model.

	## Files

	- `model.safetensors` -- model weights (120MB)
	- `config.json` -- model architecture config
	- `tokenizer.json` -- BPE tokenizer (32K vocab)