SpiceeChat
/

Train-From-Scratch-Via-Flash3-Kernels

Model card Files Files and versions

Train-From-Scratch-Via-Flash3-Kernels / README.md

PhysiQuanty's picture

Update README.md

63235ca verified 4 days ago

|

History Blame Contribute Delete

1.06 kB

	---
	{}
	---

	(v2 train scripts : RoPE Postitional Encoding)
	# SpiceeChat — Train From Scratch via Flash3 Kernels

	A custom TinyGPT trained from scratch using a BPE tokenizer and causal LM pipeline.

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| TinyGPT (custom GPT-style) \|
	\| Layers \| 4 \|
	\| Heads \| 4 \|
	\| Hidden size \| 384 \|
	\| Context length \| 128 \|
	\| Vocab size \| 32,768 \|
	\| Attention \| Torch (T4 compatible) \|

	## Files

	- `checkpoint_step_*.pt` — model weights
	- `tokenizer/` — BPE tokenizer trained on the same data
	- `config.json` — model hyperparameters

	## Load

	```python
	import torch
	import json
	from tokenizers import Tokenizer

	# Load tokenizer
	tok = Tokenizer.from_file("tokenizer/tokenizer.json")

	# Load model (requires train.py in same directory)
	from train import TinyGPT, GPTConfig
	cfg = GPTConfig(vocab_size=32768, ctx_len=128, n_layer=4, n_head=4, n_embd=384, attention_backend="torch")
	model = TinyGPT(cfg)
	ckpt = torch.load("latest.pt", map_location="cpu")
	model.load_state_dict(ckpt["model"])
	model.eval()
	```