Wilsonwin
/

handsongpt2

Text Generation

Model card Files Files and versions

handsongpt2

HandsOnGPT2 model trained on GuoFeng Webnovel Corpus using JAX/Flax on Kaggle TPU.

Model Details

Architecture: GPT-2 style transformer
Parameters: 84.6M
Vocab Size: 64,000 (Yi-1.5 tokenizer, TPU-aligned)
Max Length: 256
Layers: 6
Hidden Size: 512
Attention Heads: 8

Training

Framework: JAX/Flax
Hardware: Kaggle TPU v3-8
Batch Size: 16
Learning Rate: 0.0003
Final Loss: 0.0005

Usage

import orbax.checkpoint as ocp

checkpointer = ocp.PyTreeCheckpointer()
state = checkpointer.restore('/path/to/checkpoint')

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track