handsongpt2

HandsOnGPT2 model trained on GuoFeng Webnovel Corpus using JAX/Flax on Kaggle TPU.

Model Details

  • Architecture: GPT-2 style transformer
  • Parameters: 84.6M
  • Vocab Size: 64,000 (Yi-1.5 tokenizer, TPU-aligned)
  • Max Length: 256
  • Layers: 6
  • Hidden Size: 512
  • Attention Heads: 8

Training

  • Framework: JAX/Flax
  • Hardware: Kaggle TPU v3-8
  • Batch Size: 16
  • Learning Rate: 0.0003
  • Final Loss: 0.0005

Usage

import orbax.checkpoint as ocp

checkpointer = ocp.PyTreeCheckpointer()
state = checkpointer.restore('/path/to/checkpoint')

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support