Wilsonwin
/

handsongpt2

+---
+language:
+- zh
+- en
+license: apache-2.0
+tags:
+- jax
+- flax
+- mini-gpt
+- text-generation
+---
+# handsongpt2
+HandsOnGPT2 model trained on GuoFeng Webnovel Corpus using JAX/Flax on Kaggle TPU.
+## Model Details
+- **Architecture**: GPT-2 style transformer
+- **Parameters**: 84.6M
+- **Vocab Size**: 64,000 (Yi-1.5 tokenizer, TPU-aligned)
+- **Max Length**: 256
+- **Layers**: 6
+- **Hidden Size**: 512
+- **Attention Heads**: 8
+## Training
+- **Framework**: JAX/Flax
+- **Hardware**: Kaggle TPU v3-8
+- **Batch Size**: 16
+- **Learning Rate**: 0.0003
+- **Final Loss**: 0.0005
+## Usage
+```python
+import orbax.checkpoint as ocp
+checkpointer = ocp.PyTreeCheckpointer()
+state = checkpointer.restore('/path/to/checkpoint')
+```
+## License
+Apache 2.0