Wilsonwin
/

handsongpt2

Text Generation

Model card Files Files and versions

handsongpt2 / README.md

Wilsonwin's picture

Upload README.md with huggingface_hub

3869c0f verified 12 days ago

|

history blame contribute delete

744 Bytes

	---
	language:
	- zh
	- en
	license: apache-2.0
	tags:
	- jax
	- flax
	- mini-gpt
	- text-generation
	---

	# handsongpt2

	HandsOnGPT2 model trained on GuoFeng Webnovel Corpus using JAX/Flax on Kaggle TPU.

	## Model Details

	- Architecture: GPT-2 style transformer
	- Parameters: 84.6M
	- Vocab Size: 64,000 (Yi-1.5 tokenizer, TPU-aligned)
	- Max Length: 256
	- Layers: 6
	- Hidden Size: 512
	- Attention Heads: 8

	## Training

	- Framework: JAX/Flax
	- Hardware: Kaggle TPU v3-8
	- Batch Size: 16
	- Learning Rate: 0.0003
	- Final Loss: 0.0005

	## Usage

	```python
	import orbax.checkpoint as ocp

	checkpointer = ocp.PyTreeCheckpointer()
	state = checkpointer.restore('/path/to/checkpoint')
	```

	## License

	Apache 2.0