唐诗生成模型 / Tang Poetry Generator

A character-level Transformer (GPT-like) trained on 全唐诗 (Complete Tang Poetry, ~57,000 poems).

Model Description

Property Value
Architecture 6-layer Transformer decoder
Attention 6-head MultiheadAttention
Embedding dim 384
Parameters 17.5M
Context length 128 characters
Vocab size 8,879 (Chinese chars + punctuation)
Training data ~3.9M chars from 全唐诗 (simplified Chinese)

Quick Start

pip install torch zhconv
from inference import load_model

gen = load_model("poetry_model.pt")
print(gen("床前明月光", max_new_tokens=100))

Files

File Description
poetry_model.pt Model checkpoint (weights, optimizer, config, tokenizer)
config.json Hyperparameters
tokenizer.json Character vocabulary (stoi / itos)
inference.py Standalone inference script
train_poetry.py Full training script

Training

Trained on Apple MPS (M4 MacBook Pro) with:

  • Optimizer: AdamW (lr=5e-4)
  • Schedule: 200-step linear warmup → cosine decay
  • Gradient clipping: max_norm=1.0
  • Dropout: 0.25
  • Best validation loss: 4.51 (perplexity ~91)

Example Output

Seed: 大漠孤烟直
Generated:
大漠孤烟直,秋霜未落残。
后期殊灼得,前事有堆劳。
恩势已云懦,幽深不可逃。
王程未去绝,左右空踟蹰。
门径微月色,庭盘晓云初。

License

MIT

Downloads last month
60
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support