LongNet-Char-Shakespeare
A character-level LongNet language model trained on the full Tiny Shakespeare dataset (~1.1M characters).
Model Details
- Architecture: LongNet (dilated attention Transformer) – supports theoretically up to 1B tokens context length
- Base model: Custom from-scratch implementation (no transformers library dependency)
- Parameters: ~6.3M
- Context length used in training: 8192 tokens (character-level)
- Training data: Tiny Shakespeare (full text of Shakespeare's plays)
- Tokenizer: Character-level (65 tokens)
- Training steps: [insert your final step count, e.g. 5000+]
- Hardware: Single GPU (RTX/equivalent with 4GB VRAM)
Usage
from longnet_model import LongNetLM
model = LongNetLM.from_pretrained("your-username/longnet-char-shakespeare")
model.eval()
- Downloads last month
- 16