scratch-model
This is a scratch transformer model created using the Incremental Model Trainer.
Model Configuration
- Architecture: Transformer decoder
- Parameters: 9.3M
- Hidden Size: 256
- Layers: 8
- Attention Heads: 4
- FFN Dimension: 512
- Vocabulary Size: 8000
- Max Sequence Length: 4096
- Dropout: 0.1
Usage
from trainer.scratch_model import ScratchModelConfig, ScratchTransformer
config = ScratchModelConfig.from_dict(json.load(open("config.json")))
model = ScratchTransformer.from_pretrained(".", config)
Created with Incremental Model Trainer