scratch-model
This is a scratch transformer model created using the Incremental Model Trainer.
Model Configuration
- Architecture: Transformer decoder
- Parameters: 13.5M
- Hidden Size: 256
- Layers: 16
- Attention Heads: 16
- FFN Dimension: 512
- Vocabulary Size: 8000
- Max Sequence Length: 4096
- Dropout: 0.1
Usage
from trainer.scratch_model import ScratchModelCreator
creator = ScratchModelCreator()
model, tokenizer, config = creator.load_with_tokenizer("path/to/model")
Created with Incremental Model Trainer