broadfield's picture
Create scratch-model - 9.3M parameters
1b51cd3 verified
# scratch-model
This is a scratch transformer model created using the Incremental Model Trainer.
## Model Configuration
- **Architecture**: Transformer decoder
- **Parameters**: 9.3M
- **Hidden Size**: 256
- **Layers**: 8
- **Attention Heads**: 4
- **FFN Dimension**: 512
- **Vocabulary Size**: 8000
- **Max Sequence Length**: 4096
- **Dropout**: 0.1
## Usage
```python
from trainer.scratch_model import ScratchModelConfig, ScratchTransformer
config = ScratchModelConfig.from_dict(json.load(open("config.json")))
model = ScratchTransformer.from_pretrained(".", config)
```
Created with [Incremental Model Trainer](https://huggingface.co/spaces/broadfield/incremental-model-trainer)